# Pytorch Basics

*Pytorch* is a library for creating and training neural network based models. It provides functions for performing various matrix and vector operations, computing losses and metrics, and optimization. Crucially, it also implements auto-differentiation. This means that whenever you use one of the built in Pytorch operations to perform a computation, the result of the computation will be stored. These stored computations can then be used to compute gradients for each of the intermediate computation steps using back-propagation.

## Tensors

![tensors.png](attachment:tensors.png)

In Pytorch (and other popular neural network libraries), all of the operations act on <i>tensors</i> of data. 

These tensors are really just multi-dimensional arrays of numbers. A 1 dimensional tensor is an array of numbers indexed by one number (that is, a vector), and a 2 dimensional tensor is an array of numbers indexed by two numbers (that is, a matrix), and so on. 

In Pytorch, tensors are 0-indexed.

In [1]:
import torch
import torch.nn.functional as F
import numpy as np
import os

## Practice with tensor operations

In [2]:
a = torch.FloatTensor([-1, 1]).view(2, 1) # Create a 2x1 matrix.
b = torch.FloatTensor([0,1]).view(1, 2) # Create a 1x2 matrix.
c = a @ b # Perform matrix multiplication on a and b.
print(f'c has a shape of {c.shape}, this means that it has {c.shape[0]} rows and {c.shape[1]} columns.')
print(f'c:\n {c}')

c has a shape of torch.Size([2, 2]), this means that it has 2 rows and 2 columns.
c:
 tensor([[-0., -1.],
        [ 0.,  1.]])


### Exercise
Compute np.matmul(np.array([[1,2],[3,4],[5,6]]), np.array([[1],[2]]) ) + np.array([1,0,1]) with Pytorch

In [3]:
a = torch.arange(24).view(4, 2, 3) # Create a tensor of shape 4x2x3.
# torch.arange creates a 1d tensor of integers ranging from 0 up to the specified number.
print(f'a:\n {a}')

a_sum = a.sum(dim=1) # The sum function adds up all the values along the specified dimension.
                     # In this case we sum over the second dimension.
print(f'a_sum has a shape of {a_sum.shape}')
print(f'a_sum: {a_sum}')

# Try changing the dimension in a.sum to see how it affects the result. What happens if you don't pass in any dimension at all?

a:
 tensor([[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]],

        [[12, 13, 14],
         [15, 16, 17]],

        [[18, 19, 20],
         [21, 22, 23]]])
a_sum has a shape of torch.Size([4, 3])
a_sum: tensor([[ 3,  5,  7],
        [15, 17, 19],
        [27, 29, 31],
        [39, 41, 43]])


# Building a simple model
We're now going to build a simple linear regression model. 

To begin with, we will define some random x and y variables to make up our training dataset.

In [4]:
n_data = 1000
dim = 3

train_X = torch.randn(n_data, dim) * 2

eps = torch.randn(n_data, 1) * 0.1
train_Y = (train_X @ torch.FloatTensor([2, -1, 4]).view(dim, 1)) + eps

Next we will define our regression model.

In [5]:
class LinearRegressor(torch.nn.Module):
    def __init__(self, d_in, d_out):
        super(LinearRegressor, self).__init__()
        
        self.W = torch.nn.Linear(d_in, d_out) 
        # This creates a "linear" module, which is actually a matrix multiplication and a bias vector.
        # When a new instance of this class is created this module will be initialized with random values.
        # Since this class subclasses torch.nn.Module, all of the parameters in the linear module will be added
        # to this LinearRegressor's parameters.
    
    def forward(self, x):
        # The forward function is applied whenever this model is called on some input data.
        
        y_h = self.W(x) # Apply our linear operation to the input.
        return y_h

model = LinearRegressor(dim, 1)

Next we need an optimizer. 

The optimizer will change the values of all of the tensors given to it. Each time the optimization step is run, the variables are changed by a small amount (how much they change each step is controlled by the learning rate).

In [6]:
optimizer = torch.optim.SGD(model.parameters(), lr=0.1) # Create a stochastic gradient descent optimizer, and give it all our model's parameters as tensors to be optimized

And finally, we can train our model. 

When training neural networks, it is common to run optimization steps on every input in the dataset multiple times. Each pass through the entire dataset is called one *epoch* of training. Usually models are trained for multiple epochs.

In [7]:
n_epochs = 3

# Train our model for n_epoch epochs
for i in range(n_epochs):
    yh = model(train_X)
    
    loss = F.mse_loss(yh, train_Y) # Compute mean squared error between our model output and the correct labels.
    
    optimizer.zero_grad() # Set all gradients to 0.
    loss.backward() # Calculate gradient of loss w.r.t all tensors used to compute loss.
    optimizer.step() # Update all model parameters according to calculated gradients.
    
    print(f'epoch {i}, W has value:')
    print(list(model.W.parameters()))
    print('\n')
    
# Our model can now be used to predict y values for new X as follows.
new_X = torch.arange(dim).float().view(1, dim)
new_y_h = model(new_X)
print(f'The model predicts {new_y_h}')

epoch 0, W has value:
[Parameter containing:
tensor([[ 1.3248, -0.6409,  3.2648]], requires_grad=True), Parameter containing:
tensor([0.0716], requires_grad=True)]


epoch 1, W has value:
[Parameter containing:
tensor([[ 1.8168, -0.8918,  3.8428]], requires_grad=True), Parameter containing:
tensor([0.0652], requires_grad=True)]


epoch 2, W has value:
[Parameter containing:
tensor([[ 1.9507, -0.9709,  3.9662]], requires_grad=True), Parameter containing:
tensor([0.0543], requires_grad=True)]


The model predicts tensor([[7.0159]], grad_fn=<AddmmBackward>)
