Why use `PyTorch`? `PyTorch` has two main advantages: 


*   You can perform NumPy operations on a GPU (could be decrease runtime)
*   It has automatic differention, which enables easier model training

Also, `PyTorch` is often considered to be intuitive than other frameworks (e.g. Tensorflow 1) because it follows the structure of common Python practices more. 

You can load `PyTorch` into your Python environment with the following line: 



# Neural Networks with `PyTorch` 🔥
### Tutorial for CSC2515, Fall 2021
###Author: Marta Skreta

In this notebook, I will give an introduction to `PyTorch` and how to get started on training models. This tutorial is adapted from the following sources:


*   [60 Minute Blitz with PyTorch](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) 
*   Andrew Ng's [Introduction to PyTorch Examples](https://cs230.stanford.edu/blog/pytorch/) for CS230 at Stanford University
*   Dilara Soylu's [PyTorch Tutorial](https://web.stanford.edu/class/cs224n/materials/CS224N_PyTorch_Tutorial.html) for CS224 at Stanford University

I highly recommend looking at them to get a better understanding of `PyTorch` basics.


---





In [None]:
import torch

The basic data structure used in `PyTorch` is a `tensor`. This is very similar to a NumPy `ndarray`, except that you can run them on GPUs.  

You can initialize `tensors` in a few different ways:

### A few ways to initialize a  `tensor`

> 1. Directly from the data:

In [None]:
x = [[0,1], [2,3]]
x_tensor = torch.tensor(x)

print(f"Tensor from data list: \n {x_tensor} \n")


Tensor from data list: 
 tensor([[0, 1],
        [2, 3]]) 





> 2. From a `NumPy` array:



In [None]:
import numpy as np
x = np.array(x)
x_tensor = torch.from_numpy(x)

print(f"Tensor from NumPy: \n {x_tensor} \n")


Tensor from NumPy: 
 tensor([[0, 1],
        [2, 3]]) 




> 3. With random or constant values:




In [None]:
shape = (2,3)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor: 
 tensor([[0.5993, 0.9360, 0.8067],
        [0.4213, 0.9121, 0.8752]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


## Moving a `Tensor` onto the GPU

I've already mentioned a few times that a huge benefit of `PyTorch` is that you can perform data operations on the GPU. So how do you do that? PyTorch doesn't automatically move `tensors` onto the GPU if you have one, so you have to indicate that in your code:

In [None]:
x = [[0, 1, 2], [3, 4, 5]]
x_tensor = torch.tensor(x)
print(f"Device tensor is stored on: {x_tensor.device}")

# We move our tensor to the GPU if available [in Colab, you can load a GPU by going into Edit > Notebook settings]
if torch.cuda.is_available():
  x_tensor = x_tensor.to('cuda')
  print(f"Device tensor is stored on: {x_tensor.device}")

Device tensor is stored on: cpu
Device tensor is stored on: cuda:0


## `Tensor` Operations

There are **a lot** of operations you can perform on `tensors` (like with NumPy arrays). I've only listed a few below, but the fill list can be found [here](https://pytorch.org/docs/stable/torch.html). 

In [None]:
# Taking the mean
x = torch.tensor([[1., 1.],
                  [2., 2.],
                  [3., 3.],
                  [4., 4.]])

print(f"Mean: {x.mean()}")
print(f"Mean over the columns: {x.mean(0)}")
print(f"Mean over the rows: {x.mean(1)}")

Mean: 2.5
Mean over the columns: tensor([2.5000, 2.5000])
Mean over the rows: tensor([1., 2., 3., 4.])


In [None]:
# Concatenating 
x_cat0 = torch.cat([x, x, x], dim=0)
x_cat1 = torch.cat([x, x, x], dim=1)

print(f"Initial shape: {x.shape}")
print(f"Shape after concatenation in dimension 0: {x_cat0.shape}")
print(f"Shape after concatenation in dimension 1: {x_cat1.shape}")

Initial shape: torch.Size([4, 2])
Shape after concatenation in dimension 0: torch.Size([12, 2])
Shape after concatenation in dimension 1: torch.Size([4, 6])


## Autograd

`PyTorch` allows you to perform automatic differentiaion on a tensor. If you specify a set of operations, `PyTorch` builds a graph behind the scenes of what variables depend on each other. Then, when you use the `backward()` method, `PyTorch` automatically computes the gradients for you!

In [None]:
# requires_grad is a parameter that indicates whether we want to compute the gradient for a given tensor
x = torch.tensor([2.], requires_grad=True)
print(f"Current gradient of x: {x.grad}")

Current gradient of x: None


In [None]:
# now let's set up the following operation: y = 3x^2
y = 3 * x * x
print(y)

tensor([12.], grad_fn=<MulBackward0>)


In [None]:
# let's compute the gradient of y wrt x:
y.backward()

RuntimeError: ignored

In [None]:
# now let's view the gradient values:
print(f"Gradient of y wrt x: {x.grad}")

## checking it by hand, we have dy/dx = 6x = 6*2 = 12, which matches the answer!

Gradient of y wrt x: tensor([12.])


In [None]:
z = 2*x
z.backward()


In [None]:
print(f"Current gradient of x: {x.grad}")


Current gradient of x: tensor([14.])


## Training a Neural Network

There are 5 core steps for train a neural network:


1.   Passing the data through the model
2.   Computing a loss
3.   Clearing the previous gradients (why is it important to do this")
4.   Computing the gradients of all variables wrt loss
5.   Update model parameters based on gradients



 > 1. Passing data through a model


 First, let's learn how to set up a model. We will use predefined building blocks in the `torch.nn` module of `PyTorch`, which we can later use to build more complicated models. 



In [None]:
import torch.nn as nn

We can use `nn.Linear(D_in, D_out)` to create a linear layer. This will take a matrix of dimension (N, D_in) and output a matrix of dimension (N, D_out). The linear layer performs the operation `Wx+b`, where `W` is the weight matrix, `x` is the model input, and `b` is the bias. If you don't want your model to learn a bias, you can set `bias=False`.



In [None]:
# model input where (N, D_in) = (10, 4) --> ten samples, each of 4 dimensions
input = torch.ones(10, 4)
# Make a linear layer transforming the number of dimensions from 4 to 2
# Notice that we don't care how many samples we are passing in, we just care 
# what dimensions each sample will have before and after being passed through the layer
linear = nn.Linear(4,2)
output = linear(input)

print(f"Model output: {output}")

Model output: tensor([[0.1611, 0.9501],
        [0.1611, 0.9501],
        [0.1611, 0.9501],
        [0.1611, 0.9501],
        [0.1611, 0.9501],
        [0.1611, 0.9501],
        [0.1611, 0.9501],
        [0.1611, 0.9501],
        [0.1611, 0.9501],
        [0.1611, 0.9501]], grad_fn=<AddmmBackward0>)


There are a number of `torch.nn` modules you can use apart from `nn.Linear`, such as `nn.LSTM`, `nn.Conv2d`, `nn.MaxPool2d`, `nn.BatchNorm1d`.

You can apply non-linear activations to your tensors. Some common ones include: `nn.ReLU()`, `nn.Sigmoid()` and `nn.LeakyReLU()`. 

In [None]:
sigmoid = nn.Sigmoid()
output_sigmoid = sigmoid(output)

print(f"Output after passing through non-linearity: {output_sigmoid}")

Output after passing through non-linearity: tensor([[0.5402, 0.7211],
        [0.5402, 0.7211],
        [0.5402, 0.7211],
        [0.5402, 0.7211],
        [0.5402, 0.7211],
        [0.5402, 0.7211],
        [0.5402, 0.7211],
        [0.5402, 0.7211],
        [0.5402, 0.7211],
        [0.5402, 0.7211]], grad_fn=<SigmoidBackward0>)


We can also combine `PyTorch` modules into a single block using `nn.Sequential`. That way, when we pass our data into the block in one single step, and it will performs all our operations. 

In [None]:
# Here, we are making a model block of one linear layer and one sigmoid activation

model = nn.Sequential(nn.Linear(4,2), nn.Sigmoid())
output = model(input)
print(f"Model output: {output}")

Model output: tensor([[0.6118, 0.2648],
        [0.6118, 0.2648],
        [0.6118, 0.2648],
        [0.6118, 0.2648],
        [0.6118, 0.2648],
        [0.6118, 0.2648],
        [0.6118, 0.2648],
        [0.6118, 0.2648],
        [0.6118, 0.2648],
        [0.6118, 0.2648]], grad_fn=<SigmoidBackward0>)


Defining a Custom Model

We can build custom models by extending the `nn.Module` class. This will allow us to build more complicated models or add custom features. First, you define your model class by extending the `nn.Module` class. Then, you have to implement two methods: (1) the `__init__()` method and (2) the `forward()` method.



1.   the `__init__()` method constructs the architecture of the model at start-time
2.   the `forward()` method constructs the forward pass through the model on a batch of your data



In [None]:
class CustomModel(nn.Module):
  def __init__(self, input_size, hidden_size, output_size):
    super(CustomModel, self).__init__()

    self.D_in = input_size
    self.H = hidden_size
    self.D_out = output_size

    self.linear = nn.Linear(self.D_in, self.H) 
    self.relu = nn.ReLU()
    self.linear2 = nn.Linear(self.H, self.D_out)
    self.sigmoid = nn.Sigmoid()

  def forward(self, x):
    linear = self.linear(x)
    relu = self.relu(linear)
    linear2 = self.linear2(relu)
    output = self.sigmoid(linear2)
    return output

Let's do a foward pass through our model:

In [None]:
# Sample input
input = torch.randn(50, 10)
# Initialize our model
model = CustomModel(input_size=10, hidden_size=5, output_size=3)
# Forward pass through model (this calls the forward() method)
# The output is a matrix of N x 3, where 3 is the number of 
# features in the output (e.g. 3-class classification)
y_pred = model(input)

print(f"Model output: {y_pred}")

Model output: tensor([[0.4220, 0.4772, 0.5618],
        [0.4114, 0.4716, 0.5666],
        [0.4284, 0.4660, 0.5539],
        [0.5676, 0.4454, 0.5301],
        [0.5361, 0.4668, 0.4520],
        [0.5370, 0.4934, 0.4697],
        [0.6396, 0.4531, 0.5346],
        [0.4993, 0.4860, 0.5049],
        [0.5852, 0.4640, 0.4669],
        [0.6652, 0.4917, 0.4158],
        [0.4619, 0.4594, 0.5764],
        [0.4753, 0.5040, 0.5377],
        [0.4754, 0.4726, 0.5403],
        [0.4455, 0.4638, 0.5658],
        [0.4694, 0.4858, 0.4642],
        [0.5891, 0.4623, 0.4335],
        [0.5883, 0.5256, 0.4208],
        [0.4852, 0.4465, 0.5768],
        [0.5672, 0.5171, 0.4670],
        [0.4866, 0.5052, 0.4919],
        [0.6213, 0.4330, 0.5050],
        [0.4521, 0.4562, 0.5715],
        [0.5040, 0.4458, 0.5824],
        [0.5356, 0.4772, 0.4922],
        [0.4180, 0.4748, 0.5498],
        [0.4705, 0.5018, 0.5226],
        [0.6336, 0.4589, 0.4672],
        [0.4435, 0.4857, 0.5515],
        [0.4610, 0.4956, 0.5438],


 > 2. Computing a Loss


Now that we have the outputs from the model, we can compute the loss between our model's predictions and the true labels. The `torch.nn` module has many standard loss functions. Here's an example using Cross Entropy Loss. Recall that the output from our model was an `Nx3` matrix, where `N` is the number of samples and 3 is the number of classes. We also need a `Nx1` tensor of ground-truth labels, where each element in the tensor is the class for that sample (i.e. a label in `[0, C-1]`)

In [None]:
target = torch.empty(50, dtype=torch.long).random_(3)

loss_fn = nn.CrossEntropyLoss()
loss = loss_fn(y_pred, target)
print(f"Model loss: {loss}")


Model loss: 1.094733715057373




> Choosing an Optimizer

The `torch.optim` package provides an easy to use interface for common optimization algorithms. Below is how you would define an optimizer, e.g. ADAM optimizer. When you initialize the optimizer, you pass in the parameters of the model that need to be updated every iteration. 



In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)



> 3. Clearing the previous gradients 

Before you calculate your gradients and update your model parameters, you have to clear any gradients that are currently stored in the graph from previous steps. Otherwise, you will keep accumulating gradients over several time steps...that leads to trouble!!! This is a very common error when people first start coding with PyTorch, so be aware!





In [None]:
optimizer.zero_grad()



> 4. Computing gradients

Now, we want to compute the gradients of the loss with respect to the parameters. This is all done behind the scenes, you just need to write one simple line:




In [None]:
loss.backward()



> 5. Updating model parameters 

Now that we've computed all the necessary gradients, let's update our model parameters by taking a step with our optimizer:






In [None]:
optimizer.step()

And those are the fundamentel parts of training a neural network with `PyTorch`! Let's put it all together and see if we can learn something:

In [None]:
# Sample input

# tensor of ones
target = torch.ones((10,5), dtype=torch.float)
# Add some noise to y to make our input --> let's see if we can recover y! 
input = target + torch.randn(target.shape)

print(f"Training data: {input}")
print(f"Target data: {target}")
print(f"Shape of training data: {input.shape}")
print(f"Shape of training targets: {target.shape}")


Training data: tensor([[ 1.4522,  1.4338,  0.5959, -0.7123,  1.1290],
        [ 1.0870,  0.4542,  0.6775,  0.1876,  1.6414],
        [ 0.7295,  1.0679,  0.9739,  1.7429,  0.8815],
        [ 1.5838, -1.0341,  0.5970, -0.8812,  1.1014],
        [-0.3065,  2.2987, -0.4444,  1.9551,  1.5242],
        [ 0.3595,  0.2741,  1.2791,  1.5211, -0.1085],
        [ 1.8763, -0.1687,  1.3902,  1.6630,  1.2452],
        [ 1.6329,  1.7060, -0.0257,  1.7950,  0.8797],
        [ 1.1290,  2.5302,  0.4437,  0.4617,  0.9326],
        [ 2.9515,  2.4048,  2.2147,  0.1978, -0.3741]])
Target data: tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])
Shape of training data: torch.Size([10, 5])
Shape of training targets: torch.Size([10, 5])


In [None]:
# Initialize our model
model = CustomModel(input_size=5, hidden_size=3, output_size=5)
# Define our loss (Binary Cross Entropy Loss)
loss_fn = nn.BCELoss()
# Initialize our optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
# total number of epochs we want to train over
epochs = 20
# set the model to training mode
model.train()

for epoch in range(epochs):
  # erase any stored gradients
  optimizer.zero_grad()
  # forward pass through model
  y_pred = model(input)
  # get the loss:
  loss = loss_fn(y_pred, target)
  # metrics: Mean Average Error
  mean_average_error = torch.sum(torch.abs(y_pred - target)) # torch.nn.L1Loss()
  # Compute the gradients
  loss.backward()
  # Print stats
  # Notice here how I've printed `loss.item()` instead of print(loss)...this is very important!!!
  # Otherwise, it will print the entire graph and this operation will build up over time...leads to memory errors!
  print(f"Epoch {epoch}: traing loss: {loss.item()} \t Mean Average Error: {mean_average_error}")

  # Take a step to optimize the weights
  optimizer.step()


Epoch 0: traing loss: 0.7005239725112915 	 Mean Average Error: 24.63335418701172
Epoch 1: traing loss: 0.5813505053520203 	 Mean Average Error: 21.56714630126953
Epoch 2: traing loss: 0.42759600281715393 	 Mean Average Error: 16.952749252319336
Epoch 3: traing loss: 0.2639540433883667 	 Mean Average Error: 11.26090145111084
Epoch 4: traing loss: 0.13721483945846558 	 Mean Average Error: 6.203320503234863
Epoch 5: traing loss: 0.06214423105120659 	 Mean Average Error: 2.917966604232788
Epoch 6: traing loss: 0.025949794799089432 	 Mean Average Error: 1.2489770650863647
Epoch 7: traing loss: 0.01069071888923645 	 Mean Average Error: 0.5226072072982788
Epoch 8: traing loss: 0.004489819519221783 	 Mean Average Error: 0.22160351276397705
Epoch 9: traing loss: 0.0019510985584929585 	 Mean Average Error: 0.09685015678405762
Epoch 10: traing loss: 0.0008827897836454213 	 Mean Average Error: 0.04396253824234009
Epoch 11: traing loss: 0.0004167843144387007 	 Mean Average Error: 0.0207928419113159

Let's see how our model does on new, unseen data🇰


In [None]:
# Test set
test_y = torch.ones((10,5), dtype=torch.float)
test_x = test_y + torch.randn(test_y.shape)
print(f"Test set: {test_x}")

# Set the model to eval mode so we don't update any gradients! 
model.eval()
# forward pass
y_pred = model(test_x)
mean_average_error = torch.sum(torch.abs(y_pred - test_y))

print(f"Model predictions on test set: {y_pred}")
print(f"MAE on test set: {mean_average_error}")



Test set: tensor([[ 0.5713,  2.9225,  1.7564,  2.0633,  2.1324],
        [ 0.3882,  1.0137,  0.5670,  0.1869,  1.9320],
        [ 3.2297,  2.5533,  1.9314, -0.5266, -0.0843],
        [-1.0570,  2.8383,  2.5270,  1.9148,  0.4592],
        [ 0.8176,  0.6290,  0.0762,  0.3075,  0.8088],
        [ 0.3096,  2.6133,  2.2863,  0.5647,  3.1094],
        [ 1.4065,  0.9189,  1.2627,  1.4747,  1.9253],
        [ 1.9570,  0.2657, -0.5802,  0.3128,  0.8996],
        [-0.8213,  1.6385,  0.1345,  0.4373,  1.7196],
        [-0.4279,  0.7339,  0.7273,  1.4645,  1.2706]])
Model predictions on test set: tensor([[1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        

We're able to recover the noise from the data pretty well! 

That's it for the introduction to training Neural Networks with PyTorch. I highly recommend following this [60 Minute Blitz with PyTorch](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) to gain a better understanding of PyTorch basics. Specifically, you should learn about PyTorch's Dataloaders next! 