<a href="https://colab.research.google.com/github/FahimShahriarAnik/NLP/blob/main/Exploring_Pytorch_and_NN_training_using_pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## In this lab, you will be constructing a neural network in pytorch. Below, there is an introduction to pytorch, followed by simple model building, similar to what we do in the class.

### **Torch Tensors**
Tensors are the primary data objects in torch, equivalent to numpy array/ndarrays

In [None]:
import torch

# Initializing a tensor
# you can create tensors in multiple ways like numpy
data = torch.tensor([
                     [0, 1],
                     [2, 3],
                     [4, 5]
                    ])
zeros = torch.zeros(2, 5)
ones = torch.ones(3, 4)
rr = torch.arange(1, 10)
print(data)
print(zeros)
print(ones)
print(rr)

tensor([[0, 1],
        [2, 3],
        [4, 5]])
tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])


In [None]:
# scalar operation in torch
print(rr + 2)
print(rr * 2)

tensor([ 3,  4,  5,  6,  7,  8,  9, 10, 11])
tensor([ 2,  4,  6,  8, 10, 12, 14, 16, 18])


In [None]:
# tensor operation matrix multiplication in two different ways
a = torch.tensor([[1, 2], [2, 3], [4, 5]])      # (3, 2)
b = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]])  # (2, 4)  (3, 4)

print("The product is", a.matmul(b))
print("The other product is", a @ b) # @

The product is tensor([[11, 14, 17, 20],
        [17, 22, 27, 32],
        [29, 38, 47, 56]])
The other product is tensor([[11, 14, 17, 20],
        [17, 22, 27, 32],
        [29, 38, 47, 56]])


In [None]:
# reshaping tensors
rr = torch.arange(1, 16)
print("The shape is currently", rr.shape)
print("The contents are currently", rr)
print()
rr = rr.view(5, 3)
print("After reshaping, the shape is currently", rr.shape)
print("The contents are currently", rr)

The shape is currently torch.Size([15])
The contents are currently tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

After reshaping, the shape is currently torch.Size([5, 3])
The contents are currently tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12],
        [13, 14, 15]])


In [None]:
# converting between numpy and torch
import numpy as np

# numpy.ndarray --> torch.Tensor:
arr = np.array([[1, 0, 5]])
data = torch.tensor(arr)
print("This is a torch.tensor", data)

# torch.Tensor --> numpy.ndarray:
new_arr = data.numpy()
print("This is a np.ndarray", new_arr)

This is a torch.tensor tensor([[1, 0, 5]])
This is a np.ndarray [[1 0 5]]


In [None]:
# operations within a torch tensor
data = torch.arange(1, 36, dtype=torch.float32).reshape(5, 7)
print("Data is:\n", data)

# We can perform operations like *sum* over each row...
print("Taking the sum over columns:")
print(data.sum(dim=0))

# or over each column.
print("Taking the sum over rows:")
print(data.sum(dim=1))

# Other operations are available:
print("Taking the stdev over rows:")
print(data.std(dim=1))

Data is:
 tensor([[ 1.,  2.,  3.,  4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11., 12., 13., 14.],
        [15., 16., 17., 18., 19., 20., 21.],
        [22., 23., 24., 25., 26., 27., 28.],
        [29., 30., 31., 32., 33., 34., 35.]])
Taking the sum over columns:
tensor([ 75.,  80.,  85.,  90.,  95., 100., 105.])
Taking the sum over rows:
tensor([ 28.,  77., 126., 175., 224.])
Taking the stdev over rows:
tensor([2.1602, 2.1602, 2.1602, 2.1602, 2.1602])


In [None]:
from IPython.utils.sysinfo import pprint
# there are many ways to get element my index
print(data[3,1])
print(data[1])
print(data[:,3])
# we can access elements in torch using a mask
x_ids = [0,2,4,2]
y_ids = [1,3,2,1]
print(data[x_ids,y_ids])
# use item to get python scalar value
print(data[0, 0].item())

tensor(23.)
tensor([ 8.,  9., 10., 11., 12., 13., 14.])
tensor([ 4., 11., 18., 25., 32.])
tensor([ 2., 18., 31., 16.])
1.0


### **Autograd**
Pytorch is well-known for its automatic differentiation feature. We can call the `backward()` method to ask `PyTorch` to calculate the gradients, which are then stored in the `grad` attribute.

In [None]:
# Create an example tensor
# requires_grad parameter tells PyTorch to store gradients
x = torch.tensor([2.], requires_grad=True)
print(x)
# Print the gradient if it is calculated
# Currently None since x is a scalar
print(x.grad)

tensor([2.], requires_grad=True)
None


In [None]:
# Calculating the gradient of y with respect to x
y = x * x * 3 # 3x^2
y.backward()
print(x.grad) # d(y)/d(x) = d(3x^2)/d(x) = 6x = 12

tensor([12.])


In [None]:
z = x * x * 3 # 3x^2
z.backward()
print(x.grad)

tensor([24.])


We can see that the `x.grad` is updated to be the sum of the gradients calculated so far. When we run backprop in a neural network, we sum up all the gradients for a particular neuron before making an update. This is exactly what is happening here! This is also the reason why we need to run `zero_grad()` in every training iteration (more on this later). Otherwise our gradients would keep building up from one training iteration to the other, which would cause our updates to be wrong.

In [None]:
import torch.nn as nn

### **Linear Layer**
We can use `nn.Linear(H_in, H_out)` to create a a linear layer. This will take a matrix of `(N, *, H_in)` dimensions and output a matrix of `(N, *, H_out)`. The `*` denotes that there could be arbitrary number of dimensions in between. The linear layer performs the operation `Ax+b`, where `A` and `b` are initialized randomly. If we don't want the linear layer to learn the bias parameters, we can initialize our layer with `bias=False`.

In [None]:
# Create the inputs
input = torch.ones(2,3,4)
print(input)
# N* H_in -> N*H_out


# Make a linear layers transforming N,*,H_in dimensinal inputs to N,*,H_out
# dimensional outputs
linear = nn.Linear(4, 2)
linear_output = linear(input)
linear_output

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])


tensor([[[-0.2021, -0.0600],
         [-0.2021, -0.0600],
         [-0.2021, -0.0600]],

        [[-0.2021, -0.0600],
         [-0.2021, -0.0600],
         [-0.2021, -0.0600]]], grad_fn=<ViewBackward0>)

In [None]:
list(linear.parameters()) # Ax + b

[Parameter containing:
 tensor([[-0.2412,  0.0789, -0.4416,  0.3624],
         [ 0.0619,  0.3204, -0.4371, -0.2285]], requires_grad=True),
 Parameter containing:
 tensor([0.0395, 0.2234], requires_grad=True)]

### **Activation Function Layer**
We can also use the `nn` module to apply activations functions to our tensors. Activation functions are used to add non-linearity to our network. Some examples of activations functions are `nn.ReLU()`, `nn.Sigmoid()` and `nn.LeakyReLU()`. Activation functions operate on each element seperately, so the shape of the tensors we get as an output are the same as the ones we pass in.

In [None]:
sigmoid = nn.Sigmoid()
output = sigmoid(linear_output)
output

tensor([[[0.4496, 0.4850],
         [0.4496, 0.4850],
         [0.4496, 0.4850]],

        [[0.4496, 0.4850],
         [0.4496, 0.4850],
         [0.4496, 0.4850]]], grad_fn=<SigmoidBackward0>)

### **Putting the Layers Together**
So far we have seen that we can create layers and pass the output of one as the input of the next. Instead of creating intermediate tensors and passing them around, we can use `nn.Sequentual`, which does exactly that.

In [None]:
block = nn.Sequential(
    nn.Linear(4, 2),
    nn.Sigmoid()
)

input = torch.ones(2,3,4)
output = block(input)
output

tensor([[[0.2666, 0.7752],
         [0.2666, 0.7752],
         [0.2666, 0.7752]],

        [[0.2666, 0.7752],
         [0.2666, 0.7752],
         [0.2666, 0.7752]]], grad_fn=<SigmoidBackward0>)

### **Optimization**
We have showed how gradients are calculated with the `backward()` function. Having the gradients isn't enought for our models to learn. We also need to know how to update the parameters of our models. This is where the optomozers comes in. `torch.optim` module contains several optimizers that we can use. Some popular examples are `optim.SGD` and `optim.Adam`. When initializing optimizers, we pass our model parameters, which can be accessed with `model.parameters()`, telling the optimizers which values it will be optimizing. Optimizers also has a learning rate (`lr`) parameter, which determines how big of an update will be made in every step. Different optimizers have different hyperparameters as well.

In [None]:
import torch.optim as optim

In [None]:
# Create the y data
y = torch.ones(10, 5)

# Add some noise to our goal y to generate our x
# We want out model to predict our original data, albeit the noise
x = y + torch.randn_like(y)
x

tensor([[-0.4663,  1.1629,  2.9229,  1.7671,  1.5086],
        [ 2.5267,  0.6002, -0.2588,  2.3200,  1.3105],
        [ 0.9154,  0.0099,  1.1392,  3.0061,  0.6774],
        [ 0.1302,  0.3888, -0.8516,  1.0685,  0.1376],
        [ 2.2306, -0.3827,  3.0548, -0.5210,  1.0460],
        [ 0.6874,  2.1466,  2.1439,  1.8129, -0.1852],
        [ 2.3790, -0.7556,  0.1863,  1.2663,  1.6452],
        [ 3.0661,  2.3477, -0.0770,  3.0439,  2.1019],
        [ 1.2420, -0.5397,  0.1221, -0.0594, -0.3585],
        [ 0.4145,  1.9138,  1.4450, -0.1977,  0.7329]])

In [None]:
# Instantiate the model
model = nn.Sequential(
    nn.Linear(5, 3),
    nn.ReLU(),
    nn.Linear(3, 5),
    nn.Sigmoid()
)

# Define the optimizer
adam = optim.Adam(model.parameters(), lr=1e-1)

# Define loss using a predefined loss function
loss_function = nn.BCELoss()

# Calculate how our model is doing now
y_pred = model(x)
loss_function(y_pred, y).item()

0.8271358609199524

Let's see if we can have our model achieve a smaller loss. Now that we have everything we need, we can setup our training loop.

In [None]:
# Set the number of epoch, which determines the number of training iterations
n_epoch = 10

for epoch in range(n_epoch):
  # Set the gradients to 0
  adam.zero_grad()

  # Get the model predictions
  y_pred = model(x)

  # Get the loss
  loss = loss_function(y_pred, y)

  # Print stats
  print(f"Epoch {epoch}: traing loss: {loss}")

  # Compute the gradients
  loss.backward()

  # Take a step to optimize the weights
  adam.step()

Epoch 0: traing loss: 0.8271358609199524
Epoch 1: traing loss: 0.6808784604072571
Epoch 2: traing loss: 0.6123683452606201
Epoch 3: traing loss: 0.5250092148780823
Epoch 4: traing loss: 0.4013831317424774
Epoch 5: traing loss: 0.27092137932777405
Epoch 6: traing loss: 0.1613507717847824
Epoch 7: traing loss: 0.0897623747587204
Epoch 8: traing loss: 0.051260724663734436
Epoch 9: traing loss: 0.030743010342121124


### **Tasks**
We will be reconstructing the input like the previous example with a few modifications
1. create a torch tensor `y` with size (10,10) with random values in the range (-3,7)
2. create `error` with random values from the range (-0.5,0.5) and add this error to y to create our `input`
3. create a model with three hidden layers where each layer will consist of a linear unit with a relu non-linear function except for the third/final layer which won't have a non-linear function attached
4. The second and third hidden layer will have input dimension of respectively 7 and 11
5. run the subsequent code to train the model

In [None]:
# todo code here
torch.manual_seed(13)

# 10 x 10 matrix
y = torch.randint(-3, 7, (10, 10), dtype=torch.float32)
# error should have same size as y
error = torch.rand_like(y) * 1.0 - 0.5 # following this formula to create range, y= x*(b−a)+a
# input is a sum of y and error
input = y + error
# # this is sequential model
model2 = nn.Sequential(
    nn.Linear(10, 7),
    nn.ReLU(),
    nn.Linear(7, 11),
    nn.ReLU(),
    nn.Linear(11, 10),
    nn.Sigmoid()
)

# we already preset these
adam = optim.Adam(model2.parameters(), lr=1e-1)
# we will use mean squared error as loss
loss_function = nn.MSELoss()
n_epoch = 10
for epoch in range(n_epoch):
  # reset the grads to make sure it does accumulate values from previous steps
  adam.zero_grad()
  # getting output using forward pass
  y_pred = model2(input)
  # calculating loss
  loss = loss_function(y_pred, y)
  # print
  print(f"Epoch {epoch}: traing loss: {loss}")
  # backward pass: calculate gradient
  loss.backward()
  # update weights
  adam.step()

Epoch 0: traing loss: 9.028261184692383
Epoch 1: traing loss: 8.470184326171875
Epoch 2: traing loss: 7.989218235015869
Epoch 3: traing loss: 7.737677574157715
Epoch 4: traing loss: 7.630360126495361
Epoch 5: traing loss: 7.662440776824951
Epoch 6: traing loss: 7.627350330352783
Epoch 7: traing loss: 7.563365936279297
Epoch 8: traing loss: 7.502579212188721
Epoch 9: traing loss: 7.430051326751709
