<a href="https://colab.research.google.com/github/cs17emds11029/googlecolab/blob/master/Pytorch_Learning_Three.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook we shall set up a simple nueral network with the help of torch.nn package.

In [0]:
import torch
import numpy as np
import pandas as pd

We shall use the tutorial from pytorch.org for image classification across 10 classes. The 10 classes can be the 10 digits in the MNIST dataset. For now we shall use the network with a random input image of a chosen size.

In [0]:
import torch.nn as nn
import torch.nn.functional as F

In [0]:
# https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py

class LeNet(nn.Module):
  # All new networks start with subclassing the nn.Module class where a Module is defined as a logical grouping of ANN layers
  def __init__(self):
    super(LeNet,self).__init__() # Call the super classe's __init__ method
    # In the __init__ method we define the layers of the network with required sizes
    self.conv1 = nn.Conv2d(1, 6, 3)
    # This is the understanding of a convolution operation. The first parameter is the number of channels
    # parameter one for Grey scale images is 1 and for RGB images is 3
    # parameter 2 is the number of output channels, which can be chosen according to a network architecture
    # paramter 2 here is 6, so there are 6 output channels, and 6 kerners would operate on the input channel
    # parameter 3 is the kernel size, which is a 3x3 square kernel with stride one.
    # we did not define the height and width of the image here
    self.conv2 = nn.Conv2d(6, 16, 3)
    # from the output of this layer we have 16 channels of the image with different convolution filters
    # each one of these 16 channels would identify a particular aspect of the image
    self.fc1 = nn.Linear(16*6*6, 120)
    # here for the fully connected layer we are defining the image size in our 16 channels as 6x6
    # how did we get the image as 6x6? We shall verify with the math
    # the number of output nodes as 120 which can be chosen according to the architecture
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)
    # we conclude the network with 10 output nodes for 10 class classification
  
  def forward(self,x):
    # Now we have to fill the architecture my making the connection between each layer
    # We do this in the forward pass of the network and the computation framework of torch.nn.autograd would 
    # automatically handle the parameter update by back propogation
    x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
    # Here we first passed the input image that is X to the conv1 layer, applied an activation to its ouput
    # and then applied a Max Pooling operation with a 2x2 square kernel
    # now X has the ouput from the Conv-Relu-MaxPool block that is the first with 6 channels
    x = F.max_pool2d(F.relu(self.conv2(x)),(2,2))
    # now X has 16 channels and the dimension of x tensor is 16*h*w
    x = x.view(-1, self.num_flat_features(x))
    # We are flattening the tensor to a vector here
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x
  
  def num_flat_features(self,x):
    size = x.size()[1:]
    num_features = 1
    for s in size:
      num_features *= s
    return num_features

In the above cell, we defined lot of things. We defined the arch. of out Nueral Network, we defined the forward function (mandatory) of how the input (image) should progress through the network. It is possible that we can write display for the intermediate representations of the image in the forward pass.

In [4]:
# Lets initialize the netowk
lenet = LeNet()
print(lenet)

LeNet(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [5]:
params = list(lenet.parameters())
for param in params:
  print(param.size())

torch.Size([6, 1, 3, 3])
torch.Size([6])
torch.Size([16, 6, 3, 3])
torch.Size([16])
torch.Size([120, 576])
torch.Size([120])
torch.Size([84, 120])
torch.Size([84])
torch.Size([10, 84])
torch.Size([10])


Lets understand the parameter tensors above. We know that conv1 takes 1 channel as input and 6 channels as output with a 3x3 kernel. So there are in effect 1 * 6 * 3 * 3 kernel cells or paramters for conv1. Hence the size of conv1 parameter tensor is 6,1,3,3. The next tensor of size 6 is the weight's on the edges from the 6 output channels to the next convolution block. These are not kernel cells. Similarly one can understand the parameters until the output nodes

In [6]:
input = torch.randn(1,1,32,32) #batch size is one, channel is one, and image res is 32x32
# after convolution with a 3x3 kernel the image rep. becomes 30x30, and with 2x2 max pool the image res. becomes 28x28
out = lenet(input)
print(out)

tensor([[-0.0087,  0.0871,  0.1658, -0.0222,  0.0193,  0.0642,  0.1592,  0.0725,
         -0.0765, -0.0130]], grad_fn=<AddmmBackward>)


In [0]:
lenet.zero_grad()
out.backward(torch.randn(1,10))

In the above step, we set the gradients of the network to zero and done a backward stop that does automatic differentiation with respect to the output. The loss is output itself. In the next code blocks, we shall compute the actual loss with respect to a dummy target

In [19]:
output = lenet(input)
target = torch.randn(10)
target = target.view(1,-1)
print(target)

tensor([[-1.1712,  0.0261, -0.0733,  1.6994,  0.1407, -0.3993,  0.9905,  0.8395,
         -0.0738, -1.7455]])


In [20]:
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)

tensor(0.8887, grad_fn=<MseLossBackward>)


In [21]:
# Now we use this loss to call the backward function to adjust the gradients
# We can look at the loss propogation or computation graph starting from 'input'
# tensor to the calculation of 'loss'.
print(loss.grad_fn)
print(loss.grad_fn.next_functions[0][0])
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])


<MseLossBackward object at 0x7fc94c467320>
<AddmmBackward object at 0x7fc94c46f278>
<AccumulateGrad object at 0x7fc94c467320>


We can also integrate this with TensorBoard to view the computation or nueral network architecture.

In [0]:
# Now lets define the loop that will calculate the loss for each input chunk in the mini batch
# and update the parameters
lenet.zero_grad()
loss.backward()
learning_rate = 0.01
for f in lenet.parameters():
  f.data.sub_(learning_rate*f.grad.data)

The above step is default update rule of the parameters, i.e.
new_value = old_value - learining_rate * gradient. However the nice people has created an 'optim' or optimizer package at torch.optim that has various versions of parameter update like Adam, Nesterov etc

In [0]:
import torch.optim as optim

# use stochastic gradient descent
optimizer = optim.SGD(lenet.parameters(), lr=0.01)
optimizer.zero_grad() # the net is now encapsulated in the optimizer
# Setting the optimizer gradients to zero would set the gradients on the underlying 
# neural network to zero. This would enable us to compare multiple optimizers without instantiating
# too many objects of LeNet
output = lenet(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()

In [30]:
print(params[9])

Parameter containing:
tensor([ 0.0066,  0.0910,  0.0441, -0.0424,  0.0419,  0.0550,  0.0838,  0.0952,
        -0.1043, -0.0216], requires_grad=True)
