In [1]:
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

We will now build a simple deep learning model using Torch. We have already encountered $autograd$ in the previous lecture. $nn$ is built on top of $autograd$ and contains the basic pre-built neural network layers such as Convolution, Pooling, Recurrent and Dropout layers. 

$nn.functional$ -> contains components functions of neural networks like convolution, pooling and activation functions.

In [2]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # Below is the model details for the architecture:
        # 1 input image channel, 5 output channels, 3x3 square convolution kernel
        # The first (Convolution) COV layer has 1 input channel, 5 output channels and a 3x3 kernel
        self.conv1 = nn.Conv2d(1, 5, 3)
        # The second COV layer has 5 input channel, 16 output channels and a 3x3 kernel
        self.conv2 = nn.Conv2d(5, 16, 3)
        # The next three layers are fully connected (FC) layers an affine operation: y = Wx + b
        # The first layer has 16*3*3 input channels and 120 output channels
        self.fc1 = nn.Linear(16 * 3 * 3, 120)
        # The second layer has 120 input channels and 60 output channels
        self.fc2 = nn.Linear(120, 60)
        # The third layer has 60 input channels and 10 output channels
        self.fc3 = nn.Linear(60, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # We can do the same thing by passing a single digit 2 
        # Instead of a tuple (2,2) since the window is a square
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        # We flatten the features produced from pooling and pass them to an FC layer
        x = x.view(-1, self.num_flat_features(x))
        # We apply ReLU on the first FC layer
        x = F.relu(self.fc1(x))
        # We apply ReLU on the second FC layer
        x = F.relu(self.fc2(x))
        # We store and return the last FC layer
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        # For and n dimension tensor we return the product of all dimensions
        for s in size:
            num_features *= s
        return num_features

Now a class of the model with the required architecture is created.

Remember: We only had to define the forward function since the backward function (where gradients are calculated) is automatically defined and taken care of my PyTorch.

In [3]:
net = Net()
print(net)

Net(
  (conv1): Conv2d (1, 5, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d (5, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=144, out_features=120)
  (fc2): Linear(in_features=120, out_features=60)
  (fc3): Linear(in_features=60, out_features=10)
)


The model is printed and we can easily see the architecture that we had created. Let us now use it

In [4]:
params = list(net.parameters())
print(len(params))

10


The $.parameters()$ function will return all the learnable parameters

In [5]:
print(params[0].size())

torch.Size([5, 1, 3, 3])


As we can see the first learnable parameter is a $5\times1\times3\times3$ tensor since the first COV layer has 1 input channel, 5 output channels and a $3\times3$ kernel. 

Let us give an example input of $20\times20$ (a single color channel, Grayscale, image)

In [6]:
input = Variable(torch.randn(1, 1, 20, 20))
out = net(input)
print(out)

Variable containing:
-0.0992 -0.0789  0.0040  0.0058 -0.0659 -0.0830  0.1925  0.0286 -0.0760 -0.0965
[torch.FloatTensor of size 1x10]



This $out$ vector is the output of the neural network.

Till now, we learnt to make and use a simple neural network. Now we will delve into the loss function and get a deeper insight of how torch supports more complex neural networks.

In [7]:
net.zero_grad()

Often we want to set our gradients to $0$ to initialize them. Torch supports a function for this.

In [8]:
out.backward(torch.randn(1, 10))

Another practice that is not so commonly used is to backpropagate with random gradients. But if you ever need to do it, this is how simple it is.

In the next lecture we will build a feedforward NN using torch and run it on our GPU.