<a href="https://colab.research.google.com/github/desaiankitb/pytorch-basics/blob/main/deep-learning-blitz/02_neural_network_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%matplotlib inline

# Neural Networks 
- Neural networks can be constructed using `torch.nn` package. 

- Now that you had a glimpse of `autograd`, `nn` depends on `autograd` to define models and differentiate them. An `nn.Module` contains layers, and a method `forward(input)` that returns the `output`.

For example, look at this network that classifies digit images:
![image](https://pytorch.org/tutorials/_images/mnist.png)
convnet

- It is a simple feed-forward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output. 

- A typical training procedure for neural network is as follows: 
  - Define the neural network that has some learnable parameters (or weights) 
  - Iterate over a dataset of inputs 
  - Process input through the network 
  - Compute the loss (How far is the output from being correct) 
  - Propagate gradients back into the network's parameters 
  - Update the weights of the network, typically using simple update rule: `weight = weight - learning_rate * gradient`


# Define the Network
Let us define this network: 

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F 

class Net(nn.Module):

  def __init__(self):
    super(Net, self).__init__()
    # 1 input image channel, 6 output channels, 5X5 square convolution 
    # Kernal 
    self.conv1 = nn.Conv2d(1, 6, 5)
    self.conv2 = nn.Conv2d(6, 16, 5)
    # an affine operation: y = Wx + b
    self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5X5 from image dimension 
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

  def forward(self, x):
    # Max pooling over (2, 2) window 
    x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
    # If the size is square, you can specify with single number 
    x = F.max_pool2d(F.relu(self.conv2(x)), 2)
    x = torch.flatten(x, 1) # flatten all dimensions except the batch dimention 
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


You just have to define the `forward` function, and `backward` function (where gradients are computed) is automatically defined for you using `autograd`. You can use any of the Tensor operations in the `forward` function. 

The learnable parameters of the model are returned by `net.parameters()`

In [3]:
params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weight 

10
torch.Size([6, 1, 5, 5])


Let's try a random 32x32 input. Note: expected input size of this net (LeNet) is 32x32. To use this net on the MNIST dataset, please resize the images from the dataset to 32x32.

In [4]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[-0.0342, -0.0147,  0.0803, -0.0666,  0.0221,  0.1259,  0.1522,  0.0921,
          0.0874,  0.0777]], grad_fn=<AddmmBackward>)


  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


Zero the gradient buffers of all parameters and backprops with random gradients:

In [5]:
net.zero_grad()
out.backward(torch.randn(1, 10))

> **Note:** `torch.nn` only supports mini-batches. The entire `torch.nn` package only supports inputs package that are a mini-batch of samples, and not a single sample. 

> For example, `nn.Conv2d` will take in a 4D Tensor of `nSamples x nChannels x Height x Width`. 

> If you have a single sample, just use `input.unsqueeze(0) to add a fake batch dimension. 


Before proceeding further, let us recap all the classes you have seen so far. 

**Recap:**
- `torch.Tensor` - A *multi-dimensional array* with support for autograd operations like `backward()`. Also, holds the *gradient* wrt the tensor. 
- `nn.Module` - Neural network module. *Convenient way of encapsulating parameters*, with helpers for moving them to GPU, exporting, loading, etc. 
- `nn.Parameter` - A kin of Tensor, that is *automatically* registered as a parameter when assigned as an attribue to a `Module`. 
- `autograd.Function` - Implements *forward* and *backward* definations of an autograd operation. Every `Tensor` operation creates at least single `Function` node that connects to functions that created a `Tensor` and encodes its history. 

**At this point, we covered:**
- Defining a neural network
- Processing inputs and calling backward 

**Still left** 
- Computing the loss 
- Updating the weights of the network

## Loss Function 

A loss functino takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target. 

There are several different [loss functions](https://pytorch.org/docs/stable/nn.html) under nn package. A sinple loss is: `nn.MSELoss` which computes the mean-squared error between the input and the target. 

For example: 

In [6]:
output = net(input)
target = torch.randn(10) # a dummy target, for example 
target = target.view(1, -1) # make it the same shape as output 
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(0.9315, grad_fn=<MseLossBackward>)


Now, if you follow `loss` in the backward direction, using its `.grad_fn` attribute, you will see a graph of computations that looks like this: 


  
    input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d 
          -> flatten -> linear -> relu -> linear -> relu -> linear 
          -> MSELoss 
          -> loss

So, when we call `loss.backward()` the whole graph is differentiated w.r.t. the neural net parameters, and all Tensors in the graph that have `requires_grad=True` will have their `.grad` Tensor accumulated with the gradient. 

For illustration, let us follow a few steps backward: 

In [9]:
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU

<MseLossBackward object at 0x7f00144accd0>
<AddmmBackward object at 0x7f00144acc50>
<AccumulateGrad object at 0x7f00144acb90>
