<a href="https://colab.research.google.com/github/Shruti-Raj-Vansh-Singh/PyTorch-Tutorial/blob/master/PyTorch_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NEURAL NETWORKS
Neural Networks can be constructed using torch.nn package. \
nn depends on autograd to define a model and differentiate them. An nn.Module contains layers, and a method forward(input) that returns the output.

## define a neural network

In [9]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [10]:
class Net(nn.Module):

  def __init__(self):
    super(Net, self).__init__()
    # 1 input image, 6 filters, 3X3 kernel
    self.conv1 = nn.Conv2d(1,6,3)
    self.conv2 = nn.Conv2d(6,16,3)
    #an affine operation y= w*x + b
    self.fc1 = nn.Linear(16*6*6, 120)  #for 6X6 image
    self.fc2 = nn.Linear(120,84)
    self.fc3 = nn.Linear(84,10)

  def forward(self,x):
    #maxpooling with pool size 2X2
    x = F.max_pool2d(F.relu(self.conv1(x)),(2,2))
    x = F.max_pool2d(F.relu(self.conv2(x)),(2))   #in case square pool size we can specify only 1 number
    x = x.view(-1, self.num_flat_features(x))
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x
  
  def num_flat_features(self,x):
    size = x.size()[1:]         # we dont want the batch dimension
    num_features = 1
    for s in size:
      num_features*=s
    return num_features
  
net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


The backprop is computed automatically by autograd

The learnable parameters of the model are returned by net.parameters()

In [11]:
para = list(net.parameters())
print(len(para))
print(para[0].size())  #for conv1

10
torch.Size([6, 1, 3, 3])


random 32X32 input 

In [13]:
input = torch.rand(1,1,32,32)   #samples, channesl, height, width
out = net(input)
print(out)

tensor([[ 0.0501, -0.0101,  0.0341, -0.0379,  0.0629, -0.0435, -0.0348, -0.0989,
         -0.0470, -0.0881]], grad_fn=<AddmmBackward>)


Zero the gradient buffers of all parameters and backprops with random gradient

In [14]:
net.zero_grad()
out.backward(torch.randn(1,10))

torch.nn only supports mini batches. The entire torch.nn package only supports input that are in mini batch of samples and not a single sample \

eg: \
nn.Conv2d will take in a 4D tensor of no_samplesX no_channels X Height X width \
if we have a single sample we can use input.unsqueeze(0)


# LOSS FUNCTION
A loss function takes a pair of input (output, target) and computes a value that estimates how fat he output is from the target. \
There are several loss function in the nn package. \
eg: MSELoss - Mean square error between output and target

In [21]:
output = net(input)
target = torch.rand(10) #a dummy target
target = target.view(1,-1) #making it of the same shape as the output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)

tensor(0.4432, grad_fn=<MseLossBackward>)


In [23]:
print(loss.grad_fn)         #mse loss
print(loss.grad_fn.next_functions[0][0])    #linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])   #relu

<MseLossBackward object at 0x7fbc4507ad30>
<AddmmBackward object at 0x7fbc4507a1d0>
<AccumulateGrad object at 0x7fbc4507ad30>


# BACKPROP
To back propogate the error we have to use loss.backward(). \
We will have to clear the existing gradients though, otherwise gradients will be accumulated to existing gradients.

In [25]:
net.zero_grad()   #zeros the gradient buffer of all parameter
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([ 0.0028,  0.0000, -0.0048,  0.0018,  0.0012,  0.0075])


# UPDATING THE WEIGHT

weight = weight - learning_rate*gradient

In [26]:
learning_rate =   0.01
for f in net.parameters():
  f.data.sub_(f.grad.data*learning_rate)

in built optimizers

In [27]:
import torch.optim as optim

#create optimizer
optimizer  = optim.SGD(net.parameters(), lr=0.01)

#inside the training loop
optimizer.zero_grad()   #zero gradient to buffer
output = net(input)
loss = criterion(output,target)
loss.backward()
optimizer.step()      #for update