can be constructed using torch.nn package <br><br>
nn depends on autograd to define models and differentiate them. The nn.Module contain layer and a method forward(input) that returns an output<br><br>
A typical neural network training procedure:<br><br>
<ul>
    <li>1. Define a neural network that has some learnable parameter</li>
    <li>2. Iterate over a dataset of inputs</li>
    <li>Process input through network</li>
    <li>Compute loss</li>
    <li>Propagate gradients back into the network parameters</li>
    <li>Update weight of network, typically using a simple update rule. weight=weight-learningrate x gradient
    </ul>

In [42]:
#defining network
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        #1input image channel, 6 output channels, 3x3 convolution
        #kernel
#         self.conv1=nn.Conv2d(1,6,3)
#         self.conv2=nn.Conv2d(6,16,3)
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        #an affine operation: y=w*x+b
        self.fc1=nn.Linear(7200,120) #6x6 from image dimension
        self.fc2=nn.Linear(120,84)
        self.fc3=nn.Linear(84,10)
    
    def forward(self, x):
        #Max pooling over (2 x 2) window
        x=F.max_pool2d(F.relu(self.conv1(x)),(2,2))
        #If size is square, you can only specify single number
        X=F.max_pool2d(F.relu(self.conv2(x)),2)
        x=x.view(-1, self.num_flat_features(x))
        x=F.relu(self.fc1(x))
        x=F.relu(self.fc2(x))
        x=self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size=x.size()[1:] #all dimension except batch dimension
        num_features=1
        for s in size:
            num_features *=s
        return num_features

net=Net()
print(net)

Net(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=7200, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


###### We only need to define forward function. The backward function is automatically defined using autograd

In [43]:
#Learnable parameters are returned by net.parameters()
params=list(net.parameters())
print(len(params))
print(params[0].size())

10
torch.Size([32, 1, 3, 3])


###### Now try 32*32 input

In [44]:
#to use this on MNIST please resize image from dataset to 32*32
input=torch.randn(1,1,32,32)
out=net(input)
print(out)

tensor([[ 0.1287,  0.0478, -0.1340, -0.0307, -0.1510,  0.1099,  0.0204,  0.0143,
          0.0097, -0.0323]], grad_fn=<AddmmBackward>)


In [45]:
#zero the gradient buffers of all parameters and backprops with random gradients
net.zero_grad()
out.backward(torch.randn(1,10))

torch.nn only supports mini batches. The entire torch.nn package only suppots inputs that are a mini batch of samples, and not a single sample
<br><br>
Eg. nn.Conv2d will take in a 4D tensor of nSamples x nChannels x Height x Width<br><br>
If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension


# Loss function


Takes (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

<ul>
     <li>   torch.Tensor - A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient w.r.t. the tensor.</li>
      <li>  nn.Module - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.</li>
       <li> nn.Parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.</li>
        <li>autograd.Function - Implements forward and backward definitions of an autograd operation. Every Tensor operation creates at least a single Function node that connects to functions that created a Tensor and encodes its history.</li>
</ul>


In [52]:
#Considering MSEloss
output=net(input)
target=torch.randn(10) #dummy target
target=target.view(1,-1) #making it same shape as output
criterion=nn.MSELoss()

loss=criterion(output, target)
print(loss)

tensor(0.9404, grad_fn=<MseLossBackward>)


Now, if you follow loss in the backward direction, using its .grad_fn attribute, you will see a graph of computations that looks like this:<br><br>
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
<br>
      -> view -> linear -> relu -> linear -> relu -> linear<br>
      -> MSELoss<br>
      -> loss<br>

So, when we call loss.backward(), the whole graph is differentiated w.r.t. the loss, and all Tensors in the graph that has requires_grad=True will have their .grad Tensor accumulated with the gradient.

In [53]:
print(loss.grad_fn) #MSE loss
print(loss.grad_fn.next_functions[0][0]) #Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) 

<MseLossBackward object at 0x7ff14f1b2cc0>
<AddmmBackward object at 0x7ff14f1b2828>
<AccumulateGrad object at 0x7ff14f1b2cc0>


# Backpropagate

In [55]:
#To backpropagate the error all we have to do is to loss.backward()

#Clear existing gradient though first
net.zero_grad()


print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([ 0.0087,  0.0102,  0.0047,  0.0075,  0.0075,  0.0095, -0.0091, -0.0014,
        -0.0058, -0.0065, -0.0049, -0.0085,  0.0144, -0.0122, -0.0033,  0.0066,
        -0.0072, -0.0078,  0.0033,  0.0068,  0.0057,  0.0119, -0.0008, -0.0176,
         0.0077, -0.0030,  0.0047,  0.0027,  0.0038, -0.0047,  0.0044,  0.0038])


# Updating weights

The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):
<br>

    weight = weight - learning_rate * gradient
<br>
can be simply done using following code in python
   
       learning_rate = 0.01
       for f in net.parameters():
       f.data.sub_(f.grad.data * learning_rate)


However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package: torch.optim that implements all these methods. Using it is very simple:

In [56]:
import torch.optim as optim

#create optimizer
optimizer=optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update