### Neural Networks
Neural networks can be constructed using the **torch.nn** package

A typical training procedure for a neural network is as follows:
* Define the neural network that has some learnable parameters (or weights)
* Iterate over a dataset of inputs
* Process input through the network
* Compute the loss (how far is the output from being correct)
* Propagate gradients back into the network’s parameters
* Update the weights of the network, typically using a simple update rule:

weight = weight + learning_rate * gradient

**Required steps in defining network:**
* define the forward function, and the backward function

### Define the network

In [1]:
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

In [2]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

In [3]:
net = Net()
print(net)

Net (
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear (400 -> 120)
  (fc2): Linear (120 -> 84)
  (fc3): Linear (84 -> 10)
)


The learnable parameters of a model are returned by **net.parameters()**

In [6]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 5, 5])


In [8]:
# The input to the forward is an "autograd.Variable", and so is the output.
input = Variable(torch.randn(1, 1, 32, 32))
out = net(input)
print(out)

Variable containing:
-0.0369 -0.1117 -0.0400  0.1009 -0.1133  0.0894  0.1449 -0.0303 -0.0012  0.0786
[torch.FloatTensor of size 1x10]



In [9]:
# Zero the gradient buffers of all parameters and backprops with random gradients:
net.zero_grad()
out.backward(torch.randn(1, 10))

### Loss Function
A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

**nn.MSELoss**  which computes the mean-squared error between the input and the target

In [12]:
torch.arange(1, 11)


  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
[torch.FloatTensor of size 10]

In [13]:
output = net(input)
target = Variable(torch.arange(1, 11))  # a dummy target, for example
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

Variable containing:
 38.1994
[torch.FloatTensor of size 1]



In [21]:
print(net)

Net (
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear (400 -> 120)
  (fc2): Linear (120 -> 84)
  (fc3): Linear (84 -> 10)
)


In [22]:
print(loss.creator)  # MSELoss
print(loss.creator.previous_functions[0][0])  # Linear
print(loss.creator.previous_functions[0][0].previous_functions[0][0])  # ReLU

<torch.nn._functions.thnn.auto.MSELoss object at 0x7f6c71af4588>
<torch.nn._functions.linear.Linear object at 0x7f6c71af43c8>
<torch.nn._functions.thnn.auto.Threshold object at 0x7f6c71af42e8>


### Backprop
use loss.backward()

In [23]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

In [24]:
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
Variable containing:
 0
 0
 0
 0
 0
 0
[torch.FloatTensor of size 6]



In [25]:
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad after backward
Variable containing:
1.00000e-02 *
 -1.9524
  7.5726
  1.7037
 -1.3146
  2.3099
  9.6973
[torch.FloatTensor of size 6]



**Read Later:**

  The neural network package contains various modules and loss functions
  that form the building blocks of deep neural networks. A full list with
  documentation is `here <http://pytorch.org/docs/nn>`

### Update the weights
weight = weight - learning_rate * gradient

use **torch.optim** package.

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

In [28]:
list(net.parameters())

[Parameter containing:
 (0 ,0 ,.,.) = 
   0.0873 -0.0059 -0.0492 -0.0136  0.1793
  -0.1611 -0.0533 -0.1136 -0.0417  0.0482
  -0.0747  0.1269  0.1576 -0.1951 -0.1203
  -0.1475  0.1606 -0.0976  0.1815  0.1100
   0.1899  0.1216 -0.1929 -0.1596  0.1579
 
 (1 ,0 ,.,.) = 
  -0.1146  0.0567  0.0032  0.0509  0.0041
  -0.1551  0.0860 -0.1748  0.0678  0.1489
  -0.0924 -0.0004 -0.0500 -0.1250  0.1926
   0.1996  0.0404  0.0676 -0.1631  0.0479
  -0.1213  0.0513 -0.0763  0.1673 -0.0726
 
 (2 ,0 ,.,.) = 
  -0.1124  0.0551 -0.1044 -0.1034  0.1417
   0.0866 -0.0229 -0.0657  0.1107  0.1507
  -0.0354  0.1142  0.1559  0.1049 -0.0169
   0.1865  0.1946 -0.1684  0.1034 -0.1876
  -0.1860 -0.0243  0.1977 -0.0003  0.1304
 
 (3 ,0 ,.,.) = 
   0.1723  0.1554 -0.1660  0.1960 -0.1823
  -0.1088 -0.0267 -0.0900 -0.0803  0.0254
  -0.1217  0.0401  0.0257 -0.0709 -0.0452
   0.1383  0.1107  0.1414 -0.0501 -0.1518
   0.0832 -0.0843 -0.0203 -0.1782  0.0806
 
 (4 ,0 ,.,.) = 
   0.0699 -0.1080  0.1186 -0.0550  0.0039
  -0.04

In [32]:
# list the size of each param
for para in list(net.parameters()):
    print(para.size())

torch.Size([6, 1, 5, 5])
torch.Size([6])
torch.Size([16, 6, 5, 5])
torch.Size([16])
torch.Size([120, 400])
torch.Size([120])
torch.Size([84, 120])
torch.Size([84])
torch.Size([10, 84])
torch.Size([10])


In [33]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update