# Neural Networks
Neural networks can be constructed using torch.nn package.

* autograd: defines models and differentiates them
* nn.Module: contains layers, and a method forward(input) that returns the output

A typical training procedure for a neural network is as follows:

1. define the neural network that has some learnable parameters(or weights)
2. iterate over a dataset of inputs
3. process input through the network(forward pass)
4. compute the loss(how far is the output from being correct)
5. propagate gradients back into the network's parameters(backward pass)
6. update the weights of the network, typically using a simple update rule: weight -= learning_rate * gradient

Main objects and modules in PyTorch

* **torch.Tensor** - A multi-dimensional array with support for autograd operations like *backward()*. Also holds the gradient w.r.t. the tensor.
* **nn.Module** - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
* **nn.Parameter** - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.
* **autograd.Function** - Implements forward and backward definitions of an autograd operation. Every Tensor operation, creates at least a single Function node, that connects to functions that created a Tensor and encodes its history.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

## Define the Network

You just have to define the **forward** function, and the **backward** function (where gradients are computed) is automatically defined for you using **autograd**. You can use any of the **Tensor** operations in the forward function.

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 1 channel, 6 output channels, 5x5 square convoluation
        self.conv1 = nn.Conv2d(1, 6, 5) 
        self.conv2 = nn.Conv2d(6, 16, 5)
        
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # the square: only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:] # except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

In [None]:
net = Net()
print(net)

In [None]:
# list the net parameters
params = list(net.parameters())
for param in params:
    print(param.size())

## Feed the Network and Process forward

In [None]:
# feed random data into the network
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

In [None]:
# zero the gradient buffers of all parameters and backprops with random gradients
net.zero_grad()
out.backward(torch.randn(1, 10))

## Compute the Loss

A *loss function* takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target. There are several different loss functions under the nn package. A simple loss is: nn.MSELoss, which computes the mean-squared error between the input and the target.

In [None]:
output = net(input)
target = torch.randn(10)
target = target.view(1, -1)
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

In [None]:
print(loss.grad_fn)
print(loss.grad_fn.next_functions[0][0])
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])

## Back Propagation

To backpropagate the error all we have to do is to *loss.backward()*. You need to clear the existing gradients though, else gradients will be accumulated to existing gradients. All Tensors in the graph that has *requires_grad=True* will have their *.grad* Tensor accumulated with the gradient.

In [None]:
net.zero_grad()

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

## Update the Weights
The simplest update rule used in practice is the Stochastic Gradient Descent (SGD): weight = weight - learning_rate * gradient

In [None]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package: *torch.optim* that implements all these methods. Using it is very simple:

In [None]:
import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.01)

In [None]:
optimizer.zero_grad()
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()

In [None]:
print(net.conv1.bias.data)