## Neural Networks
<hr>
In PyTorch, neural networks are constructed using the torch.nn package, which utilize PyTorch's built in automatic differentiation engine (pytorch's autograd)

There are a a few basic steps in training a NN:

1. Build a neural network with adjustable weights.

<u>For each piece of data in your datasets</u>

2. **Make a Prediction**: Pass the input through the network to get an output.

3. **Measure Error**: Compare the output to the correct answer using a loss function. \\

4. **Learn from Mistakes**: Compute how much each weight contributed to the error (this is the gradient). \\

5. **Improve**: Adjust the weights by subtracting a small fraction of the gradient (this fraction is the learning rate).




In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

  def __init__(self):
    super(Net, self).__init__()

    # kernel (a small matrix that slides over each pixel of the image. this allows it to look at  a small region around a given pixel to extract more information)
    # not just limited to images, we can slide kernels over audios and videos
    self.conv1 = nn.Conv2d(1, 6, 5)
    self.conv2 = nn.Conv2d(6, 16, 5)

    self.fc1 = nn.Linear(16 * 5 * 5, 120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

  def forward(self, input):
    c1 = F.relu(self.conv1(input))
    s2 = F.max_pool2d(c1, (2, 2))
    c3 = F.relu(self.conv2(s2))
    s4 = F.max_pool2d(c3, 2)
    s4 = torch.flatten(s4, 1)
    f5 = F.relu(self.fc1(s4))
    f6 = F.relu(self.fc2(f5))

    output = self.fc3(f6)
    return output

net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


params[0] is the list of weights

In [2]:
params = list(net.parameters())
print(len(params))
print(params[0].size())

10
torch.Size([6, 1, 5, 5])


testing a random input

In [9]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[-0.0269, -0.0766,  0.0478, -0.0963, -0.0346, -0.0355, -0.0733, -0.0697,
         -0.2306,  0.1410]], grad_fn=<AddmmBackward0>)


In training, you repeatedly perform a forward pass followed by a backward pass for every mini-batch of data. You do this across multiple epochs until the network learns and converges

In [10]:
net.zero_grad()
out.backward(torch.randn(1, 10))

**recap so far:** \\
torch.Tensor - A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient with respect to the tensor. \\

nn.Module - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc. \\

nn.Parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module. \\

autograd.Function - Implements forward and backward definitions of an autograd operation. Every Tensor operation creates at least a single Function node that connects to functions that created a Tensor and encodes its history. \\

## Computing the loss

A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target

In [14]:
output = net(input)
target = torch.randn(10)
target = target.view(1, -1)
criterion = nn.MSELoss() # Mean-squared error between the output and the target

loss = criterion(output, target)
print(loss)

tensor(1.4355, grad_fn=<MseLossBackward0>)


In [15]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

<MseLossBackward0 object at 0x7da7aeeba4d0>
<AddmmBackward0 object at 0x7da7aeebace0>
<AccumulateGrad object at 0x7da7aeeba4d0>


## Backprop

before backpropagating the error, we have to clear existing gradients by calling net.zero_grad(), otherwise gradients will be accumuluated to existing gradients

In [16]:
net.zero_grad()

print(net.conv1.bias.grad)

loss.backward()

print(net.conv1.bias.grad)

None
tensor([ 0.0288, -0.0039,  0.0201,  0.0103,  0.0049,  0.0260])


## update weights

we'll use SGD, the simplest update rule used in practice

weight = weight - learning_rate * gradient

In [17]:
learning_rate = 0.01
for f in net.parameters():
  f.data.sub_(f.grad.data * learning_rate)

using PyTorch's built in update rules packages

In [18]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update