<a href="https://colab.research.google.com/github/BedinEduardo/Colab_Repositories/blob/master/60_min_blits_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks

NN can be constructed using the `torch.nn` package.

`nn` depends on `autograd` to define models and differentiate them.
An `nn.Module` contains layers, and a method `forward(input)` that returns `output`.

A typical training procedure for a NN is as follows:

* Define the NN that has some learnable parameters (or weights)
* Iterate over a dataset of inputs
* Process input through the network
* Compute the loss (how far is the output from being correct)
* Propagate gradients back into the network's parameters
* Update the weights of the network, typically using a simple update rule: `weight = weight - learning_rate * gradient`

## Defining the Network

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [None]:
class Net(nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    # input image channel, 6 output channels, 5x5 square convolution
    # kernel
    self.conv1 = nn.Conv2d(1,6,5)
    self.conv2 = nn.Conv2d(6,16,5)
    # an affine operation: y = Wx + b
    self.fc1 = nn.Linear(16*5*5,120)  # 5*5 from image dimension
    self.fc2 = nn.Linear(120,84)
    self.fc3 = nn.Linear(84,10)

  def forward(self, input):
    # Convolution layer C1; 1 input image channel, 6 output channels
    # 5x5 square convolution, it uses ReLU activation function
    # outputs a Tensor with size (N, 6, 28, 28), where N is the size of the batch
    c1 = F.relu(self.conv1(input))
    # subsampling layerS2: 2x2 grid, purely functional
    # this layer does not have any parameter, and outputs a (N,6,14,14) Tensor
    s2 = F.max_pool2d(c1,(2,2))
    # convolution layer C3: 6 input channel, 16 output channels
    # 5x4 square convolution, it uses ReLU activation function
    # outputs a (N, 16, 10, 10) Tensor
    c3 = F.relu(self.conv2(s2))
    # subsampling layer S4: 2x2 grid, purely functional
    # this layer does not have any parameter, and outputs a (N, 16,5,5) tensor
    s4 = F.max_pool2d(c3,2)
    #Faltten operation: purely functional, outputs a (N, 400) Tensor input
    # and outputs a (N, 120) Tensor, it uses ReLU activation function
    s4 = torch.flatten(s4,1)

    f5 = F.relu(self.fc1(s4))
    # fully connected layer F6: (N,120) Tensor input
    # and outputs a (N,84) Tensor, it uses ReLU activation function
    f6 = F.relu(self.fc2(f5))
    # Gaussian layer OUTPUT: (N,84) Tensor input, and
    # outputs a (N,10) Tensor
    output = self.fc3(f6)
    return output


In [None]:
net = Net()

In [None]:
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


Defined the `forward` function - gradients are computed

The `forward` function is automatically defined for you using `autograd`.


In [None]:
params = list(net.parameters())
print(params)

[Parameter containing:
tensor([[[[-0.1289, -0.1922,  0.0675, -0.0346,  0.1895],
          [-0.1354, -0.0936, -0.1202, -0.0939,  0.1918],
          [ 0.1764,  0.0061, -0.0404, -0.0570,  0.0860],
          [-0.0106, -0.0181, -0.0172,  0.1913, -0.0198],
          [-0.1614, -0.1179,  0.0892,  0.0880,  0.0052]]],


        [[[ 0.0114, -0.1236, -0.1464, -0.1403,  0.1469],
          [ 0.0957, -0.0498, -0.1180,  0.0144,  0.1673],
          [-0.0158, -0.0606, -0.1491,  0.0504,  0.0668],
          [ 0.0622, -0.1026,  0.1005, -0.0071, -0.0368],
          [-0.0810,  0.1141,  0.0595, -0.0896,  0.1288]]],


        [[[ 0.1552,  0.0333,  0.0306,  0.0199,  0.1425],
          [-0.1677, -0.0734,  0.1480, -0.1355, -0.1569],
          [ 0.0871, -0.1008,  0.1446,  0.1421,  0.1361],
          [ 0.1672, -0.0554, -0.0875,  0.1406, -0.0253],
          [-0.1016,  0.1817, -0.0566,  0.0030, -0.1436]]],


        [[[ 0.0514, -0.0428,  0.1075,  0.1635, -0.0756],
          [-0.0035,  0.0897, -0.0825, -0.0163,  0.086

In [None]:
print(len(params))

10


In [None]:
print(params[0].size())  #conv1's .weight

torch.Size([6, 1, 5, 5])


In [None]:
input = torch.randn(1,1,32,32)
out = net(input)

print(out)

tensor([[ 0.1023, -0.0090,  0.0497, -0.1490, -0.1222,  0.0969, -0.1421, -0.0969,
          0.0140,  0.0330]], grad_fn=<AddmmBackward0>)


Now zeroing the gradients buffers of all parameters and backprops with random gradients.

In [None]:
net.zero_grad()
out.backward(torch.randn(1,10))


**Recap:**
* `torch.Tensor` - A multi-dimensional *array* with support for autograd operation like `backward()`. Also *holds the gradient* w.r.t tensor.
* `nn.Module` - NN module. *Convenient way of encapsulating parameters*, with helpers for moving them to GPU, exporting, loading, etc.
* `nn.Parameter` - A kind of Tensor, that is *automatically registered as a parameter when assingned as an attribute to a* `Module`.
* `autogtrad.Function` - Implements *forward and backward definitions of an autograd operation*. Every `Tensor` operation createst at least a single `Function` node that connects to functions that created a `Tensor` and *encodes its history*.

## Loss Function

A loss function takes the - output, target - pair of inputs, and computes a value that estimates how far away the output is from the target.

There are several different `loss functions` under nn package

In [None]:
output = net(input)
target = torch.randn(10) # a dummy tar
target = target.view(1,-1)
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(0.3206, grad_fn=<MseLossBackward0>)


In [None]:
# For illustration, follow a few steps backward:
print(loss.grad_fn)   #MSELoss
print(loss.grad_fn.next_functions[0][0])  #Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])   #ReLU

<MseLossBackward0 object at 0x7e9a8b321600>
<AddmmBackward0 object at 0x7e9a8b321900>
<AccumulateGrad object at 0x7e9a8b321600>


## Backprop

To backpropagate the error all we have to do is to `loss.backward()`
You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

Now we shall call `loss.backward()`, and have a look at conv1's bias gradients before and after the backward.

In [None]:
net.zero_grad()    # zeroes the gradient buffers of all parameters

print('conv.bias.grad before backward')
print(net.conv1.bias.grad)

conv.bias.grad before backward
None


In [None]:
loss.backward()  # backward

In [None]:
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad after backward
tensor([ 0.0012, -0.0134, -0.0056, -0.0024,  0.0051,  0.0024])


How to use loss functions.

**The only thing left to learn is:**
* Updating the weights of the NN

## Update the weights

The simplest update rule used in practice is the SGD - Stochastic Gradient Descent

```python
weight = weight - learning_rate * gradient

```

In [None]:
# can implement this using simple Python Code:
learning_rate = 0.01
for f in net.parameters():
  f.data.sub_(f.grad.data * learning_rate)

However, as you use NN, you want to use various different update rules sucha as SGD, Nesterov-SGD, Adam, RMSProp.

In [None]:
import torch.optim as optim

# build your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()