# Neural Networks

## Introduction

Users can use the `torch.nn` package to simplify the construction of 
neural networks. This package works similarly to Keras' subclassing
API, but is simpler because it doesn't rely on the TF compute graph 
and is eager by default. You know, without requiring `tf.function`
to be remotely performant on custom models. 

*Worth noting, it's also possible to use the Torch Sequential model, 
like Keras Sequential*.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

## Example

We can start with an example. In Torch, models (and layers) are subclassed from
`nn.Module`. For example, the class `MaxPool2d` 
[inherits](https://github.com/pytorch/pytorch/blob/1a74bd407de335019afdcb748a758107092a8019/torch/nn/modules/pooling.py#L79)
from `nn.Module` via `_MaxPoolNd`. 

In this example, we define the layers in the `__init__` methods. We have:

- `convolution_1`: Layer for one `input_channels` many input channels and
6 output channels.
- `convolution_2`: Layer for the 6 inputs and 16 outputs. 

Each convolution layer has a $3 \times 3$ kernel.

Then we construct the dense layers. 

- `dense_1`: There are 16 channels from the last convolution. In the toy example,
each image is $6 \times 6$ (with pooling). Thusly, we have $16 \times 6 \times 6$ input nodes. 
Finally, we have 120 outputs.
- `dense_2`: 120 inputs and 84 outputs.
- `classifier`: The final layer with the classification. 

Note that all of the `Linear` [layers](https://en.wikipedia.org/wiki/Affine_transformation) 
apply an [affine transform](https://en.wikipedia.org/wiki/Affine_transformation). In addition,
the `Conv2d` [layers](keras.layers.MaxPooling2D(pool_size=(2, 2))
apply a 2D [convolution](https://en.wikipedia.org/wiki/Convolution) over an input plane.

In the `forward` method, we compute the forward pass. This includes two 
[max pooling](https://computersciencewiki.org/index.php/Max-pooling_/_Pooling)
operations. In addition, each convolution layer has a 'ReLU' activation applied. 
This is equivalent to:

```
model = Sequential([
    Conv2d(input_channels, activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2d(6, activation='relu'),
    MaxPooling2d(pool_size=(2, 2)),
    Flatten(),
    Dense(120, activation='relu'),
    Dense(84, activation='relu'),
    Dense(10)
])
```

In [2]:
class ConvolutionalNN(nn.Module):
    def __init__(self, input_channels=1):
        super(ConvolutionalNN, self).__init__()
        # 6 output channels, 3x3 convolution.
        self.convolution_1 = nn.Conv2d(input_channels, 6, 3)
        # 6 input channels from the previous layer, 16 output channels.
        self.convolution_2 = nn.Conv2d(6, 16, 3)
        # Linear layers are affine transforms. No non-linearity.
        # 16 out chans, 6x6 images. 120 outputs.
        self.dense_1 = nn.Linear(16 * 6 * 6, 120)
        self.dense_2 = nn.Linear(120, 84)
        self.classifier = nn.Linear(84, 10)
        
    def forward(self, x):
        # 2x2 window
        x = F.max_pool2d(F.relu(self.convolution_1(x)), (2, 2))
        # If the window is square, you can specify a single number
        x = F.max_pool2d(F.relu(self.convolution_2(x)), 2)
        # flatten
        x = x.view(-1, self._num_flat_features(x))
        x = F.relu(self.dense_1(x))
        x = F.relu(self.dense_2(x))
        x = self.classifier(x)
        return x
    
    def _num_flat_features(self, x):
        # All sizes but batch
        size = x.size()[1:]
        feature_count = 1
        for dim in size:
            feature_count *= dim
        return feature_count

In [3]:
model = ConvolutionalNN()
print(model)

ConvolutionalNN(
  (convolution_1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (convolution_2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (dense_1): Linear(in_features=576, out_features=120, bias=True)
  (dense_2): Linear(in_features=120, out_features=84, bias=True)
  (classifier): Linear(in_features=84, out_features=10, bias=True)
)


We define the forward function, and autograd is able to supply the
`backward` pass. We can then get the learnable parameters with 
`.parameters()`.

In [4]:
print(f'[+] Model Parameters:')
for index, parameter in enumerate(model.parameters()):
    print(f'\t[+] Param Size {index}: {parameter.size()}')

[+] Model Parameters:
	[+] Param Size 0: torch.Size([6, 1, 3, 3])
	[+] Param Size 1: torch.Size([6])
	[+] Param Size 2: torch.Size([16, 6, 3, 3])
	[+] Param Size 3: torch.Size([16])
	[+] Param Size 4: torch.Size([120, 576])
	[+] Param Size 5: torch.Size([120])
	[+] Param Size 6: torch.Size([84, 120])
	[+] Param Size 7: torch.Size([84])
	[+] Param Size 8: torch.Size([10, 84])
	[+] Param Size 9: torch.Size([10])


All of the components in the `nn` package expect data to be fed in batches. 
So, our model expects data of the form:

`samples` $\times$ `channels` $\times$ `height` $\times$ `width`.

When you need to feed in a single sample, just wrap that sample in a fake batch. 
This can be done easily with `data.unsqueeze(0)`.

In [5]:
data = torch.randn((1, 1, 32, 32))
out = model(data)
print(out)

tensor([[ 0.0634, -0.0135,  0.1940, -0.0105,  0.0673,  0.0047,  0.0130, -0.0793,
          0.0876,  0.0252]], grad_fn=<AddmmBackward>)


Now we can zero the gradient buffers and run a backward pass.

In [6]:
model.zero_grad()
# backprop with random grads
out.backward(torch.randn(1, 10))