# Build the Neural Network
Neural networks comproses of layers/modules that perform operations on data. The [torch.nn](https://pytorch.org/docs/stable/nn.html) namespace provides all the building blocks for neural network design. Every module in Pytorch subclasses the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) is a neural network module that consists of other layers. This nested structure allows for building and managing complex achitectures.

In the following sections contribute to a sample neural network to classify images from the FashionMNIST dataset.

In [1]:
# The imports
import os
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

## Get Device for Training
We want to be able to train our model on a hardware accelerator like a GPU, if it is available. To do this, we use [torch.cuda](https://pytorch.org/docs/stable/notes/cuda.html) to check if one is available.

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using {device} device")

Using cuda device


## Define the class
We define a neural network by subclassing `nn.Module`, and initializing the neural network layers in `__init__`. Every `nn.Module` subclass implements the operations on input data in the `forward` method.

In [3]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512,512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )
        
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

We create an instance of `NeuralNetwork`, and move it to the `device`, and print its structure.

In [4]:
model = NeuralNetwork().to(device)
print (model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


To use this model, we pass the input data. This executes the model's `forward`, along with some [background operations](https://github.com/pytorch/pytorch/blob/270111b7b611d174967ed204776985cefca9c144/torch/nn/modules/module.py#L866)(including automated back-propogation). Note: Do not call `model.forward()` directly unless skipping background operations is intended.

Calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class. We get the prediction probabilities by passing it through an instance of the `nn.Softmax` module (usually included in the NeuralNetwork depending on the setup).

In [5]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_prob = nn.Softmax(dim=1)(logits)
y_pred = pred_prob.argmax(1)
print (f"Predicted class: {y_pred}.")

Predicted class: tensor([5], device='cuda:0').


## Model Layers
Let's break down the layers in the FashionMNIST model. To illustrate it, we will take a sample minibatch of 3 images of size 28x28 each and see what happens to it as we pass it through the network.

In [6]:
input_image = torch.rand(3,28,28)
print (input_image.size())

torch.Size([3, 28, 28])


### nn.Flatten
We initialize the `nn.Flatten` layer and convert each 2D 28x28 image to a contiguous array of 784 pixel values (the minibatch dimension (at dim=0) is maintained.

In [7]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print (flat_image.size())

torch.Size([3, 784])


### nn.Linear
The [linear layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) is a module that applies a linear transformation on the input using its stored weights and biases.

In [8]:
layer1 = nn.Linear(in_features=28*28, out_features = 20)
hidden1 = layer1(flat_image)
print (hidden1.size())

torch.Size([3, 20])


### nn.ReLU
Non-linear activations are what create the complex mappings between model's inputs and outputs. They are applied after linear transformations to introduce *non-linearity* (stacking only linear opperations result in linear transformations, thus being ultimately useless. Non-linearity allows the model to represent complex functions.), helping the neural network learn a wide variety of phenomena.

In this model we use [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) between our linear layers, but there are other non-linear functions like `nn.ELU`, `nn.LeakyReLU`, `nn.Tanh`, `nn.GLU`, etc. More information about non-linear activations can be found in the [torch.nn](https://pytorch.org/docs/stable/nn.html) documentation.

In [9]:
print (f"Before ReLU: {hidden1}.\n\n")
hidden1 = nn.ReLU()(hidden1)
print (f"After ReLU: {hidden1}.")

Before ReLU: tensor([[-5.0557e-01, -1.7142e-01,  6.6578e-01,  2.5479e-01,  3.3269e-01,
         -1.5199e-01, -3.9121e-01, -6.0498e-02,  1.1474e-01,  2.1987e-01,
         -4.8143e-01,  7.0115e-01,  3.7391e-01,  7.6089e-02,  5.5664e-01,
         -8.2708e-02,  4.5848e-01, -2.1165e-01, -7.7354e-02, -4.5861e-01],
        [-3.7117e-01, -3.2021e-04,  4.4860e-01,  6.6740e-01,  9.8738e-01,
          1.3331e-01, -7.4467e-01, -1.4885e-01,  5.4119e-02, -9.8745e-02,
         -7.4164e-01,  3.5549e-01,  3.2751e-01, -1.2333e-01,  7.7946e-01,
          2.1716e-01,  2.8492e-01, -5.5487e-02,  4.8754e-02, -3.9980e-01],
        [-4.9006e-01,  6.0524e-02,  3.1122e-01,  7.8763e-01,  3.9474e-01,
          6.7923e-02, -5.8953e-01,  7.0271e-02,  5.9303e-02,  1.1663e-01,
         -3.8154e-01,  5.9758e-01, -1.5838e-01, -1.6142e-01,  7.2429e-01,
          5.3106e-02,  5.0966e-01, -2.9236e-01,  1.9882e-01, -3.1382e-01]],
       grad_fn=<AddmmBackward0>).


After ReLU: tensor([[0.0000, 0.0000, 0.6658, 0.2548, 0.3327

### nn.Sequential
[nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) is an ordered container of modules. The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network like `seq_modules`.

In [10]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20,10),
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

### nn.Softmax
The last layer of the neural network returns *logits* - raw values in \[-infty, infty\] - which are passed to the [nn.Softmax](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) module. The logits are scaled to values \[0,1\] (which is the [softargmax](https://en.wikipedia.org/wiki/Softmax_function)) representing the model's prediction for each class. `dim` parameter indicates the dimension along which the values must sum to 1.

In [11]:
nn_softargmax = nn.Softmax
softargmax = nn_softargmax(dim=1)
pred_prob = softargmax(logits)

## Model Parameters
Many layers inside a neural network are *parameterized*, i.e. have associated weights and biases that are optimized during training. Subclassing `nn.Module` automatically tracks all fields defined inside the model object, and makes all the parameters accissible using the model's `parameters()` or `named_parameters()` methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.

In [12]:
print (f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print (f"Layer: {name} | Size: {param.size()} | Values: {param[:2]}\n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values: tensor([[ 0.0198,  0.0242, -0.0213,  ...,  0.0192,  0.0120, -0.0115],
        [-0.0097, -0.0106, -0.0096,  ..., -0.0047,  0.0187,  0.0199]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values: tensor([0.0021, 0.0158], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values: tensor([[-0.0375,  0.0182,  0.0351,  ...,  0.0093, -0.0031, -0.0367],
        [-0.0256,  0.0203,  0.0315,  ..., -0.0142, -0.0001, -0.0412]],
       device='cuda:0', grad_fn=<SliceBack