# Build The Neural Network
Neural networks comprise of layers that perfrom operations on data. The torch.nn namespace provides all the building blocks you need to build your own neural network. A neural network is a module itself that consists of other modules. The nested structure allows for building and mangaing complex architecutres easily 
<p> In the following sections, we will build a neural network to classify images in the FashionMNIST dataset 

In [1]:
import os 
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms 

## Get Device for training 
We want to be able to train our model on a GPU. To do this we have to check that torch.cuda is available, else we continue on the CPU

In [2]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using {} device'.format(device))

Using cuda device


## Define the Class 
We define our neural network by subclassing nn.module, and initalizing the neural network layers init. Every nn.Module subclass implements the operations on input data in the forward method 

In [3]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork,self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
        nn.Linear(28*28,512),
        nn.ReLU(),
        nn.Linear(512,512),
        nn.ReLU(),
        nn.Linear(512,10),
        nn.ReLU()
        )
    def forward(self,x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits 

We create an instance of NeuralNetwork, and move it to the decive and print its structure 

In [4]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)


To use the model, we pass it the input data. This executes the models forward along with some backpropagation operations. Caling the model on the input returns a 10 dimensional tensor with raw predicted values for each class. We get the production probabilites by passing it thourgh an instance of the nn.Softmax module 


In [5]:
X = torch.rand(1,28,28,device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted calss: {y_pred}")

Predicted calss: tensor([3], device='cuda:0')


## Model Layers 
Lets break down the layers in the FashionMNIST model. To illustrate it, we will take a sample minibatch of 3 images of size 28x28 and see what happens to it as we pass through the network 


In [6]:
input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


## nn.Flatten 
We initalize the nn.Flatten layer to convert each 2D 28 x 28 image into a contiguous array of 784 pixel values 


In [7]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


## nn.Linear 
The linear layer is a module that applies a linear transfomration on the input using its stored weights and biases 


In [8]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


## nn.ReLU
Non-linear activations are what create the complex mappings between the models inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena. In this model we use nn.ReLU between our linear layers, but there is other activations to introduce non linearity in your model

In [9]:
print(f"Before ReLU: {hidden1} \n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU:{hidden1}")

Before ReLU: tensor([[-0.4856, -0.1061, -0.1030,  0.6917, -0.3450,  0.4654,  0.0239,  0.0550,
         -0.2685,  0.5198, -0.1533,  0.4484,  0.3980,  0.0649,  0.2138,  0.3012,
         -0.0321,  0.4431, -0.2107, -0.3729],
        [-0.2920, -0.0472, -0.3578,  0.1594, -0.2229,  0.1289, -0.0276, -0.1596,
         -0.1361,  0.2998, -0.0123, -0.1129,  0.1103,  0.1302, -0.0641,  0.3832,
          0.1046,  0.1503,  0.0341, -0.0296],
        [-0.0840, -0.2090, -0.2601,  0.1261, -0.5185, -0.1118, -0.0988,  0.2570,
         -0.2266,  0.4744, -0.1395,  0.1325,  0.3224,  0.4505,  0.1592,  0.0713,
         -0.0541,  0.3423, -0.4992, -0.2150]], grad_fn=<AddmmBackward>) 


After ReLU:tensor([[0.0000, 0.0000, 0.0000, 0.6917, 0.0000, 0.4654, 0.0239, 0.0550, 0.0000,
         0.5198, 0.0000, 0.4484, 0.3980, 0.0649, 0.2138, 0.3012, 0.0000, 0.4431,
         0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.1594, 0.0000, 0.1289, 0.0000, 0.0000, 0.0000,
         0.2998, 0.0000, 0.0000, 0.1103, 0.1302, 0.000

## nn.Sequential 
n.Sequential is an ordered container of modules. The data is passes through all the modules in the same rder as defined. You can suquential containers to put together a quick network like seq_modules 

In [10]:
seq_modules = nn.Sequential(
flatten,
layer1,
nn.Linear(20,10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)


## nn.Softmax 
The last layer of the neural network returns logits - raw values in [-infinity, infitiy] which are passed on the nn.Softmax module. The logits are scaled to values [0,1] representing the model's predictied probabilities for each class. dim parameter indicated the fimension along which the values must sum to 1 

In [11]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

## Model Parmeters 
Many layers inside a neural network are parameterized, i.e have associated weights and biases that are optimized during trainging Subclassing nn.Module automatically tracks defined inside your model object, and makes all parameters accessible using your models parameters() or named_parameters() methods 


In [13]:
print('Module Structure:',model, '\n\n')
for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size} | Values: {param[:2]} \n")

Module Structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
) 


Layer: linear_relu_stack.0.weight | Size: <built-in method size of Parameter object at 0x000001FAFC2A8400> | Values: tensor([[-0.0191, -0.0246, -0.0105,  ...,  0.0002, -0.0156,  0.0342],
        [ 0.0015,  0.0018, -0.0253,  ..., -0.0254, -0.0276, -0.0092]],
       device='cuda:0', grad_fn=<SliceBackward>) 

Layer: linear_relu_stack.0.bias | Size: <built-in method size of Parameter object at 0x000001FAFC2A81C0> | Values: tensor([ 0.0097, -0.0169], device='cuda:0', grad_fn=<SliceBackward>) 

Layer: linear_relu_stack.2.weight | Size: <built-in method size of Parameter object at 0x000001FAD74A2C80> | Values: tensor([[-0.0438,  0.0257,  0.03