# Build the Neural Network

Neural networks comprise of layers/modules that perform operations on data.
The `torch.nn` namespace provides all the building blocks you need to build your own neural network.

In the following sections, we'll build a neural network to classify images in the FashionMNIST dataset.

In [5]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

## Get Device for Training
We want to be able to train our model on a hardware accelerator like the GPU, if it is available.

In [3]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cuda device


## Define the Class
We define our neural network by subclassing `nn.Module`, and initialize the neural network layers in
`__init__`. Every `nn.Module` subclass implements the operations on input data in the `forward` method.

In [6]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        # logits mean raw predictios which come out the last layer of the neural network
        logits = self.linear_relu_stack(x)
        return logits

We create an instance of `NeuralNetwork`, move it to the `device` and print its structure.

In [7]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


To use the model, we pass it the input data. This executes the model's `forward`, along with some
background operations. Do not call `model.forward()` directly!

Calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class.
We get the prediction probabilities by passing it through an instance of the `nn.Softmax` module.

In [8]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([4], device='cuda:0')


## Model Layers
Let's break down the layers in the FashionMNIST model. To ilustrate it, we will take a sample minibatch of 3
images of size 28x28 and see what happens to it as we pass it through the network

In [9]:
input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


### nn.Flatten
We initialize the `nn.Flatten` layer to convert each 2D 28x28 image into a contigous array of 784 pixel values
(the minibatch dimension (at dim=0) is maintained).

In [10]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


### nn.Linear
The linear layer is a module that applies a linear transformation on the input using its
stored weights and biases.

In [11]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


### nn.ReLU
Non-linear activations are what create the complex mappings betweeen the model's inputs and outputs.
They are applied after linear transformations to introduce *nonlinearity*, helping neural networks learn
a wide variety of phenomena.

In this model, we use `nn.ReLU` between our linear layers, but there's other activations to introduce
non-linearity in your model.

In [13]:
print(f"Before RELU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before RELU: tensor([[ 0.0585, -0.1827, -0.4178, -0.1620,  0.0647,  0.2405, -0.0272, -0.0306,
         -0.6204,  0.3967, -0.3779,  0.2941, -0.4398, -0.0065,  0.1209,  0.2750,
          0.3355,  0.3541, -0.0945,  0.3801],
        [ 0.0785,  0.0976, -0.1717, -0.2347, -0.0373,  0.0432, -0.1041, -0.0339,
         -0.4268,  0.1593,  0.2383, -0.1793, -0.3991, -0.0376,  0.3517, -0.1093,
          0.1517,  0.3553,  0.0493,  0.3638],
        [-0.1217,  0.0743, -0.2877, -0.2189,  0.2477, -0.3519, -0.0587,  0.1612,
         -0.4273,  0.2496, -0.2395, -0.1595, -0.4345, -0.4010,  0.1864,  0.0946,
          0.3028,  0.1294,  0.2372,  0.0223]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.0585, 0.0000, 0.0000, 0.0000, 0.0647, 0.2405, 0.0000, 0.0000, 0.0000,
         0.3967, 0.0000, 0.2941, 0.0000, 0.0000, 0.1209, 0.2750, 0.3355, 0.3541,
         0.0000, 0.3801],
        [0.0785, 0.0976, 0.0000, 0.0000, 0.0000, 0.0432, 0.0000, 0.0000, 0.0000,
         0.1593, 0.2383, 0.0000, 0.0000, 0.0000, 0.35

### nn.Sequential
`nn.Sequential` is an ordered container of modules. The data is passed through all the modules in the same
order as defined. You can use sequential containers to put together a quick network like `seq_modules`.

In [None]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)

### nn.Softmax
the last linear layer of the neural network returns *logits* - raw values in [-infinity, infinity] - which are
passed to the `nn.Softmax` module. The logits are scale to values [0, 1] representing the model's predicted probabilites
for each class. `dim` parameter indicates the dimension along which the values must sum to 1.

In [14]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

## Model Parameters
Many layers inside a neural network are parametrized, i.e. have associated weights and biases that are optimized
during training. Subclassing `nn.Module` automatically tracks all fields defined inside your model object,
and makes all parameters accesible using your model's `parameters()` or `named_parameters()` methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.

In [15]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values: {param[:2]} \n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values: tensor([[-0.0215,  0.0065, -0.0063,  ..., -0.0180,  0.0278, -0.0024],
        [ 0.0230, -0.0156,  0.0273,  ..., -0.0017, -0.0167, -0.0086]],
       device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values: tensor([-0.0014,  0.0012], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values: tensor([[-0.0288, -0.0435,  0.0241,  ...,  0.0143, -0.0291, -0.0223],
        [-0.0033, -0.0213,  0.0360,  ..., -0.0250, -0.0094,  0.0211]],
       device='cuda:0', grad_fn=<Slice