# Build the Neural Network

Neural networks comprise of layers/modules that perform operations on data. The [torch.nn](https://pytorch.org/docs/stable/nn.html) namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch subclasses the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.

In the following sections, we'll build a neural network to classify images in the FashionMNIST dataset.

In [1]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

## Get Device for Training

We want to be able to train our model on a hardware accelerator like the GPU or MPS, if available. Let's check to see if [torch.cuda](https://pytorch.org/docs/stable/notes/cuda.html) or [torch.backends.mps](https://pytorch.org/docs/stable/notes/mps.html) are available, otherwise we use the CPU.

In [2]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cpu device


## Define the Class


We define our neural network by subclassing ```nn.Module```, and initialize the neural network layers in ```__init__```. Every ```nn.Module``` subclass implements the operations on input data in the ```forward``` method.

In [3]:
class MyNeuralNetwork(nn.Module):
  def __init__(self):
    super().__init__()
    self.fc1 = nn.Linear(28*28, 512)
    self.relu1 = nn.ReLU()
    self.fc2 = nn.Linear(512,512)
    self.relu2 = nn.ReLU()
    self.fc3 = nn.Linear(512,10)
    
  def forward(self, x):
    x = self.fc1(x)
    x = self.relu1(x)
    x = self.fc2(x)
    x = self.relu2(x)
    raw_output = self.fc3(x)
    return raw_output

We create an instance of ```MyNeuralNetwork```, and move it to the ```device```, and print its structure.




In [4]:
model = MyNeuralNetwork().to(device)
print(model)

MyNeuralNetwork(
  (fc1): Linear(in_features=784, out_features=512, bias=True)
  (relu1): ReLU()
  (fc2): Linear(in_features=512, out_features=512, bias=True)
  (relu2): ReLU()
  (fc3): Linear(in_features=512, out_features=10, bias=True)
)


To use the model, we pass it the input data. This executes the model's ```forward```, along with some [background operations](https://github.com/pytorch/pytorch/blob/270111b7b611d174967ed204776985cefca9c144/torch/nn/modules/module.py#L866). Do not call ```model.forward()``` directly!


Calling the model on the input returns a 2-dimensional tensor with dim=0 corresponding to each output of 10 raw predicted values for each class, and dim=1 corresponding to the individual values of each output. We get the prediction probabilities by passing it through an instance of the ```nn.Softmax``` module.

In [5]:
x = torch.rand(1, 28*28, device=device)
raw_output = model(x)
pred_probab = nn.Softmax(dim=1)(raw_output)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([8])


## Model Layers

Let's break down the layers in the FashionMNIST model. To illustrate it, we will take a sample minibatch of 3 images of size 28x28 and see what happens to it as we pass it through the network.

In [6]:
input_image = torch.rand(3,28*28)
print(input_image.size())

torch.Size([3, 784])


### nn.Linear

The [linear layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) is a module that applies a linear transformation on the input using its stored weights and biases.



In [7]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(input_image)
print(hidden1.size())

torch.Size([3, 20])


## nn.ReLU

Non-linear activations are what create the complex mappings between the model's inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.

In this model, we use [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) between our linear layers, but there's other activations to introduce non-linearity in your model.

In [8]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[-0.1857,  0.1029, -0.1446,  0.7264,  0.8233, -0.2805,  0.1828, -1.1818,
          0.1256, -0.0048,  0.3751, -0.5523,  0.2864,  0.2188, -0.0667, -0.2217,
          0.4612, -0.0630,  0.0155, -0.3488],
        [-0.4607,  0.2199,  0.3177,  0.8678,  0.6322, -0.2198, -0.0844, -0.8910,
         -0.0255, -0.2013,  0.5967, -0.2376, -0.0308,  0.1975, -0.0592,  0.0186,
          0.4229,  0.0118,  0.0820, -0.2285],
        [-0.5250,  0.2129,  0.3226,  1.1144,  0.2375,  0.0422,  0.1054, -0.9091,
          0.0061, -0.1383,  0.4276, -0.5400, -0.0509,  0.2644,  0.0788,  0.1945,
          0.7870,  0.0032,  0.1221, -0.4656]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.0000, 0.1029, 0.0000, 0.7264, 0.8233, 0.0000, 0.1828, 0.0000, 0.1256,
         0.0000, 0.3751, 0.0000, 0.2864, 0.2188, 0.0000, 0.0000, 0.4612, 0.0000,
         0.0155, 0.0000],
        [0.0000, 0.2199, 0.3177, 0.8678, 0.6322, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.5967, 0.0000, 0.0000, 0.1975, 0.00

## nn.Softmax


The last linear layer of the neural network returns *logits* - raw values in [-infty, infty] - which are passed to the [nn.Softmax](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) module. The logits are scaled to values [0, 1] representing the model's predicted probabilities for each class. ```dim``` parameter indicates the dimension along which the values must sum to 1.

In [12]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(raw_output)
print(pred_probab)

tensor([[0.0970, 0.1019, 0.1034, 0.0969, 0.0994, 0.1051, 0.0981, 0.0948, 0.1097,
         0.0938]], grad_fn=<SoftmaxBackward0>)


## Model Parameters


Many layers inside a neural network are *parameterized*, i.e. have associated weights and biases that are optimized during training. Subclassing ```nn.Module``` automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model's ```parameters()``` or ```named_parameters()``` methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.

In [13]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
  print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: MyNeuralNetwork(
  (fc1): Linear(in_features=784, out_features=512, bias=True)
  (relu1): ReLU()
  (fc2): Linear(in_features=512, out_features=512, bias=True)
  (relu2): ReLU()
  (fc3): Linear(in_features=512, out_features=10, bias=True)
)


Layer: fc1.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0317, -0.0113,  0.0012,  ...,  0.0080, -0.0356, -0.0309],
        [ 0.0205, -0.0173,  0.0096,  ...,  0.0223,  0.0304,  0.0063]],
       grad_fn=<SliceBackward0>) 

Layer: fc1.bias | Size: torch.Size([512]) | Values : tensor([-0.0336, -0.0262], grad_fn=<SliceBackward0>) 

Layer: fc2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0208,  0.0398,  0.0278,  ..., -0.0347, -0.0075,  0.0373],
        [-0.0369,  0.0030,  0.0314,  ..., -0.0179, -0.0017, -0.0373]],
       grad_fn=<SliceBackward0>) 

Layer: fc2.bias | Size: torch.Size([512]) | Values : tensor([-0.0347,  0.0194], grad_fn=<SliceBackward0>) 

Layer: fc3.weight | Size: torch.Size([10, 512]) | Val