# **Build the Neural Network**

Neural networks comprise of layers/modules that perform operations on data. The `torch.nn` namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch subclasses the `nn.Module`. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.

### **Build a NN to classify images in the FashionMNIST dataset**



In [2]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

* **Get Device for Training**

We want to be able to train our model on a hardware accelerator like the GPU or MPS, if available. Let’s check to see if `torch.cuda` or `torch.backends.mps` are available, otherwise we use the CPU.

In [4]:
device = (
    'cuda'
    if torch.cuda.is_available()
    else 'mps'
    if torch.backends.mps.is_available()
    else 'cpu'
)

print(f"Using {device} device")


Using cuda device


* **Define the Neural Net Class**

We define our neural network by subclassing `nn.Module`, and initialize the neural network layers in `__init__`. Every `nn.Module` subclass implements the operations on input data in the `forward` method.

In [21]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()

        # (samples, rows, cols) => (samples, rows * cols), reason why start_dim=1 (to skip dim = 0)
        self.flatten = nn.Flatten(start_dim=1, end_dim=-1)
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )
    
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits



* **Create an instance of NeuralNetwork and move it to device**

In [22]:
# create an instance of NeuralNetwork and move it to device
model = NeuralNetwork().to(device)
model

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

* **To use the model, we pass it the input data**

This executes the model’s forward, along with some background operations. **Do not call model.forward() directly!**

> Calling the model on the input returns a 2-dimensional tensor with dim=0 corresponding to each output of 10 raw predicted values for each class, and dim=1 corresponding to the individual values of each output.
>
> **We get the prediction probabilities by passing it through an instance of the nn.Softmax module.**

In [23]:
# input
X = torch.rand(1, 28, 28, device=device)  # (1 sample, rows, cols)

# raw output (logits)
logits = model(X)  # (1, 10)

# probability of each class
pred_probab = nn.Softmax(dim=1)(logits)  # dim=1 indicates the dimension along which the values must sum to 1.

# class of highest probability
y_pred = pred_probab.argmax(dim=1)

print(f"Predicted class: {y_pred}")

Predicted class: tensor([4], device='cuda:0')


* **Model Parameters**

Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing nn.Module automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model’s parameters() or named_parameters() methods.

> In this example, we iterate over each parameter, and print its size and a preview of its values.

In [28]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} \n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) 

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) 

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) 

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) 

