# BUILD THE NEURAL NETWORK 


- Neural networks consist of layers/modules that operate on data.
- PyTorch's torch.nn namespace provides the necessary building blocks for creating custom neural networks.
- In PyTorch, every module is a subclass of nn.Module.
- A neural network is also a module and can contain other modules (layers) within it.
- This nested structure allows for easy construction and management of complex network architectures.

**Building a neural network to classify images in the FashionMNIST dataset**

In [1]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

## Getting which  Device is used for  Training

In [2]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cpu device


## Defining the Class

In [3]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

**creating an instance of NeuralNetwork, and moving it to the device, and printing its structure.**

In [4]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


- When the model is called on the input, it produces a 2-dimensional tensor.
- The dimension with index 0 corresponds to each output, representing 10 raw predicted values for each class.
- The dimension with index 1 corresponds to the individual values of each output.
- To obtain prediction probabilities, the output tensor is passed through an instance of the nn.Softmax module.

In [5]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([2])


## Model Layers

**breaking down the layers in the FashionMNIST model.**

In [6]:
input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


### nn.Flatten

To transform each 2D 28x28 image into a continuous array of 784 pixel values, we utilize the nn.Flatten layer. This layer maintains the minibatch dimension (dim=0) while reshaping the images.

In [7]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


### nn.Linear

The linear layer is a module that utilizes its stored weights and biases to apply a linear transformation on the input.

In [8]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


### nn.ReLU

- Non-linear activations play a vital role in creating complex mappings within a neural network.
- They are applied after linear transformations to introduce nonlinearity.
- Nonlinear activations allow neural networks to learn a wide range of phenomena.
- In the model mentioned, nn.ReLU activation is used between linear layers to introduce nonlinearity.
- Other activation functions can also be used to incorporate nonlinearity in a model.

In [9]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[ 0.7782,  0.3709,  0.0203,  0.6196, -0.3334,  0.0912, -0.1739,  0.1793,
          0.3946, -0.3149,  0.2648,  0.0742, -0.1864,  0.3381, -0.2990,  0.1212,
         -0.0144, -0.0205, -0.0922,  0.0380],
        [ 0.3849,  0.1957,  0.2671,  0.1751, -0.4221, -0.3151,  0.1164,  0.1764,
         -0.0367, -0.4950,  0.3140,  0.0901, -0.6194,  0.1411, -0.0957,  0.1411,
         -0.4975, -0.0475, -0.1031, -0.1419],
        [ 0.3814,  0.3508,  0.2348,  0.2987, -0.3781, -0.2312,  0.0912,  0.4249,
          0.2784, -0.0694, -0.0733, -0.1032, -0.8455,  0.4289, -0.2112,  0.5242,
         -0.6410, -0.2007,  0.0052, -0.1337]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.7782, 0.3709, 0.0203, 0.6196, 0.0000, 0.0912, 0.0000, 0.1793, 0.3946,
         0.0000, 0.2648, 0.0742, 0.0000, 0.3381, 0.0000, 0.1212, 0.0000, 0.0000,
         0.0000, 0.0380],
        [0.3849, 0.1957, 0.2671, 0.1751, 0.0000, 0.0000, 0.1164, 0.1764, 0.0000,
         0.0000, 0.3140, 0.0901, 0.0000, 0.1411, 0.00

### nn.Sequential


nn.Sequential is an ordered container of modules in PyTorch. It allows data to pass through each module in the same defined order, making it convenient for constructing networks.

In [10]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

### nn.Softmax


The last linear layer of the neural network generates logits, which are then passed through the nn.Softmax module. This module scales the logits to obtain predicted probabilities for each class, ensuring they sum up to 1 along the specified dimension.

In [11]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

## Model Parameters


In [12]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0059, -0.0204,  0.0241,  ..., -0.0059, -0.0171,  0.0007],
        [-0.0141, -0.0242,  0.0274,  ..., -0.0138, -0.0138, -0.0232]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([ 0.0064, -0.0191], grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0034,  0.0370, -0.0198,  ..., -0.0201, -0.0402,  0.0084],
        [-0.0140, -0.0146, -0.0221,  ...,  0.0375,  0.0093,  0.0097]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.bias | 