# Build the Neural Network

Neural networks are composed of ***layers*** or modules that execute operations on data. Within the torch.nn namespace, you'll find all the necessary building blocks to construct your own neural network. Each module in PyTorch is a subclass of nn.Module. Essentially, a ***neural network*** is a module in itself, comprising of other modules, which are referred to as layers. This hierarchical structure enables the creation and management of intricate architectures with ease.

In the subsequent sections, we will proceed to construct a neural network designed for image classification using the ***FashionMNIST*** dataset.

In [28]:
# Importing necessary libraries
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Check for available device (GPU, MPS, or CPU)
device = (
    "cuda"
    if torch.cuda.is_available()  # Check if GPU is available
    else "mps"
    if torch.backends.mps.is_available()  # Check if MPS (Multi-Process Service) is available
    else "cpu"  # Use CPU if no GPU or MPS is available
)
print(f"Using {device} device")

# Define the neural network architecture
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()  # Flatten layer to convert 2D image data into 1D tensor
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),  # Fully connected layer with 28*28 input features and 512 output features
            nn.ReLU(),              # ReLU activation function
            nn.Linear(512, 512),    # Second fully connected layer with 512 input features and 512 output features
            nn.ReLU(),              # ReLU activation function
            nn.Linear(512, 10),     # Final fully connected layer with 512 input features and 10 output features (for 10 classes)
        )

    # Define the forward pass of the network
    def forward(self, x):
        x = self.flatten(x)                 # Flatten the input
        logits = self.linear_relu_stack(x)  # Pass the flattened input through the linear stack
        return logits

# Create an instance of the neural network model and move it to the specified device (GPU/CPU)
model = NeuralNetwork().to(device)
print(model)

# Generate random input data and make predictions
X = torch.rand(1, 28, 28, device=device)  # Generate random input data with shape (1, 28, 28)
logits = model(X)  # Forward pass through the model to obtain logits
pred_probab = nn.Softmax(dim=1)(logits)  # Compute softmax probabilities for each class
y_pred = pred_probab.argmax(1)  # Predicted class label with highest probability
print(f"Predicted class: {y_pred}")


Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)
Predicted class: tensor([9])


* The code defines a neural network architecture using the ***NeuralNetwork class***, which is a subclass of ***nn.Module.*** This neural network consists of three fully connected layers with ReLU activation functions in between.

* The device is determined based on the availability of GPU, MPS, or CPU.

* An instance of the neural network model is created and moved to the specified device using .to(device).

* Random input data is generated and fed into the model to make predictions.

* The ***softmax*** function is applied to the logits to obtain class probabilities, and the class with the highest probability is predicted using argmax.


# Description of the layers


* First Layer : Flatten Layer

In [29]:
input_image = torch.rand(3,28,28)
print(f"Tensor before first layer -> {input_image.size()}")
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(f"Tensor after first layer -> {flat_image.size()}")

Tensor before first layer -> torch.Size([3, 28, 28])
Tensor after first layer -> torch.Size([3, 784])



The value of 784 comes from the size of the images in the FashionMNIST dataset. Each image in the dataset is a ***28x28 pixel*** grayscale image, so it has a total size of ***28 * 28 = 784 pixels***.
When we use nn.Flatten(), this transformation is applied to each image within the minibatch separately. So, if the minibatch has a size of [3, 28, 28] (as in your example), it means there are 3 images in the minibatch (the first axis represents the number of images in the minibatch), each of which is a 28x28 pixel image. After applying nn.Flatten(), each image is converted into a contiguous array of 784 pixel values. As a result, the size of the resulting tensor will be [3, 784], where ***3 represents the number of images in the minibatch and 784 represents the total number of pixels in the image (28x28).***

* Second layer : Linear Layer

In [30]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])



The second layer is a ***linear module*** that applies a linear transformation to the input using its stored weights and biases.

The reason you get an output of torch.Size([3, 20]) depends on the weights and biases of the linear layer. When you define a linear layer in PyTorch, you specify the number of input and output features. In your case, you have defined a linear layer with 784 inputs (due to the flattening step) and 20 outputs. Therefore, when you pass an input of size [3, 784] through this linear layer, you get an output of size [3, 20]. This means that for each image in the minibatch (number of images = 3), the output of the linear layer will be a vector of ***20 elements.***

* Next Step: ReLU

In [31]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[ 0.1018, -0.3573, -0.0997,  0.5860, -0.8258, -0.0697,  0.3445, -0.6693,
         -0.4315, -0.5842, -0.0681, -0.2668, -0.4820, -0.3326, -0.2112,  0.4897,
          0.2167, -0.1137,  0.3219, -0.2377],
        [-0.1717, -0.2395,  0.2679,  0.3177, -0.7740, -0.1124,  0.1189, -1.0750,
         -0.3568, -0.4217,  0.0756, -0.2403, -0.3589, -0.0416, -0.1043,  0.6492,
          0.5647, -0.0387,  0.1827, -0.2862],
        [-0.3486, -0.2811,  0.1589,  0.3587, -0.7152, -0.3140,  0.5873, -0.9105,
         -0.2763, -0.2148,  0.2119, -0.5197, -0.5806, -0.0996, -0.1221,  0.8211,
          0.3487, -0.2983,  0.1661, -0.1870]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.1018, 0.0000, 0.0000, 0.5860, 0.0000, 0.0000, 0.3445, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.4897, 0.2167, 0.0000,
         0.3219, 0.0000],
        [0.0000, 0.0000, 0.2679, 0.3177, 0.0000, 0.0000, 0.1189, 0.0000, 0.0000,
         0.0000, 0.0756, 0.0000, 0.0000, 0.0000, 0.00

The next step is nn.ReLU. ***Non-linear activations*** are essential components in neural networks as they introduce nonlinearity into the model's output, allowing it to learn complex mappings between the inputs and outputs. They are applied after linear transformations (such as those performed by linear layers) to introduce nonlinearity into the model's predictions.

In the context of the model you're describing, nn.ReLU is a rectified linear unit (ReLU) activation function, which is commonly used in neural networks. ReLU is defined as ***ReLU(x) = max(0, x)***, meaning it sets all negative values to zero while leaving positive values unchanged. This introduces nonlinearity into the model's predictions and allows the neural network to learn complex patterns in the data.

In summary, nn.ReLU is a non-linear activation function that introduces nonlinearity into the model, enabling neural networks to learn complex mappings between inputs and outputs.

Other steps:
* Linear Layer (in = 512, out = 512)
* ReLU
* Linear Layer (in = 512 , out = 10)
* Softmax


The last linear layer of the neural network returns logits - raw values in [-infty, infty] - which are passed to the nn.Softmax module. The logits are scaled to values [0, 1] representing the model’s predicted probabilities for each class. dim parameter indicates the dimension along which the values must sum to 1.

In [32]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

# Model Parameters
Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing nn.Module automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model’s parameters() or named_parameters() methods.

In [34]:
# Print the structure of the model
print(f"Model structure: {model}\n\n")

# Iterate over each named parameter in the model
for name, param in model.named_parameters():
    # Print the name, size, and a preview of the parameter values
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")


Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0255, -0.0144, -0.0293,  ...,  0.0168, -0.0282,  0.0111],
        [-0.0313,  0.0264,  0.0086,  ...,  0.0055,  0.0015,  0.0058]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0104, -0.0074], grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0089, -0.0298,  0.0193,  ..., -0.0438, -0.0038,  0.0425],
        [ 0.0311,  0.0257, -0.0121,  ...,  0.0108,  0.0092,  0.0274]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.bias | 

# Reference
https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html
# Author
Vetrano Alessio, 2024