# Build the Neural Network

- The [torch.nn](https://pytorch.org/docs/stable/nn.html) namespace provides all the building blocks you need to build your own neural network.
- Every module in PyTorch subclasses the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html).
- A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.

In [2]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Get Device for Training

In [3]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cpu device


# Define the Class
- We define our neural network by subclassing `nn.Module`, and initialize the neural network layers in `__init__`.
- Every `nn.Module` subclass implements the operations on input data in the forward method.

In [8]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

`class NeuralNetwork(nn.Module):` This line defines a new class named NeuralNetwork that inherits from the nn.Module class in PyTorch. Inheriting from nn.Module allows you to create custom neural network models.

`def __init__(self):` This is the initialization method (constructor) for the NeuralNetwork class. It defines the architecture of the neural network.

`super().__init__():` This line calls the constructor of the parent nn.Module class, initializing the neural network as a whole.

`self.flatten = nn.Flatten():` This line creates an instance of the` nn.Flatten()` module. It is used to flatten the input tensor into a 1-dimensional tensor.

`self.linear_relu_stack = nn.Sequential(...):` This line defines a sequential container for the linear and activation layers of the neural network. It uses the` nn.Sequential()` module to create a sequence of layers. Inside the sequential container, the following layers are defined:

- `nn.Linear(28*28, 512)`: This is the first linear layer that takes an input of size 28*28 (assuming the input is a 28x28 image) and outputs a tensor of size 512.
- `nn.ReLU()`: This is the activation function (Rectified Linear Unit) applied after the first linear layer.
- `nn.Linear(512, 512)`: This is the second linear layer that takes an input tensor of size 512 and outputs another tensor of size 512.
- `nn.ReLU()`: This is the activation function applied after the second linear layer.
- `nn.Linear(512, 10)`: This is the final linear layer that takes an input tensor of size 512 and outputs a tensor of size 10, representing the output classes.
`def forward(self, x):` This method defines the forward pass of the neural network.

`x = self.flatten()`: This line flattens the input tensor x using the flatten module defined earlier. It reshapes the input tensor from a multi-dimensional shape to a 1-dimensional shape.

`logits = self.linear_relu_stack:` This line applies the sequential container linear_relu_stack to the flattened input tensor. It passes the input tensor through the defined linear and activation layers.

`return logits`: This line returns the final output tensor logits from the forward pass.

By defining this NeuralNetwork class, you can create an instance of the model and use it to perform forward propagation by calling the forward() method on the model object.

We create an instance of ``NeuralNetwork``, and move it to the ``device``, and print
its structure.



In [9]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


To use the model, we pass it the input data. This executes the model's ``forward``,
along with some [background operations](https://github.com/pytorch/pytorch/blob/270111b7b611d174967ed204776985cefca9c144/torch/nn/modules/module.py#L866).
Do not call ``model.forward()`` directly!

Calling the model on the input returns a 2-dimensional tensor with dim=0 corresponding to each output of 10 raw predicted values for each class, and dim=1 corresponding to the individual values of each output.
We get the prediction probabilities by passing it through an instance of the ``nn.Softmax`` module.

In [11]:
X = torch.rand(1, 28, 28, device = device)
logits = model(X)
print(logits)
pred_probab = nn.Softmax(dim = 1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

tensor([[ 0.0330, -0.0927, -0.1446, -0.0654, -0.0040,  0.0409,  0.0790,  0.0841,
         -0.0793, -0.0017]], grad_fn=<AddmmBackward0>)
Predicted class: tensor([7])


`X = torch.rand(1, 28, 28, device=device):` This line creates a random tensor X with a shape of (1, 28, 28) on the specified device. It represents a single input sample, assumed to be a 28x28 image.

`logits = model(X):` This line passes the input tensor X through the model, invoking the forward() method of the model. It computes the logits (raw output) of the model for the given input.

`pred_probab = nn.Softmax(dim=1)(logits):` This line applies the softmax function along dimension 1 to the logits tensor using nn.Softmax(dim=1). It converts the logits into predicted probabilities for each class.

`y_pred = pred_probab.argmax(1):` This line finds the index of the class with the highest probability by calling argmax(1) on pred_probab. It returns a tensor y_pred containing the predicted class label.

By executing this code, you can obtain the predicted class label for the given input tensor X using the trained NeuralNetwork model.

----

# Model Layers

In [12]:
input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


### nn.Flatten
We initialize the [nn.Flatten](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html)
layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values (
the minibatch dimension (at dim=0) is maintained).

In [13]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


### nn.Linear
The [linear layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)
is a module that applies a linear transformation on the input using its stored weights and biases.

In [14]:
layer = nn.Linear(in_features = 28*28, out_features=20)
hidden1 = layer(flat_image)
print(hidden1.size())

torch.Size([3, 20])


### nn.ReLU
- Non-linear activations are what create the complex mappings between the model's inputs and outputs.
- They are applied after linear transformations to introduce *nonlinearity*, helping neural networks learn a wide variety of phenomena.

In this model, we use [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) between our linear layers, but there's other activations to introduce non-linearity in your model.



In [15]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[-2.8053e-01,  1.5342e-01,  2.7263e-01,  1.0547e-01, -4.2795e-01,
         -1.4632e-01,  5.4631e-01, -4.8849e-01,  5.0240e-01, -2.9074e-02,
          1.9351e-01, -4.6250e-01,  1.2593e-01,  7.3368e-01, -2.0106e-01,
         -3.7836e-04, -3.9227e-01,  3.2677e-02,  1.3237e-01,  6.7212e-01],
        [-1.7836e-01, -4.1733e-02,  4.7869e-01,  6.5184e-02, -3.4521e-01,
         -1.4592e-01,  4.8984e-01, -9.4041e-01,  4.7825e-01,  2.1122e-02,
          1.4657e-01, -6.8349e-01,  8.3621e-02,  3.4088e-01, -3.0523e-01,
          1.8552e-01, -4.2274e-01, -3.6072e-01,  2.7483e-02,  9.4940e-02],
        [-4.1318e-01,  1.8033e-01,  3.1851e-01, -1.9846e-01,  8.7675e-02,
         -5.8274e-02,  3.4326e-01, -6.3833e-01,  2.7314e-01, -1.0070e-01,
         -1.6563e-01, -5.2347e-01, -1.4071e-01,  4.1281e-01, -2.1325e-01,
          5.5213e-02, -3.9994e-01, -3.1361e-01,  3.7067e-01,  4.8612e-02]],
       grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.0000, 0.1534, 0.2726, 0.1055, 0.0000,

### nn.Sequential
[nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) is an ordered
container of modules. The data is passed through all the modules in the same order as defined. You can use
sequential containers to put together a quick network like ``seq_modules``.



In [16]:
seq_modules = nn.Sequential(
    flatten,
    layer,
    nn.ReLU(),
    nn.Linear(20,10)
)

### nn.Softmax
The last linear layer of the neural network returns `logits` - raw values in [-\infty, \infty] - which are passed to the
[nn.Softmax](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) module. The logits are scaled to values
[0, 1] representing the model's predicted probabilities for each class. ``dim`` parameter indicates the dimension along
which the values must sum to 1.



In [17]:
softmax = nn.Softmax(dim = 1)
pred_probab = softmax(logits)
pred_probab

tensor([[0.1046, 0.0923, 0.0876, 0.0948, 0.1008, 0.1055, 0.1096, 0.1101, 0.0935,
         0.1011]], grad_fn=<SoftmaxBackward0>)

## Model Parameters
Many layers inside a neural network are *parameterized*, i.e. have associated weights
and biases that are optimized during training. Subclassing ``nn.Module`` automatically
tracks all fields defined inside your model object, and makes all parameters
accessible using your model's ``parameters()`` or ``named_parameters()`` methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.




In [18]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values?: {param[:2]}\n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values?: tensor([[ 0.0077, -0.0142, -0.0143,  ...,  0.0182, -0.0334, -0.0028],
        [ 0.0063,  0.0172,  0.0037,  ..., -0.0197,  0.0016,  0.0213]],
       grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values?: tensor([-0.0096,  0.0110], grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values?: tensor([[-0.0147, -0.0327, -0.0097,  ...,  0.0046,  0.0188,  0.0307],
        [-0.0155,  0.0263, -0.0262,  ...,  0.0319, -0.0217,  0.0427]],
       grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.bias | Siz

`for name, param in model.named_parameters():`: This line initiates a loop that iterates over the named parameters of the model. The `named_parameters()` method returns an iterator over module parameters, yielding both the name of the parameter and the parameter itself.

`print(f"Layer: {name} | Size: {param.size()} | Values?: {param[:2]}\n"):` This line prints the details of each parameter. Within the loop, it does the following:

- `name` represents the name of the current parameter (e.g., "linear_relu_stack.0.weight").
- `param.size()` prints the size (shape) of the current parameter tensor.
- `param[:2]` prints the first two elements of the parameter tensor as an example. This provides a glimpse of the parameter values.