# <div style="text-align:center; font-weight:bold">Build a Simple Neural Netework</div>

In [None]:
### In this module, we would build a simple neural network made up of 3 layers:
### 1. one input layer, a 28x28(input) by 512(output) dimension
### 2. one hiddel layer, a 512(input) by 512(output) dimension layer, activated with the ReLU activation function
### 3. one output layer, a 512(input) by 10(output) diminsion layer, activate with the ReLU activation function

## Import Neccessary Modules

In [3]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

## Obtain the Device

In [7]:
### you already know, one of the properties of a tensor is the device on which it resides
### this could be either a CPU or a GPU
### Generally, we would prefer to use a GPU it is available

# Device selection logic
if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    device = "mps"
else:
    device = "cpu"
    
print(f"Using device: {device}")

Using device: mps


## Define the Class (extend nn.Module class)

#### To create neural network, we must extend the nn.Module class. We would also need to implement the forward method

In [9]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

#### Having created our Neural Network model,let's intantiate it and move it to a device

In [10]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


## Making Prediction Using Our Model

#### To make prediction using our model, we would need to pass some input data to our model
#### This would return a 2-dimensional tensor with dim=0 corresponding to the raw ouput and dimention 1 corresponding to the output value.
#### Each of the output is one of 10 predicted classes

In [None]:
X = torch.rand(1, 28, 28, device= device)
logits = model(X)
prediction_probabilities = nn.Softmax(dim=1)(logits)
y_predicted = prediction_probabilities.argmax(1)
print(f"Predicted class {y_predicted}")

## Understanding the Model Layers

#### Let's now take some time to understand the model layers.
#### Let's take a batch of 3 images of size 28  x 28 and pass it through the layers of the network and see what we get

In [12]:
input_image = torch.rand(3, 28, 28)
print(input_image.size())

torch.Size([3, 28, 28])


### nn.Flatten

#### This layer would take the 2-dimentional 28 x 28 image and convert it into a 1-dimentional array of lenght 784 (28 x 28)
#### The dimension is maintained at dim=0

In [15]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


### nn.Linear

#### The linear layer will apply a linear transformation on the input layer using it's stored weights and biases.
#### For example, given x, we apply a function of x, f(x) = x.weight + bias

In [18]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


### nn.ReLU

#### This is a non-linear activation function that is applied after the linear transformation  to introduce non-linearity.
#### This makes it possible for the neural network to learn a from a wider variety of input dataset

#### Although we use ReLU here, there are other activation functions that can be used

In [21]:
print(f"Before applying ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After applying ReLU: {hidden1}")

Before applying ReLU: tensor([[ 7.5765e-02,  5.7657e-01, -9.2276e-02,  7.9460e-02,  4.2622e-01,
          4.1913e-01,  2.0428e-01,  2.7230e-01, -3.7125e-02,  6.6380e-02,
         -4.6995e-02,  2.8898e-01, -2.9127e-01, -3.4531e-01, -5.0734e-02,
         -5.8721e-04,  2.9879e-01, -1.7054e-01,  2.3635e-01, -2.2674e-01],
        [-2.9477e-01,  4.8444e-01, -3.2404e-01, -2.5243e-01,  6.1391e-01,
          8.0211e-02,  2.2512e-01,  4.2206e-01, -4.6083e-01,  1.4280e-01,
         -2.7107e-01,  1.4749e-01, -5.0821e-01, -1.9676e-01, -2.1479e-01,
          1.8214e-01,  1.1709e-01, -5.4246e-03, -4.5451e-02, -6.3239e-01],
        [-2.9733e-01,  4.1945e-01,  7.8776e-03, -4.2119e-01,  5.2320e-01,
          2.7397e-01,  2.6396e-01,  4.9650e-01, -1.5880e-01,  9.2246e-03,
         -3.3053e-01,  2.3142e-01, -5.4229e-01,  6.7973e-02, -2.1788e-01,
          1.6566e-02,  7.7404e-01, -1.3827e-01,  2.0108e-01, -4.8575e-01]],
       grad_fn=<AddmmBackward0>)


After applying ReLU: tensor([[0.0758, 0.5766, 0.000

### nn.Sequential

#### nn.Sequential is an ordered container of modules.
#### Data is passed through all the modules in the same order as you defined it

In [None]:
dpanalyticsolution.com

In [None]:
# We an also build the network like like this using nn.Sequential
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU,
    nn.Linear(20,10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)
logits

### nn.Softmax

#### The last layer of the neural network and it returns logits, that is raw values in [-inf, inf]
#### The output of this last layer is passed into the Softmax module
#### The logits are scaled to [0,1] which represents the model's predicted probabilities for each class
#### The dim parameter indicates the dimension along which the values must sum up to 1


In [24]:
softmax = nn.Softmax(dim=1)
predicted_probabilities = softmax(logits)
print(predicted_probabilities)

tensor([[0.0939, 0.0967, 0.0986, 0.0981, 0.1033, 0.1065, 0.1021, 0.1024, 0.0993,
         0.0990]], device='mps:0', grad_fn=<SoftmaxBackward0>)

## Model Parameters

#### Many layers inside a neural network are parameterized.
#### This means that they have associated weights and biases that are optimized during training
#### Extending the nn.Module class automatically tracks all the fields defined inside your model object 
#### and makes all parameters accessible using your model's parameters() or named_parameters() methods

In [25]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values: {param[:2]} \n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values: tensor([[-1.7760e-02, -9.9057e-03,  2.0733e-02,  ...,  4.2260e-05,
         -3.2030e-02,  1.6735e-02],
        [ 3.4041e-02, -5.3650e-03,  2.0612e-02,  ...,  1.4829e-02,
          5.5902e-03, -1.0065e-02]], device='mps:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values: tensor([ 0.0299, -0.0093], device='mps:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values: tensor([[ 0.0178, -0.0357, -0.0225,  ...,  0.0375,  0.0302, -0.0233],
        [-0.0431, -0.0338,  0.0087,  ...,  0.0290,  