<a href="https://colab.research.google.com/github/alessitomas/Neural-Network-MNIST/blob/main/3.0_alessi_deeplearning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Loading the dataset

This notebook was created on google collab to access GPU, that's why it is importing the dataset once again.

In [1]:
import numpy as np
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1, cache=True)

# A função fetch_openml() retorna targets como strings, precisamos converter para
# valores numéricos.
mnist.target = mnist.target.astype(np.int8)

mnist["data"], mnist["target"]

  warn(


(       pixel1  pixel2  pixel3  pixel4  pixel5  pixel6  pixel7  pixel8  pixel9  \
 0         0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
 1         0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
 2         0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
 3         0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
 4         0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
 ...       ...     ...     ...     ...     ...     ...     ...     ...     ...   
 69995     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
 69996     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
 69997     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
 69998     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
 69999     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
 
        pixel1

In [2]:
X, y = mnist['data'].to_numpy(), mnist['target'].to_numpy()

## Train and Test Split

This method of splitting was suggest by the creators of the dataset: "It can be split in a training set of the first 60,000 examples, and a test set of 10,000 examples", Reference: [OpenML Dataset](https://www.openml.org/search?type=data&sort=runs&id=554&status=active)


In [3]:
X_train = X[:60000]
y_train = y[:60000]

X_test = X[60000:]
y_test = y[60000:]

In [4]:
X_train

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [5]:
y_train

array([5, 0, 4, ..., 5, 6, 8], dtype=int8)

## DeepLearning

1.1 - Uses Artificial Neural Networks to mimic human brain. Consited of layers of conected neurons that work together to learn an process information.

1.2 - Neuron in a simple manner (Node that keeps a number, its activity)

1.3 - A activation of a neurons on a previous layer will determine the activation of neurons on the next layer

In [6]:
# First i will check if google's collab GPU is available
import torch

if torch.cuda.is_available():
  print("The notebook is using the: GPU")
else:
    print("The notebook is using the: CPU")



The notebook is using the: GPU


## Math Concepts

Analysing a Fully Connected Layer:



Equation to calculate the activity of a neuron:

a<sup>(1)</sup> = b + w<sub>1</sub>a<sup>(0)</sup> + w<sub>2</sub>a<sup>(0)</sup> + w<sub>3</sub>a<sup>(0)</sup> + ... + w<sub>n</sub>a<sup>(0)</sup>

Now with the Matrix Notation to calculate the activity of all neurons on a layer:


$$
Parameters =
\begin{bmatrix}
    b_1 & w(1,1) & w(2,1) & \ldots & w(n,1) \\
    b_2 & w(1,2) & w(2,2) & \ldots & w(n,2) \\
    b_3 & w(1,3) & w(2,3) & \ldots & w(n,3) \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    b_k & w(1,k) & w(2,k) & \ldots & w(n,k) \\
\end{bmatrix}
$$
$$
NeuronsLayer(0) =
\begin{bmatrix}
    1 \\
    1a^{(0)} \\
    2a^{(0)} \\
    \vdots \\
    3a^{(0)}
\end{bmatrix}
$$


NeuronsLayer(1) = Parameters @ NeuronsLayer(0)

This equation results on any continuous number, and in context where you need only the activity of a neuron, is common to pass a activity function on the equation results, like a Sigmoid or Rectified Linear Unit.

### Creating the ANN

In [7]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Setting the device as a GPU, to process PyTorch operations
device = torch.device("cuda:0")

# Creating my custom neural network architecture ANNModel
class ANNModel(nn.Module):
    # Constructor method
    def __init__(self):
        super(ANNModel, self).__init__() # Initializing the parent class nn.Module
        self.input_layer = nn.Flatten() # Flat the input to a one dimensional vector
        self.hidden_layer1 = nn.Linear(28 * 28, 256)  # Fully connected layer, sizes (input,output)
        self.relu = nn.ReLU()
        # self.sig = nn.Sigmoid()
        self.hidden_layer2 = nn.Linear(256, 128)  # Fully connected layer, sizes (input,output)
        self.relu = nn.ReLU()
        self.output_layer = nn.Linear(128, 10)  # Fully connected layer, sizes (input,output)
        # self.softmax = nn.Softmax(dim=1)

    # define the seq of operations from the input to the output
    def forward(self, x):
        x = self.input_layer(x)
        x = self.hidden_layer1(x)
        x = self.relu(x)
        x = self.hidden_layer2(x)
        x = self.relu(x)
        x = self.output_layer(x)
        # x = self.softmax(x)

        return x



## Learning Process

Learning is fiding the right parameters to minimise to cost function

### Backpropagation

Equation to calculate the activity of a neuron:

a<sup>(4)</sup> = b + w<sub>1</sub>a<sup>(3)</sup> + w<sub>2</sub>a<sup>(3)</sup> + w<sub>3</sub>a<sup>(3)</sup> + ... + w<sub>n</sub>a<sup>(3)</sup>

Now, when we've computed the cost function and its gradients, it's essential to update the model's parameters. In this example, we're focusing on Layer 4 as the output layer. Upon closer examination of the equation, we can identify three key parameters that need to be updated: b (the bias term), and the various w weights associated with this layer.

The parameter update process takes place layer by layer, starting from Layer 4 and moving backward to the input layer. This iterative process, where we adjust parameters layer by layer, is what gives rise to the term "Backpropagation."


In [8]:
# Creating an instance of the ANN and conecting it to the device (gpu or cpu)
model = ANNModel().to(device)

# Loss function
# criterion = nn.MSELoss()
criterion = nn.CrossEntropyLoss()

# Stochastic Gradient Descent (SGD), minises the loss function
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Normalize pixel values to the range [0, 1]
X_train = X_train / 255.0
X_test = X_test / 255.0

# Data to GPU
X_train = torch.Tensor(X_train).to(device)
y_train = torch.Tensor(y_train).to(device)
X_test = torch.Tensor(X_test).to(device)
y_test = torch.Tensor(y_test).to(device)

# Training loop
epochs = 10
batch_size = 32

# Iterate thought all the training data set
for epoch in range(epochs):
    # Iterate thought X_train splitting it in batches
    for i in range(0, len(X_train), batch_size):
        # Defining baches
        inputs = X_train[i:i+batch_size]
        labels = y_train[i:i+batch_size]

        # Clears the gradients from the previous iteration
        optimizer.zero_grad()

        # Model predict
        outputs = model(inputs)

        # Calcualte the loss, of the predictions with the label
        loss = criterion(outputs,labels.to(torch.int64))

        # Backpropagation algorithm, compute the gradients of the loss of the parameters
        loss.backward()

        # Update parameters
        optimizer.step()




## Testing the Model

In [9]:
# Evaluate the model on the test data
with torch.no_grad():
    test_outputs = model(X_test)
    accuracy = (torch.argmax(test_outputs, dim=1) == y_test.to(torch.int64)).float().mean().item()


print("Test Accuracy:", accuracy)

Test Accuracy: 0.9573999643325806
