# Training a Neural Network

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/EnriqueVilchezL/ai_workshop_2025_neural_networks_math/blob/main/src/train.ipynb)


## 0. Introduction

Neural networks are powerful machine learning models inspired by the human brain. They are widely used for tasks such as image recognition, natural language processing, and pattern detection. Training a neural network involves adjusting its weights using a dataset to minimize prediction errors.

In this workshop, we will train a neural network using the MNIST dataset, a collection of handwritten digits. We will go through the essential steps required to build, train, and evaluate a neural network model.

## 1. Importing Modules and Initial Configuration

The necessary libraries are imported to build and train the model. Additionally, a seed is set to ensure reproducibility.

In [None]:
!git clone https://github.com/EnriqueVilchezL/ai_workshop_2025_neural_networks_math
%pip install numpy
%pip install matplotlib

In [None]:
%cd ai_workshop_2025_neural_networks_math/src

In [None]:
from network.activation import *
from network.layer import *
from network.loss import *
from network.optimizer import *
from network.sequential import *
from network.metric import *

import numpy as np
import mnist.mnist as mnist

np.random.seed(30)

## 2. Defining the Training Function

The function responsible for training the model using the training and validation datasets is defined. This function performs the following tasks:
- Splits data into batches.
- Performs forward propagation.
- Computes loss and metric.
- Performs backpropagation.
- Updates model parameters.

In [None]:
def train(
    model: Sequential,
    X: np.ndarray,
    Y: np.ndarray,
    X_val: np.ndarray,
    Y_val: np.ndarray,
    epochs: int,
    batch_size: int,
    optimizer: Optimizer,
    loss_function: Loss,
    metric_function : Metric
) -> None:

    for epoch in range(epochs):
        loss = 0
        metric = 0
        batches_steps = range(0, len(X), batch_size)
        total_steps = len(batches_steps)
        shuffled_indexes = np.random.permutation(len(X))
        X = X[shuffled_indexes]
        Y = Y[shuffled_indexes]
        for i in batches_steps:
            x_batch = X[i:i+batch_size]
            y_batch = Y[i:i+batch_size]
            
            # Forward pass
            y_hat = model.forward({'X' : x_batch})
            # Compute loss
            batch_loss = loss_function.forward({'Y' : y_batch, 'Y_hat' : y_hat})
            batch_metric = metric_function.compute({'Y' : y_batch, 'Y_hat' : y_hat})
            # Compute gradients
            loss_function.backward()
            # Backward pass
            model.backward({'dY' : loss_function.gradients['dY_hat']})
            # Update parameters
            optimizer.update(model)
            # Accumulate batch loss mean
            loss += batch_loss.mean()
            metric += batch_metric
        
        y_hat = model.forward({'X': X_val})
        val_loss = loss_function.forward({'Y': Y_val, 'Y_hat': y_hat}).mean()
        val_metric = metric_function.compute({'Y': Y_val, 'Y_hat': y_hat})

        print(f"Train ==> Epoch {epoch+1}/{epochs} loss: {loss/total_steps} accuracy: {metric/total_steps}")
        print(f"Validation ==> Epoch {epoch+1}/{epochs} loss: {val_loss} accuracy: {val_metric}")


## 3. Data Loading and Model Configuration

In this section, the MNIST dataset is loaded and values are normalized. Additionally, the neural network architecture, loss function, and optimizer are defined.

In [None]:
# Load mnist
x_train, y_train, x_test, y_test = mnist.load('mnist/mnist.pkl')

# Normalize data
x_train = x_train / 255
x_test = x_test / 255
# Add an extra dimension
y_test = np.eye(10)[y_test].squeeze()
y_train = np.eye(10)[y_train].squeeze()

model = Sequential([
    Dense(784, 16),
    Sigmoid(),
    Dense(16, 16),
    Sigmoid(),
    Dense(16, 10),
    Softmax()
])

optimizer = StochasticGradientDescent(learning_rate=0.001)
loss = CategoricalCrossEntropy()
metric = Accuracy()

try:
    train(model, x_train, y_train, x_test, y_test, 100, 64, optimizer, loss, metric)
    model.save("model.pkl")
except KeyboardInterrupt:
    model.save("model.pkl")