# MNIST Dataset Digits Classification

## Objective:
Build a multi-layer neural network to classify handwritten digits (MNIST-like dataset).

## Steps Taken:
- **Libraries:** Used `numpy` for matrix operations and `pandas` for data loading, along with `OneHotEncoder` for label encoding.
- **Data Preprocessing:** Normalized pixel values and one-hot encoded the labels.
- **Model Design:**
  - **Forward Pass:** Used the sigmoid function for hidden layers and softmax for output layers.
  - **Loss Function:** Employed cross-entropy loss for optimization.
  - **Training:** Updated weights and biases using backpropagation and gradient descent.
- **Evaluation:** Tested the model on a test dataset and calculated accuracy.

## Learning Style:
- Gradient-based optimization and backpropagation.
- Continuous features like pixel intensities (numerical).

# Step 1: Importing Libraries

We began by importing the necessary libraries. For this task, we used `numpy` for numerical computations and `pandas` for loading the CSV data files. These libraries are essential for handling the dataset and performing matrix operations efficiently.

In [35]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import accuracy_score

# Step 2: Loading the Data

Training data, training labels, testing data, and testing labels. We will load these datasets using pandas.

In [37]:
# Load training data and labels
X_train = pd.read_csv('data/training60000.csv', header=None).values
y_train = pd.read_csv('data/training60000_labels.csv', header=None).values

# Load testing data and labels
X_test = pd.read_csv('data/testing10000.csv', header=None).values
y_test = pd.read_csv('data/testing10000_labels.csv', header=None).values

# Step 3: Preprocessing the Data

Normalize the input data (range normalization) and convert labels to one-hot encoding.

In [39]:
# Normalize the input data
X_train = (X_train - 0.01) / (1 - 0.01)  # Normalize using the given formula
X_test = (X_test - 0.01) / (1 - 0.01)

# One-hot encode the labels
encoder = OneHotEncoder(sparse_output=False)
y_train_one_hot = encoder.fit_transform(y_train.reshape(-1, 1))
y_test_one_hot = encoder.transform(y_test.reshape(-1, 1))

# Step 4: Defining Activation Functions

We'll need the logistic sigmoid function for the hidden layer and the softmax function for the output layer.

In [41]:
# Logistic (Sigmoid) function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Derivative of the logistic function (for backpropagation)
def sigmoid_derivative(z):
    return sigmoid(z) * (1 - sigmoid(z))

# Softmax function
def softmax(z):
    exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))  # Stability improvement
    return exp_z / np.sum(exp_z, axis=1, keepdims=True)

# Cross-entropy loss function (for output layer)
def cross_entropy_loss(y_pred, y_true):
    return -np.sum(y_true * np.log(y_pred + 1e-8)) / y_true.shape[0]  # Add small value to prevent log(0)

# Step 5: Initializing Network Parameters

Now, we'll initialize the weights and biases for each layer. For simplicity, let's use random initialization for the weights and zeros for the biases.

In [43]:
# Set the number of units in each layer
input_size = 784  # 28x28 pixels
hidden_size = 128  # You can experiment with different sizes here
output_size = 10  # Digits 0-9

# Initialize weights and biases
W1 = np.random.randn(input_size, hidden_size) * 0.01
b1 = np.zeros((1, hidden_size))

W2 = np.random.randn(hidden_size, output_size) * 0.01
b2 = np.zeros((1, output_size))

# Step 6: Forward Pass

Now, we will implement the forward pass, where we compute the activations for each layer.

In [45]:
# Forward pass
def forward(X):
    z1 = np.dot(X, W1) + b1
    a1 = sigmoid(z1)  # Hidden layer activation
    
    z2 = np.dot(a1, W2) + b2
    a2 = softmax(z2)  # Output layer activation
    
    return a1, a2

# Step 7: Backpropagation

The backpropagation algorithm will compute the gradients of the loss function with respect to the weights and biases and update them.

In [47]:
# Backpropagation
def backprop(X, y, a1, a2):
    # Output layer gradients (softmax + cross-entropy)
    delta2 = a2 - y  # Derivative of loss with respect to a2
    
    # Hidden layer gradients
    delta1 = np.dot(delta2, W2.T) * sigmoid_derivative(a1)  # Backprop through hidden layer
    
    # Gradients for weights and biases
    dW2 = np.dot(a1.T, delta2) / X.shape[0]
    db2 = np.sum(delta2, axis=0, keepdims=True) / X.shape[0]
    
    dW1 = np.dot(X.T, delta1) / X.shape[0]
    db1 = np.sum(delta1, axis=0, keepdims=True) / X.shape[0]
    
    return dW1, db1, dW2, db2

# Step 8: Training the Network

We will now train the network by iterating over the training data and updating the weights using gradient descent.

In [49]:
# Training function
def train(X_train, y_train, epochs=10, learning_rate=0.1, batch_size=64):
    global W1, b1, W2, b2
    
    num_samples = X_train.shape[0]
    num_batches = num_samples // batch_size
    
    for epoch in range(epochs):
        for i in range(num_batches):
            # Get the batch data
            batch_X = X_train[i*batch_size:(i+1)*batch_size]
            batch_y = y_train[i*batch_size:(i+1)*batch_size]
            
            # Forward pass
            a1, a2 = forward(batch_X)
            
            # Backpropagation
            dW1, db1, dW2, db2 = backprop(batch_X, batch_y, a1, a2)
            
            # Update weights and biases using gradient descent
            W1 -= learning_rate * dW1
            b1 -= learning_rate * db1
            W2 -= learning_rate * dW2
            b2 -= learning_rate * db2
        
        # Print loss after each epoch
        _, a2 = forward(X_train)
        loss = cross_entropy_loss(a2, y_train)
        print(f"Epoch {epoch+1}/{epochs}, Loss: {loss:.4f}")

# Step 9: Evaluating the Model

Finally, we will test the model and calculate the classification accuracy.

In [51]:
def evaluate(X_test, y_test):
    # Perform forward pass
    _, a2 = forward(X_test)
    predictions = np.argmax(a2, axis=1)
    correct = np.sum(predictions == np.argmax(y_test, axis=1))
    incorrect = y_test.shape[0] - correct
    accuracy = correct / y_test.shape[0] * 100

    # Print results
    print("==== Results")
    print(f"Network properties:  Input: {input_size}, Hidden: {hidden_size}, Output: {output_size}")
    print(f"Correct classifications: {correct}")
    print(f"Incorrect classifications: {incorrect}")
    print(f"Accuracy: {accuracy:.2f}%")

    return accuracy

# Step 10: Putting It All Together

Now we can train the network and evaluate it.

In [53]:
# Train the model
train(X_train, y_train_one_hot, epochs=10, learning_rate=0.1, batch_size=64)

# Evaluate the model
evaluate(X_test, y_test_one_hot)

Epoch 1/10, Loss: 0.5075
Epoch 2/10, Loss: 0.3613
Epoch 3/10, Loss: 0.3202
Epoch 4/10, Loss: 0.3007
Epoch 5/10, Loss: 0.2892
Epoch 6/10, Loss: 0.2814
Epoch 7/10, Loss: 0.2757
Epoch 8/10, Loss: 0.2711
Epoch 9/10, Loss: 0.2674
Epoch 10/10, Loss: 0.2643
==== Results
Network properties:  Input: 784, Hidden: 128, Output: 10
Correct classifications: 9170
Incorrect classifications: 830
Accuracy: 91.70%


91.7