## CNN from scratch

The aim of this project is to build a Convolutional Neural Network entirely from scratch using Pandas (For data manipulation) and Numpy (For mathematical operations)

The training data used in this project is the MNIST dataset containing handwritten digits from 0 to 9.

After training and testing the model on the MNIST dataset, I will test the model on my own handwritten digits.

To process the custom images and feed them into the model, I will use OpenCV. 

### Conceptual Overview

The Neural Network will have a very simple two-layer architecture.

Consider A^[0] as the input layer. This layer will have 784 nodes/units, corresponding to the 784 pixels in each 28x28 image.

Consider A^[1] as the hidden layer. This layer will have 10 nodes/units with ReLU activation function.

Consider A^[2] as the output layer. This layer will have 10 nodes/units corresponding to the 10 digit classes we are trying to predict. This layer has a softmax activation function.

Mathematically speaking this is what happens under the hood:

**Forward propagation**

$$Z^{[1]} = W^{[1]} X + b^{[1]}$$
$$A^{[1]} = g_{\text{ReLU}}(Z^{[1]}))$$
$$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$$
$$A^{[2]} = g_{\text{softmax}}(Z^{[2]})$$

**Backward propagation**

$$dZ^{[2]} = A^{[2]} - Y$$
$$dW^{[2]} = \frac{1}{m} dZ^{[2]} A^{[1]T}$$
$$dB^{[2]} = \frac{1}{m} \Sigma {dZ^{[2]}}$$
$$dZ^{[1]} = W^{[2]T} dZ^{[2]} .* g^{[1]\prime} (z^{[1]})$$
$$dW^{[1]} = \frac{1}{m} dZ^{[1]} A^{[0]T}$$
$$dB^{[1]} = \frac{1}{m} \Sigma {dZ^{[1]}}$$

**Parameter updates**

$$W^{[2]} := W^{[2]} - \alpha dW^{[2]}$$
$$b^{[2]} := b^{[2]} - \alpha db^{[2]}$$
$$W^{[1]} := W^{[1]} - \alpha dW^{[1]}$$
$$b^{[1]} := b^{[1]} - \alpha db^{[1]}$$

The goal is to get the optimal parameters for our model to start making better predictions.



In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

data_train = pd.read_csv('train.csv')
data_test = pd.read_csv('test.csv')

In [2]:
data_train.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Preprocess the Dataset

In [3]:
class DataPreprocessor:
    """Utility class for data preprocessing."""
    
    @staticmethod
    def prepare_data(data_train,test_size = 1000):
        """
        Prepare and split data into train and validation sets.
        
        Args:
            data: Raw data array
            test_size: Number of samples for test set
            
        Returns:
            tuple: (X_train, Y_train, X_val, Y_val)
        """
        # Convert to numpy array
        data = np.array(data_train)
        m, n = data.shape
        
        # Shuffle data
        np.random.shuffle(data)
        
        # Split into validation and train
        data_val = data[:test_size].T
        Y_val = data_val[0].astype(int)
        X_val = data_val[1:n].astype(float)
        
        data_train_split = data[test_size:m].T
        Y_train = data_train_split[0].astype(int)
        X_train = data_train_split[1:n].astype(float)
        
        # Normalize pixel values
        X_train = X_train / 255.0
        X_val = X_val / 255.0
        
        # Invert the pixel values
        X_train = 1.0 - X_train
        X_val = 1.0 - X_val

        return X_train, Y_train, X_val, Y_val
    
    @staticmethod
    def prepare_test_data(data_test):
        """
        Process test data (no labels).
        
        Args:
            data_test: Test data DataFrame (no labels, only pixels)
            
        Returns:
            np.array: X_test - processed test data
        """
        # Convert to numpy array
        test_array = np.array(data_test)
        X_test = test_array.T.astype(float)  # All columns are pixels
        
        # Apply same preprocessing as training data
        X_test = X_test / 255.0
        X_test = 1.0 - X_test  # Same inversion
        
        return X_test

### Build the Neural Network

In [4]:
class CNN:
    """
    A simple 2-layer neural network for digit classification.
    """
    
    def __init__(self, input_size=784, hidden_size=10, output_size=10):
        """
        Initialize the neural network with specified layer sizes.
        
        Args:
            input_size (int): Number of input features (default: 784 for 28x28 images)
            hidden_size (int): Number of hidden layer neurons
            output_size (int): Number of output classes
        """
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # Initialize parameters
        self.W1, self.b1, self.W2, self.b2 = self._init_params()
        
    def _init_params(self):
        """Initialize weights and biases with random values."""
        W1 = np.random.rand(self.hidden_size, self.input_size) - 0.5
        b1 = np.random.rand(self.hidden_size, 1) - 0.5
        W2 = np.random.rand(self.output_size, self.hidden_size) - 0.5
        b2 = np.random.rand(self.output_size, 1) - 0.5
        return W1, b1, W2, b2
    
    @staticmethod
    def relu(Z):
        """ReLU activation function."""
        return np.maximum(Z, 0)
    
    @staticmethod
    def relu_derivative(Z):
        """Derivative of ReLU activation function."""
        return Z > 0
    
    @staticmethod
    def softmax(Z):
        """Softmax activation function."""
        # Subtract max for numerical stability
        Z_stable = Z - np.max(Z, axis=0, keepdims=True)
        exp_Z = np.exp(Z_stable)
        return exp_Z / np.sum(exp_Z, axis=0, keepdims=True)
    
    @staticmethod
    def one_hot_encode(Y):
        """Convert labels to one-hot encoding."""
        one_hot_Y = np.zeros((Y.size, Y.max() + 1))
        one_hot_Y[np.arange(Y.size), Y] = 1
        return one_hot_Y.T
    
    def forward_propagation(self, X):
        """
        Perform forward propagation through the network.
        
        Args:
            X (np.array): Input data of shape (input_size, m)
            
        Returns:
            tuple: (Z1, A1, Z2, A2) - intermediate values and final output
        """
        Z1 = self.W1.dot(X) + self.b1
        A1 = self.relu(Z1)
        Z2 = self.W2.dot(A1) + self.b2
        A2 = self.softmax(Z2)
        return Z1, A1, Z2, A2
    
    def backward_propagation(self, Z1, A1, Z2, A2, X, Y):
        """
        Perform backward propagation to compute gradients.
        
        Args:
            Z1, A1, Z2, A2: Forward propagation outputs
            X: Input data
            Y: True labels
            
        Returns:
            tuple: Gradients (dW1, db1, dW2, db2)
        """
        m = X.shape[1]  # number of examples
        one_hot_Y = self.one_hot_encode(Y)
        
        # Backward propagation
        dZ2 = A2 - one_hot_Y
        dW2 = (1 / m) * dZ2.dot(A1.T)
        db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True)
        
        dZ1 = self.W2.T.dot(dZ2) * self.relu_derivative(Z1)
        dW1 = (1 / m) * dZ1.dot(X.T)
        db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)
        
        return dW1, db1, dW2, db2
    
    def update_parameters(self, dW1, db1, dW2, db2, learning_rate):
        """Update network parameters using gradients."""
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2
    
    def predict(self, X):
        """Make predictions on input data."""
        _, _, _, A2 = self.forward_propagation(X)
        return np.argmax(A2, axis=0)
    
    def compute_accuracy(self, predictions, Y):
        """Compute accuracy of predictions."""
        return np.sum(predictions == Y) / Y.size
    
    def compute_loss(self, A2, Y):
        """Compute cross-entropy loss."""
        m = Y.size
        one_hot_Y = self.one_hot_encode(Y)
        # Add small epsilon to prevent log(0)
        epsilon = 1e-15
        A2_clipped = np.clip(A2, epsilon, 1 - epsilon)
        loss = -np.sum(one_hot_Y * np.log(A2_clipped)) / m
        return loss
    
    def train(self, X, Y, learning_rate, epochs, print_every=10):
        """
        Train the neural network using gradient descent.
        
        Args:
            X: Training data of shape (input_size, m)
            Y: Training labels
            learning_rate: Learning rate for gradient descent
            epochs: Number of training iterations
            print_every: Print progress every n epochs
            
        Returns:
            dict: Training history with losses and accuracies
        """
        history = {'loss': [], 'accuracy': []}
        
        for epoch in range(epochs):
            # Forward propagation
            Z1, A1, Z2, A2 = self.forward_propagation(X)
            
            # Backward propagation
            dW1, db1, dW2, db2 = self.backward_propagation(Z1, A1, Z2, A2, X, Y)
            
            # Update parameters
            self.update_parameters(dW1, db1, dW2, db2, learning_rate)
            
            # Compute metrics
            if epoch % print_every == 0 or epoch == epochs - 1:
                loss = self.compute_loss(A2, Y)
                predictions = self.predict(X)
                accuracy = self.compute_accuracy(predictions, Y)
                
                history['loss'].append(loss)
                history['accuracy'].append(accuracy)
                
                
                print(f"Epoch {epoch:4d}: Loss = {loss:.4f}, Accuracy = {accuracy:.4f}")
        
        return history
    
    def evaluate(self, X, Y):
        """Evaluate the model on test data."""
        predictions = self.predict(X)
        accuracy = self.compute_accuracy(predictions, Y)
        _, _, _, A2 = self.forward_propagation(X)
        loss = self.compute_loss(A2, Y)
        return {'accuracy': accuracy, 'loss': loss, 'predictions': predictions}

### Training

In [5]:
preprocessor = DataPreprocessor()
X_train, Y_train, X_val, Y_val= preprocessor.prepare_data(data_train)
    
# Create and train model
model = CNN(input_size=784, hidden_size=10, output_size=10)
history = model.train(X_train, Y_train, learning_rate=0.04, epochs=1500)

Epoch    0: Loss = 11.8556, Accuracy = 0.1188
Epoch   10: Loss = 2.3376, Accuracy = 0.1169
Epoch   20: Loss = 2.3191, Accuracy = 0.1183
Epoch   30: Loss = 2.3037, Accuracy = 0.1232
Epoch   40: Loss = 2.2863, Accuracy = 0.1406
Epoch   50: Loss = 2.2658, Accuracy = 0.1538
Epoch   60: Loss = 2.2425, Accuracy = 0.1643
Epoch   70: Loss = 2.2178, Accuracy = 0.1720
Epoch   80: Loss = 2.1930, Accuracy = 0.1766
Epoch   90: Loss = 2.1687, Accuracy = 0.1813
Epoch  100: Loss = 2.1453, Accuracy = 0.1860
Epoch  110: Loss = 2.1227, Accuracy = 0.1900
Epoch  120: Loss = 2.1009, Accuracy = 0.1933
Epoch  130: Loss = 2.0797, Accuracy = 0.1967
Epoch  140: Loss = 2.0591, Accuracy = 0.2000
Epoch  150: Loss = 2.0389, Accuracy = 0.2026
Epoch  160: Loss = 2.0192, Accuracy = 0.2056
Epoch  170: Loss = 1.9998, Accuracy = 0.2519
Epoch  180: Loss = 1.9809, Accuracy = 0.2567
Epoch  190: Loss = 1.9626, Accuracy = 0.2598
Epoch  200: Loss = 1.9450, Accuracy = 0.2632
Epoch  210: Loss = 1.9280, Accuracy = 0.2693
Epoch  22

### Testing on Validation Set

In [6]:
# Evaluate on test set
test_results = model.evaluate(X_val, Y_val)
print(f"Validation Accuracy: {test_results['accuracy']:.4f}")

Validation Accuracy: 0.7140


### Testing on Test Set

In [7]:
X_test= preprocessor.prepare_test_data(data_test)
predictions = model.predict(X_test)

In [8]:
print(predictions[:10])

[2 0 4 7 2 7 0 3 0 3]
