## CNN from scratch

The aim of this project is to build a Convolutional Neural Network entirely from scratch using Pandas (For data manipulation) and Numpy (For mathematical operations)

The training data used in this project is the MNIST dataset containing handwritten digits from 0 to 9.

After training and testing the model on the MNIST dataset, I will test the model on my own handwritten digits.

To process the custom images and feed them into the model, I will use OpenCV. 

### Conceptual Overview

The Neural Network will have a very simple two-layer architecture.

Consider A^[0] as the input layer. This layer will have 784 nodes/units, corresponding to the 784 pixels in each 28x28 image.

Consider A^[1] as the hidden layer. This layer will have 10 nodes/units with ReLU activation function.

Consider A^[2] as the output layer. This layer will have 10 nodes/units corresponding to the 10 digit classes we are trying to predict. This layer has a softmax activation function.

Mathematically speaking this is what happens under the hood:

**Forward propagation**

$$Z^{[1]} = W^{[1]} X + b^{[1]}$$
$$A^{[1]} = g_{\text{ReLU}}(Z^{[1]}))$$
$$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$$
$$A^{[2]} = g_{\text{softmax}}(Z^{[2]})$$

**Backward propagation**

$$dZ^{[2]} = A^{[2]} - Y$$
$$dW^{[2]} = \frac{1}{m} dZ^{[2]} A^{[1]T}$$
$$dB^{[2]} = \frac{1}{m} \Sigma {dZ^{[2]}}$$
$$dZ^{[1]} = W^{[2]T} dZ^{[2]} .* g^{[1]\prime} (z^{[1]})$$
$$dW^{[1]} = \frac{1}{m} dZ^{[1]} A^{[0]T}$$
$$dB^{[1]} = \frac{1}{m} \Sigma {dZ^{[1]}}$$

**Parameter updates**

$$W^{[2]} := W^{[2]} - \alpha dW^{[2]}$$
$$b^{[2]} := b^{[2]} - \alpha db^{[2]}$$
$$W^{[1]} := W^{[1]} - \alpha dW^{[1]}$$
$$b^{[1]} := b^{[1]} - \alpha db^{[1]}$$

The goal is to get the optimal parameters for our model to start making better predictions.



In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

data = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')

In [2]:
data.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Preprocess the Dataset

In [None]:
class DataPreprocessor:
    """Utility class for data preprocessing."""
    
    @staticmethod
    def prepare_data(data, test_size=1000):
        """
        Prepare and split data into train and test sets.
        
        Args:
            data: Raw data array
            test_size: Number of samples for test set
            
        Returns:
            tuple: (X_train, Y_train, X_test, Y_test)
        """
        data = np.array(data)
        m, n = data.shape
        
        # Shuffle data
        np.random.shuffle(data)
        
        # Split into test and train
        data_test = data[:test_size].T
        Y_test = data_test[0].astype(int)
        X_test = data_test[1:n].astype(float)
        
        data_train = data[test_size:m].T
        Y_train = data_train[0].astype(int)
        X_train = data_train[1:n].astype(float)
        
        # Normalize pixel values
        X_train = X_train / 255.0
        X_test = X_test / 255.0
        
        # Invert the pixel values
        X_train = 1.0 - X_train
        X_test = 1.0 - X_test

        return X_train, Y_train, X_test, Y_test

### Build the Neural Network

In [None]:
class CNN:
    """
    A simple 2-layer neural network for digit classification.
    """
    
    def __init__(self, input_size=784, hidden_size=10, output_size=10):
        """
        Initialize the neural network with specified layer sizes.
        
        Args:
            input_size (int): Number of input features (default: 784 for 28x28 images)
            hidden_size (int): Number of hidden layer neurons
            output_size (int): Number of output classes
        """
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # Initialize parameters
        self.W1, self.b1, self.W2, self.b2 = self._init_params()
        
    def _init_params(self):
        """Initialize weights and biases with random values."""
        W1 = np.random.rand(self.hidden_size, self.input_size) - 0.5
        b1 = np.random.rand(self.hidden_size, 1) - 0.5
        W2 = np.random.rand(self.output_size, self.hidden_size) - 0.5
        b2 = np.random.rand(self.output_size, 1) - 0.5
        return W1, b1, W2, b2
    
    @staticmethod
    def relu(Z):
        """ReLU activation function."""
        return np.maximum(Z, 0)
    
    @staticmethod
    def relu_derivative(Z):
        """Derivative of ReLU activation function."""
        return Z > 0
    
    @staticmethod
    def softmax(Z):
        """Softmax activation function."""
        # Subtract max for numerical stability
        Z_stable = Z - np.max(Z, axis=0, keepdims=True)
        exp_Z = np.exp(Z_stable)
        return exp_Z / np.sum(exp_Z, axis=0, keepdims=True)
    
    @staticmethod
    def one_hot_encode(Y):
        """Convert labels to one-hot encoding."""
        one_hot_Y = np.zeros((Y.size, Y.max() + 1))
        one_hot_Y[np.arange(Y.size), Y] = 1
        return one_hot_Y.T
    
    def forward_propagation(self, X):
        """
        Perform forward propagation through the network.
        
        Args:
            X (np.array): Input data of shape (input_size, m)
            
        Returns:
            tuple: (Z1, A1, Z2, A2) - intermediate values and final output
        """
        Z1 = self.W1.dot(X) + self.b1
        A1 = self.relu(Z1)
        Z2 = self.W2.dot(A1) + self.b2
        A2 = self.softmax(Z2)
        return Z1, A1, Z2, A2
    
    def backward_propagation(self, Z1, A1, Z2, A2, X, Y):
        """
        Perform backward propagation to compute gradients.
        
        Args:
            Z1, A1, Z2, A2: Forward propagation outputs
            X: Input data
            Y: True labels
            
        Returns:
            tuple: Gradients (dW1, db1, dW2, db2)
        """
        m = X.shape[1]  # number of examples
        one_hot_Y = self.one_hot_encode(Y)
        
        # Backward propagation
        dZ2 = A2 - one_hot_Y
        dW2 = (1 / m) * dZ2.dot(A1.T)
        db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True)
        
        dZ1 = self.W2.T.dot(dZ2) * self.relu_derivative(Z1)
        dW1 = (1 / m) * dZ1.dot(X.T)
        db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)
        
        return dW1, db1, dW2, db2
    
    def update_parameters(self, dW1, db1, dW2, db2, learning_rate):
        """Update network parameters using gradients."""
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2
    
    def predict(self, X):
        """Make predictions on input data."""
        _, _, _, A2 = self.forward_propagation(X)
        return np.argmax(A2, axis=0)
    
    def compute_accuracy(self, predictions, Y):
        """Compute accuracy of predictions."""
        return np.sum(predictions == Y) / Y.size
    
    def compute_loss(self, A2, Y):
        """Compute cross-entropy loss."""
        m = Y.size
        one_hot_Y = self.one_hot_encode(Y)
        # Add small epsilon to prevent log(0)
        epsilon = 1e-15
        A2_clipped = np.clip(A2, epsilon, 1 - epsilon)
        loss = -np.sum(one_hot_Y * np.log(A2_clipped)) / m
        return loss
    
    def train(self, X, Y, learning_rate, epochs, print_every=10):
        """
        Train the neural network using gradient descent.
        
        Args:
            X: Training data of shape (input_size, m)
            Y: Training labels
            learning_rate: Learning rate for gradient descent
            epochs: Number of training iterations
            print_every: Print progress every n epochs
            
        Returns:
            dict: Training history with losses and accuracies
        """
        history = {'loss': [], 'accuracy': []}
        
        for epoch in range(epochs):
            # Forward propagation
            Z1, A1, Z2, A2 = self.forward_propagation(X)
            
            # Backward propagation
            dW1, db1, dW2, db2 = self.backward_propagation(Z1, A1, Z2, A2, X, Y)
            
            # Update parameters
            self.update_parameters(dW1, db1, dW2, db2, learning_rate)
            
            # Compute metrics
            if epoch % print_every == 0 or epoch == epochs - 1:
                loss = self.compute_loss(A2, Y)
                predictions = self.predict(X)
                accuracy = self.compute_accuracy(predictions, Y)
                
                history['loss'].append(loss)
                history['accuracy'].append(accuracy)
                
                
                print(f"Epoch {epoch:4d}: Loss = {loss:.4f}, Accuracy = {accuracy:.4f}")
        
        return history
    
    def evaluate(self, X, Y):
        """Evaluate the model on test data."""
        predictions = self.predict(X)
        accuracy = self.compute_accuracy(predictions, Y)
        _, _, _, A2 = self.forward_propagation(X)
        loss = self.compute_loss(A2, Y)
        return {'accuracy': accuracy, 'loss': loss, 'predictions': predictions}

### Training

In [None]:
preprocessor = DataPreprocessor()
X_train, Y_train, X_test, Y_test = preprocessor.prepare_data(data)
    
# Create and train model
model = CNN(input_size=784, hidden_size=10, output_size=10)
history = model.train(X_train, Y_train, learning_rate=0.05, epochs=1000)

Epoch    0: Loss = 3.8290, Accuracy = 0.1412
Epoch   10: Loss = 2.0560, Accuracy = 0.2800
Epoch   20: Loss = 1.8058, Accuracy = 0.3897
Epoch   30: Loss = 1.6128, Accuracy = 0.4695
Epoch   40: Loss = 1.4512, Accuracy = 0.5370
Epoch   50: Loss = 1.3164, Accuracy = 0.5873
Epoch   60: Loss = 1.2054, Accuracy = 0.6247
Epoch   70: Loss = 1.1148, Accuracy = 0.6532
Epoch   80: Loss = 1.0408, Accuracy = 0.6767
Epoch   90: Loss = 0.9798, Accuracy = 0.6949
Epoch  100: Loss = 0.9289, Accuracy = 0.7100
Epoch  110: Loss = 0.8859, Accuracy = 0.7233
Epoch  120: Loss = 0.8493, Accuracy = 0.7348
Epoch  130: Loss = 0.8177, Accuracy = 0.7450
Epoch  140: Loss = 0.7899, Accuracy = 0.7539
Epoch  150: Loss = 0.7655, Accuracy = 0.7616
Epoch  160: Loss = 0.7437, Accuracy = 0.7681
Epoch  170: Loss = 0.7243, Accuracy = 0.7739
Epoch  180: Loss = 0.7068, Accuracy = 0.7800
Epoch  190: Loss = 0.6909, Accuracy = 0.7848
Epoch  200: Loss = 0.6764, Accuracy = 0.7899
Epoch  210: Loss = 0.6632, Accuracy = 0.7945
Epoch  220

### Testing

In [6]:
# Evaluate on test set
test_results = model.evaluate(X_test, Y_test)
print(f"Test Accuracy: {test_results['accuracy']:.4f}")

Test Accuracy: 0.8340


### Processing custom Images

In [None]:
def capture_image(webcam_id=0):

    # Initialize the webcam
    print("Initialising video capture")
    cap = cv2.VideoCapture(webcam_id)
    time.sleep(1.000)
    # Check if the webcam is opened correctly
    
    if not cap.isOpened():
        print("Error: Could not open webcam.")
        exit()

    # Capture a single frame
    print("Capturing frame")
    ret, frame = cap.read()

    # Check if the frame was captured correctly
    if not ret:
        print("Error: Could not read frame.")
        exit()
  
    # Release the webcam
    print("Releasing webcam")
    cap.release()

    # Convert from BGR to RGB
    return cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)