# Convolutional Neural Network: Architecture, Implementation, and Results
## Objective
To design, implement, and evaluate a convolutional neural network (CNN) for a classification task. The report includes:
- Description of the architecture.
- Implementation details.
- Results: Test accuracy and loss.


In [1]:
import numpy as np
from keras.datasets import mnist # mnist.load_data()
from PIL import Image


2024-12-14 10:07:00.260375: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Loading/Preparing Dataset
Before training the neural network, we need to load the data and prepare it:
- Scale the input [0, 1]
- Convert output targets into one-hot encoding.
- Print results for clarity.


In [2]:
# Load the MNIST dataset and preprocess it by normalizing the pixel values.
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Check shapes
print("Training data shape:", X_train.shape, y_train.shape)
print("Test data shape:", X_test.shape, y_test.shape)

# Normalize the pixel values to the range [0, 1]
X_train = X_train / 255.0
X_test = X_test / 255.0

#initializing a list for one-hot encoded target arrayas
y_one_hot_train = []
y_one_hot_test = []

#iterating through each target in y
for target in y_train:
    one_hot = np.zeros(10)  #array of 10 zeross
    one_hot[target] = 1  
    y_one_hot_train.append(one_hot)  # Append to the list

for target in y_test:
    one_hot = np.zeros(10)  #array of 10 zeross
    one_hot[target] = 1  
    y_one_hot_test.append(one_hot)  # Append to the list

#print the first 5 one-hot encoded targets to verify
print(y_train[:5])
print(y_one_hot_train[:5])
print(y_test[:5])
print(y_one_hot_test[:5])

Training data shape: (60000, 28, 28) (60000,)
Test data shape: (10000, 28, 28) (10000,)
[5 0 4 1 9]
[array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]), array([1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]), array([0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0., 0., 0., 0., 0., 1.])]
[7 2 1 0 4]
[array([0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]), array([0., 0., 1., 0., 0., 0., 0., 0., 0., 0.]), array([0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]), array([1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0.])]


## Network Architecture and Implementation
### Network Architecture
The CNN consists of the following layers:
1. **Convolutional Layer 1**:
   - Kernel size: \(3 * 3\) * 2
   - Activation: Sigmoid
   - Output: \(26 * 26\) * 2

2. **Average Pooling Layers**:
   - Pool size: \(2 * 2\) *2
   - Output: \(13 * 13\) * 2

3. **1 by 1 Convolution Layer**:
   - Flattened input size: 338
   - Output size: Number of classes (e.g., 10 for MNIST).

4. **Output Layer**:
   - Activation: Softmax
   - Outputs class probabilities.

### Implementation
To implement the netwrok without using extra libraries, a convolution function, a forward propagation function, an average pooling functon and a backward propagation function were written:
1. **Convolve**: Written to relplace convolve2D function, this function recieves an input image (or feature map in other cases) and does a dot product between kernel weights and image, normalized pixel values, based on the stride and kernel sizes specified. It returns the resulting products.
2. **Average Pooling**: Written to replace avgPool function, this function does the same function of the convolve function of iterating the input feature map, but instead of returning the dot product of the kernel and a portion of the image, it returns an average value of the region in the feature map that it parses.
3. **Feed Forward**: This function implements the motion of an image through the CNN described above with whatever current learnable weights are in the kernels. It takes the input image, convolves it, applies a sigmoid to the convolution, then an avg pooling. It then flattens the output and puts it through 10 different 1*1 convolution kernels to get 10 different outputs. The outputs are passed to the softmax function to return probabilities which are then used to make a rrprediciton
4. **Back Propogate**: This function is used to update the weights after every feed forward run based on cross entropy loss. It updates the weights f the 1by1 convolutions by findnig the derivative of the cross entropy loss function with respect to the softmax output to the raw logit score and finally to the 1by1 kernel weights. It further propogates backwards through the flattened output, to the pooled feature map, to the sigmoid derivative to the 3by3 kernels upsampling the data when it needs to replacing the con2dTranspose function.

In [9]:
# Implement the CNN architecture described above using only basic Python libraries such as NumPy. 
class CNN:
    def __init__(self, learning_rate, epochs):
        self.lr=learning_rate
        self.epochs = epochs

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def softmax(self, x):
        exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))  # Stability adjustment
        return exp_x / np.sum(exp_x, axis=1, keepdims=True)
    
    def sigmoidDerivative(self, x):
        s = self.sigmoid(x)
        return s * (1 - s)  # Derivative: sigmoid(x) * (1 - sigmoid(x))
    
    def convolve(self, image, kernel, stride=1):
        kernel_height, kernel_width = kernel.shape #3 by 3
        image_height, image_width = image.shape # 28 by 28
    
        output_height = (image_height - kernel_height) // stride + 1 #26
        output_width = (image_width - kernel_width) // stride + 1 #26
    
        output = np.zeros((output_height, output_width))
        
        for i in range(output_height):
            for j in range(output_width):
                region = image[i * stride:i * stride + kernel_height, j * stride:j * stride + kernel_width] #selects part of image
                output[i, j] = np.sum(region * kernel) #dot product with kernel
        return output
    
    def average_pooling(self, featuremap, pool_size=2, stride=2):
        map_height, map_width = featuremap.shape
        output_height = (map_height - pool_size) // stride + 1
        output_width = (map_width - pool_size) // stride + 1
    
        output = np.zeros((output_height, output_width))
    
        for i in range(0, output_height):
            for j in range(0, output_width):
                region = featuremap[i * stride:i * stride + pool_size, j * stride:j * stride + pool_size]
                output[i, j] = np.mean(region)
        return output
    
    def flatten(self, pool1, pool2):
        conv_output = np.stack((pool1, pool2), axis=0)
        return conv_output.flatten()
    
    def feedForward(self, X): # Implement the forward and backward propagation for the network.
        
        # First convolutional layer feature maps after activation
        self.conv1 = self.sigmoid(self.convolve(X, self.kernel1, stride=1)) # 26 by 26
        self.conv2 = self.sigmoid(self.convolve(X, self.kernel2, stride=1)) #26 by 26
    
        # Average pooling layer
        self.pool1 = self.average_pooling(self.conv1, 2, 2) #13 by 13
        self.pool2 = self.average_pooling(self.conv2, 2, 2) #13 by 13

        # Flatten layer
        self.flattened_output = self.flatten(self.pool1, self.pool2)

        # 1x1 convolution (fully connected layer equivalent)
        fc_output = np.dot(self.flattened_output, self.fc_weights)  # Final concolution
    
        # Softmax activation
        predictions = self.softmax(fc_output.reshape(1, -1))
    
        return predictions

    
    def backPropagate(self, X, y, predictions):#Use cross entropy as the error function.  
       
        # Compute the loss gradient (Softmax and Cross-Entropy)
        delta_fc = predictions - y  # Shape: (1, 10)
    
        # Gradients for the 1x1 convolution weights
        grad_fc_weights = np.outer(self.flattened_output, delta_fc)  # Shape: (338, 10)
    
        # Backpropagate to the flattened layer
        delta_flattened = np.dot(delta_fc, self.fc_weights.T)  # Shape: (1, 338)
    
        # Reshape delta_flattened to match the shape of the pooled feature maps
        delta_pool1 = delta_flattened[:, :self.pool1.size].reshape(self.pool1.shape)
        delta_pool2 = delta_flattened[:, self.pool1.size:self.pool1.size + self.pool2.size].reshape(self.pool2.shape)
        
        # Upsample gradients from the pooling layer to match the convolutional layer
        def upsample(gradient, original_shape, pool_size, stride):
            upsampled = np.zeros(original_shape)
            for i in range(gradient.shape[0]):
                for j in range(gradient.shape[1]):
                    upsampled[i * stride:i * stride + pool_size, j * stride:j * stride + pool_size] = gradient[i, j] / (pool_size * pool_size)
            return upsampled
            
        grad_conv1 = upsample(delta_pool1, self.conv1.shape, pool_size=2, stride=2) * self.sigmoidDerivative(self.conv1)
        grad_conv2 = upsample(delta_pool2, self.conv2.shape, pool_size=2, stride=2) * self.sigmoidDerivative(self.conv2)
    
        # Gradients for the convolutional kernels
        grad_kernel1 = np.zeros_like(self.kernel1)
        grad_kernel2 = np.zeros_like(self.kernel2)
    
        for i in range(grad_kernel1.shape[0]):
            for j in range(grad_kernel1.shape[1]):
                grad_kernel1[i, j] = np.sum(
                    X[i:i + grad_conv1.shape[0], j:j + grad_conv1.shape[1]] * grad_conv1
                )
                grad_kernel2[i, j] = np.sum(
                    X[i:i + grad_conv2.shape[0], j:j + grad_conv2.shape[1]] * grad_conv2
                )
    
        # Update weights
        self.fc_weights -= self.lr * grad_fc_weights
        self.kernel1 -= self.lr * grad_kernel1
        self.kernel2 -= self.lr * grad_kernel2


    def train(self, X_train, y_train): # Train the network using the training dataset and evaluate its performance using the test dataset.
        # Initialize random weights for convolutional kernels (first layer kernels and 1by1 convolutions in end)
        self.kernel1 = np.random.randn(3, 3) * 0.1  #3 by 3
        self.kernel2 = np.random.randn(3, 3) * 0.1  #3 by 3 
        self.fc_weights = np.random.randn(338, 10) * 0.1  # Random weights for final convolution 10 output channels

        for epoch in range(self.epochs):
            for X, y in zip(X_train, y_train):
                predictions = self.feedForward(X)
                self.backPropagate(X, y, predictions)
         
           

    def evaluate(self, X_test, y_test):
        correct = 0
        total = X_test.shape[0]
        total_loss = 0.0
    
        for i in range(total):
            # Forward pass for a single sample
            y_pred = self.feedForward(X_test[i])  # Shape: (1, num_classes)
            y_pred_class = np.argmax(y_pred)  # Predicted class
            y_true_class = np.argmax(y_test[i])  # True class
            
            # Check if the prediction is correct
            if y_pred_class == y_true_class:
                correct += 1
            
            # Compute cross-entropy loss for this sample
            loss = -np.sum(y_test[i] * np.log(y_pred + 1e-8))  # Add small value to avoid log(0)
            total_loss += loss
    
        # Calculate accuracy
        accuracy = correct / total
        # Calculate average loss
        avg_loss = total_loss / total
    
        print(f"Accuracy: {accuracy:.2f}, Average Loss: {avg_loss:.4f}")
        return accuracy, avg_loss



## Training and Test Results:
The network was trained using stochastic gradient descent. A learning rate of 0.001 was used and a total of 10 epochs. The network was evaluated on the test dataset using accuracy and average cross-entropy loss as metrics.

In [10]:
#Accuracy and loss achieved by the network on the test dataset
myCNN = CNN(0.001, 10)
myCNN.train(X_train, y_one_hot_train)
accuracy, avg_loss = myCNN.evaluate(X_test, y_one_hot_test)
print(f"Accuracy: {accuracy:.2f}, Average Loss: {avg_loss:.4f}")


Accuracy: 0.89, Average Loss: 0.3695
Accuracy: 0.89, Average Loss: 0.3695
