The goal of this project was to build a Convolutional Neural Network (CNN) from scratch, without using high-level deep learning frameworks such as TensorFlow or PyTorch. All neural network computations are implemented using NumPy, while Pandas, Scikit-learn, and kagglehub are used only for data loading, preprocessing, and dataset management.
This project demonstrates a complete CNN pipeline, including convolution, pooling, flattening, dense layers, forward propagation, backpropagation, and training on the MNIST handwritten digit dataset.
The project is composed of several modular classes, each representing a core component of a CNN.
The Dense_Layer class represents a fully connected (dense) layer in the network. Each layer is initialized with a specified number of neurons, number of input features, and an activation function.
Key features:
- Xavier initialization for weight matrices
- Bias vectors for each neuron
- Supported activation functions: sigmoid, tanh, ReLU, and softmax
- Methods for computing weighted sums and applying activation functions
- Getters and setters for weights and biases
This class is used for both hidden dense layers and the final output layer.
The Convolutional_Layer class implements a 2D convolutional layer for image inputs.
Key features:
- Support for multi-channel inputs
- Configurable number of filters, filter size, stride, and padding
- Xavier or He initialization for filters
- Manual convolution implementation using NumPy
- Generation of feature maps
This layer is responsible for learning spatial features from input images.
The Pooling_Layer class performs downsampling on feature maps produced by convolutional layers.
Key features:
- Supports max pooling and average pooling
- Configurable filter size and stride
- Reduces spatial dimensions while preserving important features
- Includes backpropagation logic for both pooling types
Pooling layers improve computational efficiency and robustness to small spatial shifts.
The Flattening_Layer class converts multi-dimensional feature maps into a one-dimensional vector suitable for dense layers.
Key features:
- Forward pass flattens feature maps into a vector
- Backward pass reshapes gradients back to the original feature map dimensions
- Ensures dimensional consistency during backpropagation
The CNN class represents the full convolutional neural network and manages the interactions between all layers.
Key features:
- Sequential feedforward propagation through all layers
- Manual backpropagation for convolutional, pooling, flattening, and dense layers
- Support for cross-entropy and hinge loss functions
- Gradient-based parameter updates using stochastic gradient descent
- Training and testing routines with accuracy evaluation
This project uses the MNIST handwritten digit dataset, consisting of grayscale images of digits from 0 to 9.
- Data Loading: The dataset is downloaded using
kagglehuband loaded into a Pandas DataFrame. - Normalization: Pixel values are scaled from
[0, 255]to[0, 1]to improve training stability. - Train/Test Split: The dataset is split into training and testing sets using Scikit-learn with stratification to preserve class balance.
- Reshaping: Each image is reshaped to
(28, 28, 1)before being passed into the CNN. - One-Hot Encoding: Target labels are converted to one-hot encoded vectors during training for multi-class classification.
The CNN architecture used in this project is:
- Convolutional Layer (8 filters, 3×3, stride 1, padding 1, He initialization)
- Max Pooling Layer (2×2, stride 2)
- Flattening Layer
- Dense Layer (128 neurons, ReLU activation)
- Dense Output Layer (10 neurons, Softmax activation)
This architecture is designed to balance clarity, simplicity, and performance.
The network is trained using stochastic gradient descent with configurable:
- Number of epochs
- Learning rate
- Loss function (cross-entropy)
During training, classification accuracy is reported after each epoch. After training, the model is evaluated on the test set, and overall test accuracy is displayed.
A fixed random seed is used to ensure reproducible results.
This project demonstrates how a convolutional neural network can be implemented entirely from scratch using low-level numerical operations. By manually implementing convolution, pooling, flattening, backpropagation, and optimization, the project provides a strong conceptual understanding of how CNNs function internally.
Overall, this work showcases both the theoretical foundations and practical implementation of deep learning applied to image classification using the MNIST dataset.