The objective of this project was to completely demystify the "black box" of machine learning. Rather than relying on modern frameworks from the start, this project was built by hand-coding the foundational linear algebra and calculus required for a neural network to learn. Once the raw mathematics were proven, the architecture was translated into PyTorch, optimized for edge-compute, and deployed via a custom Vanilla JavaScript computer vision pipeline.
The repository is structured sequentially to demonstrate the progression from mathematical theory to a production-ready web application.
File: 01_Neural_Network_From_Scratch.ipynb
- Architecture: Single-layer neural network.
- Implementation: Engineered entirely from scratch using only standard Python and NumPy.
- Core Mechanics: Hand-coded matrix multiplication for the forward pass, raw calculus chain rule derivatives for backward propagation, and Stochastic Gradient Descent (SGD) for weight updates.
- Results & Limitations: Achieved ~89% accuracy. The model hit a "glass ceiling" because a flat layer can only read raw pixels, lacking the capacity to detect complex geometric shapes.
File: 02_Two_Layer_Neural_Network.ipynb
- Architecture: Multi-Layer Perceptron (Deep Learning).
- Implementation: Upgraded the foundational math to support hidden layers.
- Core Mechanics: Implemented non-linear activation functions (ReLU, Softmax) and managed vanishing gradients during deep backpropagation.
- Results: Accuracy improved to ~97%.
File: 03_PyTorch_Baseline.ipynb
- Implementation: The custom NumPy architecture was explicitly translated into PyTorch. This validated that the hand-coded mathematics perfectly mirrored professional framework behavior before introducing complex convolutions.
File: 04_Optimized_CNN.py
- Architecture Upgrade: Migrated to a Convolutional Neural Network (CNN) to handle spatial anomalies (tilted handwriting, off-center numbers). Utilized 2x
Conv2dlayers paired withMaxPool2dto reduce the final linear connections from 25,088 to just 1,568, drastically cutting memory requirements. - Training Optimizations: * Implemented strict Data Augmentation (
RandomRotation(20),RandomAffine(translate=0.1, scale=0.9-1.1)) to force the model to learn messy, real-world handwriting.- Added a
Dropout(0.3)layer to prevent memorization/overfitting.
- Added a
- Quantization & Export: The trained PyTorch model was dynamically quantized to 8-bit integers (
QUInt8), compressing the ONNX file size by ~4x (from roughly 900KB down to ~225KB) with zero noticeable loss in inference accuracy.
Files: index.html | script.js | style.css
The front-end does not rely on a backend server. The model runs locally in the user's browser, processing a 1x1x28x28 float32 tensor mapped from a live camera feed.
Front-End Tech Stack: HTML5, CSS3 (CSS Variables, Flexbox, Backdrop-Filter), Vanilla JavaScript (ES6+), HTML5 Canvas API, WebRTC (MediaDevices API), ONNX Runtime Web (ort.min.js).
Custom Image Processing Algorithms: To bridge the gap between a high-resolution 1080p webcam and a tiny 28x28 neural network input, two custom algorithms were engineered in JavaScript:
- Smooth Multi-Step Downscaling: Instead of shrinking the 140x140 CSS target box directly to 28x28 (which causes browsers to drop pixels and erases thin pen lines), the Canvas API dynamically funnels the image through intermediate states (112x112 → 56x56 → 28x28). This preserves the continuous stroke of a 1px ballpoint pen.
- Local Adaptive Thresholding: Global contrast adjustments fail when shadows fall across paper. A custom algorithm was written to scan a 9x9 pixel grid (Radius: 4) around every single pixel to calculate the local average brightness of the paper. It strictly isolates ink that is mathematically darker than its immediate neighbors (using an 8-shade buffer), effectively destroying shadows and horizontal blue notebook lines, forcing the background to pure black (
0.0) and the ink to pure white (1.0).
To run the web application locally for testing:
- Clone this repository to your local machine.
- Ensure the quantized
digit_brain.onnxfile is located in thepage/subfolder alongside the web assets. - Open
index.htmlusing a local server environment (e.g., VS Code Live Server, Pythonhttp.server) to bypass browser cross-origin restrictions on local camera access.