This repository is a machine learning framework for training Multi-Layer Perceptrons, built from scratch using C++20.
The goal is to explore the systems-level design of automatic differentiation engines, with an emphasis on memory layout, data ownership & performance.
No external ML, autodiff or tensor libraries are used.
The implementation emphasizes:
- aligned, contiguous memory layouts to improve cache locality
- SIMD (AVX2) intrinsics for compute-heavy kernels
- explicit ownership and lifetime management of tensors
- engine::Tensor owns aligned, contiguous storage for numerical data
- Backpropagation is performed using reverse-mode automatic differentiation over a thread-local operation tape rather than an explicit computation graph
Two reference training runs are provided.
cmake -S . -B build-release -DCMAKE_BUILD_TYPE=Release
cmake --build build-release
./build-release/cvg
./build-release/mnist
The first serves as a functional and performance sanity-check for the autodiff engine. The second trains a 2-layer MLP on MNIST.
Performance analysis was conducted using the Intel® VTune™ Profiler
With single threaded execution,
- Physical core utilization: 94.6%
- CPI Rate: 0.697
Results & Conclusions are stored under docs/PERFORMANCE.md
v1 (this):
- Uses std::shared_ptr for managing lifetimes of engine::Tensor
- Simplifies correctness at the cost of atomic reference-count updates
v2 (next):
- Arena-based allocation for engine::Tensor objects
- Elimination of False sharing; due to reference-count contention under parallel execution
- Conceptual inspiration from micrograd.
- The C++ Programming Language by Bjarne Stroustrup (4th Edition)
- all references to page numbers (such as p. xxx), are to this book
- Simple C++ reader for MNIST dataset
- xoshiro256+
- Intel Intrinsics Guide