NAND2LLM is a from-first-principles computer science and machine learning project inspired by Nand2Tetris.
Note: NAND2LLM is an independent educational project inspired by the pedagogical style of Nand2Tetris. It is not affiliated with, endorsed by, or sponsored by Nand2Tetris, Noam Nisan, Shimon Schocken, MIT Press, or Coursera.
Where Nand2Tetris starts with a NAND gate and builds toward a working computer and a playable version of tetris, NAND2LLM starts with the core loop underneath modern learning systems and builds toward a tiny, inspectable language model and assistant.
This loop is the heartbeat of the project:
prediction -> loss -> gradient -> update -> better prediction
The goal is not to train a competitive model but instead to demystify large language models and make them feel understandable, buildable, and testable.
By the end, the reader should have built enough of the technology stack to understand how a small language model works from the inside out having covered the following concepts:
- Arrays
- Tensors
- Autograd
- Neural networks
- Tokenization
- Embeddings
- Attention
- Transformers
- Training loops
- Decoding
- Evaluation
- Retrieval
- Tool use
- Serving
- Scaling pressure
- GPU optimization
The final artifact of this project is TinyTutor: a small assistant trained, tuned, and retrieval-augmented over the NAND2LLM material itself. The reader will ultimately build the model, teach the model what was built, and ask the model to explain the build.
This project is in its early stages. Scaffolding of the project and the curriculum design is expected to change.
The repository is intended to grow into:
- A GitHub learning project
- A Book
- An interactive course
For now, the priority is to make the project executable, testable, and easy to contribute to.
The reader should implement the core machinery before leaning on established libraries or frameworks. The NAND2LLM curriculum is intentionally conservative about its dependencies.
The project should build, at minimum:
- Arrays, tensors, and shape-aware operations
- Scalar and tensor automatic differentiation
- Neural network layers
- Optimizers
- Tokenizers
- Embeddings
- Attention
- Transformer blocks
- Training loops
- Sampling and decoding
- Evaluation harnesses
- Retrieval components
- Tool-calling abstractions
- Basic inference serving
Libraries for tests, CLIs, serialization, plotting, documentation, and comparison against mature frameworks are acceptable but the central learning machinery should be derived from the reader.
Each abstraction should solve a concrete problem introduced by the previous layer. For example:
- When vectors and matrices become too limiting, introduce tensors.
- When hand-derived gradients become tedious and error-prone, introduce autograd.
- When fixed-size context representations become insufficient, introduce attention and transformers.
The project assumes basic algebra skills to start and should build the necessary mathematical skills incrementally:
- Scalars, vectors, and matrices
- Functions and composition
- Rates of change
- Derivatives and gradients
- Probability
- Entropy and cross-entropy
- Optimization
- Dot products and similarity
- Attention as learned routing over information
NAND2LLM should introduce the math when it becomes useful, make it concrete in code, then give the formal version after the mathematical intuition has landed.
NAND2LLM is a dual-track project:
- The Python Track teaches mathematical, modeling, and training ideas as clearly as possible.
- The Rust Track teaches the systems substrate:
- memory layout
- performance
- error handling
- explicit APIs
- deployment-oriented implementation
A Go Track may appear later as an optional comparison point for approachable service implementation.
Since scaling is fundamental to modern language models, GPU programming is a topic that can't be ignored in the NAND2LLM material.
The main path should not require GPU expertise up front. Instead, the project should focus on understandable CPU implementations, and then use those implementations to motivate:
- Why matrix multiplication dominate neural network workloads
- Why batching improves throughput
- Why data layout matters
- Why memory bandwidth matters
- Why parallel execution matters
- Why GPUs are effective
- Why mature tensor libraries and compilers exist
The NAND2LLM project should evolve in small, testable increments. A good contribution should include some combination of:
- Lesson material
- Implementation
- Exercises
- Tests
- Diagrams
- Experiments
- Failure cases
- Lead up to the next abstraction
- Introduction - An outline of the project and development environment
- Numbers, bits, and arrays - The minimal array substrate
- The Dot Product Machine - A linear model
- Measuring Wrongness - Loss functions
- Learning by Descent - Gradient descent over simple models
- Scalar Autograd - A tiny computational graph engine
- Tensor Autograd - Shape-aware differential operations
- Neural Networks - MLP layers, activations, batching
- Optimization - SGD, momentum, Adam, clipping, schedules
- Text, corpora, and datasets - Dataset pipeline
- Tokenization - Character, byte, and simple BPE tokenizers
- Embeddings - Token and positional embeddings
- Next Token Prediction - Crude text generator
- Attention - Scaled dot-product attention
- Multi-head self-attention - Reusable self-attention layer
- Transformer block - Residual, norm, feed-forward, dropout
- TinyGPT - Small autoregressive language model
- Training Runs as Experiments - Configs, metrics, checkpoints
- Sampling and Decoding - Greedy, temperature, top-k, top-p
- Evaluation - Perplexity, validation loss, prompt regressions
- Model Introspection - Embeddings, logits, attention, activations
- Instruction Tuning - Simple instruction-following model
- Conversation and Context Windows - Tiny chat interface
- Retrieval-augmented generation - Local document assistant
- Tools and Function Calling - Structured Action Selection
- Safety, Alignment, and Boundaries - Policy-aware assistant shell
- Serving the Model - Local inference server and UI