# Day 1: The Value Class & Computation Graphs

**Building LLMs from Scratch** — Following Andrej Karpathy's micrograd/makemore/nanoGPT lectures.

---

## 1. Introduction

Before we can train neural networks, we need **automatic differentiation** — the ability to compute gradients automatically. The foundation of autograd is a simple **Value class** that:

1. Tracks scalar values and their gradients
2. Records how each value was computed (the operation and its inputs)
3. Builds a **computation graph** that we can traverse backward to compute gradients

In this notebook, we'll implement this Value class from scratch and explore how computation graphs are built.

## 2. The Value Class

Our Value class wraps a scalar and tracks:
- `data`: the numeric value
- `grad`: the gradient (will be used in backprop)
- `_prev`: the set of Value objects that produced this one (parents in the graph)
- `_op`: the operation that created this value (e.g. `'+'`, `'*'`)

We overload `__add__` and `__mul__` so that `a + b` and `a * b` return new Value objects that record their lineage.

In [None]:
class Value:
    """A scalar value that tracks its computation graph for autograd."""
    
    def __init__(self, data, _children=(), _op=''):
        self.data = data
        self.grad = 0.0
        self._prev = set(_children)
        self._op = _op
    
    def __add__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data + other.data, (self, other), '+')
        return out
    
    def __mul__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data * other.data, (self, other), '*')
        return out
    
    def __repr__(self):
        return f"Value(data={self.data})"

In [None]:
# Quick sanity check
x = Value(2.0)
y = Value(3.0)
z = x + y
print(f"z = {z}")
print(f"z.data = {z.data}")
print(f"z._prev = {z._prev}")
print(f"z._op = {z._op}")

## 3. Building Computation Graphs

Let's build a simple expression: $e = a \cdot b + c$ where $a=2$, $b=-3$, $c=10$.

The computation graph will be:
- $d = a \cdot b$
- $e = d + c$

Each intermediate Value stores its parents in `_prev` and the operation in `_op`.

In [None]:
a = Value(2.0)
b = Value(-3.0)
c = Value(10.0)

d = a * b
e = d + c

print(f"a = {a}")
print(f"b = {b}")
print(f"c = {c}")
print(f"d = a * b = {d}")
print(f"e = d + c = {e}")
print()
print("Inspecting d:")
print(f"  d._prev = {d._prev}")
print(f"  d._op = '{d._op}'")
print()
print("Inspecting e:")
print(f"  e._prev = {e._prev}")
print(f"  e._op = '{e._op}'")

## 4. Visualizing the Graph

We can draw the computation graph using **graphviz**. If graphviz isn't installed, we fall back to a simple text representation.

In [None]:
def draw_dot(root):
    """Build a graphviz representation of the computation graph."""
    from graphviz import Digraph
    
    def build_graph(node, visited, graph):
        if node in visited:
            return
        visited.add(node)
        node_id = str(id(node))
        graph.node(node_id, f"{node.data:.2f} | grad={node.grad:.2f}", shape='record')
        for child in node._prev:
            build_graph(child, visited, graph)
            graph.edge(str(id(child)), node_id, label=node._op)
    
    graph = Digraph(format='png', graph_attr={'rankdir': 'LR'})
    build_graph(root, set(), graph)
    return graph


def draw_dot_text(root):
    """Fallback: print a simple text representation of the graph."""
    from collections import deque
    
    def collect_nodes(node, visited):
        if node in visited:
            return []
        visited.add(node)
        nodes = [node]
        for child in node._prev:
            nodes.extend(collect_nodes(child, visited))
        return nodes
    
    nodes = collect_nodes(root, set())
    print("Computation graph (text):")
    for n in nodes:
        parents = [f"{p.data:.2f}" for p in n._prev]
        op_str = f" {n._op} ".join(parents) if n._op else ""
        print(f"  {n.data:.2f} <- {op_str or '(leaf)'}")

In [None]:
try:
    graph = draw_dot(e)
    from IPython.display import Image, display
    display(Image(graph.pipe(format='png')))
except ImportError:
    print("graphviz not installed. Using text fallback:")
    draw_dot_text(e)

## 5. Exploring the Graph Structure

We can walk through the graph by following `_prev` sets. Each node points to its parents.

A topological sort (or reverse topological order) gives us the order to traverse nodes during backpropagation — we process children before their parents.

In [None]:
def build_topo(node, visited, topo):
    """Build topological order (children before parents): DFS post-order."""
    if node in visited:
        return
    visited.add(node)
    for child in node._prev:
        build_topo(child, visited, topo)
    topo.append(node)


topo = []
build_topo(e, set(), topo)

print("Topological order (for backprop):")
for i, node in enumerate(topo):
    parents_str = ", ".join(f"{p.data:.2f}" for p in node._prev)
    print(f"  {i}: {node.data:.2f}  <- {node._op or 'leaf'}({parents_str})")

## 6. Exercises

1. **Extend Value with subtraction** — Implement `__sub__` so that `a - b` works. Hint: `a - b` is `a + (-b)`.

2. **Add `__neg__`** — Implement `__neg__` so that `-a` returns a new Value. This helps with subtraction.

3. **Try `a + a` and inspect the graph** — What happens when both operands are the same object? How many nodes does the graph have? Does `_prev` contain one or two references?

4. **Optional:** Implement `__radd__` and `__rmul__` so that `2 + a` and `3 * a` work when the left operand is a plain Python number.

---

**Next:** [Day 2 — The Backward Pass](llm_day02_backward_pass.ipynb)

**Blog:** [omkarray.com — LLM Day 1](https://omkarray.com/llm-day1.html)