# Day 4: Building a Neuron & MLP from Scratch

**Building LLMs from Scratch** — Following Andrej Karpathy's micrograd lectures.

---

## 1. Introduction

From autograd to neural networks: we now build **neurons** and **multi-layer perceptrons (MLPs)** on top of the Value class. A neuron is simply a weighted sum of inputs plus a bias, passed through a nonlinear activation (tanh). Stack neurons into layers, layers into an MLP — this is the architecture of every deep network.

## 2. The Value Class (Complete, from Day 3)

The full Value class with add, mul, pow, tanh, backward, and operator overloading. This is our autograd engine.

In [None]:
import math
import random

class Value:
    """A scalar value that tracks its computation graph for autograd."""
    
    def __init__(self, data, _children=(), _op=''):
        self.data = data
        self.grad = 0.0
        self._backward = lambda: None
        self._prev = set(_children)
        self._op = _op
    
    def __add__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data + other.data, (self, other), '+')
        def _backward():
            self.grad += out.grad
            other.grad += out.grad
        out._backward = _backward
        return out
    
    def __mul__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data * other.data, (self, other), '*')
        def _backward():
            self.grad += other.data * out.grad
            other.grad += self.data * out.grad
        out._backward = _backward
        return out
    
    def __pow__(self, other):
        assert isinstance(other, (int, float)), "only supporting int/float powers"
        out = Value(self.data ** other, (self,), f'**{other}')
        def _backward():
            self.grad += (other * self.data ** (other - 1)) * out.grad
        out._backward = _backward
        return out
    
    def __neg__(self):
        return self * -1
    
    def __sub__(self, other):
        return self + (-other)
    
    def __radd__(self, other):
        return self + other
    
    def __rmul__(self, other):
        return self * other
    
    def tanh(self):
        t = math.tanh(self.data)
        out = Value(t, (self,), 'tanh')
        def _backward():
            self.grad += (1 - t**2) * out.grad
        out._backward = _backward
        return out
    
    def backward(self):
        topo = []
        visited = set()
        def build_topo(v):
            if v not in visited:
                visited.add(v)
                for child in v._prev:
                    build_topo(child)
                topo.append(v)
        build_topo(self)
        self.grad = 1.0
        for v in reversed(topo):
            v._backward()
    
    def __repr__(self):
        return f"Value(data={self.data})"

## 3. The Neuron Class

A neuron has `nin` inputs. Each input gets a weight (random). There's one bias. The forward pass: weighted sum + bias, then tanh activation.

In [None]:
class Neuron:
    def __init__(self, nin):
        self.w = [Value(random.uniform(-1, 1)) for _ in range(nin)]
        self.b = Value(0)
    
    def __call__(self, x):
        act = sum((wi * xi for wi, xi in zip(self.w, x)), self.b)
        return act.tanh()
    
    def parameters(self):
        return self.w + [self.b]

## 4. The Layer Class

A layer is a list of neurons. `nin` inputs, `nout` outputs. Each neuron in the layer produces one output.

In [None]:
class Layer:
    def __init__(self, nin, nout):
        self.neurons = [Neuron(nin) for _ in range(nout)]
    
    def __call__(self, x):
        out = [n(x) for n in self.neurons]
        return out[0] if len(out) == 1 else out
    
    def parameters(self):
        return [p for n in self.neurons for p in n.parameters()]

## 5. The MLP Class

A multi-layer perceptron: list of layers. `nin` inputs, `nouts` is a list of layer sizes (e.g. `[4, 4, 1]` = two hidden layers of 4, one output).

In [None]:
class MLP:
    def __init__(self, nin, nouts):
        sz = [nin] + nouts
        self.layers = [Layer(sz[i], sz[i + 1]) for i in range(len(nouts))]
    
    def __call__(self, x):
        for layer in self.layers:
            x = layer(x)
        return x
    
    def parameters(self):
        return [p for layer in self.layers for p in layer.parameters()]

## 6. Building an MLP

Create `MLP(3, [4, 4, 1])`: 3 inputs → 4 hidden → 4 hidden → 1 output.

In [None]:
random.seed(42)
model = MLP(3, [4, 4, 1])

print("Architecture: MLP(3, [4, 4, 1])")
print(f"  Inputs: 3")
print(f"  Hidden 1: 4 neurons")
print(f"  Hidden 2: 4 neurons")
print(f"  Output: 1 neuron")
print(f"\nTotal parameters: {len(model.parameters())}")

## 7. Forward Pass

Pass sample input `[2.0, 3.0, -1.0]`. Inputs must be Value objects for the computation graph.

In [None]:
x = [Value(2.0), Value(3.0), Value(-1.0)]
out = model(x)

print(f"Input: [2.0, 3.0, -1.0]")
print(f"Output: {out}")
print(f"Output value (scalar): {out.data:.4f}")

## 8. Inspecting Parameters

All parameters are Value objects. After a backward pass, they would have gradients. Here we show their structure.

In [None]:
print("Parameters (first 10):")
for i, p in enumerate(model.parameters()[:10]):
    print(f"  {i}: {p}  (grad={p.grad})")

print(f"\n... and {len(model.parameters()) - 10} more")
print(f"\nAll parameters are Value objects: {all(isinstance(p, Value) for p in model.parameters())}")

In [None]:
# Optional: run backward to see gradients flow
out.backward()
print("After backward(), sample parameter gradients:")
for i, p in enumerate(model.parameters()[:5]):
    print(f"  param {i}: data={p.data:.4f}, grad={p.grad:.4f}")

---

**Blog:** [Day 4 — Neuron & MLP](https://omkarray.com/llm-day4.html)

**Prev:** [Day 3 — The Full Value Class](llm_day03_full_value.ipynb) · **Next:** [Day 5 — Training the MLP](llm_day05_training.ipynb)