# Build Micrograd (Part 2)

Video Resource: [YouTube - Micrograd Part 2](https://www.youtube.com/watch?v=VMj-3S1tku0)

Now in the first half of the video we built the basic structure of Micrograd and also understood the backpropagation process manually and with topological sorting.

In this second part, we will start with a clean slate so the code is not all messy and we just have the important parts for automatic differentiation.


In [None]:
import math
import random
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
class Value:
    
  def __init__(self, data, _children=(), _op='', label = ''):
    """
    Creating Initial Value
    Args:
      data: some numerical value
      grad: gradient of the value (derivative)
      _backward: function to propagate the gradient backward
      _children: a tuple of previous operations
      _op: symbol representing the operation (+, *, etc.)
      label: a string label for the value (name of the variable for visualization)
    """
    self.data = data
    self.grad = 0.0
    self._backward = lambda: None
    self._prev = set(_children)
    self._op = _op
    self.label = label

  def __repr__(self):
    return f"Value(data={self.data})"
  
  def __add__(self, other):
    other = other if isinstance(other, Value) else Value(other)
    out = Value(self.data + other.data, (self, other), '+')

    def _backward():
      self.grad += 1.0 * out.grad
      other.grad += 1.0 * out.grad
    out._backward = _backward

    return out

  def __mul__(self, other):
    other = other if isinstance(other, Value) else Value(other)
    out = Value(self.data * other.data, (self, other), '*')

    def _backward():
      self.grad += other.data * out.grad
      other.grad += self.data * out.grad
    out._backward = _backward

    return out
  
  def __pow__(self, other): # self ** other
    assert isinstance(other, (int, float)), "only supporting int/float powers for now"
    out = Value(self.data**other, (self,), f'**{other}')

    def _backward():
        self.grad += other * (self.data ** (other - 1)) * out.grad
    out._backward = _backward

    return out
  
  def __neg__(self): # -self
    return self * -1
  
  def __sub__(self, other): # self - other
    return self + (-other)

  # This is a fallback for __mul__, so if self * other fails we do other * self
  def __rmul__(self, other): # other * self
    return self * other

  def __truediv__(self, other): # self / other
    return self * other**-1

  def tanh(self):
    x = self.data
    t = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)

    out = Value(t, (self,), 'tanh')

    def _backward():
      self.grad += (1 - t ** 2) * out.grad
    out._backward = _backward

    return out
  
  def exp(self):
    x = self.data
    out = Value(math.exp(x), (self,), 'exp')

    def _backward():
      self.grad += out.data * out.grad
    out._backward = _backward

    return out

  def backward(self):
    # Build the topological order
    topo = []
    visited = set()

    def build_topo(node):
      if node not in visited:
        visited.add(node)
        for child in node._prev:
          build_topo(child)
        topo.append(node)

    build_topo(self)

    # Reverse the topological order
    self.grad = 1.0
    for node in reversed(topo):
      node._backward()

In [None]:
# This is some copy pasted code that will help visualize the operation chain and it will make sense why are we storing the previous values and operations.

from graphviz import Digraph

def trace(root):
  # builds a set of all nodes and edges in a graph
  nodes, edges = set(), set()
  def build(v):
    if v not in nodes:
      nodes.add(v)
      for child in v._prev:
        edges.add((child, v))
        build(child)
  build(root)
  return nodes, edges

def draw_dot(root):
  dot = Digraph(format='svg', graph_attr={'rankdir': 'LR'}) # LR = left to right
  
  nodes, edges = trace(root)
  for n in nodes:
    uid = str(id(n))
    # for any value in the graph, create a rectangular ('record') node for it
    dot.node(name = uid, label = "{ %s | data %.4f | grad %.4f }" % (n.label, n.data, n.grad), shape='record')
    if n._op:
      # if this value is a result of some operation, create an op node for it
      dot.node(name = uid + n._op, label = n._op)
      # and connect this node to it
      dot.edge(uid + n._op, uid)

  for n1, n2 in edges:
    # connect n1 to the op node of n2
    dot.edge(str(id(n1)), str(id(n2)) + n2._op)

  return dot

# HOW TO USE
# You can use the draw_dot function to visualize the computation graph of a Value object.
# Simply call draw_dot(d) where d is the Value object you want to visualize.


In [None]:
# Following is an equation for a simple neuron with two inputs
# inputs x1,x2
x1 = Value(2.0, label='x1')
x2 = Value(0.0, label='x2')
# weights w1,w2
w1 = Value(-3.0, label='w1')
w2 = Value(1.0, label='w2')
# bias of the neuron
b = Value(6.8813735870195432, label='b')

# x1*w1 + x2*w2 + b
x1w1 = x1*w1; x1w1.label = 'x1*w1'
x2w2 = x2*w2; x2w2.label = 'x2*w2'
x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'
n = x1w1x2w2 + b; n.label = 'n'
# apply the tanh activation function
o = n.tanh(); o.label = 'o'

o.backward()

In [None]:
draw_dot(o)

In [None]:
# Lets edit the old equation so we use different way to do tanh
# inputs x1,x2
x1 = Value(2.0, label='x1')
x2 = Value(0.0, label='x2')
# weights w1,w2
w1 = Value(-3.0, label='w1')
w2 = Value(1.0, label='w2')
# bias of the neuron
b = Value(6.8813735870195432, label='b')

# x1*w1 + x2*w2 + b
x1w1 = x1*w1; x1w1.label = 'x1*w1'
x2w2 = x2*w2; x2w2.label = 'x2*w2'
x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'
n = x1w1x2w2 + b; n.label = 'n'

# ------------
e = (2*n).exp(); e.label = 'e'
o = (e - 1) / (e + 1)
# ------------
o.label = 'o'

o.backward()

draw_dot(o)

---
## Using PyTorch

Now we will do that same thing we did but a proper library (pytorch). This was some small learning that was based on PyTorch. PyTorch is a powerful library for building and training neural networks, it is used in production by many companies and researchers.


In [None]:
# in PyTorch we have torch.Tensor instead of Value which we used.
import torch

In [None]:
x1 = torch.Tensor([2.0]).double()                ; x1.requires_grad = True
x2 = torch.Tensor([0.0]).double()                ; x2.requires_grad = True
w1 = torch.Tensor([-3.0]).double()               ; w1.requires_grad = True
w2 = torch.Tensor([1.0]).double()                ; w2.requires_grad = True
b = torch.Tensor([6.8813735870195432]).double()  ; b.requires_grad = True
n = x1*w1 + x2*w2 + b
o = torch.tanh(n)

print(o.data.item())
o.backward()

print('---')
print('x2', x2.grad.item())
print('w2', w2.grad.item())
print('x1', x1.grad.item())
print('w1', w1.grad.item())

In [None]:
# Let's work on creating a neuron using micrograd

class Neuron:
	def __init__(self, nin):
		"""
		Initialize the weights and bias for the neuron.
		
		Args:
		nin (int): The number of input connections to the neuron.
		"""
		self.w = [Value(random.uniform(-1,1)) for _ in range(nin)] # weights
		self.b = Value(random.uniform(-1,1)) # bias
        
	def __call__(self, x):
		"""
		This function computes the output of the neuron for a given input. i.e. y = f(wx + b)
		"""
		# we will calculate the activation. i.e. Σ(wi*xi) + b. Here zip will create pairs of (wi, xi) in tuples and we will iterate on that to do wi*xi and sum them up starting from self.b instead of 0.0 for efficiency
		activation = sum((wi*xi for wi, xi in zip(self.w, x)), self.b)

		# now on top of that we will have an activation function. here it will be tanh
		output = activation.tanh()

		return output
	

# now that we have created one single neuron using micrograd, we will create a layer
# a layer is bunch of neurons working together
class Layer:
	def __init__(self, nin, nout):
		"""
		Initialize the layer with a given number of input and output neurons.
		So one neuron takes n number of inputs (nin) and produces one output.
		So we will create n number of neurons (nout) in the layer to generate n number of outputs.

		Args:
		nin (int): The number of input connections for each neuron in the layer.
		nout (int): The number of neurons to create so we get n number of outputs.
		"""
		self.neurons = [Neuron(nin) for _ in range(nout)] # create nout neurons each with nin inputs

	def __call__(self, x):
		"""
		This function computes the output of the layer for a given input by passing the input through each neuron in the layer. That means all inputs are given to each neuron to generate list of outputs.

		Args:
		x (list of float): The input values to the layer.
		
		Returns:
		list of Value: The output values from each neuron in the layer.
		"""
		out = [n(x) for n in self.neurons] # pass the input x through each neuron

		return out
	

# Now when we have layer of Layers we call it a Multi-Layer Perceptron (MLP)
# A MLP is just mutliple layers stacked together to form a neural network
class MLP:
	def __init__(self, nin, nouts):
		"""
		Initialize the MLP with a given number of input connections and a list of output sizes for each layer.
		So we will create multiple layers where each layer has its own number of neurons.

		Args:
		nin (int): The number of input connection for the first layer.
		nouts (list of int): A list where each element represents the number of neurons in that layer.
		"""
		sizes = [nin] + nouts # sizes will be [input_size, layer1_size, layer2_size, ...]
		self.layers = [Layer(sizes[i], sizes[i+1]) for i in range(len(nouts))] # create layers based on sizes

	def __call__(self, x):
		"""
		This function computes the output of the MLP for a given input by passing the input through each layer in sequence.

		Args:
		x (list of float): The input values to the MLP. The length of x should match the input size of the first layer.

		Returns:
		list of Value: The output values from the final layer of the MLP.
		"""
		for layer in self.layers:
			x = layer(x) # pass the input through each layer sequentially
		return x

x = [2.0, 3.0] # input to the neuron
neuron = Neuron(len(x)) # create a neuron with inputs as the length of x
print(f"Output of neuron: {neuron(x)}")

layer = Layer(len(x), 3) # create a layer with 3 neurons each managing the same inputs
print(f"Output of layer: {layer(x)}")

mlp = MLP(len(x), [4,4,1]) # create a MLP with 2 hidden layers of 4 neurons each and 1 output neuron
print(f"Output of MLP: {mlp(x)}")


# Here what we create is a simple neural network where we have a list of values as input and we run those inputs through a MLP. MLP class generates multiple layers using the layer class and each layer creates multiple neurons using the Neuron class. The inputs travel in neurons from layer to layer until we get the final output. For example: MLP(len(x), [4,4,1]) means if we have input as [2.0, 3.0] then first input layer will be [2.0, 3.0] -> first hidden layer with 4 neurons each getting the same input and each neuron will produce 1 output i.e. 4 outputs in total -> second hidden layer with 4 neurons each getting the 4 outputs from previous layer as input and each neuron will produce 1 output i.e. 4 outputs in total -> final output layer with 1 neuron getting the 4 outputs from previous layer as input and producing 1 final output.

---
## Understanding the Code Flow for the above Implementation of Neuron, Layer, and MLP Classes as Forward Pass.

### 1. Neuron Class

A **Neuron** is the basic unit. It takes `nin` inputs and produces 1 output.

**Initialization (`__init__`):**
```
Neuron(2) creates:
├── self.w = [Value(random), Value(random)]  ← 2 weights (one per input)
└── self.b = Value(random)                    ← 1 bias
```

**Forward Pass (`__call__`):**
```
Input: x = [2.0, 3.0]

Step 1: zip(weights, inputs) → [(w0, 2.0), (w1, 3.0)]
Step 2: Compute weighted sum starting from bias
        → b + (w0 * 2.0) + (w1 * 3.0)
Step 3: Apply tanh activation
        → tanh(weighted_sum) → output ∈ [-1, 1]
```

---

### 2. Layer Class

A **Layer** is a collection of neurons. Each neuron receives the **same input** but has **different weights**.

**Initialization (`__init__`):**
```
Layer(2, 3) creates:
└── self.neurons = [Neuron(2), Neuron(2), Neuron(2)]  ← 3 neurons, each with 2 inputs
```

**Forward Pass (`__call__`):**
```
Input: x = [2.0, 3.0]

All neurons receive the SAME input:
├── Neuron 0: [2.0, 3.0] → w0·x + b0 → tanh → out0
├── Neuron 1: [2.0, 3.0] → w1·x + b1 → tanh → out1
└── Neuron 2: [2.0, 3.0] → w2·x + b2 → tanh → out2

Output: [out0, out1, out2]  ← 3 values (one per neuron)
```

---

### 3. MLP Class (Multi-Layer Perceptron)

An **MLP** stacks multiple layers. The output of one layer becomes the input to the next.

**Initialization (`__init__`):**
```
MLP(2, [4, 4, 1]) creates:

sizes = [2, 4, 4, 1]  ← [input_size, layer1, layer2, layer3]

self.layers:
├── Layer(2, 4)  ← 4 neurons, each takes 2 inputs  → outputs 4 values
├── Layer(4, 4)  ← 4 neurons, each takes 4 inputs  → outputs 4 values
└── Layer(4, 1)  ← 1 neuron,  takes 4 inputs       → outputs 1 value
```

**Forward Pass (`__call__`):**
```
Input: x = [2.0, 3.0]

    [2.0, 3.0]         ← 2 inputs
         │
         ▼
    ┌─────────┐
    │ Layer 0 │        4 neurons × 2 weights each
    └─────────┘
         │
    [v, v, v, v]       ← 4 values
         │
         ▼
    ┌─────────┐
    │ Layer 1 │        4 neurons × 4 weights each
    └─────────┘
         │
    [v, v, v, v]       ← 4 values
         │
         ▼
    ┌─────────┐
    │ Layer 2 │        1 neuron × 4 weights
    └─────────┘
         │
       [v]             ← 1 final output
```

---

### Key Takeaways

| Component | Input | Output | Purpose |
|-----------|-------|--------|---------|
| **Neuron** | n values | 1 value | Weighted sum + activation |
| **Layer** | n values | m values | Multiple neurons in parallel |
| **MLP** | n values | k values | Layers connected in sequence |

The beauty of using `Value` objects is that every operation builds a **computation graph**. When you call `.backward()` on the final output, gradients automatically propagate back through the entire network!