# Micrograd and Building an MLP from Scratch
In this notebook, we implement a minimal automatic differentiation engine (micrograd) and build a simple multilayer perceptron (MLP) for demonstration. We'll define a `Value` class for autograd, and then build Neuron, Layer, and MLP classes to create a small neural network that can be trained on sample data.


## 1. Import Libraries and Setup

In [80]:
import math
import random
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## 2. Defining the `Value` Class
The `Value` class is the core of our autograd engine. It encapsulates a scalar value, its gradient, and pointers to its predecessor nodes (forming a computation graph).  
- **Attributes:**
  - `data`: the scalar numerical value.
  - `grad`: gradient (initialized to 0.0) that will be computed during backpropagation.
  - `_backward`: a function placeholder for gradient propagation.
  - `_prev`: set of previous Value objects that contributed to this value.
  - `_op`: a string indicating the operation that produced this node (useful for debugging).
  - `label`: an optional label for visualization or debugging.


In [83]:
class Value:
  
  def __init__(self, data, _children=(), _op='', label=''):
    self.data = data
    self.grad = 0.0
    self._backward = lambda: None
    self._prev = set(_children)
    self._op = _op
    self.label = label

  def __repr__(self):
    return f"Value(data={self.data})"

    #Overloading arithmetic operations
  
  def __add__(self, other):
    other = other if isinstance(other, Value) else Value(other)
    out = Value(self.data + other.data, (self, other), '+')
    
    def _backward():
      self.grad += 1.0 * out.grad
      other.grad += 1.0 * out.grad
    out._backward = _backward
    
    return out

  def __mul__(self, other):
    other = other if isinstance(other, Value) else Value(other)
    out = Value(self.data * other.data, (self, other), '*')
    
    def _backward():
      self.grad += other.data * out.grad
      other.grad += self.data * out.grad
    out._backward = _backward
      
    return out
  
  def __pow__(self, other):
    assert isinstance(other, (int, float)), "only supporting int/float powers for now"
    out = Value(self.data**other, (self,), f'**{other}')

    def _backward():
        self.grad += other * (self.data ** (other - 1)) * out.grad
    out._backward = _backward

    return out
  
  def __rmul__(self, other): # other * self
    return self * other

  def __truediv__(self, other): # self / other
    return self * other**-1

  def __neg__(self): # -self
    return self * -1

  def __sub__(self, other): # self - other
    return self + (-other)

  def __radd__(self, other): # other + self
    return self + other

    #defining the activation function (tanh)

  def tanh(self):
    x = self.data
    t = (math.exp(2*x) - 1)/(math.exp(2*x) + 1)
    out = Value(t, (self, ), 'tanh')
    
    def _backward():
      self.grad += (1 - t**2) * out.grad
    out._backward = _backward
    
    return out
  
  def exp(self):
    x = self.data
    out = Value(math.exp(x), (self, ), 'exp')
    
    def _backward():
      self.grad += out.data * out.grad # NOTE: in the video I incorrectly used = instead of +=. Fixed here.
    out._backward = _backward
    
    return out
  
  
  def backward(self):

    #backpropagation implementation
      
    topo = []
    visited = set()
    def build_topo(v):
      if v not in visited:
        visited.add(v)
        for child in v._prev:
          build_topo(child)
        topo.append(v)
    build_topo(self)
    
    self.grad = 1.0
    for node in reversed(topo):
      node._backward()

## 3. Defining the `Neuron` Class
A neuron represents a basic unit of a neural network. It:
- Holds weights for each input and a bias.
- Computes the weighted sum of inputs.
- Applies an activation function (here, tanh).


In [86]:
class Neuron:
    def __init__(self, nin):
        self.w = [Value(random.uniform(-1,1)) for _ in range(nin)]
        self.b = Value(random.uniform(-1,1))

    def __call__(self,x):
        # w * x + b
        act = sum((wi*xi for wi, xi in zip(self.w,x)), self.b)
        out = act.tanh()
        return out

    def parameters(self):
        return self.w + [self.b]

## 4. Building a Layer and a Multilayer Perceptron (MLP)
We organize neurons into layers, and stack multiple layers to build an MLP.
- **Layer:** Contains a list of neurons.
- **MLP:** A sequence of layers.


In [89]:
class Layer:
    def __init__(self,nin,nout):
        self.neurons = [Neuron(nin) for _ in range(nout)]

    def __call__(self, x):
        outs = [n(x) for n in self.neurons]
        return outs[0] if len(outs) == 1 else outs

    def parameters(self):
        return [p for neuron in self.neurons for p in neuron.parameters()]

class MLP:
    def __init__(self, nin, nouts):
        sz = [nin] + nouts
        self.layers = [Layer(sz[i], sz[i+1]) for i in range(len(nouts))]

    def __call__(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

    def parameters(self):
        return [p for layer in self.layers for p in layer.parameters()]

## 5. Testing the MLP with Sample Data
We create an instance of our MLP, pass some input through it, and then train the network on a simple dataset.


In [118]:
x = [2.0, 3.0, -1.0] # inputs 
n = MLP(3, [4,4,1]) # defining the MLP as 3 inputs, 2 hidden layers of 4 neurons each and 1 output

### Here we give inputs as xs and also define our desirable targets as ys for training.

In [94]:
xs = [
    [2.0, 3.0, -1.0],
    [3.0, -1.0, 0.5],
    [0.5, 1.0, 1.0],
    [1.0, 1.0, -1.0],
]
ys = [1.0, -1.0, -1.0, 1.0]
ypreds = [n(x) for x in xs]

### Doing the forward pass, backward pass and updation of weights and calculating the loss

In [108]:
for k in range(20):
    # forward pass
    ypreds = [n(x) for x in xs]
    loss = sum([(yout - ygt)**2 for ygt, yout in zip(ys, ypreds)])

    #backward pass
    for p in n.parameters():
        p.grad = 0.0
    loss.backward()

    #update
    for p in n.parameters():
        p.data += -0.05 * p.grad

    print(k, loss.data)

0 0.005904483588430149
1 0.005815143652664447
2 0.005728426464753935
3 0.0056442187653247735
4 0.005562413713252813
5 0.005482910437438475
6 0.005405613625650509
7 0.005330433146906642
8 0.005257283704239599
9 0.005186084515030364
10 0.005116759016387317
11 0.005049234593308567
12 0.004983442327597307
13 0.00491931676570306
14 0.004856795703844502
15 0.004795819988930248
16 0.0047363333339383
17 0.004678282146542857
18 0.004621615369892357
19 0.004566284334544605


### checking if our predictions are close to the target values

In [110]:
ypreds

[Value(data=0.9698178288215796),
 Value(data=-0.9761187512832474),
 Value(data=-0.9679841747247905),
 Value(data=0.9546128457691276)]

## We can see that by iterating through the process of forward pass, backward pass and updating weight a few times, the prediction values keeps coming close to the target values, i.e. of 1.0, -1.0, -1.0, 1.0.
## The weights keep changing and by backpropagating in each iteration the weights adjust themselves to understand the neural network and get the desired output.