# Karpathy's Introduction to Backpropagation in Neural Networks

I really want to cement my understanding of backpropagation in neural networks after going through Manning's [Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 3 - Backprop and Neural Networks](https://www.youtube.com/watch?v=X0Jw4kgaFlg). 

From Karpathy's [The spelled-out intro to neural networks and backpropagation: building micrograd](https://www.youtube.com/watch?v=VMj-3S1tku0) on YouTube.

NOTE: this will require the following libraries and/or packages...
- `graphviz` package for Linux, install with `sudo apt-get install graphviz`
- [`graphviz`](https://graphviz.readthedocs.io/en/stable/) - Python wrapper to `graphviz` package itself

In [None]:
import math
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Intuition: Derivatives

#### Simple example: a function with a single input

Consider the simple function $f(x) = 3x^{2} - 4x + 5$.

In [None]:
# simple example: 1 input; a quadratic
def f(x):
    return 3*x**2 - 4*x + 5

In [None]:
f(3.0)

What does that look like?

In [None]:
xs = np.arange(-5, 5, 0.25)
ys = f(xs)
plt.plot(xs, ys)

In [None]:
h = 0.0000001

# x = 3
# x = -3
x = 2/3

(f(x + h) - f(x)) / h

#### A more complex example: 3 inputs

Now see what happens to the slope of the function $f(x) = a \times b + c$ when you increase the inputs individually by a small amount $h$.

In [None]:
# a more complex example: 3 inputs!
a = 2.0
b = -3.0
c = 10.0
d = a*b + c
print(d)

In [None]:
h = 0.0001

# inputs, fixed
a = 2.0
b = -3.0
c = 10.0

d1 = a*b + c

# a += h
# b += h
c += h

d2 = a*b + c

print('d1: ', d1)
print('d2: ', d2)
print('slope: ', (d2 - d1) / h)

----

In [None]:
class Value:
    
    def __init__(self, data, _children=(), _op='', label=''):
        self.data = data
        self._prev = set(_children)
        self._op = _op
        self.label = label
        
    def __repr__(self):
        return f"Value(data={self.data})"

    def __add__(self, other):
        """ + operator responder! """
        out = Value(self.data + other.data, (self, other), '+')
        return out

    def __mul__(self, other):
        """ * operator responder! """
        out = Value(self.data * other.data, (self, other), '*')
        return out

In [None]:
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10.0, label='c')

e = a*b; e.label = 'e'

d = e + c; d.label = 'd'

f = Value(-2.0, label='f')

L = d * f; L.label = 'L'

print(L)

In [None]:
from graphviz import Digraph

def trace(root):
    # builds a set of all nodes and edges in a graph
    nodes, edges = set(), set()
    def build(v):
        if v not in nodes:
            nodes.add(v)
            for child in v._prev:
                edges.add((child, v))
                build(child)
    build(root)
    return nodes, edges

def draw_dot(root):
    dot = Digraph(
        format='svg',
        graph_attr={'rankdir': 'LR'}
    )
    
    nodes, edges = trace(root)
    for n in nodes:
        uid = str(id(n))
        # for any value in the graph, create a rectangular {'record'} node for it
        dot.node(name = uid, label = "{ %s | data %.4f }" % (n.label, n.data, ), shape='record') 
        if n._op:
            # if this value is a result of some operation, create an op node for it
            dot.node(name=uid + n._op, label = n._op)
            # and connect this node to it!
            dot.edge(uid + n._op, uid)
            
    for n1, n2 in edges:
        # connecty n1 to the op node of n2
        dot.edge(str(id(n1)), str(id(n2)) + n2._op)
    
    return dot

In [None]:
draw_dot(L)