A python implementation of autodif to show how it works. Based on [this lecture by Alan Edelman.](https://www.youtube.com/watch?v=rZS2LGiurKY&list=PLUl4u3cNGP63oMNUHXqIUcrkS2PivhN3k&index=36)
Autodifferentiation can be represented simply by replacing simple numbers with Tuple objects which track both:
* the number $y$
* its derivative with respect to $x$, $\frac {dy} {dx}$

Then the derivative propogates *forward* through the calculation just as $y$ does. This representation also shows the usefulness of graph representations of the calculation in machine learning.


The below class represents the number with the appended derivative. The first element of the tuple is $y$ and the second is $\frac {dy} {dx}$. `__mul__`, `__add__`, and `__truediv__` override python `*`, `+`, and `/`. The reason we can create the resulting derivative is because it will be a function of the two operands and their derivatives for all three operations.


Notice that we generate a new TupleNum with each calculation, rather than for example updating the values of the existing object. This is equivalent to creating a new node on the graph representation. 

In [3]:
class TupleNum():
    
    def __init__(self, num):
        self.num = num
    
    def __mul__(self, other):
        x = self.num[0]*other.num[0]
        x_ = self.num[1]*other.num[0] + self.num[0]*other.num[1]
        return TupleNum((x, x_))
    
    def __add__(self, other):
        x = self.num[0] + other.num[0]
        x_ = self.num[1] + other.num[1]
        return TupleNum((x, x_))
    
    def __truediv__(self, other):
        x = self.num[0]/other.num[0]
        x_ = (other.num[0]*self.num[1] - self.num[0]*other.num[1])/(other.num[0])**2
        return TupleNum((x, x_))

Here's where we see the usefulness of autodifferentiation. Below I implement the [Babylonian square root algorithm](https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method). How it works isn't important, except that it only utilizes the three operations we created: `*`, `+`, and `/`. Notice how the function looks basically the same as it would if we only wanted $y$, but we also get $\frac {dy} {dx}$, and we get this without ever using the power rule! As we numerically approximate $\sqrt x$ we also numerically approximate $\frac {d} {dx} \sqrt x$ 
(with 10 iterations it gets the $y$ and $\frac {dy} {dx}$ to the precision of python floats)

In [4]:
def babylonian_root(x, iters=10):
    S = x
    half = TupleNum((.5, 0))
    y = half * x
    for i in range(iters):
        y = half*(y + S / y)
        # print(y.num[0])
        # print(y.num[1])
    return y

`x` is the starting node of the calculation. It's derivative is set to 1, because the derivative of interest, $\frac {dx} {dx} = 1$. For constants, such as the half used in the Babylonian root algorithm, it's derivative is set to 0. 

In [5]:
x = TupleNum((10, 1))
y = babylonian_root(x)

print("Babylonian approximation: ")
print("y: " + str(y.num[0]))
print("dy/dx: " + str(y.num[1]))

print("True values: ")
print("y: " + str(10**.5))
print("dy/dx: " + str(.5*10**-.5))

Babylonian approximation: 
y: 3.162277660168379
dy/dx: 0.15811388300841897
True values: 
y: 3.1622776601683795
dy/dx: 0.15811388300841897


As a post-note, I should add that, while this is great, forward propagation of derivatives wouldn't be feasible for deep learning. For deep learning we need the derivatives of many, many elements. We couldn't use tuples of just two elements, but each tuple would have to have $y$ as well as its derivative with respect to each element. For thousands and thousands of elements, that's not feasible. 