<a href="https://colab.research.google.com/github/ShaunakSen/Deep-Learning/blob/master/Reverse_AD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Reverse Mode Auto Diff

```
# Program A
x = ?
y = ?
a = x * y
b = sin(x)
z = a + b
```

![](https://rufflewind.com/img/reverse-mode-automatic-differentiation-graph.png)

```
gz = ?
gb = gz
ga = gz
gy = x * ga
gx = y * ga + cos(x) * gb
```

Going back to the equations (R1), we see that if we substitute s=z, we would obtain the gradient in the last two equations. In the program, this is equivalent to setting gz = 1 since gz is just ∂s/∂z. We no longer need to run the program twice! This is reverse-mode automatic differentiation.

### A simple implementation in Python


One way is to parse the original program and then generate an adjoint program that calculates the derivatives. This is usually quite complicated to implement, and its difficulty varies significantly depending on the complexity of the host language. Nonetheless, this may be worthwhile if efficient is critical, as there are more opportunities to perform optimizations in this static approach.

A simpler way is to do this dynamically: construct a full graph that represents our original expression as as the program runs. The goal is to get something akin to the dependency graph we drew earlier:

The “roots” of the graph are the independent variables x and y, which could also be thought of as nullary operations. Constructing these nodes is a simple matter of creating an object on the heap:



In [0]:
import math

In [22]:
class Var:
  def __init__(self, value):
    self.value = value
    self.children = []
    
    self.grad_value = None
    
  def grad(self):
    if self.grad_value is None:
      self.grad_value = sum(weight * var.grad() for weight, var in self.children)
    return self.grad_value
    
    
  
  def __mul__(self, other):
    z = Var(self.value * other.value)
    self.children.append((other.value, z))
    other.children.append((self.value, z))
    return z
  
  def __add__(self, other):
    z = Var(self.value + other.value)
    self.children.append((1.0, z))
    other.children.append((1.0, z))
    return z
  
  def __truediv__(self,other):
    z = Var(self.value/other.value)
    self.children.append((1.0/other.value, z))
    other.children.append((-1.0*self.value*other.value**-2, z))
    return z
  
  def sin(self):
    z = Var(math.sin(self.value))
    self.children.append((math.cos(self.value), z))
    return z
  
  def cos(self):
    z = Var(math.cos(self.value))
    self.children.append((-math.sin(self.value), z))
    return z
  
  def __pow__(self, other):
    
    # Using (d/dx) b^x = b^x ln(b)
    # Using (d/dx) x^b = b*x^(b-1)
    
    z = Var(self.value**other.value)
    x = self.value
    y = other.value
    self.children.append((y*x**(y-1), z))
    other.children.append((x**y*math.log(x), z))
  
    return z
  
  def tan(self):
    return
  
x = Var(0.5)
y = Var(4.2)
  
a = x*y + x**y

a.grad_value = 1.0

print("∂a/∂y = {}".format(y.grad()))

∂a/∂y = 0.46228627071977624


In [21]:
x = 0.5
y = 4.2

x + (x**y)*math.log(x)

0.46228627071977624