<a href="https://colab.research.google.com/github/ShaunakSen/Deep-Learning/blob/master/2_2_ReverseAD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part 3: Reverse Mode Automatic Differentiation

Dynamic Reverse mode AD can be implemented by declaring a class to represent a value and the child expressions that the value depends on. We've provided the implementation that was shown in the lecture slides as a basis below, but it's missing some parts that will make it useful.

__Tasks:__

- Addition (`__add__`) is incomplete - can you finish it? 
- Can you also implement division (`__truediv__`), subtraction (`__sub__`) and power (`__pow__`)?

In [0]:
import math

class Var:
    def __init__(self, value):
        self.value = value
        self.children = []
        self.grad_value = None

    def grad(self):
        if self.grad_value is None:
            self.grad_value = sum(weight * var.grad()
                                  for weight, var in self.children)
        return self.grad_value
    
    def __str__(self):
        return str(self.value)

    def __mul__(self, other):
        z = Var(self.value * other.value)
        self.children.append((other.value, z))
        other.children.append((self.value, z))
        return z

    def __add__(self, other):
      z = Var(self.value + other.value)
      self.children.append((1.0, z))
      other.children.append((1.0, z))
      return z
    
    def __sub__(self, other):
      z = Var(self.value - other.value)
      self.children.append((1.0, z))
      other.children.append((-1.0, z))
      return z
  
    def __truediv__(self,other):
      z = Var(self.value/other.value)
      self.children.append((1.0/other.value, z))
      other.children.append((-1.0*self.value*other.value**-2, z))
      return z
    
    def __pow__(self, other):
    
      # Using (d/dx) b^x = b^x ln(b)
      # Using (d/dx) x^b = b*x^(b-1)

      z = Var(self.value**other.value)
      x = self.value
      y = other.value
      self.children.append((y*x**(y-1), z))
      other.children.append((x**y*math.log(x), z))

      return z

    


In [2]:
x = 0.5
y = 4.2

x - (x**y)*math.log(x)

0.5377137292802238

In [3]:
# Tests

Var(1) + Var(1) / Var(1) - Var(1)**Var(1)


<__main__.Var at 0x7fa97b6a6588>

## Implementing math functions

Just like when we were looking at Forward Mode AD, we also need to implement some core math functions. Here's the sine function for a `Var`:

In [0]:
def sin(x):
    z = Var(math.sin(x.value))
    x.children.append((math.cos(x.value), z))
    return z

__Task:__ can you implement the _cosine_ (`cos`), _tangent_ (`tan`), and _exponential_ (`exp`) functions in the code block below?

In [0]:
# TODO: implement additional math functions on dual numbers

def cos(x):
    z = Var(math.cos(x.value))
    x.children.append((-math.sin(x.value), z))
    return z
    

def tan(x):
    # YOUR CODE HERE
    z = Var(math.tan(x.value))
    sec_x = 1.0/math.cos(x.value)
    x.children.append((sec_x**2, z))
    return z

def exp(x):
    # YOUR CODE HERE
    z = Var(math.exp(x.value))
    x.children.append((math.exp(x.value), z))
    return z

In [0]:
# Tests
assert cos(Var(0)).value == 1
assert tan(Var(0)).value == 0
assert exp(Var(0)).value == 1


## Time to try it out

We're now in a position to try our implementation.

__Tasks:__ 

- Try running the following code to compute the value of the function $z=x\cdot y+sin(x)$ given $x=0.5$ and $y=4.2$, together with the derivative $\partial z/\partial x$ at that point. 
- Verify that the result is correct by hand-differentiating the function.

In [8]:
x = Var(0.5)
y = Var(4.2)
z = x * y + sin(x)
print('z:', z)

z.grad_value = 1.0 #Note that we have to 'seed' the gradient of z to 1 (e.g. ∂z/∂z=1) before computing grads
print('∂z/∂x:',x.grad())

z: 2.579425538604203
∂z/∂x: 5.077582561890373




**Verification of the value of z**

The value of $z=x\cdot y+sin(x)$ given $x=0.5$ and $y=4.2$:

$z = 0.5\cdot4.2 + sin(0.5) = 2.57942554$

Thus this result is verified

**Verification of the derivative if z wrt x**


$z=x\cdot y+sin(x)$

$\partial z/\partial x = y + cos(x)$

Substituting  x = 0.5 and y = 4.2:

$\partial z/\partial x = 0.877582562 + 4.2 = 5.07758256$

Thus, the result is verified

__Task:__ Now use the code block below to compute the derivative $\partial z/\partial y$ of the above expression (at the same point $x=0.5, y=4.2$ as above). Store the resultant gradient in the variable `dzdy`. Verify by hand that the result is correct.

In [9]:
# YOUR CODE HERE

dzdy = y.grad()

print('∂z/∂y:', dzdy)

∂z/∂y: 0.5


In [0]:
assert dzdy


**Verification of the derivative of z wrt y**


$z=x\cdot y+sin(x)$

$\partial z/\partial y = x$

Substituting  x = 0.5 

$\partial z/\partial y =0.5$

Thus, the result is verified

## Differentiating Algorithms

Now, let's look at doing something wacky: differentiate an algorithm. For this example, we'll use an algorithm that is in a sense static (in this particular case the upper limit of the for loop is predetermined). However, it is not difficult to see that AD is much more general, and could even be applied to stochastic algorithms (say if we replaced the upper limit of the loop below with `Math.floor(Math.random() * 10)` for example).

__Task:__ Consider the following algorithm and in the box below it manually compute the value of $z$ and the gradient $\partial z/\partial x$ at the end of execution.

In [11]:
x = Var(0.5)
z = Var(1)
for i in range(0,2):
    z = (z + Var(i)) * x * x
    
print(z)


0.3125
1.5


### Verifying the value of z

After 1st iteration:

$z := (z   + i)*x^{2}$

The initial value of z is 1 and i = 0. So,

The value of z after 1st iteration is: $1*0.5*0.5 = 0.25$

After 2nd iterarion (i=1):

$z := (z   + 1)*x^{2}$

The value of z after 2nd iteration is: $(0.25+1)*0.5*0.5= 0.3125$

Thus the value of z is verified

### The value of derivative of z wrt x:

After 1st iteration:

$z := (z   + 0)*x^{2}$

The initial value of z is 1. So,

The expression of z after 1st iteration is: $z := (1   + 0)*x^{2} = x^{2}$

After 2nd iteration:

$z := (z   + i)*x^{2}$

The value of z is $x^{2}$ and i = 1

So

$z := (x^{2} + 1)*x^{2}$

$z = x^{4} + x^{2})$

$\partial z/\partial x = 4x^{3} + 2x$

Substituting x = 0.5:

$\partial z/\partial x = 4\times0.5^{3} + 2\times0.5 = 1.5$


In [12]:
4*0.5**3 + 1

1.5

__Task__: Now use the code block below to print out the gradient computed by our reverse AD by storing the result in a variable called `grad`. Does it match?

In [13]:
# YOUR CODE HERE

z.grad_value = 1.0


grad = x.grad()

print(grad)

1.5


Thus the value is verififed

In [0]:
# Tests
assert grad


__Task:__ Finally, use the code block below to experiment and test the other math functions and methods you created.

Some other math functions to be tested are:

- tan
- power
- exp

Let us find $\partial z/\partial x$ for each of the following functions at **x=0.5 and y=4.2** 

1. $x = y^{x} + e^{x}$

  $\partial z/\partial x = y^{x} ln(y) + e^{x}$
  
  The result should be **4.589769365826166**

2. $x = tan(x) + cos(x)$

  $\partial z/\partial x = {sec}^{2}(x) - sin(x)$
  
  The result should be **0.8190208718053216**


In [31]:
# first function

x = Var(0.5)
y = Var(4.2)
  
z = y**x + exp(x)

z.grad_value = 1.0

print("z=", z)

print("∂z/∂x = {}".format(x.grad()))

z= 3.698111423892048
∂z/∂x = 4.589769365826166


In [23]:
# first function --- verify

4.2**0.5*math.log(4.2) + math.exp(0.5)

4.589769365826166

In [32]:
# second function

x = Var(0.5)
y = Var(4.2)

z = tan(x) + cos(x)

print("z=",z)

z.grad_value = 1.0

print("∂z/∂x = {}".format(x.grad()))

z= 1.4238850517341632
∂z/∂x = 0.8190208718053216


In [28]:
# second function --- verify

(1/math.cos(0.5))**2 - math.sin(0.5)

0.8190208718053216

**Thus, the functions are verified**