In [None]:
%matplotlib inline

# Computational Graphs (CG), Calculus on CG, and AutoGrad

Useful links
  * https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html
  * https://colah.github.io/posts/2015-08-Backprop/
  * https://www.deepideas.net/deep-learning-from-scratch-i-computational-graphs/
  
What is torch.autograd?
 * PyTorch's automatic differentiation engine
 * Will help you train your neural networks!
 * Breakdown
   * forward prop: pushes input data through the net
   * backwar prop: what we have been studying! 

Lets create two tensors with autograd turned on

In [None]:
import torch

a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)
print(a)
print(b)

create another tensor based on a and b

$Q = 3a^3 - b^2$

In [None]:
Q = 3*a**3 - b**2
print(Q)

If we calc'd the gradients of Q we will get (analytically)

$\frac{\partial Q}{\partial a} = 9a^2$

$\frac{\partial Q}{\partial b} = -2b$

We can call .backward() on our tensor $Q$ 

Autograd calc's the gradients and stores them in the respective tensor's .grad attribute

However, we need to explicitly pass a gradient argument in Q.backward() because its a vector

Gradient is a tensor of the same shape as Q and it represents the gradient of Q w.r.t. itself

$\frac{dQ}{dQ}=1$

In [None]:
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

We now have those gradients sitting for us in a and b!

In [None]:
print(a.grad)
print(9*a**2)

print(b.grad)
print(-2*b)

Cool, right!

Let's do an example by hand vs all that PyTorch magic, lets do the Perceptron

In [None]:
import numpy as np

# define a data set
X = np.array([
    [0, 1],  # data point @ (0,1)
    [1, 1],  # data point @ (1,1)
    [0, -1],  # data point @ (0,-1)
    [1, -1],  # data point @ (1,-1)
]) 

# DESIRED labels
y = np.array([[-1], 
              [-1], 
              [1],
              [1]
             ])

# initial weight vector
w = np.asarray([1.,-0.5])

# push our wrong data point through the network! (other points are at least on the right size of the 0 check!)
v = np.sum(X[1,:] * w)
y = (np.exp(v)-np.exp(-v)) / (np.exp(v)+np.exp(-v))
print( y )
print( 'wanted it to be negative and a -1' )

Lets do its update manually with tensors and autograd

In [None]:
# our forward pass

# input and weight vector
w = torch.tensor([1., -0.5], requires_grad=True)
x = torch.tensor([1., 1.], requires_grad=True)

# our network formula
v = torch.dot(x,w)
v.retain_grad()
y = (torch.exp(v)-torch.exp(-v)) / (torch.exp(v)+torch.exp(-v))
y.retain_grad()
e = 0.5 * ( -1. - y )**2

print(y)
print(e)

Note, I put in .retain_grad() to keep non-leaf node gradients (those intermediate variables!)

Lets evaluate that backward

In [None]:
e.backward()
print('y grad')
print(y.grad)
print('v grad')
print(v.grad)
print('w grad')
print(w.grad)

Lets work out the gradients by hand that we would have done (see our formula from the backprop day)

In [None]:
print('y grad')
print( (-1 - y) * (-1) )
print('v grad')
print( (-1 - y) * (-1) * (1 - torch.tanh(v)**2) )
print('w grad')
print( [ (-1 - y) * (-1) * (1 - torch.tanh(v)**2) * (1.), (-1 - y) * (-1) * (1 - torch.tanh(v)**2) * (1.) ] )

You can take it from here class!