# Backprop Examples

In this notebook will look at three different ways of computing gradients for neural network learning, all of which rely on the same basic principle (backprop). 

1. The first method is to code a function and its derivatives manually using the rules of backprop.
2. The second method is to use automatic differentiation using a python tool called [autograd](https://github.com/HIPS/autograd).
3. The third method is to use [Tensorflow](https://www.tensorflow.org/), a powerful machine learning toolkit from Google, which also includes methods to automatically differentiate an expression defined through Tensforflow primitives. 

### More about ``autograd``

[autograd](https://github.com/HIPS/autograd) is a Python package for **algorithmic differentiation**. It allows you to automatically compute the derivative of functions written in (nearly) native code. This makes it really easy to compute derivatives for things like gradients of complex non-linear functions in neural networks. Under the hood, it is also using reverse mode autodiff (which is just backpropagation). To use autograd, you just need to add it to your cs490 Anaconda environment using the following command:

`conda install -c conda-forge autograd`

### More about Tensorflow

[Tensorflow](https://www.tensorflow.org/) is a powerful machine learning package from Google.

You can define functions as "computation graphs" using tensor flow operations, and it can automatically perform backprop (aka reverse mode automatic differentiation) to compute the gradient for you.

You can use the Python port of Tensorflow in the same way as autograd--you just need to download it and install it in your Anaconda cs490 environment. To do this, use the following command:

`conda install tensorflow`

If you have a GPU available on your machine, you can download the GPU compatible version of Tensorflow. This is optimized to run different parts of the "tensor" (essentially a matrix on steroids, where each cell can be any type of data...like images...which are just matrices of pixels) on different GPU cores and will be much more efficient if you are training a deep network or have a lot of training data. To install the GPU version, use this command: 

`conda install tensorflow-gpu` 

**NOTE: the GPU version is apparently not available for Mac**

# Backprop Example 1

We will see three different ways to compute the gradient of

$$f(x,y,z) = (2x + y)*z$$

where we can interpret this as a very simple two node network. $2x + y$ is one perceptron at the first layer of our network (i.e., the hidden layer) where we are computing a linear combination over the input $x$. The result of this is used by the next layer (i.e., the output layer) to compute the final output with $z$.

### Manual backprop

In [2]:
import numpy as np

# Backprop example

# Compute f(x,y,z) = (2*x+y)*z
x = 1.
y = 2.
z = 3.

# Forward pass--push input through the network to get the predition f
h = 2.*x + y   # Node 1
f = h*z        # Node 2

# Backward pass--get the derivative with respect to each component to get the gradient
d_f = 1
d_h = z * d_f  # Node 2 input
d_z = h * d_f  # Node 2 input
d_x = 2 * d_h  # Node 1 input
d_y = 1 * d_h  # Node 1 input

grad = np.array([d_x, d_y, d_z])

print(f)
print(grad)

12.0
[6. 3. 4.]


### Autograd

In [3]:
import autograd.numpy as np  # Thinly wrapped version of numpy
from autograd import grad

# define our top level function  f
def f(args):
    x,y,z = args
    return (2*x + y)*z

f_grad = grad(f)  # magic: returns a function that computes the gradient of f :) 
                  # This essentially just applies the chain rule to the function and
                  # and takes the derivative, which is exactly what backprop is!

# Plug in values for our inputs
x = 1.
y = 2.
z = 3.

print(f([x, y, z]))
print(f_grad([x, y, z]))


12.0
[array(6.), array(3.), array(4.)]


### Tensorflow

In [14]:
import tensorflow as tf

# define our top level function  f
def f(args):
    x,y,z = args
    return (2*x + y)*z

# tensorflow uses it's own types of variables so it can return its matrices (tensors)
# so we need to use tf variables
x = tf.Variable(1.)
y = tf.Variable(2.)
z = tf.Variable(3.)

# To run our gradient, we need to declare what is called a "gradient tape"
# This is essentially a "tape" where tensorflow stores all of the intermediate derivatives
# during backpropagation
with tf.GradientTape() as tape:
    y_hat = f([x, y, z])    # First we do our feedforward run to get our prediction from f

grad = tape.gradient(y_hat, [x, y, z]) # Another magical gradient computation!

tf.print(y_hat)
tf.print(grad)


12
[6, 3, 4]


# Backprop Example 2

Here is a slighly more complex example:

$$f(x) = 10*\exp(\sin(x)) + \cos^2(x)$$

### Manual backprop

In [5]:
import numpy as np

# Backprop example
# f(x) = 10*np.exp(np.sin(x)) + np.cos(x)**2

# Forward pass
x = 1000
a = np.sin(x)   # Node 1
b = np.cos(x)   # Node 2
c = b**2        # Node 3
d = np.exp(a)   # Node 4
f = 10*d + c    # Node 5 (final output)

# Backward pass
d_f = 1
d_d = 10 * d_f            # Node 5 input
d_c = 1  * d_f            # Node 5 input
d_a = np.exp(a) * d_d     # Node 4 input
d_b = 2*b * d_c           # Node 3 input
d_x =  np.cos(x) * d_a - np.sin(x) * d_b  # Node 2 and 1 input

print (f, x_bar)

23.1780070835713 11.92692295225547


### Autograd

In [7]:
import autograd.numpy as np  # Thinly wrapped version of numpy
from autograd import grad

# define our output function (result of forward pass)
def f(args):
    x = args
    return 10*np.exp(np.sin(x)) + np.cos(x)**2

# get the gradient function
f_grad = grad(f)

# set input value
x = 1000

print(f(x))
print(f_grad([x]))

23.1780070835713
[11.92692295]


### Tensorflow

In [16]:
import tensorflow as tf

# define our output function (result of forward pass)
def f(args):
    x = args
    return 10*tf.exp(tf.sin(x)) + tf.cos(x)**2  # Use the tf version of exp, sin, and cos

# tensorflow uses it's own types of variables so it can return its matrices (tensors)
# so we need to use tf variables
x = tf.Variable(1000.)

# To run our gradient, we need to declare what is called a "gradient tape"
# This is essentially a "tape" where tensorflow stores all of the intermediate derivatives
# during backpropagation
with tf.GradientTape() as tape:
    y_hat = f([x])    # First we do our feedforward run to get our prediction from f

grad = tape.gradient(y_hat, [x]) # Another magical gradient computation!

tf.print(y_hat)
tf.print(grad)


[23.178009]
[11.9269238]
