### Chain Rule

$ \frac{\partial f}{\partial x} = \frac{\partial f}{\partial g} \frac{ \partial g } {\partial x} $

$\frac {\partial loss}{\partial x} $

This is like saying, because of X, what is the rate of error

Find the final loss, we can chain rule together all of the intermediate derivates (backpropagation)

### Example


Two inputs into a single neuron

$x, y$ outputs a value $z$

We want to find

$ \frac{\partial loss}{\partial x} \frac{\partial loss}{\partial y} $

This will tell us the rate of loss due to x and y which we want to be 0 ideally.


First step, calculate

$\frac{\partial z}{\partial x}$ and $\frac{\partial z}{\partial y}$

which is the change of z due to y or x 

Then, assume we get $\frac{\partial loss}{\partial z}$

$ \frac{\partial loss}{\partial x} = \frac{\partial loss}{\partial z} \frac{\partial z}{\partial x} $ 

$ \frac{\partial loss}{\partial y} = \frac{\partial loss}{\partial z} \frac{\partial z}{\partial y} $

#### More concrete example

x = 2
y = 3

Neuron function

$f(x,y) = x * y$

$\frac{\partial z}{\partial x} = y$

$\frac{\partial z}{\partial y} = x$

Assume $\frac{\partial loss}{\partial z} = 5$

$ \frac{\partial loss}{\partial x} = 3 * 5 = 15 $

$ \frac{\partial loss}{\partial y} = 2 * 5 = 10 $

# Computational Graph

$ loss = (xw - y) ^ 2 = (\hat{y} - y) ^ 2 $

$ \hat{y} = xw $

xw = first gate

$\hat{y}$ - y = second gate

$()^2$ = third gate

#### Forward

x = 1, y = 2, w = 1

$\hat{y} = 1$

$s = 1 - y = -1$

$loss = -1 ^ 2 = 1$ (We actually don't care about the loss here, we care about making the rate of error = 0)

#### Backward

$\frac{\partial s^2}{\partial s} = 2s$ (Derivate of $s^2$ with respect to $s$)

$\frac{\partial loss}{\partial s} = 2s = -2$

$\frac{\partial \hat{y} - y}{\partial \hat{y}} = 1$

$ s = \hat{y} - y $

$\frac{\partial loss}{\partial \hat{y}} = \frac{\partial loss}{\partial s}\frac{\partial s}{\partial \hat{y}} = -2 * 1 = -2 $

$\frac{\partial xw}{\partial w} = x$

$ \hat{y} = xw $

$\frac{\partial loss}{\partial w} = \frac{\partial loss}{\partial \hat{y}}\frac{\partial \hat{y}}{\partial w} = -2 * x = -2 * 1 = -2 $

#### Example 4-1
x = 2, y = 4, w =1

$\hat{y} = 2 = 2 * 1 = x * w$

$s = \hat{y} - y = 2 - 4 = -2$

$s ^ 2 = -2 ^ 2 = 4 = loss$

$2 * s = 2 * -2  = 2 * s  = -4$

$ -4 * 1 = -4 $

$ 2 * -4 = -8 $

https://my.pcloud.com/publink/show?code=XZYHWS7ZGNkSsHHaVhbDtGtgq8iMlY73yek0

In [36]:
import torch
from torch.autograd import Variable

x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

In [37]:
def forward(x):
    return x * w

In [38]:
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) ** 2

In [56]:
w = Variable(torch.Tensor([1.0]), requires_grad=True)
print(f"When x=3, result is {forward(3).data[0]}, should be 6")
for ep in range(10):
    for x, y in zip(x_data, y_data):
        err = loss(x, y)
        err.backward()
        print(f'\t x = {x}, y = {y}, w=${w.data}, pred=${forward(x)}, grad = {w.grad.data[0]}')
        w.data = w.data - 0.01 * w.grad.data
        w.grad.data.zero_()
    print(f"episode {ep} last loss = {err.data[0]} (error^2)")
print(f"When x=3, result is {forward(3).data[0]}, should be 6")   

When x=3, result is 3.0, should be 6
	 x = 1.0, y = 2.0, w=$tensor([ 1.]), pred=$tensor([ 1.]), grad = -2.0
	 x = 2.0, y = 4.0, w=$tensor([ 1.0200]), pred=$tensor([ 2.0400]), grad = -7.840000152587891
	 x = 3.0, y = 6.0, w=$tensor([ 1.0984]), pred=$tensor([ 3.2952]), grad = -16.228801727294922
episode 0 last loss = 7.315943717956543 (error^2)
	 x = 1.0, y = 2.0, w=$tensor([ 1.2607]), pred=$tensor([ 1.2607]), grad = -1.478623867034912
	 x = 2.0, y = 4.0, w=$tensor([ 1.2755]), pred=$tensor([ 2.5509]), grad = -5.796205520629883
	 x = 3.0, y = 6.0, w=$tensor([ 1.3334]), pred=$tensor([ 4.0003]), grad = -11.998146057128906
episode 1 last loss = 3.9987640380859375 (error^2)
	 x = 1.0, y = 2.0, w=$tensor([ 1.4534]), pred=$tensor([ 1.4534]), grad = -1.0931644439697266
	 x = 2.0, y = 4.0, w=$tensor([ 1.4643]), pred=$tensor([ 2.9287]), grad = -4.285204887390137
	 x = 3.0, y = 6.0, w=$tensor([ 1.5072]), pred=$tensor([ 4.5216]), grad = -8.870372772216797
episode 2 last loss = 2.1856532096862793 (er

## Todo 4-2 / 4-3 / 4-4 / 4-5 Exercise for Fun