# Backpropogation using real-valued computational graph 

```
 dQ/dX
X ---\
      \    df/dQ
       (+)---\
 dQ/dY/   Q   \
Y ---/         \
                (*)--- f(x, y, z)
               /   df/df
 df/dZ        /
Z -----------/
```

#### By Chain rule
```
given Q = X + Y

df/dQ = df/df * df/dQ
df/dX = df/dQ * dQ/dX
df/dY = df/dQ * dQ/dY
df/dZ = df/df * df/dZ
```
by manually calculating the partial derivatives...
```
df/dQ = Z
df/dX = Z * 1
df/dY = Z * 1
df/dZ = X + Y
```

In [None]:
def forward(x, y, z):
    return z*(x + y)

In [None]:
def backward(x, y, z):
    dfdx = z
    dfdy = z
    dfdz = x + y
    return dfdx, dfdy, dfdz

In [None]:
x = 2
y = 5
z = 9
dx, dy, dz = backward(x, y, z)
print ("Gradients on X: {}, Y: {} and Z: {}".format(dx, dy, dz))

---
# Converting the above computation graph to an optimization problem

### This can simply be done by adding a loss layer at the end. Lets say we add a mean square loss `node` after `f(x, y, z)`. Here we define value `V` that we want to optimize using the function `f(x, y, z)`. Below is how the new computational graph would look like. Notice that we are taking derivatives w.r.t `L` not `f` anymore.

```
 dQ/dX
X ---\
      \    df/dQ
       (+)---\
 dQ/dY/   Q   \
Y ---/         \
                (*)--- f(x, y, z)---Loss
               /   dL/df
        df/dZ /
Z -----------/
```

```
L = (V - f(x,y,z))^2
dL/dQ = dL/df * df/dQ
dL/dX = dL/dQ * dQ/dX
dL/dY = dL/dQ * dQ/dY
dL/dZ = dL/df * df/dZ
```

analytically this becomes...

```
dL/dQ = -2(V-f)*Z
dL/dX = -2(V-f)*Z*1 = dL/dQ*Z
dL/dY = -2(V-f)*Z*1 = dL/dQ*Z
dL/dZ = -2(V-f)*(X + Y)

```

now the backward calculation changes to below. You can verify this using wolframalpha

In [None]:
def backward_w_loss(V, x, y, z):
    dldx = -2 * z * (V - (x + y) * z)
    dldy = -2 * z * (V - (x + y) * z)
    dldz = -2 * (x + y) * (V - (x + y) * z)
    return dldx, dldy, dldz

# Optimization loop for variables `x`, `y` and `z`

Here I am optimizing the variables to produce a `final_value` of `144`. Which is partly equivalent to training a neural network to learn a certain output/final_value. Notice how in the iteration, you do not need the `forward` function (only in this case because it is a simple/small network and we are solving the entire network analytically). If you were do this piecewise as in a `real` neural network, you need to do the `forward` pass to calculate the loss, which will then get backpropogated through the network.

Play around with all initial values of the variables (long enough) and you will essentially find all the real values of `x`, `y` and `z` that satisfy the equation. 

Notice that this problem has the same steps as the previous curvefitting assigment. Here instead you are training a computational graph to approximate an output value. Also a more complex computational graph like a neural network will follow the same steps.

In [None]:
x = 12.3791
y = 1.4782
z = 8.192
alpha = 0.001
final_value = 144
n = 1000
for i in range(n):
    dx, dy, dz = backward_w_loss(final_value, x, y, z)
    x = x - alpha * dx
    y = y - alpha * dy
    z = z - alpha * dz
print ("Final Values of x: {}, y: {} and z: {}".format(x, y, z))
# forward is only used here below
print ("Evaluation of (x + y) * z = {}".format(forward(x, y, z)))

---
# Assignment

- Design another computational graph where you introduce weights in the intermediate nodes.

- Write a function to calculate the backward pass to optimize only the weights in the computational graph. Keep the `final_value` and your inputs constant. NOTE: this design of the computational graph would be more in the flavor of a neural network.
- Optimize the computational graph to obtain optimal weights such that your input variables produce your `final_value`.
 