# Backpropogation using real-valued computational graph 

```
 dQ/dX
X ---\
   w1   \    df/dQ
       (+)---\
 dQ/dY/   Q w4  \
Y ---/         \
   w2             (+)--- f(x, y, z)
               /   df/df
 df/dZ        /
Z -----w3------/
```

#### By Chain rule
```
given Q = w1X + w2Y

df/dQ = df/df * df/dQ
df/dX = df/dQ * dQ/dX
df/dY = df/dQ * dQ/dY
df/dZ = df/df * df/dZ
```
by manually calculating the partial derivatives...
```
df/dQ = 1
df/dX = w1
df/dY = w2
df/dZ = 1
```

In [119]:
def forward(w1,w2,w3,w4):
    return w4*(w1*x+w2*y)+w3*z

In [120]:
def backward(w1,w2,w3,w4):
    dfdw1 =w4*x
    dfdw2 = w4*y
    dfdw3 = z
    dfdw4 = (w1*x +w2*y)
    return dfdw1, dfdw2, dfdw3, dfdw4

In [121]:
x = 2
w1=1
y = 5
w2=2
z = 9
w3=3
w4=6

dw1, dw2, dw3, dw4 = backward(w1,w2,w3,w4)
print ("Gradients on w1: {}, w2: {}, w3: {} and w4: {}".format(dw1, dw2, dw3,dw4))

Gradients on w1: 12, w2: 30, w3: 9 and w4: 12


---
# Converting the above computation graph to an optimization problem

### This can simply be done by adding a loss layer at the end. Lets say we add a mean square loss `node` after `f(x, y, z)`. Here we define value `V` that we want to optimize using the function `f(x, y, z)`. Below is how the new computational graph would look like. Notice that we are taking derivatives w.r.t `L` not `f` anymore.

```
 dQ/dX
X ---\
      \    df/dQ
       (+)---\
 dQ/dY/   Q   \
Y ---/         \
                (+)--- f(x, y, z)---Loss
               /   dL/df
        df/dZ /
Z -----------/
```

```
L = (V - f(x,y,z))^2     where f(x,y,z) = w4(w1x+w2y)+w3*z
dL/dQ = dL/df * df/dQ
dL/dw1 = dL/dQ * dQ/dw1
dL/dw2 = dL/dQ * dQ/dw2
dL/dw3 = dL/df * df/dw3
dL/dw4 = dL/df * df/dw4

```

analytically this becomes...

```
dL/dw1 = -2(V-(w4(w1x+w2y)+w3*z)*w4x
dL/dw2 = -2(V-(w4(w1x+w2y)+w3*z)*w4y
dL/dw3 = -2(V-(w4(w1x+w2y)+w3*z)*z
dL/dw4= -2(V-(w4(w1x+w2y)+w3*z)*(w1x+w2y)

```

now the backward calculation changes to below. You can verify this using wolframalpha

In [122]:
def backward_w_loss(V, w1,w2,w3,w4):
    dldw1 = -2*(V-(w4*(w1*x+w2*y)+w3*z))*w4*x
    dldw2 = -2*(V-(w4*(w1*x+w2*y)+w3*z))*w4*y
    dldw3 = -2*(V-(w4*(w1*x+w2*y)+w3*z))*z
    dldw4 = -2*(V-(w4*(w1*x+w2*y)+w3*z))*(w1*x+w2*y)
    return dldw1, dldw2, dldw3, dldw4

# Optimization loop for variables `x`, `y` and `z`

Here I am optimizing the variables to produce a `final_value` of `145`. Which is partly equivalent to training a neural network to learn a certain output/final_value. Notice how in the iteration, you do not need the `forward` function (only in this case because it is a simple/small network and we are solving the entire network analytically). If you were do this piecewise as in a `real` neural network, you need to do the `forward` pass to calculate the loss, which will then get backpropogated through the network.

Play around with all initial values of the variables (long enough) and you will essentially find all the real values of `x`, `y` and `z` that satisfy the equation. 

Notice that this problem has the same steps as the previous curvefitting assigment. Here instead you are training a computational graph to approximate an output value. Also a more complex computational graph like a neural network will follow the same steps.

In [132]:
x = 2
w1=1
y = 5
w2=2
z = 9
w3=3
w4=6


alpha = 0.001
final_value = 145
n = 10000
for i in range(n):
    dw1, dw2, dw3,dw4 = backward_w_loss(final_value, w1, w2, w3, w4)
    w1 = w1 - alpha * dw1
    w2 = w2 - alpha * dw2
    w3 = w3 - alpha * dw3
    w4 = w4 - alpha * dw4
print ("Final Values of dw1: {}, dw2: {}, dw3: {}, and z: {}".format(w1, w2, w3, w4))
# forward is only used here below
print ("Evaluation of (x + y) + z = {}".format(forward(w1, w2, w3, w4)))

Final Values of dw1: 1.4927532249355941, dw2: 3.231883062338986, dw3: 8.466380584931686, and z: 3.593776751479741
Evaluation of (x + y) + z = 145.00000000000006
