# Given

Train data

|x|y|
|-|-|
|1|5|

and TF model with 1 hidden layer and 1 output layer:

* 1 hidden layer $a_1$
    *   relu activation, $w_1 = 2$, $b_1 = 0$ 
* 1 output layer
    *   relu acivation, $w_2 = 3$, $b_1 = 1$ 

*No Regularization

# Calculate

Backwards propagation, i.e. adjustment to weights 
* $\frac{dJ}{dw_1}$ and $\frac{dJ}{db_1}$
* $\frac{dJ}{dw_2}$ and $\frac{dJ}{db_2}$

# Solution

In [17]:
import numpy as np

x = 1
y = 5
w1, b1 = 2, 0
w2, b2 = 3, 1

### Forward propagation

In [18]:
def relu(a):
    return(np.maximum(0,a))


a1 = w1 * x + b1
activated_a1 = relu(a1)
a2 = w2 * activated_a1 + b2
activated_a2 = relu(a2)

print(f"a1: {activated_a1} -> a2: {activated_a2}")

# Cost
J = 1/2 * (activated_a2 - y) ** 2

print(f"Cost J: {J}")

a1: 2 -> a2: 7
Cost J: 2.0


### Backward Propagation

##### Layer 2

Formulas:

$$
\begin{cases}
\frac{\partial J}{\partial w_2} = \frac{\partial J}{\partial a^{activated}_2} \cdot \frac{\partial a^{activated}_2}{\partial a_2} \cdot \frac{\partial a_2}{\partial w_2} \\[1em]
\frac{\partial J}{\partial b_2} = \frac{\partial J}{\partial a^{activated}_2} \cdot \frac{\partial a^{activated}_2}{\partial a_2} \cdot \frac{\partial a_2}{\partial b_2}
\end{cases}
$$


Solution:

$$
\begin{align*}

    J = \frac{1}{2} \cdot (a^{activated}_2 - y)^2 => \frac{\partial J}{\partial a^{activated}_2} = \frac{\partial {(\frac{1}{2}\cdot (a^{activated}_2 - y)^2)}}{\partial a^{activated}_2} = a^{activated}_2 - y = 7 - 5 = 2\\


    \frac{\partial a^{activated}_2}{\partial a_2} = (1|7 \geq 0) \\


    a_2 = w_2 \cdot a^{activated}_1 + b_2 =>

    \begin{cases}
        \frac{\partial a_2}{\partial w_2} = a^{activated}_1 = 2 \\
        \frac{\partial a_2}{\partial b_2} = 1
    \end{cases}

\end{align*}

=>

\begin{cases}

    \frac{\partial J}{\partial w_2} = 2 * 1 * 2 = 4 \\
    \frac{\partial J}{\partial w_2} = 2 * 1 * 1 = 2

\end{cases}

$$

##### Layer 1

Formulas:

$$
\begin{cases}
\frac{\partial J}{\partial w_1} = \frac{\partial J}{\partial a^{activated}_2} \cdot \frac{\partial a^{activated}_2}{\partial a_2} \cdot \frac{\partial a_2}{\partial a^{activated}_1} \cdot \frac{\partial a^{activated}_1}{\partial a_1} \cdot \frac{\partial a_1}{\partial w_1}  \\[1em]
\frac{\partial J}{\partial b_1} = \frac{\partial J}{\partial a^{activated}_2} \cdot \frac{\partial a^{activated}_2}{\partial a_2} \cdot \frac{\partial a_2}{\partial a^{activated}_1} \cdot \frac{\partial a^{activated}_1}{\partial a_1} \cdot \frac{\partial a_1}{\partial b_1}  
\end{cases}
$$


Solution:

$$

\begin{cases}
    \frac{\partial J}{\partial a^{activated}_2} \cdot \frac{\partial a^{activated}_2}{\partial a_2} = 2 \cdot 1 = 2 \text{ (solved above)} \\
    a_2 = w_2 \cdot a^{activated}_1 + b_2 => \frac{\partial a_2}{\partial a^{activated}_1} = w_2 = 3 \\
    \frac{\partial a^{activated}_1}{\partial a_1} = (1|a_1 = 2 \geq 0) \\
    a_1 = w_1 * x + b_1 => \frac{\partial a_1}{\partial w_1} = x = 1 \\
\end{cases}

    =>

\frac{\partial J}{\partial w_1} = 2 \cdot 3 \cdot 1 \cdot 1 = 6 \\

\\[2em]
\\

\begin{cases}
\frac{\partial J}{\partial a^{activated}_2} \cdot \frac{\partial a^{activated}_2}{\partial a_2} = 2 \cdot 1 = 2 \\
\frac{\partial a_2}{\partial a^{activated}_1} = w_2 = 3 \text{ (solved step before)} \\
\frac{\partial a^{activated}_1}{\partial a_1} = 1 \text{ (solved step before)} \\
\frac{\partial a_1}{\partial b_1} = 1
\end{cases}

    =>

\frac{\partial J}{\partial b_1} = 2 \cdot 3 \cdot 1 \cdot 1 = 6\\

$$

### Sympy solver

In [94]:
import numpy as np
from sympy import symbols, diff, lambdify, Max, pprint

def relu(a):
    return Max(0, a)

# Define the symbols
w1, w2, b1, b2, x, y = symbols('w1 w2 b1 b2 x y')

# Define the equations
a1 = w1 * x + b1
activated_a1 = relu(a1)
a2 = w2 * activated_a1 + b2
activated_a2 = relu(a2)

# Define the cost function
J = 1/2 * (activated_a2 - y)**2

# Calculate the derivatives
dJ_dw2 = diff(J, w2)
dJ_db2 = diff(J, b2)
dJ_dw1 = diff(J, w1)
dJ_db1 = diff(J, b1)

In [97]:
pprint(dJ_dw2)

1.0⋅(-y + Max(0, b2 + w2*Max(0, b1 + w1*x)))⋅θ(b₂ + w₂⋅Max(0, b1 + w1*x))⋅Max(
0, b1 + w1*x)


In [98]:
pprint(dJ_db2)

1.0⋅(-y + Max(0, b2 + w2*Max(0, b1 + w1*x)))⋅θ(b₂ + w₂⋅Max(0, b1 + w1*x))


In [99]:
pprint(dJ_dw1)

1.0⋅w₂⋅x⋅(-y + Max(0, b2 + w2*Max(0, b1 + w1*x)))⋅θ(b₁ + w₁⋅x)⋅θ(b₂ + w₂⋅Max(0
, b1 + w1*x))


In [100]:
pprint(dJ_db1)

1.0⋅w₂⋅(-y + Max(0, b2 + w2*Max(0, b1 + w1*x)))⋅θ(b₁ + w₁⋅x)⋅θ(b₂ + w₂⋅Max(0, 
b1 + w1*x))


In [89]:
# Create functions to evaluate the derivatives
calc_dJ_dw2 = lambdify([w1, w2, b1, b2, x, y], dJ_dw2)
calc_dJ_db2 = lambdify([w1, w2, b1, b2, x, y], dJ_db2)

# Example usage
w1_val = 2
w2_val = 3
b1_val = 0
b2_val = 1
x_val = 1
y_val = 5

result_dJ_dw2 = calc_dJ_dw2(w1_val, w2_val, b1_val, b2_val, x_val, y_val)
result_dJ_db2 = calc_dJ_db2(w1_val, w2_val, b1_val, b2_val, x_val, y_val)

print("dJ/dw2 =", result_dJ_dw2)
print("dJ/db2 =", result_dJ_db2)

dJ/dw2 = 4.0
dJ/db2 = 2.0


In [90]:
# Create functions to evaluate the derivatives
calc_dJ_dw1 = lambdify([w1, w2, b1, b2, x, y], dJ_dw1)
calc_dJ_db1 = lambdify([w1, w2, b1, b2, x, y], dJ_db1)

# Example usage
w1_val = 2
w2_val = 3
b1_val = 0
b2_val = 1
x_val = 1
y_val = 5

result_dJ_dw1 = calc_dJ_dw1(w1_val, w2_val, b1_val, b2_val, x_val, y_val)
result_dJ_db1 = calc_dJ_db1(w1_val, w2_val, b1_val, b2_val, x_val, y_val)

print("dJ/dw1 =", result_dJ_dw1)
print("dJ/db1 =", result_dJ_db1)

dJ/dw1 = 6.0
dJ/db1 = 6.0
