In [1]:
import numpy as np
import tensorflow as tf
from sympy import *
import re
import matplotlib.pyplot as plt

2025-04-07 22:06:52.493207: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-07 22:06:52.527957: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Let's calculate the derivative of this expression : $ J = (2 + 3w)²$

In [2]:
plt.close("all")

# Forward Propagation

In [3]:
w = 3
a = 2 + 3*w
J = a**2
print(f"a = {a}, J = {J}")

a = 11, J = 121


# Backprop

Backprop starts at the right and moves to the left. The first node to consider is $ J = a²$ and the first step is to find $\frac{\partial J}{\partial a}$

### Arithmetically

In [4]:
increase = 0.001
a_epsilon = a + increase
J_epsilon = a_epsilon**2
delta = (J_epsilon - J) / increase
print(f"J = {J}, J_epsilon = {J_epsilon}, dJ_da ~= k = {delta}") 

J = 121, J_epsilon = 121.02200099999999, dJ_da ~= k = 22.000999999988835


### Symbolically

Using Sympy to calculate derivates symbolically

In [5]:
sw, sJ, sa = symbols('w, J, a')
sJ = sa**2
sJ

a**2

In [6]:
sJ.subs([(sa,a)])

121

In [7]:
dJ_da = diff(sJ, sa)
dJ_da

2*a

So, $\frac{\partial J}{\partial a} = 2a$. When $a = 11$, $\frac{\partial J}{\partial a} = 22$. Moving from right to left, the next value we would like to compure is $\frac{\partial J}{\partial w}$.

### Arithmetically

In [8]:
increase = 0.001
w_epsilon = w + increase
a_epsilon = 2 + 3 * w_epsilon
delta = (a_epsilon - a) / increase
print(f"a = {a}, a_epsilon = {a_epsilon}, da_dw ~= k = {delta}")

a = 11, a_epsilon = 11.003, da_dw ~= k = 3.0000000000001137


Using sympy

In [9]:
sa = 2 + 3*sw
sa

3*w + 2

In [10]:
da_dw = diff(sa, sw)
da_dw

3

Now, we calculate the *chain rule*:
$\frac{\partial J}{\partial w} = \frac{\partial a}{\partial w} \frac{\partial J}{\partial a}$

In [11]:
dJ_dw = da_dw * dJ_da
dJ_dw

6*a

So, in the example, a is 11, so $\frac{\partial J}{\partial w} = 66.$

In [12]:
epsilon = 0.001
w_epsilon = w + epsilon
a_epsilon = 2 + 3*w_epsilon
J_epsilon = a_epsilon**2
k = (J_epsilon - J) / epsilon
print(f"J = {J}, J_epsilon= {J_epsilon}, dJ_dw ~= k = {k}")

J = 121, J_epsilon= 121.06600900000001, dJ_dw ~= k = 66.0090000000082


Let's analyze another example:

# Forward propagation

In [13]:
# Inputs and parameters
x = 2
w = -2
b = 8
y = 1
#calculate per step values
c = w * x
a = c + b
d = a - y
J = d**2/2
print(f"J 0 {J}, d = {d}, a = {a}, c={c}")

J 0 4.5, d = 3, a = 4, c=-4


# Backprop

### Arithmetically

Let's find $\frac{\partial J}{\partial d}$:

In [14]:
epsilon = 0.001
d_epsilon = d + epsilon
J_epsilon = d_epsilon**2/2
delta = (J_epsilon - J) / epsilon
print(f"J ={J}, J_epsilon = {J_epsilon}, dJ_dd ~ = delta = {delta}")

J =4.5, J_epsilon = 4.5030005, dJ_dd ~ = delta = 3.0004999999997395


$\frac{\partial J}{\partial d}$ is 3.

### Symbolically

In [15]:
sx, sw, sb, sy, sJ = symbols('x, w, b, y, J')
sa, sc, sd = symbols('a, c, d')
sJ = sd**2/2
sJ

d**2/2

In [16]:
sJ.subs([(sd,d)])

9/2

In [17]:
dJ_dd = diff(sJ,sd)
dJ_dd

d

So, $\frac{\partial J}{\partial d} = d$. When $d = 3$, $\frac{\partial J}{\partial d} = 3$.

Now, Let's calculate $\frac{\partial J}{\partial a}$, where $ d = a - y$

In [18]:
epsilon = 0.001
a_epsilon = a + epsilon
d_epsilon = a_epsilon - y 
delta = (d_epsilon - d) / epsilon
print(f"d ={d}, d_epsilon = {d_epsilon}, dd_da ~ = delta = {delta}")

d =3, d_epsilon = 3.0010000000000003, dd_da ~ = delta = 1.000000000000334


This means that arithmetically, $\frac{\partial d}{\partial a} \approx 1$.

### Symbolically

In [19]:
sd = sa - sy
sd

a - y

In [20]:
dd_da = diff(sd,sa)
dd_da

1

In [21]:
dJ_da = dd_da * dJ_dd
dJ_da

d

And $d$ is 3. Checking arithmetically:

In [22]:
epsilon = 0.001
a_epsilon = a + epsilon
d_epsilon = a_epsilon - y 
J_epsilon = (d_epsilon**2/2)
delta = (J_epsilon - J) / epsilon
print(f"J ={J}, J_epsilon = {J_epsilon}, dJ_da ~ = delta = {delta}")

J =4.5, J_epsilon = 4.503000500000001, dJ_da ~ = delta = 3.0005000000006277


Let's continue with $\frac{\partial J}{\partial c}$, $\frac{\partial J}{\partial b}$ with $a = c + b$.

In [23]:
sa = sc + sb
sa

b + c

In [24]:
da_dc = diff(sa, sc)
da_db = diff(sb, sb)
print(da_dc, da_db)

1 1


In [25]:
dJ_dc = da_dc * dJ_da
dJ_db = da_db * dJ_da
print(f"dJ_dc = {dJ_dc}, dJ_db = {dJ_db}")

dJ_dc = d, dJ_db = d


In the example, $d = 3$.

The last node is $c = w * x$. Here, we are interested in how $J$ changes with respect to the parameter $w$. Let's compute $\frac{\partial c}{\partial w}$.

In [27]:
sc = sw * sx
sc

w*x

In [30]:
dc_dw = diff(sc, sw)
dc_dw

x