# **Codes to lecture 6.**


This notebook includes codes on forward propagation and back propagation calculations for neural network with 2 layers.

The first layer consists of 2 neurons and the second layer has also 2 neurons.
We use sigmoid as activation function.

It is based on example given in prof. B. Dyda notes:
https://prac.im.pwr.edu.pl/~bdyda/sem2022zimowy/ml/back-propagation4.pdf



>[Codes to lecture 6.](#scrollTo=G0wY-sjAyIfA)

>[6.1. Forward propagation and back propagation example for simple neural network.](#scrollTo=Xorai9gk6smc)

>[6.1.1 Forward propagation.](#scrollTo=PUzVG0d892Yz)

>[6.1.2 Back propagation](#scrollTo=3JRWouppCpQt)

>[6.1.3 Weights update.](#scrollTo=Q5HN5T6vPwZq)



We import Numpy package.

In [1]:
# import numpy
import numpy as np

# 6.1. Forward propagation and back propagation example for simple neural network.

Let us suppose for input
$$x = \begin{bmatrix}
1\\
0
\end{bmatrix}
$$
we expect output
$$y = \begin{bmatrix}
0\\
1
\end{bmatrix}$$

We start with the following weights

$W^{(1)} = \begin{bmatrix}
-0.1 & -0.2 & 0.1\\
-0.4 & 0.7 & -0.6
\end{bmatrix}$

and

$W^{(2)} = \begin{bmatrix}
-0.15 & -0.25 & 0.15\\
-0.45 & 0.75 & -0.65
\end{bmatrix}$

In [2]:
x = np.array([1,0]).reshape(2,1)
x

array([[1],
       [0]])

In [3]:
y = np.array([0,1]).reshape(2,1)
y

array([[0],
       [1]])

In [4]:
W1 = np.array([[-0.1, -0.2, 0.1],
               [-0.4, 0.7, -0.6]])
W1

array([[-0.1, -0.2,  0.1],
       [-0.4,  0.7, -0.6]])

In [5]:
W2 = np.array([[-0.15, -0.25, 0.15],
               [-0.45, 0.75, -0.65]])
W2

array([[-0.15, -0.25,  0.15],
       [-0.45,  0.75, -0.65]])

Since the first coordinate encodes bias (and thus is always equal to $1$) we denote the input by $X^{[1]}$ as follows

$$
X^{(1)} = \begin{bmatrix}
1\\
1\\
0
\end{bmatrix}
$$

In [6]:
X1 = np.vstack((1,x))
X1

array([[1],
       [1],
       [0]])

# 6.1.1 Forward propagation.

The first layer gives

$$
net^{[1]} = W^{[1]} \cdot X^{[1]}
=
\begin{bmatrix}
-0.3\\
0.3
\end{bmatrix}
$$

In [7]:
net1 = np.dot(W1, X1)
net1

array([[-0.3],
       [ 0.3]])

Applying element-wise the activation function (here: sigmoid $\varphi$)
$$\varphi(x)=\frac{e^x}{1+e^x}$$
on
$$net1 = \begin{bmatrix}
-0.3\\
0.3
\end{bmatrix}$$
we obtain the output $a^{[1]}$ of the first layer:

$$a^{[1]}
= \varphi(net1)
= \begin{bmatrix}
\varphi(-0.3)\\
\varphi(0.3)
\end{bmatrix} =
\begin{bmatrix}
0.42555748\\
0.57444252
\end{bmatrix}
$$


In [8]:
def sigmoid(x):
  return np.exp(x) / (1+np.exp(x))

a1 = sigmoid(net1)
a1

array([[0.42555748],
       [0.57444252]])

We add $1$ as the first coordinate and obtain the input $X^{[2]}$ of the second layer

$$X^{[2]} =
\begin{bmatrix}
1\\
0.42555748\\
0.57444252
\end{bmatrix}
$$

In [9]:
X2 = np.vstack((1,a1))
X2

array([[1.        ],
       [0.42555748],
       [0.57444252]])

The second layer (with weights $W^{[2]}$) gives us
$$
net^{[2]}
= W^{[2]} \cdot X^{[2]}
=
\begin{bmatrix}
-0.17022299\\
-0.50421952
\end{bmatrix}
$$

In [10]:
net2 = np.dot(W2,X2)
net2

array([[-0.17022299],
       [-0.50421952]])

One again, we apply the activation function (here: sigmoid $\varphi$) to $net^{[2]}$ to get the output $a^{[2]}$ of the whole neural network

$$a^{[2]}
= \varphi(net^{[2]})
= \begin{bmatrix}
\varphi(-0.17022299) \\
\varphi(-0.50421952)
\end{bmatrix}
=
\begin{bmatrix}
0.45754671\\
0.37654958
\end{bmatrix}
$$

In [11]:
a2 = sigmoid(net2)
a2

array([[0.45754671],
       [0.37654958]])

# 6.1.2 Back propagation

Our target value is
$$
y =
\begin{bmatrix}
0\\
1
\end{bmatrix}$$

Using the loss function

$$L(y, a^{[2]}) = \frac12 ||y-a^{[2]}||_2^2$$

we can write

$$\frac{\partial L}{\partial a^{[2]}} = a^{[2]} - y
\begin{bmatrix}
0.45754671\\
0.37654958
\end{bmatrix}
-
\begin{bmatrix}
0\\
1
\end{bmatrix}
=
\begin{bmatrix}
0.45754671\\
-0.62345042
\end{bmatrix}
$$


In [12]:
dLda2 = a2 - y
dLda2

array([[ 0.45754671],
       [-0.62345042]])

We calculate the delta signal $\delta^{[2]}$ as follows
$$\delta^{[2]} =
\begin{bmatrix}
0.45754671 \cdot \varphi'(-0.17022299)\\
-0.62345042 \cdot \varphi'(-0.50421952)
\end{bmatrix}
=
\begin{bmatrix}
0.11356205\\
-0.14636122
\end{bmatrix}
$$

Notice that we mulitplied $\frac{\partial L}{\partial a^{[2]}}$ by derivative of $\varphi$ at the points coresponding to $net2$.

Moreover, since $\varphi$ is a sigmoid function, we can use the following equality (check!)
$$\varphi'(x)=\varphi(x) \cdot (1-\varphi(x)).$$

In [13]:
def sigmoid_derivative(x):
  return sigmoid(x) * (1-sigmoid(x))

delta2 = dLda2 * sigmoid_derivative(net2)
delta2

array([[ 0.11356205],
       [-0.14636122]])

Thus, the derivative

$$\frac{\partial L}{\partial W^{[2]}}$$

of the loss function $L$ with respect to the weghts $W^{[2]}$ from the second layer $W^{[2]}$ is equal to

$$\frac{\partial L}{\partial W^{[2]}}
= \delta^{[2]} \cdot (X^{[2]})^T
=
\begin{bmatrix}
0.11356205 &  0.04832718 &  0.06523487\\
-0.14636122 & -0.06228511 & -0.08407611
\end{bmatrix}$$


In [14]:
dLdW2 = delta2 * X2.T
dLdW2

array([[ 0.11356205,  0.04832718,  0.06523487],
       [-0.14636122, -0.06228511, -0.08407611]])

We have to move back to the first layer. Thus, we need to find the derivative

$$\frac{\partial L}{\partial X^{[2]}} $$

of the loss function $L$ with respect to $X^{[2]}$.

We can write

$$\frac{\partial L}{\partial X^{[2]}}  
= (W^{[2]})^T \cdot \delta^{[2]}
=\begin{bmatrix}
0.04882824\\
-0.13816143\\
0.1121691
\end{bmatrix}$$


In [15]:
dLdX2 = np.dot(W2.T, delta2)
dLdX2

array([[ 0.04882824],
       [-0.13816143],
       [ 0.1121691 ]])

We omit the first coordinate (because it is not important i.e. it corresponds to the the constant input $1$ and in fact, it encodes only the bias) and obtain the derivative

$$\frac{\partial L}{\partial a^{[1]}}
=
\begin{bmatrix}
-0.13816143\\
0.1121691
\end{bmatrix}$$

of the loss function $L$ with respect to the output $a^{[1]} $of the first layer (since $X^{[2]}$ is just $a^{[1]}$ with $1$ added at the first coordinate)

In [16]:
dLda1 = dLdX2[1:]
dLda1

array([[-0.13816143],
       [ 0.1121691 ]])

Now we are in position to continue in the same way as previously i.e we get the delta signal $\delta^{[1]}$ as follows

$$\delta^{[1]}
=
\begin{bmatrix}
0.07511425 \cdot \varphi'(-0.1)\\
-0.09478965 \cdot \varphi'(0.1)
\end{bmatrix}
=
\begin{bmatrix}
-0.03377471\\
0.02742067
\end{bmatrix}
$$

In [17]:
delta1 = dLda1 * sigmoid_derivative(net1)
delta1

array([[-0.03377471],
       [ 0.02742067]])

Thus, the derivative

$$\frac{\partial L}{\partial W^{[1]}}$$

of the loss function $L$ with respect to the weghts $W^{[1]}$ from the first layer $W^{[1]}$ is equal to

$$\frac{\partial L}{\partial W^{[1]}}
= \delta^{[1]} \cdot (X^{[1]})^T
=
\begin{bmatrix}
-0.03377471 & -0.03377471 & 0\\
0.02742067 &  0.02742067 &  0
\end{bmatrix}$$

In [18]:
dLdW1 = delta1 * X1.T
dLdW1

array([[-0.03377471, -0.03377471, -0.        ],
       [ 0.02742067,  0.02742067,  0.        ]])

# 6.1.3 Weights update.

Finally, we can adjust the weights as follows (we set e.g. learning rate $c=0.1$)

$$W^{[2]}_{new} = W^{[2]} - c \frac{\partial L}{\partial W^{[2]}}
=\begin{bmatrix}
-0.16135621 & -0.25483272 &  0.14347651\\
-0.43536388 &  0.75622851 & -0.64159239
\end{bmatrix}$$

In [19]:
c = 0.1

W2new = W2 - c * dLdW2
W2new

array([[-0.16135621, -0.25483272,  0.14347651],
       [-0.43536388,  0.75622851, -0.64159239]])

Similarly, we adjust the weights in the first layer

$$W^{[1]}_{new} = W^{[1]} - c \frac{\partial L}{\partial W^{[1]}}
=\begin{bmatrix}
-0.09662253 & -0.19662253 &  0.1\\
-0.40274207 &  0.69725793 & -0.6
\end{bmatrix}$$

In [22]:
W1new = W1 - c * dLdW1
W1new

array([[-0.09662253, -0.19662253,  0.1       ],
       [-0.40274207,  0.69725793, -0.6       ]])

That is all -  we obtained the neural network with new weights!