# Intuition Behind Artificial Neural Networks

2022 DS Elective 4 <br>
University of Science and Technology of the Philippines <br>
Instructor: Jhun Brian M. Andam <br>

## Example:

<center><img src="nn.png" width="400"></center>

Assume that the neurons have a `sigmoid` activation function, perform a forward pass on the network. Assume that the actual output of $y$ is 1 and `learning rate` $\alpha$ is 0.9. Perform another forward pass.

$$x = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}$$

$$y = \begin{bmatrix} 1 \end{bmatrix}$$

$$\text{hidden unit weights} = 
\begin{bmatrix}
w_{11} = 0.2 && w_{12} = -0.3 \\
w_{13} = 0.4 && w_{14} = 0.1 \\
w_{15} = -0.5 && w_{16} = 0.2
\end{bmatrix}$$

$$\text{output unit weights} = 
\begin{bmatrix}
w_{21} = -0.3 \\
w_{22} = -0.2
\end{bmatrix}$$

$$\theta = 
\begin{bmatrix}
\theta_{1} = -0.4 \\
\theta_{2} = 0.2 \\
\theta_{3} = 0.1
\end{bmatrix}$$

## Forward Propagation

To calculate $H_1$, we need to calculate first the weighted sum of the input values added by the bias $\theta$.

## $$Z = \sum_j{(w_{i,j} \cdot x_i)} + \theta_i$$

### $Z_1 = w_{11} \cdot x_1 + w_{13} \cdot x_2 + w_{15} \cdot x_3 + \theta_1$

### $Z_1 = (0.2 \cdot 1) + (0.4 \cdot 0) + (-0.5 \cdot 1) + (-0.4)$

### $Z_1 = -0.7$

Now we introduce `non-linearity` by applying the sigmoid function $\sigma$.

## $$\sigma = \frac{1}{1+e^{-Z_i}}$$



### $H_1 = \frac{1}{1+e^{-(-0.7)}}$

### $H_1 = 0.332$

Similarly, we have to calculate the value for $H_2$ using the same procedures.

### $Z_2 = w_{12} \cdot x_1 + w_{14} \cdot x_2 + w_{16} \cdot x_3 + \theta_2$

### $Z_2 = (-0.3 \cdot 1) + (0.1 \cdot 0) + (0.2 \cdot 1) + (0.2)$

### $Z_2 = 0.1$

Apply the activation function $\sigma$

### $H_2 = \frac{1}{1+e^{-0.1}}$

### $H_2 = 0.525$

The same procedure is followed to calculate the value for the $\hat{y}$


### $Z_3 = w_{21} \cdot H_1 + w_{22} \cdot H_2 + \theta_3$

### $Z_3 = (-0.3 \cdot 0.332) + (0.1 \cdot 0.525) + (0.1)$

### $Z_3 = -0.105$

Apply the activation function $\sigma$

### $\hat{y} = \frac{1}{1+e^{-(-0.105)}}$

### $\hat{y} = 0.474$

Now that we have calculated the predicted output $\hat{y}$, we can now determine how much error did our model have using the initial weights that we used. To do that, we have to calculate the loss by using a loss function. Assume that our error function is just the difference between the target output and the predicted output.

## $$Error = y - \hat{y}$$

$Error = 1-0.474$

$Error = 0.526$

The ideal error value is 0 or even close to zero, in order to minimize this error, we do an optimization called gradient descent. This optimization function minimizes the error by backpropagating and updating each parameters $(w, \theta)$.

## Backward Propagation

Each weight is updated using:

## $$\Delta{w_{ji}} = \alpha \delta_j O_i$$

$$\delta_j = \begin{cases}
O_j(1-O_j)(t_j - O_j) && \text{if $j$ is an output unit} \\
O_j(1-O_j)\sum_k{\delta_k w_{kj}} && \text{if $j$ is a hidden unit}
\end{cases}$$

<br></br>

Where: 

* $O$ is the output of the neuron after the activation function is applied.
* $\alpha$ is the learning rate $(0.9)$
* $\delta_j$ is the error measure for unit $j$
* $t_j$ is the correct output for unit $j$

Calculating the error term for the output $(\hat{y})$ , since $\hat{y}$ is calculated using the output unit weights, we can calculate the $\delta$ using the equation..

### $\delta_3 = \hat{y}(1-\hat{y}) (y - \hat{y})$

### $\delta_3 = 0.474 \cdot (1 - 0.474) \cdot (1 - 0.474)$

### $\delta_3 = 0.1311$

Calculating the error term for the $\delta_2$ is different since it belongs to the hiddenlayer unit.

### $\delta_2 = H_2 (1-H_2) w_{22} \cdot \delta_3$

### $\delta_2 = 0.525 (1-0.525) \cdot (-0.2 \cdot 0.1311) $

### $\delta_2 = -0.0065$

The same procedure goes with $\delta_1$

### $\delta_1 = H_1 (1-H_1) w_{21} \cdot \delta_3$

### $\delta_1 = 0.332 (1-0.332) \cdot (-0.3 \cdot 0.1311) $

### $\delta_1 = -0.0087$

Now that we have our $\delta_j$ values, we can now calculate the rate of change of our parameters.

$$\delta = 
\begin{bmatrix} 
\delta_1 = -0.0087 \\ 
\delta_2 = -0.0065 \\
\delta_3 = 0.1311 
\end{bmatrix}$$

Let's calculate the rate of change of $w_{21}.$

### $\Delta w_{21} = \alpha \delta_3 H_1$

### $\Delta w_{21} = 0.9 \cdot 0.1311 \cdot 0.332$

### $\Delta w_{21} = 0.0392$

Finally, we can update the $w_{21}$ by adding the rate of change of the weight $\Delta w_{21}$ and the inital value of the weight $w_{21}$.

### $w_{21 \ (new)} = \Delta w_{21} + w_{21}$

### $w_{21 \ new} = 0.0392 + (-0.3)$

### $w_{21 \ new} = -0.261$

Calculating the rate of change for a hidden unit weight looks something like this. We take into account the value of the input $x$.

### $\Delta w_{11} = \alpha \delta_1 x_1$

### $\Delta w_{11} = 0.9 (-0.0087) \cdot 1$

### $\Delta w_{11} = -0.0078$

### $w_{11 \ (new)} = \Delta w_{11} + w_{11}$

### $w_{11 \ new} = -0.0078 + 0.2$

### $w_{11 \ new} = 0.192$

Similarly, we can calculate the rate of change for the bias $(\theta)$ to update the parameter.

### $\Delta \theta_3 = \alpha \delta_3$

### $\Delta \theta_3 = 0.9 \cdot 0.1311$

### $\Delta \theta_3 = 0.11799$

### $\theta_{3 \ new} = \Delta \theta_3 + \theta_3$

### $\theta_{3 \ new} = 0.218$