The objective of this example is to show how we have implemented the backpropagation algorithm to train a neural
network. The example is based on [this](https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c)
article.

In [1]:
import numpy as np

from dlfs.activation_functions import Sigmoid, ReLU, Softmax
from dlfs.losses import CategoricalCrossentropy

### The neural network

In order to make things simple to compute manually, we have used a small neural network with two input and two
output neurons. The network is trained by a single input and a single output (with a batch size of 3).

![](images/neural_network.png)

In [2]:
# the input showed in the web site corresponds to the first sample in the batch. The rest of the samples have
# been generated "randomly".

inp = np.array([[0.1, 0.2, 0.7],
                [0.2, 0.3, 0.5],
                [0.3, 0.4, 0.6]])

output = np.array([[1., 0., 0.],
                   [0., 1., 0.],
                   [0., 0., 1.]])

In [3]:
weights_ij = np.array([[0.1, 0.2, 0.3],
                       [0.3, 0.2, 0.7],
                       [0.4, 0.3, 0.9]])

bias_j = np.array([1., 1., 1.])

weights_jk = np.array([[0.2, 0.3, 0.5],
                       [0.3, 0.5, 0.7],
                       [0.6, 0.4, 0.8]])

bias_k = np.array([1., 1., 1.])

weights_kl = np.array([[0.1, 0.4, 0.8],
                       [0.3, 0.7, 0.2],
                       [0.5, 0.2, 0.9]])

bias_l = np.array([1., 1., 1.])

### The forward pass

#### Layer 1

In [4]:
h1_in = inp @ weights_ij + bias_j
h1_in

array([[1.35, 1.27, 1.8 ],
       [1.31, 1.25, 1.72],
       [1.39, 1.32, 1.91]])

Each row of the matrix `h1_in` corresponds to a sample in the batch. The matrix `h1_in` is the input to the first
layer. Now we have to apply the activation function to each element of the matrix.

In [5]:
relu = ReLU()
h1_out = relu(h1_in)
h1_out

array([[1.35, 1.27, 1.8 ],
       [1.31, 1.25, 1.72],
       [1.39, 1.32, 1.91]])

#### Layer 2

In [6]:
h2_in = h1_out @ weights_jk + bias_k
h2_in

array([[2.731, 2.76 , 4.004],
       [2.669, 2.706, 3.906],
       [2.82 , 2.841, 4.147]])

In [7]:
sigmoid = Sigmoid()
h2_out = sigmoid(h2_in)
h2_out

array([[0.93883129, 0.94047563, 0.9820843 ],
       [0.93517243, 0.93737976, 0.98027604],
       [0.94374707, 0.94485159, 0.98443434]])

#### Layer 3 (output layer)

In [8]:
o_in = h2_out @ weights_kl + bias_l
o_in

array([[1.86706797, 2.23028232, 2.82303603],
       [1.86486919, 2.22629002, 2.81786233],
       [1.87004735, 2.23578181, 2.82995888]])

In [9]:
softmax = Softmax()
o_out = softmax(o_in)
o_out

array([[0.19844689, 0.28535553, 0.51619758],
       [0.19885349, 0.28542781, 0.5157187 ],
       [0.19790076, 0.28528827, 0.51681098]])

### Computing the loss (cross-entropy loss)

The Actual Output should be `[1.0, 0.0, 0.0]` but we got `[0.19857651 0.28559493 0.51582856]` (note that we are only
focusing on the first sample).

In [10]:
cross_entropy = CategoricalCrossentropy()
loss = cross_entropy(output[0], o_out[0])
loss

1.6172337500003393

### BackPropagating the error

#### (Hidden Layer2 — Output Layer) Weights

In [11]:
d_loss_d_o_out = cross_entropy.gradient(output, o_out)
d_loss_d_o_out


array([[-5.03913152,  1.39929719,  2.06695946],
       [ 1.24821114, -3.50351285,  2.06491558],
       [ 1.24672852,  1.3991655 , -1.93494342]])

In [13]:
d_o_out_d_o_in = softmax.gradient(d_loss_d_o_out)
d_o_out_d_o_in

array([[0.00054148, 0.22402643, 0.22420076],
       [0.21222988, 0.00263291, 0.21324901],
       [0.24781641, 0.24922099, 0.01846397]])