# Artificial Neural Network

## 1. Introduction and Overview

### 1.1 Introduction

Neural Networks (NN) are like the fundamentals of Deep Learning. With enough data and computational power, they can be used to solve almost any problems. While it is totally fine to treat these NN like black boxes, it's much more exciting and educational to learn what lies behind this term and how they work. 

As such, this project aims to create an Artificial Neural Network (ANN) from scratch using Python and NumPy. This documentation explains the flow of the ANN and how it works from behind the hood. 

### 1.2 Overview

![Neural Network Diagram](../static/Neural%20Network.png)

Above is a very simplistic view of the diagram. It has some coloured circles connected to each other, often referred to as neurons. Basically, how it works it that a Neural Network consists of an Input Layer, multiple Hidden Layers and a final Output Layer. The input is taken in from the Input Layer, passed through multiple Hidden Layers before finally reaching the Output Layer. 

For this particular ANN, the neurons will train from a set of data, and determine its weights and biases will be determined through backward propagation. 

## 2. Neural Network

### 2.1 Initialization of Parameters

Referring to the code below, the parameters `layer_dims` is basically the dimensions of the input, hidden and output layers. For instance, for a 4 layer neural Network with input layer having 784 neurons, the two hiden layers having 128 and 64 neurons respectively and output layer having 10 neurons, the `layer_dims` will be equal to `[784, 128, 64, 10]`. In a sense it is taken in as `[Input Layer Dimension, Hidden Layer Dimension, ..., Output Layer Dimension]`.

The weights of each neuron are randomly generated via a normal distribution and then multipled by 0.01. 
The weights are in a $m \times n$ matrix where m is the current layer dimension and n is the previous layer dimension. In a sense, the weight matrix of the lth layer is as follows:

$$
weight_{l} = \begin {bmatrix} w_{1,1} & w_{1,2} & ... & w_{1,n} \\ 
                              w_{2,1} & w_{2,2} & ... & w_{1,n} \\
                              . \\
                              . \\
                              . \\
                              w_{m,1} & w_{m,2} & ... & w_{m,n}
                                      
             \end{bmatrix}
$$

, where m is the current layer dimension and n is the previous layer dimension.

The biases of each neuron are initalized to zero. In a sense the initial bias vector will be basically an $n \times 1$ column matrix, as such:

$$
bias_{l} = \begin {bmatrix} b_{1} \\ b_{2} \\ . \\ . \\ . \\ b_{m} \end {bmatrix}
$$

, where m is the current layer dimension.

In [None]:
import numpy as np

def initialize_parameters(layer_dims: list[int]):
    np.random.seed(1)

    parameters = {}

    for l in range(1, len(layer_dims)):
        # Generation of Weights
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) *0.01
        # Generation of Biases
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))

    return parameters

### 2.2 Forward Propagation

Forward propagation is basically the process of taking in inputs and making a prediction. 

For the foward propagation of this particular neural network, and with the assumption that the number of training entries is 1, the Linear Hypothesis is defined as follows: 

$$
Z_{(m, 1)} = W_{(m, n)} \cdot A_{(n, 1)} + b_{(m, 1)}
$$

, where $Z_{(m, 1)}$ is the $m \times 1$ linear output, $W_{(m, n)}$ is the $m \times n$ weight matrix, $A_{(n, 1)}$ is the $n \times 1$ activated output from the previous layer and $b_{(m, 1)}$ is the $m \times 1$ bias vector.

The activation function for this particular neural network is the Rectified Linear Unit (ReLU), although more commonly practices uses the sigmoid function. The ReLU is defined as follows: 

$$
ReLU = \begin {cases} 0 & \text{if } x < 0 \\ x & \text{if } x \geq 0 \end {cases}
$$

The Linear and Activation Cache are basically the input to the Linear Hypothesis and Activation Function respectively. They are stored and returned for the [backward propagation](#23-backward-propagation) which will be explained in the next section.

In [None]:
def forward_propagation(X: np.ndarray, params: dict):
    A = X
    caches = []

    for l in range(1, (len(params) // 2) + 1):
        A_prev = A
        W = params['W'+str(l)]
        b = params['b'+str(l)]

        # Linear Hypothesis
        Z = np.dot(W, A_prev) + b

        # Linear Cache
        linear_cache = (A_prev, W, b)

        # Activation Function
        A = ReLU(Z)                     # type: ignore 

        # Activation Cache
        activation_cache = Z

        cache = (linear_cache, activation_cache)
        caches.append(cache)
    
    return A, caches

### 2.3 Backward Propagation

Backward propagation is slightly more complex than the forward one. In essence, the backward propagation is basically a learning mechanism in the neural network to tune its weights and biases for each neuron. During training, the network evaluates the cost by comparing the prediction to the labels. Then, gradient are determined via backward propagation to make subtle changes to the weights and biases of each neuron. 