# Artificial Neural Networks

In [2]:
import numpy as np
seed = 42
rng = np.random.default_rng(seed=seed)

## Network Class

We shall start writing the `Network` class. The two methods that are indispensable for any ML class are :
- `fit`
- `predict`

Fitting a neural network model requires us to compute two passes on the data :
- `forward`
- `backward`

We need to start at some place by initializing the network and various hyperparameters and this requires an `init` method :
- `init`

In most of these methods, we would have to take the help of certain helper functions :
- `activations`
- `losses`

This is the process. But we will work through it in the reverse order so that each step of the process does not have any forward references :
`helpers -> init -> forward -> backward -> fit -> predict`

The skeleton of the class is given in the code block that follows. For ease of exposition, we are going to discuss the methods on at a time and then plugh them into the class right at the end.

In [3]:
class Network :
    
    def __init__(self, layers, activation_choice="relu", output_choice="softmax", loss_choice="cce") :
        pass
    
    def forward(self, X) :
        pass
    
    def backward(self, Y, Y_hat) :
        pass
    
    def fit(self, X, Y, lr=0.01, epochs=100, batch_size=100) :
        pass
    
    def predict(self, X) :
        pass

## Activation Functions

### Hidden Layer
We will look at 2 functions for the hidden layers. Both of these functions will be **applied element-wise**. The input to these functions can be scalars, vectors or matrices

- Sigmoid :
$$
    g(z) = \frac {1} {1 + e^{-z}}
$$

Its derivative :

$$
    g'(z) = g(z)(1 - g(z))
$$

- ReLU ( Rectified Linear Unit ) :
$$
    g(x)=\begin{cases}
    z, & z \ge 0 \\ 
    0, & z<0
    \end{cases}
$$

Its derivative :

$$
    g'(x)=\begin{cases}
    1, & z \ge 0 \\ 
    0, & z<0
    \end{cases}
$$


In [4]:
def sigmoid(z) :
    return 1 / (1 + np.exp(-z))

def grad_sigmoid(z) :
    return sigmoid(z) * (1 - sigmoid(z))

def relu(z) :
    return np.where(z >= 0, z, 0)

def grad_relu(z) :
    return np.where(z >= 0, 1, 0)


# A dictionary of activation functions will be used while initializing the network
hidden_act = {"sigmoid" : sigmoid, "relu": relu}
grad_hiddent_act = {"sigmoid": grad_sigmoid, "relu": grad_relu}

## Output Layer
We will look at 2 activation functions for the output layer :
    
- Identity ( For regression )
$$
    g(z) = z
$$

- Softmax ( For classification ) :
The input to the softmax function will always be a matrix of size $n$ x $k$. Since we need a probability distribution for each data point, **the softmax will be computed row-wise**

$$
    g(\textbf Z) = 
    \begin{pmatrix}
    ... & ... & ... \\
    ... & \frac {e^{Z_{ij}}} {\sum \limits_{j=1}^{k} e^{Z_{ij}}} & ... \\
    ... & ... & ... \\
    \end{pmatrix}
$$

**To avoid overflow, we will subtract the row-wise maximum from each row while computing the softmax**

In [None]:
def identity(z) :
    return z

def softmax(z) :
    """
    Row-wise softmax
    """
    # Check if z is a matrix
    assert z.ndim == 2
    
    # To prevent overflow, subtract row-wise maximum
    z -= z.max(acis=1, keepdims=True)
    
    # Compute row-wise softmax
    prob = np.exp(z) / np.exp(z).sum(axis=1, keep)