# Neural Network from the Ground-Up

Init by importing `numpy` (this is mainly used) and `matplotlib` (only used to show the convergence). 
Note we make use of some `@staticmethods` which have been grouped by there activation function (e.g. Sigmoid, TANH), this way it allows for modular implementation of the activation function.

In [None]:
import numpy as np  # required for most computations
import json         # Used to return the model params
import matplotlib.pyplot as plt # Used to display the training loss / itterations
import random       # Used to randomly move through the training set

class SigmoidActivation:
    @staticmethod
    def forward(input) -> np.array:
        ex = np.exp(input)
        return ex / (ex + 1)
    
    @staticmethod
    def backward(input) -> np.array:
        sigmoid = SigmoidActivation.forward(input)
        return sigmoid * (1 - sigmoid)
    
class TanhActivation:
    @staticmethod
    def forward(input) -> np.array:
        ex = np.exp(input)
        ex_min = np.exp(-1*input)
        return 1 / (1 + np.exp(input))
        # return (ex - ex_min) / (ex + ex_min)
    
    @staticmethod
    def backward(input) -> np.array:
        return None # not implemented yet
    
class ReLU:
    @staticmethod
    def forward(input) -> np.array:
        return np.maximum(0, input)
    
    @staticmethod
    def backward(input) -> np.array:
        return np.where(input <= 0, 0, 1)

def softmax(x: np.array) -> np.array:
    return np.exp(x)/sum(np.exp(x))

def calculateLoss(y_pred, y_true):
    loss = (y_pred - y_true) ** 2
    return np.mean(loss)

# This function is used to check the activation function selected for the NN (if exists and if yes return the right object)
def checkActivation(activation_func) -> object:
    if activation_func == "sigmoid":
            activation = SigmoidActivation
    elif activation_func == "tanh":
        activation = TanhActivation
    elif activation_func == "relu":
        activation = ReLU
    else:
        raise AttributeError("Chosen activation function does not exist.")
    return activation

Here we define the `layer` class which is the most simple form of the Neural Network. In this case it has been chosen to vectorise each layer. So each node is a position an `ndarray` where the value is the weight of the node.

Furthermore, the `getLayer` method has been implemented to return the weights of the layer (getter func) and the `compute` method does the operations to calculate the output of the layer given an input.

In [81]:
class layer:
    def __init__(self, numberOfOutput: int, numberOfNodes: int):
        self.weigths = np.random.rand(numberOfNodes)
        self.bias = np.random.rand(numberOfOutput, numberOfNodes)

    def getLayer(self):
        return(self.weigths)
    
    def compute(self, inputs: np.array, activation: object) -> np.array:
        print("ITTERATION!")
        print(f"inputs: {inputs}")
        print(f"weights: {self.weigths}")
        print(f"bias: {self.bias}")
        W = np.dot(inputs, self.weigths) + self.bias
        return  activation.forward(W)

Below the `FFN` class (Feed Forward Network) has been defined. This is a rather complex class so I will break it down method by method. In general the class represents a simple neural nerwork as an `object`. This object holds all layers (which in turn hold als weights and biases) of the network AND the methods defined in the class can be used to do operations on the network.

1. **init method** 

The init method is used to initialise a network. This is done by passing the dimentions of network where the position of the array corrosponds to a layer and the value corrosponds to the number of neurons in each layer (e.g. [3, 3, 1] -> 3 layers, where; layer 1 = 3 neurons; layer 2 = 3 neurons; layer 3 = 1 neuron). 
Additionally, activation function can be chosen (by default sigmoid) and an learning rate (alpha). 

2. **forward method**

The forward method does a forward pass through the network given a valid input. This can be seen as the `.predict` method in the tensorflow lib. 

3. **backward method**

The backward method does backprogagation of the network. 
The following algorithm has been implemented to tune the weights and biases. 

4. **train method**

The train method is a wrapper of the backward method with some additional features. It uses the `backward method` to train the model in a _for-loop_ so that the model can be easly trained multiple itterations and an implementation which takes a set number of samples of the loss and prediction which is than displayed in a plot.

In [None]:
class FFN:
    def __init__(self, dimensions: list[int], activation="sigmoid", alpha=1) -> None:
        if len(dimensions) < 2:
            raise AssertionError("Network must have at least two layers (input and output).")
        
        self.activation = checkActivation(activation_func=activation)
        self.learningRate = alpha

        self.layers = []
        for i in range(0, len(dimensions)):
            layer_dim = dimensions[i]
            print(f"i: {i}, len: {len(dimensions)}")
            if i != (len(dimensions) - 1): 
                output_dim = dimensions[i+1]
            else: 
                output_dim = layer_dim
            self.layers.append(layer(output_dim, layer_dim))

    def displayModel(self) -> None:
        json_obj = {
            "Number of Layers" : len(self.layers),
            "Layers" : []
        }

        i = 1
        for layer in self.layers:
            layer_data = {
                "Layer" : i,
                "Number of Nodes" : len(layer.weigths),
                "Number Biases" : len(layer.bias),
                "Weights" : layer.weigths.tolist(),
                "Bias" : layer.bias.tolist()
            }
            i += 1
            json_obj["Layers"].append(layer_data)

        print(json.dumps(json_obj, indent=4))

    def forward(self, input: list[float]) -> list[float]:
        activation = input
        for layer in self.layers:
            activation = layer.compute(activation, self.activation)
        return activation
        
    def backward(self, input: list[float], y_true: list[float]) -> None:
        store_impulse = [input]
        current_output = input

        for layer in self.layers:
            current_output = layer.compute(current_output, activation=self.activation)
            store_impulse.append(np.array(current_output)) 

        y_pred = store_impulse[-1]
        error = y_pred - y_true
        delta = error * self.activation.backward(y_pred)

        for i in reversed(range(len(self.layers))):
            layer = self.layers[i]
            activation_layer = store_impulse[i]

            activation_layer = np.array(activation_layer).reshape(1, -1)
            delta = delta.reshape(1, -1)


            layer.weigths -= self.learningRate * np.dot(activation_layer.T, delta)
            layer.bias -= self.learningRate * np.sum(delta, axis=0)

            if i > 0:
                delta = np.dot(delta, self.layers[i].weigths.T) * self.activation.backward(store_impulse[i])

    def train(self, X: list[list[float]], y: list[list[float]], max_itterations=1000, plot_loss=True) -> None:
        if len(X) != len(y):
            raise ValueError("The length of X and y must be the same.")
        losses = []

        for i in range(max_itterations):
            iteration_loss = 0 
            for _ in range(len(X)): 
                index = random.randint(0, len(X) - 1)
                xSelected = X[index]
                ySelected = y[index]

                self.backward(xSelected, ySelected)

                if plot_loss:
                    yPredicted = self.forward(xSelected)
                    example_loss = np.mean(yPredicted - ySelected)
                    iteration_loss += example_loss

            if plot_loss:
                avg_loss = iteration_loss / len(X)
                losses.append(avg_loss)

        if plot_loss:
            plt.plot(range(max_itterations), losses)
            plt.xlabel("Iterations")
            plt.ylabel("Average Loss")
            plt.title("Training Loss over Iterations")
            plt.grid()
            plt.show()


In [91]:
xTrain =[[1, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0], 
        [0, 0, 1, 0, 0, 0, 0, 0], 
        [0, 0, 0, 1, 0, 0, 0, 0], 
        [0, 0, 0, 0, 1, 0, 0, 0], 
        [0, 0, 0, 0, 0, 1, 0, 0], 
        [0, 0, 0, 0, 0, 0, 1, 0], 
        [0, 0, 0, 0, 0, 0, 0, 1]]

yTrain =[[1, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0], 
        [0, 0, 1, 0, 0, 0, 0, 0], 
        [0, 0, 0, 1, 0, 0, 0, 0], 
        [0, 0, 0, 0, 1, 0, 0, 0], 
        [0, 0, 0, 0, 0, 1, 0, 0], 
        [0, 0, 0, 0, 0, 0, 1, 0], 
        [0, 0, 0, 0, 0, 0, 0, 1]]


Network_ = FFN([8, 3, 8], alpha=0.01)
Network_.train(X=xTrain, y=yTrain, max_itterations=1000)


i: 0, len: 3
i: 1, len: 3
i: 2, len: 3
ITTERATION!
inputs: [0, 0, 0, 0, 0, 0, 1, 0]
weights: [0.82849596 0.88488761 0.43816297 0.33149125 0.38319134 0.67650209
 0.34639775 0.8063838 ]
bias: [[0.47263886 0.97369162 0.97574955 0.04138936 0.07060159 0.63200443
  0.36404243 0.68539744]
 [0.76212882 0.91241896 0.39532338 0.03322995 0.78945612 0.61962017
  0.31963175 0.09895953]
 [0.2381396  0.1362166  0.68955988 0.28081035 0.03191991 0.42903582
  0.07331574 0.94669599]]
ITTERATION!
inputs: [[0.6940318  0.78919657 0.78953874 0.59574988 0.60276499 0.72679106
  0.67049842 0.73726378]
 [0.75185432 0.77882234 0.67737211 0.5937833  0.75691759 0.72432507
  0.66061353 0.60953482]
 [0.64211078 0.61836502 0.73806927 0.65185614 0.59346728 0.68469511
  0.60341469 0.78467038]]
weights: [0.23874636 0.33980828 0.99640392]
bias: [[0.75852929 0.32526029 0.95456396]
 [0.77126167 0.98937168 0.70527436]
 [0.72203406 0.65437884 0.88263147]
 [0.9998074  0.40444162 0.5080641 ]
 [0.10307328 0.54908688 0.6057169 ]


ValueError: shapes (3,8) and (3,) not aligned: 8 (dim 1) != 3 (dim 0)

In [59]:
xTest = [0, 0, 0, 1, 0, 0, 0, 0]
yPred = Network_.forward(xTest)

print(yPred)
print(softmax(yPred)) # This should be equal to xTest
print(f"avg loss: {calculateLoss(yPred, xTest)}")

[0.11143225 0.13147238 0.13277603 0.1342457  0.12013994 0.11962479
 0.12160775 0.12671164]
[0.12334278 0.12583952 0.12600368 0.126189   0.1244215  0.12435742
 0.12460426 0.12524185]
avg loss: 0.10705626228747613


In [28]:
print(softmax([0, 0, 0, 1, 0, 0, 0, 0]))

[0.10289885 0.10289885 0.10289885 0.27970807 0.10289885 0.10289885
 0.10289885 0.10289885]


In [22]:
print(type(yPred))

<class 'numpy.ndarray'>


Get the model params

In [95]:
Network_.displayModel()

{
    "Number of Layers": 3,
    "Layers": [
        {
            "Layer": 1,
            "Number of Nodes": 8,
            "Number of Inputs": 3,
            "Weights": [
                0.8284959564695161,
                0.8848876102567131,
                0.43816296842151303,
                0.33149125119891176,
                0.3831913357217924,
                0.6765020893647472,
                0.34639774775088794,
                0.8063837990472243
            ],
            "Bias": [
                [
                    0.472638863176842,
                    0.9736916200828568,
                    0.9757495533260457,
                    0.04138935669830235,
                    0.07060158593021426,
                    0.6320044286865178,
                    0.364042428319279,
                    0.6853974394631653
                ],
                [
                    0.7621288199777693,
                    0.9124189598463068,
                    0.3953233825107837,
      