# Building a Basic Neural Network from scratch
Nowadays, building a neural network in python is very easy. TensorFlow and Keras allow us to build in a few lines of code a net structure and using some of the included datasets we can train it without further complication.
However, if we were to modify the behaviour of the neural network's training procedure, for example, optimizing the weights with a genetic algorithm, we can't use those tools since we are not able to modify the training sequence easily.  
This notebook aims to explain how to build a neural network from scratch using backpropagation as weight optimization mechanism.  

## Use case scenario
A neural network can be represented from different points of view. In this notebook we want to represent it as a set of arrays that ensemble the weights and the biases of the network. Additionally we will add to this arrays some calculations necessary to run the backpropagation, but they don't scale over time. Depending on the complexity of the structure proposed, these variables will scale in number and size, so it must be taken in consideration prior to build a deep neural network.  
The neural network that we are going to build in this notebook is a fully connected one, that means, that every neuron from one layer is connected to all the neurons of the following layer. 
The rest of the notebook contains, a specification of the neural network, explanation of the algorithm and a proof that the algorithm actually work as intended.

## Basic neural network class
In this section we will attend to the code that make the training with backpropagation possible.

In [1]:
#Configure Jupyter to use correctly the path related to the code
import sys
import os
sys.path.insert(0, os.path.abspath('..'))

from neuroevolution.activation_functions import sigmoid, sigmoid_der, relu, relu_der
from neuroevolution.error_functions import MSE, crossentropy_loss
import numpy as np
import typing

class BasicNeuralNetwork:
  """Class that implements the basic behaviour of a neural network.
  It represents the fully connected networks, so take into account that all the
  links in between nodes will be found here.
  It ensembles all the basics and it is generalized to match every architecure.
  However it is important to know that since it will need to store derivatives
  to calculate the backpropagation step, it may not be super eficient in memory
  """

  def __init__(self, layers:list, num_of_classes:int, input_size:int, lr = 0.05,
               activation_functs = None):
    """Contructor of the basic Neural Network Object

    Keyword arguments:
      layers -- List of numbers that ensemble the number of neurons od each layer
      lr -- Stands for learning rate and it ts the size of the movement once you
      start training it.
    """ 
    self.layers = layers
    self.learning_rate = lr
    self.params = {}
    self.loss = []
    if layers[-1] != num_of_classes:
      raise AttributeError("The number of classes should match the last layer")
    if layers[0] != input_size:
      raise AttributeError("The input size should match the 1st layer")
    if len(self.layers) == 0:
      raise AttributeError("Can't create a neural net without a single layer ")
    self.activation_functs = activation_functs
    if activation_functs is None:
      self.activation_functs = [sigmoid for i in range(len(self.layers))]
    self.initialize_weithts_and_biases()
  
  def initialize_weithts_and_biases(self):
    """Function that initialize weights and biases randomly. The random values
    are not uniformly distributed which can cause a slower convergence.
    """
    np.random.seed = 42
    for i,e in enumerate(self.layers):
      if i < len(self.layers)-1:
        self.params['W{}'.format(i+1)] = np.random.randn(self.layers[i],
                                                       self.layers[i+1])
        self.params['b{}'.format(i+1)] = np.random.randn(self.layers[i+1])
;

''

In these lines what we have showed is how the structure is defined in terms of data structures. We can find layers, learning rate, params, activation functions and loss.

* Layers: Layers is an integer list that contains the number of neurons in each layer.
* Learning Rate: Learning rate os the size of the step in this configuration, it is the number that will tell how quickly the neural network stop searching through the solutions space
* Params: A dictionary that stores the weights and biases vectors, that are numpy arrays, and the calculated variables for the backpropagation computing.
* Loss: A list that stores every epoch loss. This vector will be super useful to see how the training process reduces the value of the objective function.
* Activation Functions: A list containing the activation function associated to each layer.

Additionally the code run some checks to be sure that the number of neurons in the network is correct.
In addition, the weight initialization is random, and we use the traditional notation to label the weights and biases arrays. 

In [2]:
def train(self, inputs: np.ndarray, targets: np.ndarray, epochs: int):
    """Training method of the neural network.
    It optimized the weights for a given input
    Keyword Arguments:
      inputs -- the features to be predicted by the neural network
      targets -- the targets that match the features for this predictions.
      epochs -- number of iterations of optimization that the neural network
      will perform
    """  
    for i in range(epochs):
      y_hat = self.feed_forward(inputs)
      loss = crossentropy_loss(targets,y_hat)
      self.loss.append(loss)
      self.backpropagation(inputs,y=targets,y_hat=y_hat)
      self.__weight_updating()

In this cell we define the training loop. The iteration calculates the forward pass, calculates the loss, and backprogates the results.
* In the forward pass we make the dot operation using the weights and the inputs, and using those results to calculate the following layer.
* In the backpropagation calculation we use the loss value and learning rate to calculate the values used to update the weights.
* Finally, we use those calculated values to update the weights.  

In the following cells we are going to explain each of these methods more thoroughly

In [3]:
def feed_forward(self, inputs):
    """Function that performs the forward pass through the neural network, it
    computes the dot product between the features and the weights and then adds
    the bias assigned to that layer.
    Returns:
        activated_result -- The result of the dot product and the activation
        function
    """ 
    return self.calculate_feed_forward(inputs,self.params)
  
def calculate_feed_forward(self, inputs, store):
    for i in range(len(self.layers)-1):
        if i == 0:
          Z_i = inputs.dot(store['W{}'.format(i+1)]) + store[
                                      'b{}'.format(i+1)]
          A_i = self.activation_functs[i](Z_i)
        else:
          Z_i = A_i.dot(store['W{}'.format(i+1)]) + store[
                                    'b{}'.format(i+1)]
          A_i = self.activation_functs[i](Z_i)
        store['Z{}'.format(i+1)] = Z_i 
        store['A{}'.format(i+1)] = A_i  
    y_hat = A_i
    return y_hat

In the cell above we have 2 functions that serve the same purpose, calculating the forward pass. The second one is a refactor made afterwards due to the fact that (spoiler alert) I needed to use this code for calculating the forward pass for a genetic weight optimized neural network.
The indexes to calculate backpropagation are Z and A.

Z = w_i * inputs + b_i  
A = activation_function(Z_i)

The return of the function is the last A_i being the activated output of the last layer. This result, depending on the activation function can take different meanings.


In [4]:
def backpropagation(self, inputs, y, y_hat):
    """The backward pass to update the weights.
    It computes the ammount in which the weights must be changed to match the 
    outputs by the calculation of the error that the neural network is making
    at the moment.
    Note: It only performs the backpropagation, not the weight updating.
    
    Returns:
        z_delta -- The delta diference in which the weights should be modified
    """  
    dl_wrt_yhat = -(np.divide(y, y_hat) - np.divide((1 - y),(1-y_hat)))
    dl_wrt_sig = y_hat * (1-y_hat)
    dl_wrt_z_i = dl_wrt_yhat * dl_wrt_sig
    for i in range(len(self.layers)-1,0,-1):
      if i != len(self.layers)-1:
        dl_wrt_z_i = dl_wrt_A_j * sigmoid_der(self.params['Z{}'.format(i)])
      if i > 1:
        dl_wrt_A_j = dl_wrt_z_i.dot(self.params['W{}'.format(i)].T)
        dl_wrt_w_i = self.params['A{}'.format(i-1)].T.dot(dl_wrt_z_i)
      else:
        dl_wrt_w_i = inputs.T.dot(dl_wrt_z_i)
      dl_wrt_b_i = np.sum(dl_wrt_z_i, axis=0)
      self.params['dl_wrt_w{}'.format(i)] = dl_wrt_w_i
      self.params['dl_wrt_b{}'.format(i)] = dl_wrt_b_i

In the cell above we define the backpropagation operator, that takes the activated results of the neural network, the real values of the dataset that we pass into the training loop and the Z_i's and A_i's calculated above and calculated all the derivatives necessary to update the weights correctly. The necessity to calculate those derivatives comes from the type of technique that backpropagation is.  
Since backpropagation is an Stochastic Gradient Descent method, we need to calculate the gradient (i.e. the slope of the function) and the gradient of a function is calculated as the derivative of that function.
Once we calculate the gradient is time to make is descend, operation that we will make updating the weights.

In [5]:
def __weight_updating(self):
    """In this fucnction we update the weights of the neural network using the 
    derivatives calculated in the backpropagation pass.
    """
    for i in range(len(self.layers)-1):
      self.params['W{}'.format(i+1)] = self.params[
                  'W{}'.format(i+1)] - self.learning_rate * self.params[
                  'dl_wrt_w{}'.format(i+1)]
      self.params['b{}'.format(i+1)] = self.params[
                  'b{}'.format(i+1)] - self.learning_rate * self.params[
                  'dl_wrt_b{}'.format(i+1)]

In the cell above we define the weight updating method, that takes the weights and biases and their respective derivative and update the weights moving them forward into the convergence area. This procedure is executed for each layer of the network, and it is performed from the first to the last layer although the order doesn't matter since we have stored all the derivatives.

## Basic Neural Network Demo
In this section we will making a demo to test if the neural network that we have built in fact optimizes the weights and reduces the value of the loss function in the course of the training loop.

In [6]:
from neuroevolution.networks.basic_neural_network import BasicNeuralNetwork
import numpy as np

if __name__ == "__main__":
    feature_set = np.array([[0,1,0],[0,0,1],[1,0,0],[1,1,0],[1,1,1]])
    labels = np.array([[1,0,0,1,1]])
    labels = labels.reshape(5,1)
    nnet = BasicNeuralNetwork(layers=[3,1], input_size=3, num_of_classes= 1)
    print(nnet.params)
    print("-----Training Process------")
    nnet.train(feature_set,labels, 100)
    print("--------LOSS VALUES--------")
    print(nnet.loss[1:5])
    print("...")
    print(nnet.loss[-4:])
    print("-----Training Finished-----")
    print(nnet.params)
    


{'W1': array([[ 1.22760983],
       [-0.76522897],
       [ 1.17770312]]), 'b1': array([-0.15035313])}
-----Training Process------
--------LOSS VALUES--------
[0.9209102583778608, 0.9007855144499435, 0.8812766039703068, 0.8623490723765888]
...
[0.2199382788228734, 0.21787288174165617, 0.21584304313975414, 0.21384790832632694]
-----Training Finished-----
{'W1': array([[ 0.44255866],
       [ 2.8135129 ],
       [-0.12578231]]), 'b1': array([-1.18553206]), 'Z1': array([[ 1.61447312],
       [-1.30017129],
       [-0.73194428],
       [ 2.06134215],
       [ 1.93998417]]), 'A1': array([[0.83403149],
       [0.21413619],
       [0.32476821],
       [0.88708867],
       [0.8743504 ]]), 'dl_wrt_w1': array([[ 0.08620729],
       [-0.40452943],
       [ 0.08848659]]), 'dl_wrt_b1': array([0.13437498])}


In the above code cell we have made a demo showcasing the power of backpropagation. As seen in the code, we show the first set of parameters of the neural network and once we train it we also show how the weights, biases, Z's A's and derivatives values have changed.
The code also prints the loss values at first, and the last ones, just to check how the objective function value decreases over time as we make further iterations into the gradient descent method.

## Conclusion and further improvements
In this notebook, we have explained how a basic neural network is set up from absolute scratch using the numpy library to make the matricial operations, speeding up the algorithm since numpy is written and compiled in C. As stated, this class is general for any structure that you may imagine, always taking into conderation that the net built will be a fully connected one. As we have seen during the course of this notebook, the basic neural network class is able to reduce the loss function my using forward and back propagation. Finally, I want to encourage the reader to play with the code in this notebook, changing the structure, the data fed into the network, or even colaborate in this repository.  
As improvements or things that will change over time, adding more activation functions and derivatives would be a very interesting addition.  
Support for more types of networks in addition to fully connected ones, is an interesting addition aswell.  
To conclude, the basics of the neural networks and the training procedure are a mistery if you use libraries like tensorflow and keras, I hope that once you read this notebook some of this black magic is transferred to you and maybe ecourage to expeeriment with this structures or more complex ones.

## References
This notebook is a generic implementation of this medium post: https://heartbeat.fritz.ai/building-a-neural-network-from-scratch-using-python-part-1-6d399df8d432
