## DEL-03 Programming Excercise - Multi-Layer Perceptron - Backward Propagation
### (created by Prof. Dr.-Ing. Christian Bergler & Prof. Dr. Fabian Brunner)

Documentation: **Python-Bibliothek Pandas** - https://pandas.pydata.org/docs/

Documentation: **Numpy** - https://numpy.org/doc/

Documentation: **Sklearn** - https://scikit-learn.org/stable/index.html

Documentation: **Matplotlib** - Documentation: https://matplotlib.org/stable/index.html

Documentation: **Matplotlib** - Graphics Gallery: https://matplotlib.org/2.0.2/gallery.html

Additional Documentation: **Python Tutorial** - https://docs.python.org/3/tutorial/

Additional Documentation: **Matthes Eric, "Python crash course: A hands-on, project-based introduction to programming"**, ISBN: 978-1-59327-603-4, ©2023 no starch press  

In [1]:
import pandas as pd
import numpy as np

### Task DEL-03-1 (Sigmoid Function)

In [2]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

### Task DEL-03-2 (Softmax Activation Function)

In [3]:
def softmax(O):
    O_exp = np.exp(O - np.max(O, axis=1, keepdims=True))
    partition = O_exp.sum(axis=1, keepdims=True)
    return O_exp / partition

### Task DEL-03-3 (Implementation Multi-Layer Perceptron for Classification)

In this and the next exercise, an `Multi-Layer Perceptron (Deep Feed Forward Neural Network)`  for classification is to be implemented from scratch. The topology of the network is to be specified using a list called `nodes_per_layer`. The $i$th entry specifies how many nodes the $i$th layer consists of. The number of layers can be arbitrary. The number of classes corresponds to the number of nodes in the output layer. 

Example: ``nodes_per_layer = [4,5,3]`` would realize a network with 4 nodes in the input layer, 5 nodes in the hidden layer and 3 nodes in the output layer.

Now that forward propagation has already been implemented in the other exercise, model training is to be implemented using the gradient (descent) method. To do this, complete the following class ``DeepFeedForward``. The gradient calculation should be carried out using the backpropagation algorithm presented in the lecture. In addition to the implementation of the ``fit`` method, a modification of the ``forward`` method is recommended by saving the `inputs` and `activations of the hidden layers` calculated during the forward propagation. These are accessed in the course of back propagation.

#### Equation for Backpropagation (Recap!)

- Rekursive Berechnung der Größen $\delta^{[l]}$:

$$\begin{align*}
\delta^{[L]}&=\nabla_{\hat{y}}L \odot ({f^{[L]}})'(z^{[L]})~,\\[0.2cm]
\delta^{[l]}&={\mathbf{W}^{[l+1]}}\delta^{[l+1]}\odot{(f^{[l]}})'(z^{[l]})~,\quad l=L-1,\ldots,1
\end{align*}$$

- Calculation of the partial derivatives according to the weights as a function of the variables $\delta^{[l]}$:

$$\begin{align*}
\frac{\partial L}{\partial b_j^{[l]}}&=\delta^{[l]}_j~,\quad l=1,\ldots,L\\[0.2cm]
\frac{\partial L}{\partial w^{[l]}_{kj}}&=h_k^{[l-1]}\delta_j^{[l]}~,\quad l=1,\ldots,L~.
\end{align*}
$$

In [4]:
class DeepFeedForward:
    def __init__(self, nodes_per_layer, lr=1.0, num_iter = 100):
        self.learning_rate = lr
        self.num_iter = num_iter
        self.n_layers = len(nodes_per_layer) #number of layers
        self.nodes_per_layer = nodes_per_layer #list containing the number of nodes for each layer
        self.n_classes = nodes_per_layer[-1] #number of output units (=number of classes for classification)
        self.weight_matrices = [] #in this list, the weight matrices will be stored
        self.bias_vectors = [] #in this list, the bias vectors will be stored
   
    def initialize_weights(self):
        """
        When this function is called, the weight matrices and bias vectors are 
        initialized with random normally distributed numbers and stored in the instance parameters weight_matrices and bias_vectors
        For each layer (except the input layer), one weight matrix and one bias vector is needed.
        The dimensions of the matrices depend on the number of units of the layers.
        """ 
        self.weight_matrices = []
        self.bias_vectors = []
        #TODO

    def forward(self, X):
        """
        model function to perform the forward pass through the net for all samples in the batch X
        
        In each layer except the output layer, sigmoid activation is used. 
        In the output layer, softmax activation is used.
        
        :param X: batch of training data of dimension n_samples x n_features
        :type X: numpy array
        :return: array containing the predicted scores for all samples of the batch X (dimension: n_samples x n_classes)
        :rtype: numpy array
        """ 
        layer_inputs = []
        layer_outputs = []
        
        #TODO
        
        self.layer_outputs = layer_outputs
        self.layer_inputs = layer_inputs
        
        return H
    
    def fit(self, X, y):
        """
        Model training using gradient descent optimization algorithm

        :param X: batch of training data of dimension n_samples x n_features
        :type X: numpy array
        :param y: target values corresponding to records in X 
        :type y: numpy array
        :return: List containing the values of the loss function after each iteration of Gradient descent
        :rtype: list
        """
        
        loss_history = []
        
        #TODO
        
        return loss_history
            
    
    
    def predict(self, X):
        """
        Predict classes based on the largest predicted class probability
        
        :param X: batch of data to be scored
        :type X: numpy array
        :return: predicted classes for the records in X
        :rtype: numpy array
        """
        #TODO
    

### Task DEL-03-4 (Testing Implemented Backward Propagation)

- Create a new `DeepFeedForward` object with `4 input neurons, 10 hidden neurons, and 3 output neurons`
- learning rate of `lr=0.2`, and maximum iterations `num_iter=300`
- Initialize `weights` and `biases`
- Train the model using `fit`
- Evaluate using `predict` and `plot` (matplotlib) the `loss` curve

In [5]:
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data 
Y = iris.target

In [6]:
#TODO

In [7]:
#TODO

#### Model Training

- Train the `DeepFeedForward` neural network, using the implemented ``fit`` method and `plot` the progression of the loss function over time. How does the curve differ from the curve you observed with the `logistic regression` and the `softmax` regression? Explain!

In [8]:
#TODO

In [9]:
#TODO

In [10]:
import matplotlib.pyplot as plt
#TODO