Team member 1 (name, email, id): Mhd Jawad Al Rahwanji, mhal00002@stud.uni-saarland.de, 7038980
Team member 2 (name, email, id): Christian Singer, chsi00002@stud.uni-saarland.de, 7039059


### Note: This assignment will extensively refer to coding exercise in assignment 4.

## 6.2.a Building your own Neural-Network

Import numpy, which is really all we need to create our own NN.

In [4]:
import numpy as np
np.random.seed(32)

Recall that our simple neural network consisted of two layers. We also added an `activation` function as a non-linearity to the output of our intermediate layer. Given an input $\mathbf{x} \in \mathbb{R}^n $ we have

$ \mathbf{h} = f^{(1)}(\mathbf{x}; \mathbf{W},c) = activation\_fn(\mathbf{W}^\mathsf{T} \mathbf{x} + c) $ 

$ \mathbf{y} = f^{(2)}(\mathbf{h}; \mathbf{w},b) = \text{$ softmax $}( \mathbf{w}^\mathsf{T} \mathbf{h} + b) $

In this exercise you will create your own network and are free to implement it with your own design choices. However, we will do it in a way that allows you to specify the depth of network, i.e. we extend our network such that there isn't just one $\mathbf{h}$ intermediate layers, but rather $n$ of them $\mathbf{h}_{i}$ with $i \in \{1,..., n\}$

**NOTE**: You are not allowed to use any built-in functions to calculate Leaky_ReLU, Softmax or the forward/backward pass directly.

**NOTE 2**: Remember to include the non-linearity at every layer. Remember to also add the bias to every layer. Finally, remember to apply the softmax in the output layer.

## ToDo: Rewrite the Leaky_ReLu and Softmax function as Class and implement a function in each of them to calculate gradients (1 point)
Remember that in PyTorch, these are implemented as classes so we also want to have them as classes.

In [5]:
class LeakyReLU():
    """ Leaky ReLU activation function. """
    def __init__(self, alpha: float=0.01):
        """ Initialize LeakyReLU activation function. """
        self.alpha = alpha
        self.trainable = False

    def forward(self, x):
        """ Forward pass of LeakyReLU activation function. """
        self.input = x
        return np.maximum(x, self.alpha * x)

    # Make forward method callable like LeakyReLU(x)
    __call__ = forward

    def gradient(self, x):
        """ Calculate gradient of function."""
        jacobian = np.where(x > 0, 1, self.alpha)
        return jacobian

    def backward(self, grad_output):
        """ Backward pass of LeakyReLU activation function. """
        return grad_output * self.gradient(self.input)

In [6]:
class Softmax():
    """ Softmax activation function. """
    def __init__(self):
        """ Initialize Softmax activation function. """
        self.trainable = False

    def forward(self, x):
        """ Forward pass of Softmax activation function. """
        z = x - np.max(x)
        exp = np.exp(z)
        probs = exp / np.sum(exp)
        return probs

    # Make forward method callable lik Softmax(x)
    __call__ = forward

    def gradient(self, y):
        """ Calculate gradient of function."""
        jacobian = np.diag(y) - np.outer(y, y)
        return jacobian

    def backward(self, y):
        """ Backward pass of Softmax activation function. """
        jacobian = self.gradient(y)
        return np.dot(jacobian,y)

## ToDo: Calculate the gradient using your implemented functions in their respective classes and validate by manually calculating gradients using a toy value. (1 point)

In [8]:
activation1 = LeakyReLU()
activation2 = Softmax()

data = np.array([1/2, 1/4, 1/4])

print(f"Gradient LeakyRelu: {activation1.gradient(data)}")
print(f"Gradient Softmax: {activation2.gradient(data)}")

Gradient LeakyRelu: [1. 1. 1.]
Gradient Softmax: [[ 0.25   -0.125  -0.125 ]
 [-0.125   0.1875 -0.0625]
 [-0.125  -0.0625  0.1875]]


## ToDo: Rewrite the code from Assignment 4 to include backpropagation in your class without using pytorch. Remember to use your Leaky_ReLu class here as activation function. (1.5 points)
#### Feel free to refer to your solutions from Assignment 4.

In [None]:
class LinearLayer():
    def __init__(self, input_dim, output_dim):
        """ Initialize LinearLayer with He-method. """
        self.W = np.random.randn(output_dim, input_dim) * np.sqrt(2 / input_dim)
        self.b = np.random.randn(output_dim) * np.sqrt(2 / input_dim)
        self.trainable = True

    def forward(self, x):
        """ Forward pass of LinearLayer. """
        self.input = x
        return np.dot(self.W, x) + self.b

    # Make forward method callable like LinearLayer(x)
    __call__ = forward

    def backward(self, grad_output):
        """ Backward pass of LinearLayer. """
        x = self.input
        self.grad_W = np.outer(grad_output, x)
        self.grad_b = grad_output
        return np.dot(self.W.T, grad_output)

    def update_params(self, learning_rate):
        """ Update parameters of LinearLayer. """
        self.W -= learning_rate * self.grad_W
        self.b -= learning_rate * self.grad_b


class Sequential():
    """ Sequential model. """
    def __init__(self, *layers):
        """ Initialize Sequential model. """
        self.layers = list(layers)

    def forward(self, x):
        """ Forward pass of Sequential model. """
        for layer in self.layers:
            x = layer(x)
        return x

    # Make forward method callable like Sequential(x)
    __call__ = forward

    def backward(self, grad_output):
        """ Backward pass of Sequential model. """
        for layer in reversed(self.layers):
            grad_output = layer.backward(grad_output)
        return grad_output

    def update_params(self, learning_rate):
        """ Update parameters of Sequential model. """
        for layer in self.layers:
            if layer.trainable:
                layer.update_params(learning_rate)

### 6.2.b.2 Training a network for MNIST (1.5 points)

Now that we know how to train a Neural network in Pytorch. Let's start training and evaluating our model using a very standard dataset, for now let's use MNIST. Design a network from scracth using PyTorch and include the followings. Remember that we need to use forward-propagation and backprop.
- Training Loop
- Optimization 
- Evaluating Loop

In [None]:
import torch
import torch.nn.functional as F

from torch import nn

In [None]:
class TorchFFNetwork(nn.Module):
    """
    A pytorch implementation to do classification for MNIST dataset.
    """
    def __init__():
        raise NotImplementedError

### ToDo: Implement functions for Stochastic Gradient Descent and Stochastic Gradient Descent with momentum and plot the difference on how they change the value for gradients. ( 1 + 1 (Bonus))