# Exercise Sheet 4, Task 3

In this assignment, we will implement a neural network “library”, using Python and Numpy. The tool is inspired by PyTorch’s implementation.

In the lecture, we have covered vectorised backpropagation in detail, that we also want to use in this exercise for efficiency.

## Backward Pass
We now implement the ```backward``` functions for each of the module classes implemented last week. Each ```backward``` function gets the input to the function as well as the backpropagating gradient and will output the new gradient for this module. For ```FullyConnectedLayer```, we return a tuple with the gradient w.r.t. the input, the gradient w.r.t. the weights, and the gradient w.r.t. the bias. ```NeuralNetwork``` returns a tuple with the gradient w.r.t. the input, a list of gradients w.r.t. the weights of each layer, and a list of gradients w.r.t. the biases of each layer.

In [5]:
#most of this is copied from last week ;)
import numpy as np
from typing import List, Tuple

class Sigmoid:
    def __init__(self):
        pass

    def non_rounded_sigmoid(self,x : np.array) -> np.array:
        return 1 / (1 + np.exp(-x))


    def forward(self, x: np.array) -> np.array:
        return 1 / (1 + np.exp(-x))

    def backward(self, x: np.array, grad: np.array = np.array([[1]])) -> np.array:
        return grad * (self.forward(x) * (1 - self.forward(x)))

class MeanSquaredError:
    def __init__(self):
        pass

    def forward(self, y_pred: np.array, y_true: np.array) -> float:
        return np.mean(0.5 * (y_true - y_pred) ** 2)

    def backward(self, y_pred: np.array, y_true: np.array, grad: np.array = np.array([[1]])) -> np.array:
        return  grad * (y_pred - y_true)

class FullyConnectedLayer:
    def __init__(self, input_size: int, output_size: int):
        self.input_size = input_size
        self.output_size = output_size

        self.weights = np.random.randn(self.input_size, self.output_size)
        self.bias = np.zeros((1, self.output_size))

    def forward(self, x: np.array) -> np.array:
        return np.matmul(x, self.weights) + self.bias

    def backward(self, x: np.array, grad: np.array = np.array([[1]])) -> Tuple[np.array,np.array,np.array]:
        x_grad = np.matmul(grad, self.weights.T)
        W_grad = np.matmul(x.T, grad)
        b_grad = grad

        return x_grad, W_grad, b_grad

class NeuralNetwork:
    def __init__(self,
                 input_size: int,
                 output_size: int,
                 hidden_sizes: List[int],
                 activation=Sigmoid):
        self.activ_inputs = None
        self.layer_inputs = None
        s = [input_size] + hidden_sizes + [output_size]
        self.layers = [FullyConnectedLayer(s[i], s[i+1]) for i in range(len(s) - 1)]
        self.activation = activation()

    def forward(self, x: np.array) -> np.array:
        # we need to edit this function to cache our inputs and outputs for each layer during the forward passe!
        self.layer_inputs = []
        self.activ_inputs = []

        for layer in self.layers[:-1]:
            self.layer_inputs.append(x)
            x = layer.forward(x)
            self.activ_inputs.append(x)
            x = self.activation.forward(x)

        #The last layer should not be using an activation function
        self.layer_inputs.append(x)
        x = self.layers[-1].forward(x)
        return x

    def backward(self, x: np.array, grad: np.array = np.array([[1]])) -> Tuple[np.array]:
        W_grads = []
        b_grads = []

        # Backward pass for the last layer
        grad, W_grad, b_grad = self.layers[-1].backward(self.layer_inputs[-1], grad)
        W_grads.append(W_grad)
        b_grads.append(b_grad)

        # Backward pass for the remaining layers
        for i in reversed(range(len(self.activ_inputs))):
            grad = self.activation.backward(self.activ_inputs[i], grad)
            grad, W_grad, b_grad = self.layers[i].backward(self.layer_inputs[i], grad)
            W_grads.append(W_grad)
            b_grads.append(b_grad)

        return grad, list(reversed(W_grads)), list(reversed(b_grads))

## Testing the Implementation
Let's apply our backward pass to the network from last week by adding a few lines after the forward pass:

In [6]:

# Network Initialization
net = NeuralNetwork(2, 1, [2], Sigmoid)

# Setting the layer weights
net.layers[0].weights = np.array([[0.5, 0.75], [0.25, 0.25]])
net.layers[1].weights = np.array([[0.5], [0.5]])

# Loss
loss_function = MeanSquaredError()

# Input
x = np.array([[1, 1]])
y = np.array([[0]])

# Forward Pass
pred = net.forward(x)

# Loss Calculation
loss = loss_function.forward(pred, y)

print(f"Prediction: {pred}")
print(f"Loss: {loss}")

# Backward Pass
grad = loss_function.backward(pred, y)
grad, W_grads, b_grads = net.backward(x, grad)

print(f"Gradients of the first layer: \n\nW1:\n{W_grads[0]}, \n\nb1: \n{b_grads[0]}\n")
print(f"Gradients of the second layer: \n\nW2:\n{W_grads[1]}, \n\nb2 \n{b_grads[1]}")

Prediction: [[0.70511864]]
Loss: 0.24859614746399733
Gradients of the first layer: 

W1:
[[0.07682091 0.06931737]
 [0.07682091 0.06931737]], 

b1: 
[[0.07682091 0.06931737]]

Gradients of the second layer: 

W2:
[[0.47890156]
 [0.51548303]], 

b2 
[[0.70511864]]


We may now check if the gradient computed by our network is the same as the one computed manually. However, we did use an activation function in the final layer on exercise sheet 2, but didn't do this here, so the results will not match without further alterations. Consider that a bonus exercise ;)