Name:   
Matrikelnummer:  
Email:   
   
Name:   
Matrikelnummer:   
Email:

Name:    
Matrikelnummer:    
Email:    

## 4.4.a Building your own feed-forward network

Import numpy, which is really all we need to create our own NN.

In [1]:
import numpy as np

Recall that our simple neural network consisted of two layers. We also added a `ReLU` function as a non-linearity to the output of our intermediate layer. Given an input $\mathbf{x} \in \mathbb{R}^n $ we have

$ \mathbf{h} = f^{(1)}(\mathbf{x}; \mathbf{W},c) = ReLU(\mathbf{W}^\mathsf{T} \mathbf{x} + c) $ 

$ \mathbf{y} = f^{(2)}(\mathbf{h}; \mathbf{w},b) = \text{$ softmax $}( \mathbf{w}^\mathsf{T} \mathbf{h} + b) $

In this exercise you will create your own network. However, we will do it in a way that allows you to specify the depth of network, i.e. we extend our network such that there isn't just one $\mathbf{h}$ intermediate layers, but rather $n$ of them $\mathbf{h}_{i}$ with $i \in \{1,..., n\}$

**NOTE**: You are not allowed to use any built-in functions to calculate the ReLU, Softmax or the forward pass directly.

**NOTE 2**: Remember to include the non-linearity at every layer. Remember to also add the bias to every layer. Finally, remember to apply the softmax in the output layer.

In [2]:
def relu(x):
    """
    Implement the ReLU function as defined in the lecture
    Input: an array of numbers
    Output: ReLU(x)
    """
    x[x<0] = 0
    return x

In [3]:
def softmax(x):
    """
    Implement the `softmax` function as defined in the lecture
    """
    x = np.exp(x) / (np.exp(x).sum())
    return x

In [4]:
class FFNetwork:
    """
    Class representing the feed-forward neural network
    """
    def __init__(self, input_dim: int, hidden_dim: int,
                 output_dim: int, hidden_size: int):
        """
        Args:
        input_dim: dimensionality of `x`
        hidden_dim: dimensionality of the intermediate `h_i`
        output_dim: dimensionality of `y`
        hidden_size: number of intermediate layers `h_i`
        """
        # TODO: Implement
        # Initialize each layer as a random matrix of the
        # appropriate dimensions
        
        ## SOLUTION ##

        self.layers = []
        
        # first layer
        self.layers.append([np.random.randn(input_dim, hidden_dim), np.zeros(hidden_dim)])
        
        # middle layer(s)
        for i in range(hidden_size):
            self.layers.append([np.random.randn(hidden_dim, hidden_dim), np.zeros(hidden_dim)])
        
        # last layer
        self.layers.append([np.random.randn(hidden_dim, output_dim), np.zeros(output_dim)])
        
        ## SOLUTION ##
    
    def forward(self, x):
        """
        Args:
        x: input to the neural network
        
        Output:
        `y`, i.e. the prediction of the network
        
        Note: Remember to apply the ReLU and add the bias for each layer
        """
        # TODO: Implement the forward pass of the network,
        # i.e. calculate `y` from an input `x`
        # Remember that each layer's output is calculated by
        # f^(i) = ReLU(W_i^T * f^(i-1) + b_i)
        res = x
        
        ## SOLUTION ##
        
        # first layer
        W, b = self.layers[0]
        res = res@W + b
        res = relu(res)
        
        # middle layers
        for W, b in self.layers[1:-1]:
            res = res@W + b
            res = relu(res)
            
        # last layer
        W, b = self.layers[-1]
        res = res@W + b
        res = softmax(res)
        
        ## SOLUTION ##
        
        return res

Your implementation needs to be compatible with the following test code:

In [5]:
np.random.seed(0)

# A configuration that reflects the example from the lecture
# i.e. our input is of size 2, our intermediate layers are also of size 2,
# and we will only have 1 hidden layer.
network = FFNetwork(2, 2, 2, 1)
network.forward([1.,0.])

array([0.1314611, 0.8685389])

Disclaimer: Do not expect a correct output at this stage, you are simply building the structure of the network.

However, our setup also allows us to create larger networks:

In [6]:
np.random.seed(0)

network = FFNetwork(2, 3, 2, 1)
network.forward([1.,0.]) 

array([0.17044382, 0.82955618])

Some sanity checks:

1. You should be seeing the number of units you specified as output units in your output.
1. The numbers in your outputs should be in the range $[0,1]$
1. The numbers should add up to $1$
1. Varying the structure of the network should not break its functionality.

## 4.4.b Implementing a feed-forward network using `torch`

### 4.4.b.1 Creating the network (1 point)

For this we will be using the `nn` module of `torch`, which contains modules representing types of layers. In your case, the specific relevant module would be that of a *fully connected linear layer*.

We will also be using the `nn.functional` module to take advantage of the built in functions for ReLU and Softmax. In this exercise, you are allowed to use them.

In [7]:
import torch
import torch.nn.functional as F

from torch import nn

In [8]:
class TorchFFNetwork(nn.Module):
    """
    A `torch` version of the network implemented for 4.3.b
    """
    def __init__(self, input_dim: int, hidden_dim: int,
                 output_dim: int, hidden_size: int):
        """
        Args:
        input_dim: dimensionality of `x`
        hidden_dim: dimensionality of the intermediate `h_i`
        output_dim: dimensionality of `y`
        hidden_size: number of intermediate layers `h_i`
        """
        ## SOLUTION ##
        
        super(TorchFFNetwork, self).__init__()
        
        layers = [nn.Linear(input_dim, hidden_dim, bias=True)] 
        layers.append(nn.ReLU())
        for i in range(hidden_size):
            layers.append(nn.Linear(hidden_dim, hidden_dim, bias=True))
            layers.append(nn.ReLU())
        layers.append(nn.Linear(hidden_dim, output_dim, bias=True))
        layers.append(nn.Softmax(dim = 0)) 
        
        self.layers = nn.Sequential(*layers)
        
        
        ## SOLUTION ##

    def forward(self, x):
        ## SOLUTION ##
        
        return self.layers(x)
        
        ## SOLUTION ##
         


Your implementation, once more, needs to be compatible with the following test code:

In [9]:
torch_network = TorchFFNetwork(2, 3, 2, 1)

In [10]:
with torch.no_grad():
    print(torch_network(torch.tensor([1.,0.])))

tensor([0.3202, 0.6798])


Note that the `forward` method is automatically called when you call your network object.

### 4.4.b.2 Training your network (1 point)

Even though we have not covered how training actually works, we will proceed with the training of the a neural network as a blackbox procedure and we will later on learn the internals of the training process (and even implement them ourselves!).

For now, train a neural network (the one you created above) to learn the XOR operation. You are to create a neural network with the appropriate number of input variables, an intermediate hidden layer with 2 units and an output layer with 2 units.

Notes:
- Please read [this introduction to the optimization loop in PyTorch](https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html#optimization-loop). It should give you a good overview to what PyTorch needs from you to train a neural network.
- You are to train the network until the network learns the operation. Remember to set your random seeds so the results are reproducible.
- There are many optimizers available and Adam is an optimizer that's more complex than SGD. It has not yet been covered in the lecture but its usage in code is equivalent to that of SGD and performs much better.

In [11]:
# Our training X, where each instance includes an x1 and an x2, (where the operation is defined as x1 XOR x2)
training_x = [[0,0], [0,1], [1,0], [1,1]]

# We have only covered softmax in the lecture, so we format the output as follows:
training_y = [[1,0], [0,1], [0,1], [1,0]]

# The Y is formatted such that the its first element corresponds to the probability of the XOR resulting in a 0
# and the second element to the probability of the XOR resulting in a 1

################################################################
# TODO: Adapt the training set so it can be used with `pytorch`
################################################################

In [43]:
# Create the model from the previous class and pick a learning rate
torch.manual_seed(42)
model = TorchFFNetwork(2, 2, 2, 1)
learning_rate = 0.0003

In [56]:
def train_loop(data, model, loss_fn, optimizer):
    X, Y = data
    optimizer.zero_grad()
    prediction = model(torch.tensor(X).float())
    loss = torch.sum((prediction - torch.tensor(Y))**2)
    loss.backward()
    optimizer.step()

In [57]:
# TODO: Run training
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
train_loop((training_x, training_y), model, None, optimizer)

model(torch.tensor(training_x).float())

tensor([[0.2528, 0.2580],
        [0.2675, 0.2703],
        [0.2358, 0.2341],
        [0.2439, 0.2376]], grad_fn=<SoftmaxBackward0>)