# **Lab 2**

TI3155TU Deep Learning (2024 - 2025)

Authors: Ali Alper Ataşoğlu, Elena Congeduti

# Instructions
We recommend that you fork the lab notebooks by selecting `Copy & Edit` on the notebook's homepage. This will create a copy in your Kaggle repository, allowing you to work on it and save your progress as you go. Kaggle provides a pre-configured virtual environment, which means that most of the libraries we will use are already downloaded and ready to use. Therefore, you typically do **not** need to `pip install` additional resources.

Alternatively, to work on Google Colab, you just need to select the `Open in Colab` option in the notebook's homepage menu. Finally, if you want to work locally, you will need to set up your own virtual environment. Check the Lab Instructions in [Learning Material](https://brightspace.tudelft.nl/d2l/le/content/682797/Home?itemIdentifier=D2L.LE.Content.ContentObject.ModuleCO-3812764) on Brightspace for detailed information on the virtual environment configuration.

These labs include programming exercises and insight questions. Follow the instructions in the notebook. Fill in the text blocks to answer the questions and write your own code to solve the programming tasks within the designated part of the code blocks:

```python
#############################################################################
#                           START OF YOUR CODE                              #
#############################################################################


#############################################################################
#                            END OF YOUR CODE                               #
#############################################################################
```

Solutions will be shared the week after the lab is published. Note that these labs are designed for practice and are therefore **ungraded**.

In [None]:
import torch 
import torch.nn as nn

import numpy as np

from scipy.stats import multivariate_normal

import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator

%matplotlib inline

# 1 Multi-layer Perceptrons

Modern neural networks consist of many building blocks that are assembled together to create deep and sophisticated architectures. Here we want to introduce you to the basic blocks that will allow you to build your first neural network.


## 1.1 Linear Layers

A linear (fully connected) layer represents a linear transformation of an input $x \in \mathbb{R}^{N_{\text{in}}}$ to an output $y  \in \mathbb{R}^{N_{\text{out}}}$ described as:

$$ y = x\cdot\mathbf{W}  + \mathbf{b}$$

for weight $\mathbf{W}\in \mathbb{R}^{N_{\text{in}} \times N_{\text{out}}}$ and bias $\mathbf{b}\in \mathbb{R}^{N_{\text{out}}}$ matrices. $N_{\text{in}}$ and $N_{\text{out}}$ correspond to the dimension (or features) of the input and output space respectively.


***
**Question 1.1:** What does $\textbf{W}_{ij}$ represent exactly?

<font color='green'>Write your answere here</font>
****

Moreover the forward pass formula above can also be used to process multiple inputs at the same time in a *batch* $x \in \mathbb{R}^{\text{batch size}\times N_{\text{in}}}$, where *batch size* indicates the number of samples in the batch. This allows to parallelize the computations, reducing processing time significantly. Remember that in this case, broadcasting is used to sum the bias vector $b$ with the matrix product $x \cdot\textbf{W}$. This yields the same restuls as computing $y=x \cdot\textbf{W}+\,\textbf{1}\cdot\textbf{b}^T$, where $\textbf{1}\in\mathbb{R}^{{\text{batch size}}}$ is a vector whose components are all one and $\textbf{1}\cdot\textbf{b}^T\in \mathbb{R}^{\text{batch size}\times N_{out}}$ is a matrix with each of its $\text{batch size}$ rows equal to the bias $b$.

To build such a layer, we can complete the reference implementation provided below.

****
**Task 1.2:** Complete the code to initialize the $\mathbf{W}$ and $\mathbf{b}$ matrices with the correct dimensions, and compute the output $y$ of the linear layer.

**Hint**: remember that the sum supports broadcasting.
****

In [None]:
class Linear (object):
    """
    Fully connected layer.
    """

    def __init__(self, N_in, N_out):
        """
        Args:
          N_in:  number of input features (input space dimension)
          N_out: number of output features (output space dimension)
        """
        #############################################################################
        #                           START OF YOUR CODE                              #
        #############################################################################
        self.weight =None
        self.bias =None
        #############################################################################
        #                            END OF YOUR CODE                               #
        #############################################################################

        # Initialize parameters
        self.init_params()

    def init_params(self):
        """
        Initialize layer parameters by sampling from uniform distribution over [0,1)
        """
        self.weight = torch.randn_like(self.weight)
        self.bias = torch.rand_like(self.bias)

    def forward(self, x):
        """
        Forward pass of Linear layer: multiply input tensor by weights and add
        bias.

        Args:
            x: input tensor

        Returns:
            y: output tensor
        """

        #############################################################################
        #                           START OF YOUR CODE                              #
        #############################################################################

        #############################################################################
        #                            END OF YOUR CODE                               #
        #############################################################################

        return y

Having defined a linear layer, we can now use it to perform a forward pass on some random input $x$.

*****
**Question 1.3:**
The input sample $x$ comprises 2 examples each with 3 dimensions (or features) and a linear layer has 4 units. What is the dimension of the output tensor $y$?

<font color='green'>Write your answere here</font>

****

****
**Task 1.4:** Instantiate a linear layer with the variable name ```layer```. Perform a forward pass on the input $x$ and check the shape of the output $y$.
****

In [None]:
# Define layer dimensions and dummy input
batch_size, N_in, N_out = 2, 3, 4

# Make input tensor has shape [batch_size, N_in] with random values between -1 and 1
x = 2*torch.rand((batch_size, N_in))-1

#############################################################################
#                           START OF YOUR CODE                              #
#############################################################################
y= None
#############################################################################
#                            END OF YOUR CODE                               #
#############################################################################

# Shape output tensor y
print('Shape of ouput tensor y:', y.shape)

In order to validate our solution and further explore the capabilities of PyTorch, we can solve the same exercise by using the building blocks provided by PyTorch.

The linear layer as well as many other useful classes are defined in the **torch.nn** module. A full list of these classes can be found [here](https://pytorch.org/docs/stable/nn.html).

You can compare the outputs of both implementations using [`torch.allclose`]( https://pytorch.org/docs/stable/generated/torch.allclose.html) which returns True if the tensors have all their components sufficiently "close" to each other (i.e. if the distance between the two tensors is lower than a small tolerance threshold).

****
**Task 1.5:** First check the documentation for the `Linear` function from `torch.nn` [here](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html). Initialize a linear layer, compute `torch_y` as the result of the forward pass on the input $x$ and verify that the two outoputs `y` and `torch_y` concide using the function `torch.allclose`.
****

In [None]:
#############################################################################
#                           START OF YOUR CODE                              #
#############################################################################
torch_layer = None
#############################################################################
#                            END OF YOUR CODE                               #
#############################################################################

# Load the parameters from our layer into the Pytorch layer
torch_layer.weight = torch.nn.Parameter(layer.weight.T)
torch_layer.bias = torch.nn.Parameter(layer.bias)

#############################################################################
#                           START OF YOUR CODE                              #
#############################################################################
torch_y = None
outputs_same = None
#############################################################################
#                            END OF YOUR CODE                               #
#############################################################################

# Shape of output tensor torch_y
print('Shape of ouput tensor torch_y:', torch_y.shape)

#Are the output tensors the same?
print('Outputs identical: ', outputs_same)

Your forward implementation of the linear layer is *correct* if `True` is returned.

## 1.2 Activation functions

At this point, you may wonder about the difference between a simple linear regression model and our linear layer, and the short answer is - there is none. However, we aim for our neural network to learn non-linear relationships between the input and output spaces. This is where non-linear activation functions come into play.

One of the most common activation function is the Rectified Linear Unit (ReLU) defined as

$$
\operatorname{ReLU}(x) =
\begin{cases}
x & \text{if } x\geq 0\\
0 & \text{if } x < 0
\end{cases}
\quad= \quad\max\{0, x\}
$$

In [None]:
class ReLU (object):
    """
    ReLU non-linear activation function.
    """
    def __init__(self):
        super(ReLU, self).__init__()

    def forward(self, x):
        """
        Forward pass of ReLU non-linear activation function: y=max(0,x).
        Args:
            x: input tensor

        Returns:
            y: output tensor
        """
        #Clamp all the values below zero and replace them with 0
        y = torch.clamp(x,min=0)

        return y

Now let's now test our implementation on the dummy input tensor $x$.

In [None]:
print('Input tensor x\n:',x)

#Our implementation
relu = ReLU()
y_relu = relu.forward(x)

print('\nOutput of our ReLU implementation:\n',y_relu)

We can also apply ReLU in PyTorch using the `torch.nn.ReLU` class.

****
**Task 1.6:** Check the documentation of `torch.nn.ReLU` then use it to initialize a ReLU function and perform a forward pass on the input $x$.
****

In [None]:
#Torch implementation
#############################################################################
#                           START OF YOUR CODE                              #
#############################################################################
torch_y_relu = None
#############################################################################
#                            END OF YOUR CODE                               #
#############################################################################

check = torch.allclose(torch_y_relu,y_relu)
print('\nOutputs identical: ', check)

****
**Task 1.7:**
Implement your definition of the [sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function). Then verify your implementation by comparing it with the PyTorch counterpart `torch.nn.sigmoid`.
****

In [None]:
class Sigmoid (object):
    """
    Sigmoid non-linear activation function.
    """
    def forward(self, x):
        """
        Forward pass of Sigmoid non-linear activation function: y=1/(1+exp(-x)).

        Args:
            x: input tensor

        Returns:
            y: output tensor
        """
        ########################################################################
        #                        START OF YOUR CODE                            #
        ########################################################################

        ########################################################################
        #                         END OF YOUR CODE                             #
        ########################################################################

        return y

In [None]:
#Our implementation
#############################################################################
#                           START OF YOUR CODE                              #
#############################################################################

#############################################################################
#                            END OF YOUR CODE                               #
#############################################################################

#Torch implementation
#############################################################################
#                           START OF YOUR CODE                              #
#############################################################################

#############################################################################
#                            END OF YOUR CODE                               #
#############################################################################

check = torch.allclose(torch_y_sigma,y_sigma)

print('\nOutputs identical: ', check)

****
**Question 1.8:** What is the shape of the output tensor $y$ for an input tensor $x\in\mathbb{R}^{\text{batch size}\times N_{out}}$?

<font color='green'>Write your answere here</font>
****

A list of all available non-linearities in PyTorch can be found [here](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity).

Despite similarities in implementation, these two activation functions—and activation functions in general—have very different  behaviors.Therefore, we recommand to carefully consider your choices, review documentation or other suggestions and warnings regarding the use of specific activation functions. 

## 1.3 Building a network model

Using the linear layer and the activation functions described beforehand, we can start piecing together our first neural network. To do so, we can define a base class that will allow us to stack an arbitrary amount of layers and activation functions.

****
**Task 1.9:** Complete the code, performing a forward pass through all the layers in the net.
****

In [None]:
class Net (object):
    """
    Neural network object containing layers.
    """

    def __init__(self, layers):
        """
        Args:
          layers: list of layers in neural network
        """
        self.layers = layers

    def forward(self, x):
        """
        Performs forward pass through all layers of the network.

        Args:
            x: input tensor

        Returns:
            x: output tensor
        """

        ########################################################################
        #                        START OF YOUR CODE                            #
        ########################################################################
        x = None
        ########################################################################
        #                         END OF YOUR CODE                             #
        ########################################################################

        return x

As an example, now we can defined a two layer network with an input size of 2, 5 nodes in the hidden layer and an output size of 1.

In [None]:
N_in = 3
hidden_dim = 5
N_out = 1

# Define and initialize layers
layers = [Linear(N_in, hidden_dim),
          ReLU(),
          Linear(hidden_dim, N_out)]

# Initialize network
net = Net(layers)

# Do forward pass
y = net.forward(x)

# What will be the shape of output tensor y?
print('Shape of ouput tensor y:', y.shape)

****
**Question 1.10**: Depict the computational graph for this network.
What do the dimesions of the output correspond to?

<font color='green'>Write your answere here</font>
****


We will now create the same neural network in PyTorch. PyTorch uses the `nn.Module` base class for neural network architectures, which is similar to the `Net` object that we have just defined. However, other than in the `Net` class, you have to define all layers inside the network definition.

This is an important exercise, as this is how you will define all your future models in PyTorch.

You can print a PyTorch `module` to see all sub-modules (i.e. layers) in the module.

****
**Task 1.11:** Complete the code, defining the same layers as we did in our network class with the attribute names ```layer1``` and ```layer2``` and using the activation function ```relu```. Implement the forward pass and initialize a TorchNet object as ```torch_net```.
****

In [None]:
class TorchNet(nn.Module):
    """
    PyTorch neural network. Network layers are defined in __init__ and forward
    pass implemented in forward.
    """

    def __init__(self, N_in, hidden_dim, N_out):
        """
        Args:
          N_in: number of features in input layer
          hidden_dim: number of features in hidden layer
          N_out: number of features in output layer
        """

        super(TorchNet, self).__init__()

        ########################################################################
        #                        START OF YOUR CODE                            #
        ########################################################################

        ########################################################################
        #                         END OF YOUR CODE                             #
        ########################################################################

    def forward(self, x):
        """
        Performs forward pass through all layers of the network.

        Args:
            x: input tensor

        Returns:
            x: output tensor
        """

        ########################################################################
        #                        START OF YOUR CODE                            #
        ########################################################################
        x = None
        ########################################################################
        #                         END OF YOUR CODE                             #
        ########################################################################
        return x



In [None]:
# Initialize Pytorch network
########################################################################
#                        START OF YOUR CODE                            #
########################################################################
torch_net = None
########################################################################
#                         END OF YOUR CODE                             #
########################################################################

print(torch_net)

To check your implementation we will now again compare the outputs of the two networks. For that, we need to load the weights from our network into the PyTorch network.

In [None]:
# Load the parameters from our model into the Pytorch model
torch_net.layer1.weight = nn.Parameter(net.layers[0].weight.t()) # transpose weight by .t()
torch_net.layer1.bias = nn.Parameter(net.layers[0].bias)
torch_net.layer2.weight = nn.Parameter(net.layers[2].weight.t()) # transpose weight by .t()
torch_net.layer2.bias = nn.Parameter(net.layers[2].bias)

# Perform forward pass
torch_y = torch_net(x)

# What will be the shape of output tensor torch_y?
print('Shape of ouput tensor y:', torch_y.shape)

# Compare outputs using torch.allclose
outputs_same = torch.allclose(y, torch_y)
print('Network outputs identical: ', outputs_same)

Each `nn.Module` in PyTorch has a default initialization method for its parameters. Since initialization plays a critical role in the network's learning, we will implement a custom initialization method that can also be called whenever we want to reset the network. We will sample the initial parameters from a Gaussian distribution around zero with small variance.

In [None]:
class TorchNet1(nn.Module):
    """
    PyTorch neural network. Network layers are defined in __init__ and forward
    pass implemented in forward.
    """

    def __init__(self, N_in, N_out):
        """
        Args:
          N_in: number of features in input layer
          hidden_dim: number of features in hidden layer
          N_out: number of features in output layer
         """
        
        super(TorchNet1, self).__init__()

        self.layer = nn.Linear(N_in, N_out)
        self.relu = nn.ReLU()

        self.reset_params()
        
    def reset_params(self, mean=0.0, std=0.02):
        """
        Initializes the parameters of the network
        """
        nn.init.normal_(self.layer.weight, mean=mean, std=std)
        nn.init.constant_(self.layer.bias, 0.0)
    
    def forward(self, x):
        """
        Performs forward pass through all layers of the network.

        Args:
            x: input tensor

        Returns:
            x: output tensor
        """

        x = self.layer1(x)
        x = self.relu(x)

        return x

Now let's have a look at the parameter values and then reset them. To access the entire collection of learnable parameters, you can use the `nn.Module` method `.parameters()`.

****
**Task 1.12:** Reset the weights of the linear layer of the network below using mean 0 and standard deviation 1 and then print them.
(Hint: loop over `net1.parameters()` to access and print them all)
****

In [None]:
#Initialize a network
net1 = TorchNet1(2,1)

# Initial values of the parameters
print('Initial values of the network weights', net1.layer.weight.data)
print('Initial values of the network bias', net1.layer.bias.data)

########################################################################
#                        START OF YOUR CODE                            #
########################################################################

########################################################################
#                         END OF YOUR CODE                             #
########################################################################

# 2 Introducing a regression problem

Finally, we have gathered all the necessary tools to solve our first problem using a neural network.

In many areas of engineering, neural networks are used to approximate complex non-linear models when these models are either missing an analytical formulation, or are too computationally expensive to evaluate.

Starting off simple, the function we will try to approximate is the 2-dimensional Gaussian distribution:

$$
f(x, y)=\frac{1}{2 \pi \sigma^2} e^{-\left[\left(x-\mu_x\right)^2+\left(y-\mu_y\right)^2\right] /\left(2 \sigma^2\right)}
$$

with $\sigma = 50$, $\mu_{x} = 0$, and $\mu_{y} = 0.0$.



To do so, first we will start by generating a dataset that we will use to evaluate our network on. This will be done by sampling the function on a square grid ranging from -2 to 2.

In [None]:
# Define the gaussian distribution function
mux = 0.
muy = 0.
sigma = 50
mean = [mux, muy]
cov = [[sigma, 0.0], [0.0, sigma]]
var = multivariate_normal(mean=mean, cov=cov)

# Number of grid points per axis
num_sample_points = 50

# Generate range of x and y values
x_range = np.linspace(-2, 2., num_sample_points)
y_range = np.linspace(-2, 2., num_sample_points)

# Generate grid coordinates that will be used to evaluate the gaussian on
x_nodes, y_nodes = np.meshgrid(x_range, y_range)
nodes = np.column_stack((x_nodes.ravel(), y_nodes.ravel()))

# Compute the function over the grid coordinates and store the values
ground_truth = np.zeros(len(nodes), dtype=np.float32)
for i, node in enumerate(nodes):
  gauss_value = var.pdf(node)
  ground_truth[i] = gauss_value

# Convert to numpy matrix to tensor
ground_truth = torch.Tensor(ground_truth)


And with this we have our dataset! The 2D-grid input points are stored in the variable `nodes`, while the corresponding gaussian values are stored in `ground_truth`. We can visualize the dataset using `matplotlib` functionalities.

In [None]:
def plot_result_surface(x_nodes, y_nodes, z_values):
  if isinstance(z_values, torch.Tensor):
    z_values = z_values.detach().numpy()

  fig, ax = plt.subplots(subplot_kw={"projection": "3d"})


  # Plot the surface.
  surf = ax.plot_surface(x_nodes, y_nodes, np.reshape(z_values, x_nodes.shape), cmap=cm.coolwarm,
                        linewidth=0, antialiased=False)

  plt.show()

plot_result_surface(x_nodes, y_nodes, ground_truth)


To approximate the gaussian distribution, we will use a 2-layer network with hidden dimension of 25.

****
**Question 2.1**: What should be the number of features of the input and output layer for this problem?

<font color='green'>Write your answere here</font>
****


****
**Task 2.2:** Define a PyTorch Net with the dimensions mentioned above and evalute it on the dataset.
*****

In [None]:
#Set the torch seed for reproducibility
torch.manual_seed(999)

# Initialize Pytorch network
########################################################################
#                        START OF YOUR CODE                            #
########################################################################
net = None
########################################################################
#                         END OF YOUR CODE                             #
########################################################################


# Convert the numpy matrices to PyTorch tensors
torch_nodes = torch.Tensor(nodes)

# Evaluate the network on the grid - Perform forward pass
########################################################################
#                        START OF YOUR CODE                            #
########################################################################
prediction=None
########################################################################
#                         END OF YOUR CODE                             #
########################################################################

print('Shape of the output\n:', prediction.shape)

# Plot the network's prediction
plot_result_surface(x_nodes, y_nodes, prediction)

Disappointingly, the results are far from providing a good approximation of the gaussian function. That is because we have not trained the network yet but we are only looking at the outcome of a random initialization of the weights and bias. Without proper training, neural networks have very poor performance, regardless of the task.

Moreover, currently we have no way of addressing how bad our network is at approximating the target function. Therefore, we need to come up with a systematic method of quantifying the error between the ground truth and the network's prediction.


# 3 Loss functions

Loss functions provide exactly this functionality, i.e. they quantify the error between the ground truth and the prediction. There are many types of loss functions with different properties and suitable for different kind of problems. The most commonly used loss function for regression problems is the **Mean-Square error** defined as:

$$
\mathrm{MSE}=\frac{1}{\text{n}} \sum_{i=1}^n\left(y_i-\hat{y}_i) \right)^2
$$

where $y_{i}$ is the observed data, $\hat{y}_{i}$ is the prediction and $n$ the number of samples.

PyTorch provides definitions of a myriad of loss functions, again saving some time for the user (check the documentation [here](https://pytorch.org/docs/stable/nn.html#loss-functions)).

Now we can use the MSE loss to measure how poorly our network is approximating the Gaussian. As usual, we will first define our own version and then compare it with the PyTorch implementation.


****
**Task 3.1:** Define the a function computing the MSE loss between the ground truth and the predicted values. 
****

In [None]:
def MSE_loss (y,hat_y):
    ########################################################################
    #                        START OF YOUR CODE                            #
    ########################################################################
    loss = None
    ########################################################################
    #                         END OF YOUR CODE                             #
    ########################################################################
    return loss

Now we verify our implementation with the PyTorch function.

In [None]:
# Define MSE loss function
MSE_loss = torch.nn.MSELoss()

# Compute MSE over the ground truth and network prediction
torch_loss = MSE_loss.forward(ground_truth, prediction).item()
loss = MSE_loss(ground_truth, prediction)

print(f"Average loss over whole dataset: {torch_loss:.6f}")
print(f"MSE outputs identical: {loss== torch_loss}")

Let's see if we can improve that using a simple training procedure.



# 4 First steps toward training
We can work our way through a simple training loop in which the network parameters are randomly updated whenever the network performs worse than a desired threshold level for the loss function. That means we will repeat the following training steps:

1.   Compute forward pass of the network.
2.   Compute loss between prediction and ground truth values.
3.   If the loss is below a certain threshold, stop training.
4.   Otherwise, randomly update the network's parameters.


****
**Task 4.1:** Complete the code below to implement the training loop.
****

In [None]:
# Define the error threshhold
epsilon = 1e-5

# Maximum number of training iterations
iter_num = 0.
max_iters = 50

while loss > epsilon and iter_num < max_iters:
  iter_num += 1

  # Update network's weights and biases by drawing from a normal distributions
  for layer in net.children():
    if hasattr(layer, 'reset_parameters'):
        layer.weight = torch.nn.Parameter(0.025 * (-2 * torch.rand(layer.weight.shape) + 1))
        layer.bias = torch.nn.Parameter(0.05 * (-2 * torch.rand(layer.bias.shape) + 1))

    # Evaluate the network on the grid
    ########################################################################
    #                        START OF YOUR CODE                            #
    ########################################################################
    prediction = None
    ########################################################################
    #                         END OF YOUR CODE                             #
    ########################################################################

    # Compute the loss
    ########################################################################
    #                        START OF YOUR CODE                            #
    ########################################################################
    loss = None
    ########################################################################
    #                         END OF YOUR CODE                             #
    ########################################################################

  print(f"Iteration: {iter_num}, Loss: {loss:.8f}")

# Plot the network's prediction
plot_result_surface(x_nodes, y_nodes, prediction)

While the result is still not perfect, we can see that is much better than the initial one, even without applying any smart learning rule.

In the next labs, we will see how to implement learning algorithms to update the network parameters that will make the networks coverge much faster and achieve substantially better results.


**That's all for now, see you in the next lab!**

**Feedback Form:** please fill in the following form to provide feedback https://forms.office.com/e/zFR5Nx0NFC