# 1. Introduction

This lab is created for the course _Machine Learning for Structured Data_ from the University of Amsterdam. By completing this lab you will gain basic knowledge about PyTorch which will help you spend less time during the assignments on debugging.

### What is PyTorch?

PyTorch is an open-source deep learning framework designed to be flexible and modular, used both in research and production. It supports high-level functionalities such as tensor computation and GPU acceleration. PyTorch is developed and maintained by Meta's AI Research lab.

### Why PyTorch?

PyTorch has certain advantages:
- modular
- dynamic computational graphs
- automatic differentiation
- GPU acceleration
- popularity (both in research and production)

By the end of this lab, you will able to experience most of its advantages and understand what they mean.

### Installation and Setup for Assignments

For all the assignments we will provide you with a _requirements.txt_ file contianing all the required packages and their versions for running the files.

To install the packages in the Python environment associated with the currently active Python interpreter:

```pip install -r requirements.txt ```

If you are using [Anaconda](https://docs.anaconda.com/free/anaconda/install/) to manage your environments run in your terminal:

```conda create --name mlsd --file requirements.txt```

In this way, we ensure that everyone uses the same versions of packages.

Now, for the purpose of this lab, run the following cell to import eveything that we need.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split


# 2. Tensors

A Tensor is a multi-dimensional array/matrix. A Tensor stores and manipulates the data in a more efficient way. The simples way to allocate memory for a tensor is:

In [None]:
# 2d tensor
x = torch.empty(2, 3)
print(x)

In [None]:
# 3D tensor
x = torch.empty(3, 2, 4)
print(x)

In [None]:
# 4D tensor
x = torch.empty(3, 2, 3, 4)
print(x)

Let's take the last tensor and explore it.

In [None]:
# TODO: print the shape of x


In [None]:
# TODO: print the first element in the tensor and its shape


In [None]:
# TODO: print x[0][0], its transpose and the shape of the transpose


In [None]:
# TODO: check the documentation and create a tensor containing just zeros and just ones of different shapes


__Indexing__ in PyTorch has one more dimension than the indexing in NumPy for example.

In [None]:
x = torch.Tensor([
                  [[1, 2], [3, 4]],
                  [[5, 6], [7, 8]], 
                  [[9, 10], [11, 12]] 
                 ])
x

In [None]:
print(x[0, :]) # equivalent to x[0]

In [None]:
x[:, 0, 0] # in words: loop over the first dimension and always select the element x[i][0][0]

In [None]:
# get a list of the second elements of the first rows
x[:, 0, 1]

In [None]:
# TODO: get a list of all the second elements


__Flattening__ a tensor refers to converting a tensor with multiple dimensions into a one-dimensional tensor.

In [None]:
x = torch.empty(3,2,4)
x_flattened = x.flatten()
print("Original tensor:")
print(x)
print(f"Shape: {x.shape}")
print("Flattened tensor:")
print(x_flattened)
print(f"Shape: {x_flattened.shape}")

__Unsqueezing__ a tensor adds an additional dimension to a tensor at a specific position. More precisely, it increases the tensor's rank (number of dimensions) by one.

In [None]:
x = torch.tensor([1,2,3,4])
print(f"Shape: {x.shape}")
x_unsqueezed_0 = x.unsqueeze(0)
x_unsqueezed_1 = x.unsqueeze(-1) # equivalent to x.unsqueeze(1) because it is the last dimension of this tensor
print("Original tensor:")
print(x)
print("Unsqueezed tensor on the first dimension:")
print(x_unsqueezed_0)
print("Unsqueezed tensor on the last dimension:")
print(x_unsqueezed_1)

In [None]:
x = torch.empty(5,3,4)
# TODO: check the documentation to reshape x into a (5,12) tensor


__Element-wise operations__ on tensors range from addition, substraction, multiplication, division. Let's see some examples.

In [None]:
x = torch.tensor([[1, 2], [3, 4]])
y = torch.tensor([[5, 6], [7, 8]])
summing_tensors = x + y
substr_tensors = x - y
mult_tensors = x * y # this is not matrix multiplication
div_tensors = x / y
print('x')
print(x)
print('y')
print(y)
print("Element-wise summing:")
print(summing_tensors)
print("Element-wise substraction:")
print(substr_tensors)
print("Element-wise multiplication:")
print(mult_tensors)
print("Element-wise division:")
print(div_tensors)

__Multiplying matrices__ is easy in PyTorch. All you have to do is match their dimensions! Okay maybe it is a bit complicated when you work in 3d, 4d..

In [None]:
x = torch.empty(10, 32, 64)
y = torch.empty(10, 64, 512)
result = x @ y
print(result.shape)

In [None]:
x = torch.empty(10, 32, 64)
y = torch.empty(512, 64)
# TODO: match dimensions to multiply x and y and obtain a (10, 32, 512) tensor. Hint: transpose.


In [None]:
x = torch.empty(64, 8, 256)
y = torch.empty(512, 10)
# TODO: match dimenstions to muliply x and y and obtain a (256, 10) tensor. Hint: 64*8=512.


PyTorch has __built-in functions__ which are incredibly useful because they are optimized, simple, and dispose of numerical stability checks (e.g. for torch.log, torch.exp, torch.softmax). Such in-build functions include sum, mean, max, min and so on. By using these function, you take advantage of the computational power of PyTorch and allow to apply it per dimension of tensors without the need of unpacking any in between dimensions. Therefore, the computation is speeded up!

In [None]:
x = torch.Tensor([
                  [[1, 2], [3, 4]],
                  [[5, 6], [7, 8]], 
                  [[9, 10], [11, 12]] 
                 ])
print(x)
print(x.shape)

In [None]:
x.sum(dim=0)

In [None]:
print("What is happening behind the scenes for x.sum(dim=0):")
print(x[:, 0, 0])
print(x[:, 0, 0].sum())

print(x[:, 0, 1])
print(x[:, 0, 1].sum())

print(x[:, 1, 0])
print(x[:, 1, 0].sum())

print(x[:, 1, 1])
print(x[:, 1, 1].sum())

In [None]:
x.mean(dim=1)

In [None]:
x.min(dim=2)

In [None]:
torch.manual_seed(1)
x = torch.randint(low=1, high=11, size=(3, 2, 4))
# calculate the mean of all the last rows of each of the three (2,4) elements in the tensor
last_rows = x[:, -1, :]
mean_last_rows = torch.mean(last_rows, dim=1, dtype=float)
print(x)
print(mean_last_rows)

In [None]:
torch.manual_seed(1)
x = torch.randint(low=1, high=11, size=(3, 2, 5), dtype=torch.float64)
# TODO: calculate the mean of all the first rows of each of the three (2,4) elements in the tensor


# 3. Gradients in PyTorch

PyTorch also has a differentiation feature. With a certain flag, PyTorch will remember the gradients. The backward() function requires PyTorch to calculate the gradients, which are then stored in the grad attribute.

In [None]:
x = torch.tensor([2.], requires_grad=True)
y = 5*x*x + 1 # 5x^2 + 1
y.backward()
print(x.grad) # d(y)/d(x) = d(5x^2+1)/d(x) = 10x = 10 * [2.] = [20.]

In [None]:
x = torch.linspace(start=-12, end=10, steps=100).requires_grad_()
y = torch.sigmoid(x) # 1/(1+e^(-x))
scalar_y = torch.sum(y)
scalar_y.backward()
gradients = x.grad

# plot the function y and its first derivative function
plt.plot(x.detach().numpy(), y.detach().numpy(), label='sigmoid function')
plt.plot(x.detach().numpy(), gradients.detach().numpy(), label="first derivative of sigmoid")
plt.legend()

In [None]:
x = torch.linspace(start=-12, end=10, steps=100).requires_grad_()
# TODO: add below another function to check how its first derivative looks like
y = ... # suggestion: inspo from documentation (i.e. torch.relu(x), nn.ELU()(x), nn.PReLU()(x))
scalar_y = torch.sum(y)
scalar_y.backward()
gradients = x.grad

# plot the function y and its first derivative function
plt.plot(x.detach().numpy(), y.detach().numpy(), label='your function')
plt.plot(x.detach().numpy(), gradients.detach().numpy(), label='its first derivative')
plt.legend()

# 4. Modularity in PyTorch

PyTorch is a very modular framework. This makes it eazy to read and to interpret, eazy to connect pieces of networks together. The code below is a quick demo on how a class works, the way to access variables and to declare them. It is a simple class called MyMeal which takes a list of string ingredients as inputs and encodes them in natural numbers. What you can do with a meal is prepare it, meaning that a combination of the ingredients will be randomly selected and set as the meal. The second thing you can do with a meal is eat it, but first it checks if the meal was prepared in the first place.

It is strange to use PyTorch for this task, but it is a good reminder on how classes work. Notice how different torch functions are used.

In [None]:
class MyMeal:
    def __init__(self, ingredients) -> None:
        self.ingredients = ingredients
        self.ing_encodings = torch.tensor(list(range(len(ingredients))))
        self.meal = torch.zeros(len(ingredients))
    def prepare(self, number_of_ingredients = 3):
        combinations = torch.combinations(self.ing_encodings, r=number_of_ingredients)
        rand_idx = torch.randint(high=combinations.size(0), size=(1,)).item()
        self.meal = combinations[rand_idx]
    def eat(self):
        if not torch.all(self.meal == 0).item():
            selected_ingredients = [self.ingredients[idx] for idx in self.meal]
            self.meal = torch.zeros(len(self.ingredients))
            return selected_ingredients
        else:
            return 'Prepare it first!'

In [None]:
ingredients = ['Flour', 'Sugar', 'Eggs', 'Butter']
meal = MyMeal(ingredients)
meal.eat()

In [None]:
meal.prepare()
meal.eat()

For more on how classes work in Python, check [__Practical Programming__](https://pragprog.com/titles/gwpy3/practical-programming-third-edition/).

# 5. Neural Networks

Nueral Networks are eveywhere in artificial intelligence. The main advantage of a neural network is that it can learn to approximate an unknown function able to separate some data. The network adjusts its parameters (weights and biases) during the training phase to minimize the difference between the network's predicted output and the true output of the function.

Putting together everything from the basics of PyTorch, we can now use it to train a neural network for a classification task! In this section we will:
- generate our own training and validation data
- visualize the data
- define our own model architecture
- train the our model
- validate the model
- visualize what has been learned by the network
- save the model and load it for more predictions


### Generate Data

In [None]:
np.random.seed(123)
torch.manual_seed(123)

In [None]:
def sample_circular_data(radius, noise_scale, num_samples):
    angles = np.random.uniform(0, 2 * np.pi, num_samples)
    x = radius * np.cos(angles)
    y = radius * np.sin(angles)
    noise = np.random.normal(scale=noise_scale, size=(num_samples, 2))
    x += noise[:, 0]
    y += noise[:, 1]
    return np.column_stack((x, y))

In [None]:
num_samples = 1000

samples_class1 = sample_circular_data(radius=5, noise_scale=1.5, num_samples=num_samples)
samples_class2 = sample_circular_data(radius=10, noise_scale=1.5, num_samples=num_samples)

# Create dataset with classes
data = np.concatenate((samples_class1, samples_class2), axis=0)
labels = np.concatenate((np.zeros(num_samples), np.ones(num_samples)))

In [None]:
# Split data into training and validation sets
train_data, val_data, train_labels, val_labels = train_test_split(data, labels, test_size=0.2, random_state=42)

In [None]:
# Convert numpy arrays to tensors
train_x = torch.tensor(train_data, dtype=torch.float32)
train_y = torch.tensor(train_labels, dtype=torch.float32).view(-1,1)
val_x = torch.tensor(val_data, dtype=torch.float32)
val_y = val_labels

### Visualize the data

Notice how the data is not linearlly separable. A linear classifier is not able to solve this clssification task without a poor performance. Here is where Neural Networks come in handy.

In [None]:
plt.scatter(samples_class1[:, 0], samples_class1[:, 1], label='Class 1')
plt.scatter(samples_class2[:, 0], samples_class2[:, 1], label='Class 2')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Data Visualization')
plt.axis('equal')
plt.legend()
plt.show()

### Define the Neural Network

Our task is to learn a model to separate the data we visualized above into two classes. Therefore, this is a binary classification task.

We define our own architecture for the Neural Network. It is a simple neural network with:
- three linear layers
    - input size is 2 (since each input has an x-value and a y-value as features)
    - output size is 1 since this is a binary classification and we optimize for an element belonging to a class or not
    - 128, 64 represent the number of hidden units
    - a dropout layer dropping 30% of the nodes during training
- ReLU as the activation function
- Sigmoid is used for the final layer because it is a binary classification task
(note that the Sigmoid function produces proboabilities of one object being in a class)

Feel free to change the number of Linear layers or hidden units.

In the __forward__ function, each layer is passed thorugh the activation funciton, and at the end through the Sigmoid function to produce the probability of the input being in the class.

In [None]:
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.relu = nn.ReLU()
        self.fc1 = nn.Linear(2, 128)
        self.fc2 = nn.Linear(128, 64)
        self.dropout = nn.Dropout(0.3)
        self.fc3 = nn.Linear(64, 1)

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = torch.sigmoid(self.fc3(x))
        return x

In [None]:
# Create an instance of the network
model = SimpleNet()

In [None]:
print(model)

### Train Model

It's time to train the model. We will use a Binary Cross Entropy Loss (i.e. nn.BCELoss()) and a Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.01.

__Important:__ set the model in training mode (i.e. model.train()) when training and in evaluation mode (i.e. model.eval()) when validating or testing the model. Why? For example, Dropout layers are only needed during training. Therefore, setting the flag right will give you the true performance of your model.

In [None]:
# Define the loss function and optimizer
loss_fn = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

In [None]:
# Train the network
num_epochs = 2000

for epoch in range(num_epochs):

    # TODO: set the model in training mode
    

    # Step 1: zero out the gradients of the parameters that are being optimized
    optimizer.zero_grad()

    # Step 2: forward pass
    outputs = model(train_x)
    loss = loss_fn(outputs, train_y)

    # Step 3: backward pass and optimization step
    loss.backward()
    optimizer.step()

    # TODO: set the model in eval mode
    
    
    with torch.no_grad():
        val_outputs = model(val_x)
        val_predictions = (val_outputs >= 0.5).squeeze().numpy()
    
    val_accuracy = (val_predictions == val_y).mean()


    if (epoch+1) % 100 == 0:
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}, Val: {val_accuracy}")

In [None]:
# Plot the decision boundary
x_min, x_max = data[:, 0].min() - 1, data[:, 0].max() + 1
y_min, y_max = data[:, 1].min() - 1, data[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))
grid_tensor = torch.tensor(np.c_[xx.ravel(), yy.ravel()], dtype=torch.float32)
Z = model(grid_tensor).detach().numpy().reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.8)
plt.scatter(samples_class1[:, 0], samples_class1[:, 1], label='Class 1')
plt.scatter(samples_class2[:, 0], samples_class2[:, 1], label='Class 2')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Decision Boundary')
plt.legend()
plt.show()

# 6. Save and Load a Model (reusing the pretrained weigths)

### Save Model

In [None]:
# Save the trained model
torch.save(model.state_dict(), 'model.pth')

### Load Model weights

In [None]:
# Load the saved model
loaded_model = SimpleNet()
loaded_model.load_state_dict(torch.load('model.pth'))

### Use the weights for predictions

In [None]:
# TODO: generate some play test data in the same way as above and check how your loaded model performs on it

num_samples = 500

new_samples_class1 = ...
new_samples_class2 = ...

# Create dataset with classes
new_data = np.concatenate((new_samples_class1, new_samples_class2), axis=0)
data_y = np.concatenate((np.zeros(num_samples), np.ones(num_samples)))

data_x = torch.tensor(new_data, dtype=torch.float32)

model.eval()
    
with torch.no_grad():
    outputs = model(data_x)
    predictions = (outputs >= 0.5).squeeze().numpy()

accuracy = (predictions == data_y).mean()
print(f"Accuracy: {accuracy}")

# 7. More resources

- [Official PyTorch Tutorial](https://pytorch.org/tutorials/beginner/introyt/tensors_deeper_tutorial.html)
- [Standford Notebook on PyTorch](https://web.stanford.edu/class/cs224n/materials/CS224N_PyTorch_Tutorial.html)
- [UvA PyTorch Notebook](https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial2/Introduction_to_PyTorch.html)
- [Kaggle notebook on how gradinet descent works in neural networks](https://www.kaggle.com/code/jhoward/how-does-a-neural-net-really-work)