<div class="alert alert-block alert-info">
<b>Deadline:</b> February 4, 2026 (Wednesday) 23:59
</div>

# Exercise 2. Recommender system

In this exercise, your task is to design a recommender system.

## Learning goals:
* Practise tuning a neural network model by using different regularization methods.

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import tools
import data

In [2]:
skip_training = True  # Set this flag to True before validation and submission

In [3]:
# During evaluation, this cell sets skip_training to True
# skip_training = True

import tools, warnings
warnings.showwarning = tools.customwarn

In [4]:
# When running on your own computer, you can specify the data directory by:
# data_dir = tools.select_data_dir('/your/local/data/directory')
data_dir = tools.select_data_dir()

The data directory is /coursedata


In [5]:
# Select the device for training (use GPU if you have one)
#device = torch.device('cuda:0')
device = torch.device('cpu')

In [6]:
if skip_training:
    # The models are always evaluated on CPU
    device = torch.device("cpu")

## Ratings dataset

We will train the recommender system on the dataset in which element consists of three values:
* `user_id` - id of the user (the smallest user id is 1)
* `item_id` - id of the item (the smallest item id is 1)
* `rating` - rating given by the user to the item (ratings are integer numbers between 1 and 5).

The recommender system need to predict the rating for any given pair of `user_id` and `item_id`.

We measure the quality of the predicted ratings using the mean-squared error (MSE) loss:
$$
  \frac{1}{N}\sum_{i=1}^N (r_i - \hat{r}_i)^2
$$
where $r_i$ is a real rating and $\hat{r}_i$ is a predicted one.

Note: The predicted rating $\hat{r}_i$ does not have to be an integer number.

In [12]:
trainset = data.RatingsData(root=data_dir, train=True)
testset = data.RatingsData(root=data_dir, train=False)

In [13]:
# Print one sample from the dataset
x = trainset[0]
print(f'user_id={x[0]}, item_id={x[1]}, rating={x[2]}')


user_id=1, item_id=1, rating=5


# Model

You need to design a recommender system model with the API described in the cell below.

Hints on the model architecture:
* You need to use [torch.nn.Embedding](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html?highlight=embedding#torch.nn.Embedding) layer to convert inputs `user_ids` and `item_ids` into reasonable representations. The idea of the embedding layer is that we want to represent similar users with values that are close to each other. The original representation as integers is not good for that. By using the embedding layer, we can learn such useful representations automatically.

### Model tuning

In this exercise, you need to tune the architecture of your model to achieve the best performance on the provided test set. You will notice that overfitting is a severe problem for this data: The model can easily overfit the training set producing poor accuracy on the out-of-training (test) data.

You need to find an optimal combination of the hyperparameters, with some hyperparameters corresponding to the regularization techniques that we studied in the lecture.

The hyperparameters that you are advised to consider:
* Learning rate value and learning rate schedule (decresing the learning rate often has positive effect on the model performance)
* Number of training epochs
* Network size
* Weight decay
* Early stopping
* Dropout
* Increase amount of data:
  * Data augmentation
  * Injecting noise

You can tune the hyperparameters by, for example, grid search, random search or manual tuning. In that case, you can use `architecture` argument to specify the hyperparameters that define the architecture of your network. After you have tuned the hyperparameters, set the default value of this argument to the optimal set of the hyparameters so that the best architecture is used in the accuracy tests.

Note:
* The number of points that you will get from this exercise depends on the MSE loss on the test set:
  * below 1.00: 1 point
  * below 0.95: 2 points
  * below 0.92: 3 points
  * below 0.90: 4 points
  * below 0.89: 5 points
  * below 0.88: 6 points 

In [39]:
class RecommenderSystem(nn.Module):
    def __init__(self, n_users, n_items, architecture=None):
        super(RecommenderSystem, self).__init__()
        
        # Default hyperparameters based on typical tuning for this task
        if architecture is None:
            architecture = {
                'embedding_dim': 32,
                'dropout': 0.1
            }
        
        self.embedding_dim = architecture.get('embedding_dim', 32)
        dropout_p = architecture.get('dropout', 0.1)

        # User and Item Embeddings
        # We use n+1 because IDs start from 1
        self.user_emb = nn.Embedding(n_users+1, self.embedding_dim)
        self.item_emb = nn.Embedding(n_items+1, self.embedding_dim)
        
        # Bias terms (Crucial for capturing baseline popularity/activity)
        self.user_bias = nn.Embedding(n_users+1, 1)
        self.item_bias = nn.Embedding(n_items+1, 1)
        
        # Regularization
        self.dropout = nn.Dropout(dropout_p)

        # Initialize weights (Small random values help convergence)
        self.user_emb.weight.data.uniform_(0, 0.05)
        self.item_emb.weight.data.uniform_(0, 0.05)
        self.user_bias.weight.data.zero_()
        self.item_bias.weight.data.zero_()

    def forward(self, user_ids, item_ids):
        # 1. Lookup Embeddings
        user_ids=user_ids.long()
        item_ids=item_ids.long()
        user_vecs = self.user_emb(user_ids)
        item_vecs = self.item_emb(item_ids)
        
        # 2. Apply Dropout to prevent over-reliance on specific latent features
        user_vecs = self.dropout(user_vecs)
        item_vecs = self.dropout(item_vecs)
        
        # 3. Dot Product: (batch, dim) * (batch, dim) -> sum across dim
        # This represents the interaction between user preferences and item features
        dot_prod = (user_vecs * item_vecs).sum(dim=1, keepdim=True)
        
        # 4. Add Biases: dot_prod + u_bias + i_bias
        # Biases capture that some users are "easy raters" and some items are "universally liked"
        out = dot_prod + self.user_bias(user_ids) + self.item_bias(item_ids)
        
        # Flatten to (batch_size)
        return out.squeeze()

You can test the shapes of the model outputs using the function below.

In [40]:
def test_RecommenderSystem_shapes():
    n_users, n_items = 100, 1000
    model = RecommenderSystem(n_users, n_items)
    batch_size = 10
    user_ids = torch.arange(1, batch_size+1)
    item_ids = torch.arange(1, batch_size+1)
    output = model(user_ids, item_ids)
    print(output.shape)
    assert output.shape == torch.Size([batch_size]), "Wrong output shape."
    print('Success')

test_RecommenderSystem_shapes()

torch.Size([10])
Success


In [41]:
# This cell is reserved for testing

## Train the model

You need to train a recommender system using **only the training data.** Please use the test set to select the best model: the model that generalizes best to out-of-training data.

**IMPORTANT**:
* During testing, the predictions are produced by `predictions = model(user_ids, item_ids)` with the `user_ids` and `item_ids` loaded from `RatingsData`.
* There is a size limit of 30Mb for saved models.

In [42]:
# Create the model
# IMPORTANT: the default value of the architecture argument should define your best model.
model = RecommenderSystem(trainset.n_users, trainset.n_items)

In [44]:

if not skip_training :
    epochs=100
    batch_size=256
    lr=1e-3
    wd=1e-4
    criterion = nn.MSELoss()
    optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=wd)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=3, factor=0.5)
    
    best_val_loss = float('inf')
    early_stopping_patience = 7
    patience_counter = 0

    for epoch in range(epochs):
        model.train()
        train_losses = []
        
        # Manual Shuffling
        indices = np.random.permutation(len(trainset))
        
        # Manual Batching Loop
        for i in range(0, len(trainset), batch_size):
            batch_indices = indices[i : i + batch_size]
            
            # Extract and stack the data: [user_id, item_id, rating]
            # We convert to tensors here because they aren't batched yet
            batch_data = [trainset[idx] for idx in batch_indices]
            u = torch.tensor([x[0] for x in batch_data]).long()
            i_ids = torch.tensor([x[1] for x in batch_data]).long()
            r = torch.tensor([x[2] for x in batch_data]).float()

            optimizer.zero_grad()
            preds = model(u, i_ids)
            
            loss = criterion(preds.view(-1), r.view(-1))
            loss.backward()
            optimizer.step()
            train_losses.append(loss.item())

        # --- Validation ---
        model.eval()
        val_losses = []
        with torch.no_grad():
            # Process validation in one or few large chunks
            u_v = torch.tensor([x[0] for x in testset]).long()
            i_v = torch.tensor([x[1] for x in testset]).long()
            r_v = torch.tensor([x[2] for x in testset]).float()
            
            preds_v = model(u_v, i_v)
            v_loss = criterion(preds_v.view(-1), r_v.view(-1))
            val_losses.append(v_loss.item())
        
        avg_val_loss = np.mean(val_losses)
        print(f"Epoch {epoch+1}: Test MSE {avg_val_loss:.4f}")

        # Learning Rate and Early Stopping logic
        scheduler.step(avg_val_loss)
        if avg_val_loss < best_val_loss:
            best_val_loss = avg_val_loss
            torch.save(model.state_dict(), 'best_model.pth')
            patience_counter = 0
        else:
            patience_counter += 1
            if patience_counter >= early_stopping_patience:
                break



Epoch 1: Val MSE 1.9681
Epoch 2: Val MSE 1.3020
Epoch 3: Val MSE 1.0865
Epoch 4: Val MSE 0.9969
Epoch 5: Val MSE 0.9541
Epoch 6: Val MSE 0.9304
Epoch 7: Val MSE 0.9184
Epoch 8: Val MSE 0.9095
Epoch 9: Val MSE 0.9065
Epoch 10: Val MSE 0.9020
Epoch 11: Val MSE 0.9020
Epoch 12: Val MSE 0.9004
Epoch 13: Val MSE 0.8984
Epoch 14: Val MSE 0.8978
Epoch 15: Val MSE 0.8971
Epoch 16: Val MSE 0.8973
Epoch 17: Val MSE 0.8965
Epoch 18: Val MSE 0.8950
Epoch 19: Val MSE 0.8954
Epoch 20: Val MSE 0.8939
Epoch 21: Val MSE 0.8955
Epoch 22: Val MSE 0.8933
Epoch 23: Val MSE 0.8933
Epoch 24: Val MSE 0.8956
Epoch 25: Val MSE 0.8920
Epoch 26: Val MSE 0.8892
Epoch 27: Val MSE 0.8879
Epoch 28: Val MSE 0.8852
Epoch 29: Val MSE 0.8844
Epoch 30: Val MSE 0.8832
Epoch 31: Val MSE 0.8784
Epoch 32: Val MSE 0.8750
Epoch 33: Val MSE 0.8713
Epoch 34: Val MSE 0.8687
Epoch 35: Val MSE 0.8668
Epoch 36: Val MSE 0.8654
Epoch 37: Val MSE 0.8647
Epoch 38: Val MSE 0.8627
Epoch 39: Val MSE 0.8599
Epoch 40: Val MSE 0.8606
Epoch 41:

In [45]:
# Save the model to disk (the pth-files will be submitted automatically together with your notebook)
# Set confirm=False if you do not want to be asked for confirmation before saving.
if not skip_training:
    tools.save_model(model, 'recsys.pth', confirm=True)

Do you want to save the model (type yes to confirm)?  yes


Model saved to recsys.pth.


In [46]:
# This cell loads your best model
if skip_training:
    model = RecommenderSystem(trainset.n_users, trainset.n_items)
    tools.load_model(model, 'recsys.pth', device)

The next cell tests the accuracy of your best model. It is enough to submit .pth files.

**IMPORTANT**:
* During testing, the predictions are produced by `predictions = model(user_ids, item_ids)` with the `user_ids` and `item_ids` loaded from `RatingsData`.
* There is a size limit of 30Mb for saved models. Please make sure that your model loads in the cell above.

In [None]:
# This cell tests the accuracy of your best model.

In [None]:
# This cell is reserved for grading

In [None]:
# This cell is reserved for grading

In [None]:
# This cell is reserved for grading

In [None]:
# This cell is reserved for grading

In [None]:
# This cell is reserved for grading

In [None]:
# This cell is reserved for grading