### Background



**MovieLens** is a movie recommendation system operated by GroupLens, a research group at the University of Minnesota. 

1. Propose and implement your own recommendation system based on the MovieLens dataset. Use `ratings_train.csv` as the training set, `ratings_valid.csv` as the validation set. Your system may use information from `movies.csv` and `tags.csv` to conduct recommendations. The undisclosed test set will be used to evaluate your system.
   - The data file structure is available at https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html. 
   - The main goal of the recommendation system is to minimize the root-mean-square error.
   - The implementation should include a function named `predict_rating`. This function accepts a DataFrame with two columns `userId` and `movieId`. Then, the function adds a column named `rating` storing a predicted rating of a `movieId` by a `userId`.
   - Your program must return a root-mean-square error value when the validation set is changed to another file. Otherwise, your score will be deducted by 50%.
   - You must modify the given program to make better recommendations. Submitting the original program without modification is considered plagiarism.
2. Prepare slides for a 7-minute presentation to explain your proposed technique and algorithm to conduct recommendation, and show your RMSE results on the validation set.
3. Submit all required documents by April 30, 2023; 23:59. Late submission will not be accepted and will be marked 0. Do not wait until the last minute. Plagiarism and code duplication will be checked. 
4. Present your work on May 1, 2023 within 7 minutes. Exceeding 7 minutes will be subject to point deduction.

### Loading data

In [1]:
import numpy as np
import pandas as pd

In [2]:
ratings_train = pd.read_csv('datasets/ratings_train.csv')
ratings_valid = pd.read_csv('datasets/ratings_valid.csv')
movies = pd.read_csv('datasets/movies.csv')

### Constructing model and predicting ratings

In [3]:
# Model construction
avg_rating = ratings_train[['movieId', 'rating']].groupby(by='movieId').mean()
	    
# Prediction
def predict_rating(df):
    # Input: 
	# 	df = a dataframe with two columns: userId, movieId
	# Output:
	#   a dataframe with three columns: userId, movieId, rating
	return df.join(avg_rating, on='movieId')


In [4]:
# Prepare df for prediction
r = ratings_valid[['userId', 'movieId']]

# Predict ratings
ratings_pred = predict_rating(r)

In [5]:
from sklearn.metrics import mean_squared_error

r_true = ratings_valid['rating'].to_numpy()
r_pred = ratings_pred['rating'].to_numpy()

rmse = mean_squared_error(r_true, r_pred, squared=False)
print(f"RMSE = {rmse:.4f}")

RMSE = 0.9171


### Recommendation systen based on Transformer Architecture - Prototype

It doesn't work rn, in case you can't see XDDD

In [6]:
# Load the dataset using pandas
import pandas as pd

train_data_tensor = pd.read_csv("datasets/ratings_train.csv")
test_data = pd.read_csv("datasets/ratings_valid.csv")

# Preprocess the data
# ...
from sklearn.model_selection import train_test_split

train_data_tensor, test_data = train_test_split(train_data_tensor, test_size=0.2)

# Build the Transformer-based model using PyTorch
import torch
import torch.nn as nn
import torch.optim as optim

# Define the hyperparameters and optimizer for the model
input_dim = 24
block_size = 3
batch_size = 64
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'




In [7]:
train_data_tensor.drop(['timestamp'], axis=1, inplace=True)
test_data.drop(['timestamp'], axis=1, inplace=True)
train_data_tensor = torch.tensor(train_data_tensor.values, dtype=torch.float)
test_data_tensor = torch.tensor(test_data.values, dtype=torch.float)

def get_batch(split):
    # generate a small batch of data of inputs x and targets y
    data = train_data_tensor if split == 'train' else test_data_tensor
    ix = torch.randint(len(data) - block_size, (batch_size,))
    x = torch.stack([data[i:i+block_size] for i in ix])
    y = torch.stack([data[i+1:i+block_size+1] for i in ix])
    x, y = x.to(device), y.to(device)
    return x, y

In [8]:

transformer_model = nn.Transformer(d_model=3, nhead=3, num_encoder_layers=12)
transformer_model.to(device)

Transformer(
  (encoder): TransformerEncoder(
    (layers): ModuleList(
      (0-11): 12 x TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=3, out_features=3, bias=True)
        )
        (linear1): Linear(in_features=3, out_features=2048, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (linear2): Linear(in_features=2048, out_features=3, bias=True)
        (norm1): LayerNorm((3,), eps=1e-05, elementwise_affine=True)
        (norm2): LayerNorm((3,), eps=1e-05, elementwise_affine=True)
        (dropout1): Dropout(p=0.1, inplace=False)
        (dropout2): Dropout(p=0.1, inplace=False)
      )
    )
    (norm): LayerNorm((3,), eps=1e-05, elementwise_affine=True)
  )
  (decoder): TransformerDecoder(
    (layers): ModuleList(
      (0-5): 6 x TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=3, out_features=3,

In [34]:
import math
import time


ntokens = 9
sequence_length = 9
criterion = nn.MSELoss()
epochs = 40
lr = 20 

def train():
    # Turn on training mode which enables dropout.
    transformer_model.train()
    total_loss = 0.
    start_time = time.time()

    for batch, _ in enumerate(range(0, train_data_tensor.size(0) - 1, sequence_length)):
        data, targets = get_batch('train')
        # Starting each batch, we detach the hidden state from how it was previously produced.
        # If we didn't, the model would try backpropagating all the way to start of the dataset.
        transformer_model.zero_grad()
        output = transformer_model(data, targets)
        output = output.view(-1, ntokens)
        targets = targets.view(64, 9)
        loss = criterion(output, targets)
        loss.backward()

        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        torch.nn.utils.clip_grad_norm_(transformer_model.parameters(), 0.25) # 0.25 -> gradient clip
        for p in transformer_model.parameters():
            p.data.add_(p.grad, alpha=-lr)

        total_loss += loss.item()

        if batch % 100 == 0 and batch > 0:
            cur_loss = total_loss / 100
            elapsed = time.time() - start_time
            print('| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.2f} | ms/batch {:5.2f} | '
                    'loss {:5.2f} | ppl {:8.2f}'.format(
                epochs, batch, len(train_data_tensor) // sequence_length, lr,
                elapsed * 1000 / 100, cur_loss, math.exp(cur_loss)))
            total_loss = 0
            start_time = time.time()

In [35]:

def evaluate(data_source):
    # Turn on evaluation mode which disables dropout.
    transformer_model.eval()
    total_loss = 0.
    with torch.no_grad():
        for i in range(0, data_source.size(0) - 1, sequence_length):
            data, targets = get_batch(data_source, i)
            output = transformer_model(data)
            output = output.view(-1, ntokens)
            targets = targets.view(64, 9)
            total_loss += len(data) * criterion(output, targets.view(64, 9)).item()
    return total_loss / (len(data_source) - 1)

In [37]:
best_val_loss = None
# At any point you can hit Ctrl + C to break out of training early.
try:
    for epoch in range(1, epochs+1):
        epoch_start_time = time.time()
        train()
        val_loss = evaluate(test_data_tensor)
        print('-' * 89)
        print('| end of epoch {:3d} | time: {:5.2f}s | valid loss {:5.2f} | '
                'valid ppl {:8.2f}'.format(epoch, (time.time() - epoch_start_time),
                                           val_loss, math.exp(val_loss)))
        print('-' * 89)
        # Save the model if the validation loss is the best we've seen so far.
        if not best_val_loss or val_loss < best_val_loss:
            best_val_loss = val_loss
        else:
            # Anneal the learning rate if no improvement has been seen in the validation dataset.
            lr /= 4.0
except KeyboardInterrupt:
    print('-' * 89)
    print('Exiting from training early')

torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])
torch.Size([64, 9]) torch.Size([64, 9])


Traceback (most recent call last):
  File "_pydevd_bundle/pydevd_cython.pyx", line 577, in _pydevd_bundle.pydevd_cython.PyDBFrame._handle_exception
  File "_pydevd_bundle/pydevd_cython.pyx", line 312, in _pydevd_bundle.pydevd_cython.PyDBFrame.do_wait_suspend
  File "c:\Users\Reychard\Cineflexi\.venv\lib\site-packages\debugpy\_vendored\pydevd\pydevd.py", line 2070, in do_wait_suspend
    keep_suspended = self._do_wait_suspend(thread, frame, event, arg, suspend_type, from_this_thread, frames_tracker)
  File "c:\Users\Reychard\Cineflexi\.venv\lib\site-packages\debugpy\_vendored\pydevd\pydevd.py", line 2106, in _do_wait_suspend
    time.sleep(0.01)
KeyboardInterrupt
