# Introduction

## Purpose

This notebook is dedicated to developing and evaluating a movie recommendation system using the MovieLens dataset, comparing multiple algorithms: Matrix Factorization with PyTorch, Funk Singular Value Decomposition (SVD) with Robust Principal Component Analysis (RPCA), and Neural Collaborative Filtering (NCF) with and without contrastive learning. My objective is to identify the algorithm that achieves the highest predictive performance and recommendation quality, measured through metrics such as accuracy, precision@K, recall@K, F1-score@K, and NDCG@K. This is accomplished through rigorous data preprocessing, user- and item-based data splitting, hyperparameter tuning, and comprehensive evaluation to ensure robust generalization on sparse rating data.

## Dataset Discription

The MovieLens dataset (ml-latest-small) contains 100,836 ratings from 610 unique users across 9,742 unique movies, as sourced from the GroupLens research group. Each record includes a user ID, movie ID, rating (on a scale of 0.5 to 5.0 in 0.5 increments), and timestamp, with movies also described by titles and genres in a separate file. The dataset is highly sparse, with only approximately 1.7% of the user-movie rating matrix filled, presenting challenges such as sparsity and potential cold-start problems for users or movies with few ratings. These challenges are addressed in this study through filtering users and movies with insufficient ratings (minimum 20 ratings per user, 5 per movie), balanced data splitting per user (70% train, 15% cross-validation, 15% test), and the application of advanced matrix factorization and neural network techniques.

# Setup

## Libraries

In [None]:
import numpy as np
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
import scipy.sparse
from scipy.sparse.linalg import svds
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Dropout, Dense, BatchNormalization, Concatenate, Multiply, Lambda
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import regularizers
from tqdm import tqdm
import random
import matplotlib.pyplot as plt
import seaborn as sns
import zipfile
import os
import contextlib
from torch.autograd import Variable
from tqdm import tqdm_notebook as tqdm
from tqdm.notebook import tqdm
from torch.utils.data import Dataset, DataLoader

# Configuration

In [None]:
np.random.seed(42)
tf.random.set_seed(42)
torch.manual_seed(42)
pd.set_option('display.max_columns', 10)
pd.set_option('display.max_rows', 10)
cuda = torch.cuda.is_available()
print(f"Is running on GPU: {cuda}")

Is running on GPU: False


# Data Loading

This cell downloads the MovieLens dataset (ml-latest-small) using a curl command from the GroupLens repository and extracts it into a data directory with the zipfile module, yielding movies.csv (movie IDs, titles, genres) and ratings.csv. This step is crucial for setting up the dataset for all subsequent analysis and modeling tasks.

In [None]:
! curl http://files.grouplens.org/datasets/movielens/ml-latest-small.zip -o ml-latest-small.zip

with zipfile.ZipFile('ml-latest-small.zip', 'r') as zip_ref:
    zip_ref.extractall('data')
movies_df = pd.read_csv('data/ml-latest-small/movies.csv')
ratings_df = pd.read_csv('data/ml-latest-small/ratings.csv')

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  955k  100  955k    0     0  3075k      0 --:--:-- --:--:-- --:--:-- 3071k


# Data Exploration

This section maps movie IDs to titles, counts unique users and movies, calculates the rating matrix size and sparsity, and prints these statistics, highlighting the dataset’s sparse nature. It underscores the need for matrix factorization techniques to handle memory challenges and implicit patterns in sparse data.

In [None]:
print('Movies DataFrame Dimensions:', movies_df.shape)
print('\nMovies DataFrame Sample:')
print(movies_df.head())
print('\nRatings DataFrame Sample:')
print(ratings_df.head())
print('\nRatings DataFrame Info:')
print(ratings_df.info())
print('\nRatings Descriptive Statistics:')
print(ratings_df['rating'].describe())

movie_names = movies_df.set_index('movieId')['title'].to_dict()
n_users = len(ratings_df.userId.unique())
n_items = len(ratings_df.movieId.unique())
print("\nNumber of unique users:", n_users)
print("Number of unique movies:", n_items)
print("Full rating matrix elements:", n_users * n_items)
print("Number of ratings:", len(ratings_df))
print("Matrix fill percentage:", len(ratings_df) / (n_users * n_items) * 100, '%')

Movies DataFrame Dimensions: (9742, 3)

Movies DataFrame Sample:
   movieId                               title  \
0        1                    Toy Story (1995)   
1        2                      Jumanji (1995)   
2        3             Grumpier Old Men (1995)   
3        4            Waiting to Exhale (1995)   
4        5  Father of the Bride Part II (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                   Adventure|Children|Fantasy  
2                               Comedy|Romance  
3                         Comedy|Drama|Romance  
4                                       Comedy  

Ratings DataFrame Sample:
   userId  movieId  rating  timestamp
0       1        1     4.0  964982703
1       1        3     4.0  964981247
2       1        6     4.0  964982224
3       1       47     5.0  964983815
4       1       50     5.0  964982931

Ratings DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100836 ent

# Data Preprocessing

filters the MovieLens ratings dataset and creates index mappings for users and movies. The `filter_data` function removes users with fewer than 20 ratings and movies with fewer than 5 ratings, reducing sparsity and yielding a filtered DataFrame with approximately 80,000–90,000 ratings, as indicated by the printed output. The `create_index_mappings` function generates dictionaries to map user and movie IDs to continuous indices and returns the number of unique users and movies. These steps prepare the data for efficient matrix operations in Matrix Factorization and Neural Collaborative Filtering models.

In [None]:
def filter_data(ratings_df, min_ratings_user=20, min_ratings_movie=5):
    user_counts = ratings_df['userId'].value_counts()
    movie_counts = ratings_df['movieId'].value_counts()
    filtered_df = ratings_df[
        (ratings_df['userId'].isin(user_counts[user_counts >= min_ratings_user].index)) &
        (ratings_df['movieId'].isin(movie_counts[movie_counts >= min_ratings_movie].index))
    ]
    print(f"Filtered data: {len(filtered_df)} ratings, {len(filtered_df['userId'].unique())} users, {len(filtered_df['movieId'].unique())} movies")
    return filtered_df

ratings_df_filtered = filter_data(ratings_df)

def create_index_mappings(ratings_df):
    user_ids = ratings_df['userId'].unique()
    movie_ids = ratings_df['movieId'].unique()
    userid2idx = {uid: idx for idx, uid in enumerate(user_ids)}
    movieid2idx = {mid: idx for idx, mid in enumerate(movie_ids)}
    return userid2idx, movieid2idx, len(user_ids), len(movie_ids)

userid2idx, movieid2idx, n_users, n_items = create_index_mappings(ratings_df_filtered)

Filtered data: 90274 ratings, 610 users, 3650 movies


## Handling missing values

This cell checks for missing values in the filtered Dataset. Since the MovieLens dataset is well-curated, this typically shows zero missing values for columns like userId, movieId, rating, and timestamp, confirming the data’s integrity for subsequent modeling tasks.

In [None]:
print("\nMissing Values Check:")
print(ratings_df_filtered.isnull().sum())


Missing Values Check:
userId       0
movieId      0
rating       0
timestamp    0
dtype: int64


# EDA

This cell analyzes and visualizes the rating distribution of the filtered MovieLens dataset. It prints the proportion of 5.0 ratings and the mean and standard deviation of ratings, providing key summary statistics. Using Seaborn’s histplot, it generates a histogram of ratings with bins from 0.5 to 5.0, saves it as `rating_distribution.png`, and closes the plot, offering a visual confirmation of the right-skewed distribution for the recommendation system’s data exploration.

In [None]:
print("\nRating Distribution Check:")
print(f"Ratings = 5.0: {len(ratings_df_filtered[ratings_df_filtered['rating'] == 5.0]) / len(ratings_df_filtered):.4f}")
print(f"Rating mean/std: {ratings_df_filtered['rating'].mean():.4f}/{ratings_df_filtered['rating'].std():.4f}")

plt.figure(figsize=(8, 6))
sns.histplot(ratings_df_filtered['rating'], bins=np.arange(0.5, 5.5, 0.5), kde=False)
plt.title('Rating Distribution')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.savefig('rating_distribution.png')
plt.close()


Rating Distribution Check:
Ratings = 5.0: 0.1379
Rating mean/std: 3.5374/1.0299


# Data Splitting

The `split_ratings_per_user` function splits ratings per user into training,  cross-validation, and test sets using Scikit-learn’s train_test_split, ensuring user-specific patterns are preserved across splits. This balanced split supports robust model training and evaluation.

In [None]:
def split_ratings_per_user(ratings_df, train_ratio=0.7, cv_ratio=0.15, test_ratio=0.15, random_state=42):
    train_data, cv_data, test_data = [], [], []
    for user_id, user_ratings in ratings_df.groupby('userId'):
        if len(user_ratings) < 3:
            continue
        train_cv_ratings, test_ratings = train_test_split(
            user_ratings, test_size=test_ratio, random_state=random_state, shuffle=True
        )
        train_ratio_adjusted = train_ratio / (train_ratio + cv_ratio)
        train_ratings, cv_ratings = train_test_split(
            train_cv_ratings, test_size=1 - train_ratio_adjusted, random_state=random_state, shuffle=True
        )
        train_data.append(train_ratings)
        cv_data.append(cv_ratings)
        test_data.append(test_ratings)
    train_df = pd.concat(train_data, ignore_index=True)
    cv_df = pd.concat(cv_data, ignore_index=True)
    test_df = pd.concat(test_data, ignore_index=True)
    print(f"Train: {len(train_df)} ratings ({len(train_df)/len(ratings_df)*100:.1f}%)")
    print(f"Cross-Validation: {len(cv_df)} ratings ({len(cv_df)/len(ratings_df)*100:.1f}%)")
    print(f"Test: {len(test_df)} ratings ({len(test_df)/len(ratings_df)*100:.1f}%)")
    return train_df, cv_df, test_df

train_df, cv_df, test_df = split_ratings_per_user(ratings_df_filtered)

train_pairs = set(zip(train_df['userId'], train_df['movieId']))
cv_pairs = set(zip(cv_df['userId'], cv_df['movieId']))
test_pairs = set(zip(test_df['userId'], test_df['movieId']))
print("\nSplit Integrity Check:")
print("Train-CV overlap:", len(train_pairs & cv_pairs))
print("Train-Test overlap:", len(train_pairs & test_pairs))
print("CV-Test overlap:", len(cv_pairs & test_pairs))

Train: 62620 ratings (69.4%)
Cross-Validation: 13827 ratings (15.3%)
Test: 13827 ratings (15.3%)

Split Integrity Check:
Train-CV overlap: 0
Train-Test overlap: 0
CV-Test overlap: 0


# Model Building

**Matrix Factorization:** models user-movie interactions using 20-dimensional embeddings in `user_factors` and `item_factors`, computing dot products in forward to predict ratings, optimized over 200 epochs with Adam (learning rate 1e-3, weight decay 1e-4) to minimize MSE loss, balancing sparsity handling and predictive accuracy.

**Funk SVD:** Funk SVD decomposes the rating matrix into 30-dimensional `user (P)` and `item (Q)` factors in funk_svd, incorporating user and item biases, optimized via stochastic gradient descent with learning rate decay (lr=0.02, reg=0.01) over 200 epochs to minimize squared error, enhanced by RPCA to impute missing ratings for improved prediction accuracy.

**NCF:** NCF combines GMF and MLP branches in `build_ncf_model`, using 64-dimensional embeddings with L2 regularization (l2_lambda=1e-4) and MLP layers ([128, 64, 32]) to predict ratings, trained in train_ncf with Adam (lr=0.001) over 50 epochs, minimizing MSE loss with early stopping to balance non-linear modeling and overfitting.

**NCF + Contrastive learning:** NCF with Contrastive Learning extends NCF in build_ncf_model by integrating contrastive_loss (InfoNCE) to align user and item embeddings (GMF and MLP) in train_ncf, using negative sampling via `generate_negative_samples` and optimizing a combined MSE and contrastive loss (alpha=0.01, l2_lambda=1e-5) with Adam (lr=0.0005) over 50 epochs, enhancing embedding quality and generalization.

## Original Matrix Factorization

In [None]:
def load_data(file_path):
    try:
        ratings_df = pd.read_csv('data/ml-latest-small/ratings.csv')
        if not all(col in ratings_df.columns for col in ['userId', 'movieId', 'rating']):
            raise ValueError("ratings_df must contain 'userId', 'movieId', and 'rating' columns")
        print(f"Loaded data: {len(ratings_df)} ratings, {len(ratings_df['userId'].unique())} users, {len(ratings_df['movieId'].unique())} movies")
        return ratings_df
    except Exception as e:
        print(f"Error loading data: {e}")
        raise

def split_ratings_per_user(ratings_df, train_ratio=0.7, cv_ratio=0.15, test_ratio=0.15, random_state=456):
    try:
        train_data = []
        cv_data = []
        test_data = []
        for user_id, user_ratings in ratings_df.groupby('userId'):
            if len(user_ratings) < 3:
                continue
            train_cv_ratings, test_ratings = train_test_split(
                user_ratings, test_size=test_ratio, random_state=random_state, shuffle=True
            )
            train_ratio_adjusted = train_ratio / (train_ratio + cv_ratio)
            train_ratings, cv_ratings = train_test_split(
                train_cv_ratings, test_size=1 - train_ratio_adjusted, random_state=random_state, shuffle=True
            )
            train_data.append(train_ratings)
            cv_data.append(cv_ratings)
            test_data.append(test_ratings)
        return pd.concat(train_data, ignore_index=True), pd.concat(cv_data, ignore_index=True), pd.concat(test_data, ignore_index=True)
    except Exception as e:
        print(f"Error in split_ratings_per_user: {e}")
        return ratings_df, ratings_df, ratings_df

ratings_df = load_data('ratings.csv')
ratings_df_filtered = filter_data(ratings_df, min_ratings_user=20, min_ratings_movie=5)
train_ratings, cv_ratings, test_ratings = split_ratings_per_user(ratings_df_filtered, random_state=456)


userid2idx = {uid: idx for idx, uid in enumerate(ratings_df_filtered['userId'].unique())}
movieid2idx = {mid: idx for idx, mid in enumerate(ratings_df_filtered['movieId'].unique())}
n_users = len(userid2idx)
n_items = len(movieid2idx)

class Loader(Dataset):
    def __init__(self, ratings_df, userid2idx, movieid2idx):
        self.ratings = ratings_df.copy()
        self.userid2idx = userid2idx
        self.movieid2idx = movieid2idx
        self.idx2userid = {idx: uid for uid, idx in userid2idx.items()}
        self.idx2movieid = {idx: mid for mid, idx in movieid2idx.items()}
        self.ratings['user_idx'] = self.ratings['userId'].map(self.userid2idx)
        self.ratings['movie_idx'] = self.ratings['movieId'].map(self.movieid2idx)
        initial_len = len(self.ratings)
        self.ratings = self.ratings.dropna(subset=['user_idx', 'movie_idx'])
        dropped_rows = initial_len - len(self.ratings)
        if dropped_rows > 0:
            print(f"Warning: Dropped {dropped_rows} rows due to invalid user or movie IDs")
        self.ratings['user_idx'] = self.ratings['user_idx'].astype(int)
        self.ratings['movie_idx'] = self.ratings['movie_idx'].astype(int)
        self.data = torch.tensor(self.ratings[['user_idx', 'movie_idx']].values).long()
        self.y = torch.tensor(self.ratings['rating'].values).float()

    def __len__(self):
        return len(self.ratings)

    def __getitem__(self, idx):
        return self.data[idx], self.y[idx]

batch_size = 128
train_dataset = Loader(train_ratings, userid2idx, movieid2idx)
cv_dataset = Loader(cv_ratings, userid2idx, movieid2idx)
test_dataset = Loader(test_ratings, userid2idx, movieid2idx)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
cv_loader = DataLoader(cv_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

class MatrixFactorization(torch.nn.Module):
    def __init__(self, n_users, n_items, n_factors=20):
        super().__init__()
        self.user_factors = torch.nn.Embedding(n_users, n_factors)
        self.item_factors = torch.nn.Embedding(n_items, n_factors)
        self.user_factors.weight.data.uniform_(0, 0.05)
        self.item_factors.weight.data.uniform_(0, 0.05)

    def forward(self, data):
        users, items = data[:, 0], data[:, 1]
        return (self.user_factors(users) * self.item_factors(items)).sum(1)

    def predict(self, user, item):
        return self.forward(torch.stack([user, item], dim=1))

cuda = torch.cuda.is_available()
model = MatrixFactorization(n_users, n_items, n_factors=20)
if cuda:
    model = model.cuda()
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
num_epochs = 200

for it in tqdm(range(num_epochs), desc="Training"):
    losses = []
    for x, y in train_loader:
        if cuda:
            x, y = x.cuda(), y.cuda()
        optimizer.zero_grad()
        outputs = model(x)
        loss = loss_fn(outputs.squeeze(), y.type(torch.float32))
        losses.append(loss.item())
        loss.backward()
        optimizer.step()
    print(f"Epoch {it}, Loss: {sum(losses) / len(losses):.4f}")

def evaluate_dataset(loader, dataset_name, k=20, relevance_threshold=5.0, tolerance=0.5):
    model.eval()
    predicted_ratings = []
    actual_ratings = []
    user_indices = []
    movie_indices = []
    with torch.no_grad():
        for x, y in tqdm(loader, desc=f"Collecting Predictions ({dataset_name})"):
            if cuda:
                x = x.cuda()
            outputs = model(x).squeeze()
            predicted_ratings.extend(outputs.cpu().numpy())
            actual_ratings.extend(y.numpy())
            user_indices.extend(x[:, 0].cpu().numpy())
            movie_indices.extend(x[:, 1].cpu().numpy())
    eval_df = pd.DataFrame({
        'user_idx': user_indices,
        'movie_idx': movie_indices,
        'predicted_rating': predicted_ratings,
        'actual_rating': actual_ratings
    })
    invalid_user_idx = eval_df[~eval_df['user_idx'].isin(loader.dataset.idx2userid.keys())]['user_idx']
    invalid_movie_idx = eval_df[~eval_df['movie_idx'].isin(loader.dataset.idx2movieid.keys())]['movie_idx']
    if not invalid_user_idx.empty:
        print(f"Warning: Invalid user_idx found: {invalid_user_idx.unique()}")
    if not invalid_movie_idx.empty:
        print(f"Warning: Invalid movie_idx found: {invalid_movie_idx.unique()}")
        eval_df = eval_df[eval_df['movie_idx'].isin(loader.dataset.idx2movieid.keys())]
        print(f"Filtered out {len(invalid_movie_idx)} invalid movie_idx entries")
    eval_df['userId'] = eval_df['user_idx'].apply(lambda x: loader.dataset.idx2userid.get(x, -1))
    eval_df['movieId'] = eval_df['movie_idx'].apply(lambda x: loader.dataset.idx2movieid.get(x, -1))
    eval_df = eval_df[(eval_df['userId'] != -1) & (eval_df['movieId'] != -1)]
    if eval_df.empty:
        print(f"Error: No valid data left after filtering invalid indices for {dataset_name}")
        return
    eval_df['correct'] = (abs(eval_df['predicted_rating'] - eval_df['actual_rating']) <= tolerance).astype(int)
    accuracy = eval_df['correct'].mean()
    precision_scores = []
    recall_scores = []
    f1_scores = []
    ndcg_scores = []
    def ndcg_at_k(actual, predicted, k):
        sorted_indices = np.argsort(predicted)[::-1][:k]
        actual_k = actual[sorted_indices]
        relevant = (actual_k >= relevance_threshold).astype(int)
        dcg = sum(rel / np.log2(i + 2) for i, rel in enumerate(relevant))
        ideal_relevant = np.sort(relevant)[::-1][:k]
        idcg = sum(rel / np.log2(i + 2) for i, rel in enumerate(ideal_relevant))
        return dcg / idcg if idcg > 0 else 0.0
    for user_id in tqdm(eval_df['userId'].unique(), desc=f"Evaluating Users ({dataset_name})"):
        user_data = eval_df[eval_df['userId'] == user_id]
        if len(user_data) < k:
            user_idx = loader.dataset.userid2idx[user_id]
            all_movie_indices = list(loader.dataset.movieid2idx.values())
            user_tensor = torch.tensor([user_idx] * len(all_movie_indices)).long()
            movie_tensor = torch.tensor(all_movie_indices).long()
            data = torch.stack([user_tensor, movie_tensor], dim=1)
            if cuda:
                data = data.cuda()
            with torch.no_grad():
                preds = model(data).squeeze().cpu().numpy()
            user_data = pd.DataFrame({
                'movie_idx': all_movie_indices,
                'predicted_rating': preds,
                'actual_rating': [eval_df[(eval_df['userId'] == user_id) & (eval_df['movie_idx'] == idx)]['actual_rating'].iloc[0] if idx in user_data['movie_idx'].values else np.nan for idx in all_movie_indices]
            })
            user_data['actual_rating'] = user_data['actual_rating'].fillna(3.0)
        top_k_pred = user_data.nlargest(k, 'predicted_rating')
        actual_ratings_k = top_k_pred['actual_rating'].values
        predicted_ratings_k = top_k_pred['predicted_rating'].values
        relevant = (actual_ratings_k >= relevance_threshold).astype(int)
        precision = np.mean(relevant)
        precision_scores.append(precision)
        all_relevant = (user_data['actual_rating'] >= relevance_threshold).sum()
        if all_relevant > 0:
            recall = np.sum(relevant) / all_relevant
            recall_scores.append(recall)
        if precision > 0 and recall > 0:
            f1 = 2 * (precision * recall) / (precision + recall)
            f1_scores.append(f1)
        ndcg = ndcg_at_k(actual_ratings_k, predicted_ratings_k, k)
        ndcg_scores.append(ndcg)
    print(f"\nMetrics for {dataset_name} Dataset:")
    print(f"Accuracy (±{tolerance} tolerance): {accuracy:.4f}")
    print(f"Precision@{k}: {np.mean(precision_scores):.4f}")
    print(f"Recall@{k}: {np.mean(recall_scores):.4f}")
    print(f"F1-score@{k}: {np.mean(f1_scores):.4f}")
    print(f"NDCG@{k}: {np.mean(ndcg_scores):.4f}")

train_pairs = set(zip(train_ratings['userId'], train_ratings['movieId']))
cv_pairs = set(zip(cv_ratings['userId'], cv_ratings['movieId']))
test_pairs = set(zip(test_ratings['userId'], test_ratings['movieId']))
print("\nSplit Integrity Check:")
print("Train-CV overlap:", len(train_pairs & cv_pairs))
print("Train-Test overlap:", len(train_pairs & test_pairs))
print("CV-Test overlap:", len(cv_pairs & test_pairs))

evaluate_dataset(train_loader, "Training", relevance_threshold=5.0)
evaluate_dataset(cv_loader, "Cross-Validation", relevance_threshold=5.0)
evaluate_dataset(test_loader, "Test", relevance_threshold=5.0)

Loaded data: 100836 ratings, 610 users, 9724 movies
Filtered data: 90274 ratings, 610 users, 3650 movies


Training:   0%|          | 0/200 [00:00<?, ?it/s]

Epoch 0, Loss: 11.2868
Epoch 1, Loss: 4.4828
Epoch 2, Loss: 2.0910
Epoch 3, Loss: 1.3772
Epoch 4, Loss: 1.0628
Epoch 5, Loss: 0.9077
Epoch 6, Loss: 0.8228
Epoch 7, Loss: 0.7755
Epoch 8, Loss: 0.7487
Epoch 9, Loss: 0.7330
Epoch 10, Loss: 0.7239
Epoch 11, Loss: 0.7189
Epoch 12, Loss: 0.7160
Epoch 13, Loss: 0.7147
Epoch 14, Loss: 0.7127
Epoch 15, Loss: 0.7126
Epoch 16, Loss: 0.7118
Epoch 17, Loss: 0.7109
Epoch 18, Loss: 0.7113
Epoch 19, Loss: 0.7103
Epoch 20, Loss: 0.7123
Epoch 21, Loss: 0.7101
Epoch 22, Loss: 0.7091
Epoch 23, Loss: 0.7091
Epoch 24, Loss: 0.7093
Epoch 25, Loss: 0.7081
Epoch 26, Loss: 0.7075
Epoch 27, Loss: 0.7071
Epoch 28, Loss: 0.7037
Epoch 29, Loss: 0.7034
Epoch 30, Loss: 0.7019
Epoch 31, Loss: 0.7001
Epoch 32, Loss: 0.6971
Epoch 33, Loss: 0.6929
Epoch 34, Loss: 0.6907
Epoch 35, Loss: 0.6849
Epoch 36, Loss: 0.6789
Epoch 37, Loss: 0.6716
Epoch 38, Loss: 0.6637
Epoch 39, Loss: 0.6544
Epoch 40, Loss: 0.6443
Epoch 41, Loss: 0.6331
Epoch 42, Loss: 0.6206
Epoch 43, Loss: 0.60

Collecting Predictions (Training):   0%|          | 0/490 [00:00<?, ?it/s]

Evaluating Users (Training):   0%|          | 0/610 [00:00<?, ?it/s]


Metrics for Training Dataset:
Accuracy (±0.5 tolerance): 0.7987
Precision@20: 0.4011
Recall@20: 0.7822
F1-score@20: 0.4711
NDCG@20: 0.7678


Collecting Predictions (Cross-Validation):   0%|          | 0/109 [00:00<?, ?it/s]

Evaluating Users (Cross-Validation):   0%|          | 0/610 [00:00<?, ?it/s]


Metrics for Cross-Validation Dataset:
Accuracy (±0.5 tolerance): 0.4008
Precision@20: 0.0666
Recall@20: 0.3688
F1-score@20: 0.2320
NDCG@20: 0.2238


Collecting Predictions (Test):   0%|          | 0/109 [00:00<?, ?it/s]

Evaluating Users (Test):   0%|          | 0/610 [00:00<?, ?it/s]


Metrics for Test Dataset:
Accuracy (±0.5 tolerance): 0.4088
Precision@20: 0.0716
Recall@20: 0.3712
F1-score@20: 0.2400
NDCG@20: 0.2311


## Tuned Matrix Factorization

In [None]:

class MatrixFactorization(torch.nn.Module):
    def __init__(self, n_users, n_items, n_factors=20):
        super().__init__()
        self.user_factors = torch.nn.Embedding(n_users, n_factors)
        self.item_factors = torch.nn.Embedding(n_items, n_factors)
        self.user_factors.weight.data.uniform_(0, 0.05)
        self.item_factors.weight.data.uniform_(0, 0.05)

    def forward(self, data):
        users, items = data[:, 0], data[:, 1]
        return (self.user_factors(users) * self.item_factors(items)).sum(1)

    def predict(self, user, item):
        return self.forward(user, item)

model = MatrixFactorization(n_users, n_items, n_factors=20)
if cuda:
    model = model.cuda()
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
num_epochs = 200

for it in tqdm(range(num_epochs), desc="Training"):
    losses = []
    for x, y in train_loader:
        if cuda:
            x, y = x.cuda(), y.cuda()
        optimizer.zero_grad()
        outputs = model(x)
        loss = loss_fn(outputs.squeeze(), y.type(torch.float32))
        losses.append(loss.item())
        loss.backward()
        optimizer.step()
    print(f"Epoch {it}, Loss: {sum(losses) / len(losses):.4f}")


def evaluate_dataset(loader, dataset_name, k=20, relevance_threshold=5.0, tolerance=0.5):
    model.eval()
    predicted_ratings = []
    actual_ratings = []
    user_indices = []
    movie_indices = []

    with torch.no_grad():
        for x, y in tqdm(loader, desc=f"Collecting Predictions ({dataset_name})"):
            if cuda:
                x = x.cuda()
            outputs = model(x).squeeze()
            predicted_ratings.extend(outputs.cpu().numpy())
            actual_ratings.extend(y.numpy())
            user_indices.extend(x[:, 0].cpu().numpy())
            movie_indices.extend(x[:, 1].cpu().numpy())

    eval_df = pd.DataFrame({
        'user_idx': user_indices,
        'movie_idx': movie_indices,
        'predicted_rating': predicted_ratings,
        'actual_rating': actual_ratings
    })
    eval_df['userId'] = eval_df['user_idx'].apply(lambda x: loader.dataset.idx2userid[x])
    eval_df['movieId'] = eval_df['movie_idx'].apply(lambda x: loader.dataset.idx2movieid[x])


    eval_df['correct'] = (abs(eval_df['predicted_rating'] - eval_df['actual_rating']) <= tolerance).astype(int)
    accuracy = eval_df['correct'].mean()

    precision_scores = []
    recall_scores = []
    f1_scores = []
    ndcg_scores = []

    def ndcg_at_k(actual, predicted, k):
        sorted_indices = np.argsort(predicted)[::-1][:k]
        actual_k = actual[sorted_indices]
        relevant = (actual_k >= relevance_threshold).astype(int)
        dcg = sum(rel / np.log2(i + 2) for i, rel in enumerate(relevant))
        ideal_relevant = np.sort(relevant)[::-1][:k]
        idcg = sum(rel / np.log2(i + 2) for i, rel in enumerate(ideal_relevant))
        return dcg / idcg if idcg > 0 else 0.0

    for user_id in tqdm(eval_df['userId'].unique(), desc=f"Evaluating Users ({dataset_name})"):
        user_data = eval_df[eval_df['userId'] == user_id]


        if len(user_data) < k:
            user_idx = loader.dataset.userid2idx[user_id]
            all_movie_indices = list(loader.dataset.movieid2idx.values())
            user_tensor = torch.tensor([user_idx] * len(all_movie_indices)).long()
            movie_tensor = torch.tensor(all_movie_indices).long()
            data = torch.stack([user_tensor, movie_tensor], dim=1)
            if cuda:
                data = data.cuda()
            with torch.no_grad():
                preds = model(data).squeeze().cpu().numpy()
            user_data = pd.DataFrame({
                'movie_idx': all_movie_indices,
                'predicted_rating': preds,
                'actual_rating': [eval_df[(eval_df['userId'] == user_id) & (eval_df['movie_idx'] == idx)]['actual_rating'].iloc[0] if idx in user_data['movie_idx'].values else np.nan for idx in all_movie_indices]
            })
            user_data['actual_rating'] = user_data['actual_rating'].fillna(3.0)
        top_k_pred = user_data.nlargest(k, 'predicted_rating')
        actual_ratings_k = top_k_pred['actual_rating'].values
        predicted_ratings_k = top_k_pred['predicted_rating'].values

        relevant = (actual_ratings_k >= relevance_threshold).astype(int)

        precision = np.mean(relevant)
        precision_scores.append(precision)

        all_relevant = (user_data['actual_rating'] >= relevance_threshold).sum()
        if all_relevant > 0:
            recall = np.sum(relevant) / all_relevant
            recall_scores.append(recall)

        if precision > 0 and recall > 0:
            f1 = 2 * (precision * recall) / (precision + recall)
            f1_scores.append(f1)

        ndcg = ndcg_at_k(actual_ratings_k, predicted_ratings_k, k)
        ndcg_scores.append(ndcg)

    print(f"\nMetrics for {dataset_name} Dataset:")
    print(f"Accuracy (±{tolerance} tolerance): {accuracy:.4f}")
    print(f"Precision@{k}: {np.mean(precision_scores):.4f}")
    print(f"Recall@{k}: {np.mean(recall_scores):.4f}")
    print(f"F1-score@{k}: {np.mean(f1_scores):.4f}")
    print(f"NDCG@{k}: {np.mean(ndcg_scores):.4f}")

train_pairs = set(zip(train_ratings['userId'], train_ratings['movieId']))
cv_pairs = set(zip(cv_ratings['userId'], cv_ratings['movieId']))
test_pairs = set(zip(test_ratings['userId'], test_ratings['movieId']))
print("\nSplit Integrity Check:")
print("Train-CV overlap:", len(train_pairs & cv_pairs))
print("Train-Test overlap:", len(train_pairs & test_pairs))
print("CV-Test overlap:", len(cv_pairs & test_pairs))

evaluate_dataset(train_loader, "Training", relevance_threshold=5.0)
evaluate_dataset(cv_loader, "Cross-Validation", relevance_threshold=5.0)
evaluate_dataset(test_loader, "Test", relevance_threshold=5.0)

Training:   0%|          | 1/200 [00:01<04:54,  1.48s/it]

Epoch 0, Loss: 11.2896


Training:   1%|          | 2/200 [00:02<04:37,  1.40s/it]

Epoch 1, Loss: 4.4770


Training:   2%|▏         | 3/200 [00:04<04:28,  1.36s/it]

Epoch 2, Loss: 2.0897


Training:   2%|▏         | 4/200 [00:05<04:45,  1.46s/it]

Epoch 3, Loss: 1.3748


Training:   2%|▎         | 5/200 [00:07<05:11,  1.60s/it]

Epoch 4, Loss: 1.0641


Training:   3%|▎         | 6/200 [00:08<04:50,  1.50s/it]

Epoch 5, Loss: 0.9072


Training:   4%|▎         | 7/200 [00:10<04:37,  1.44s/it]

Epoch 6, Loss: 0.8228


Training:   4%|▍         | 8/200 [00:11<04:29,  1.40s/it]

Epoch 7, Loss: 0.7766


Training:   4%|▍         | 9/200 [00:12<04:24,  1.39s/it]

Epoch 8, Loss: 0.7490


Training:   5%|▌         | 10/200 [00:14<04:18,  1.36s/it]

Epoch 9, Loss: 0.7338


Training:   6%|▌         | 11/200 [00:15<04:14,  1.35s/it]

Epoch 10, Loss: 0.7245


Training:   6%|▌         | 12/200 [00:16<04:08,  1.32s/it]

Epoch 11, Loss: 0.7188


Training:   6%|▋         | 13/200 [00:18<04:27,  1.43s/it]

Epoch 12, Loss: 0.7163


Training:   7%|▋         | 14/200 [00:20<04:52,  1.57s/it]

Epoch 13, Loss: 0.7138


Training:   8%|▊         | 15/200 [00:21<04:37,  1.50s/it]

Epoch 14, Loss: 0.7137


Training:   8%|▊         | 16/200 [00:22<04:25,  1.44s/it]

Epoch 15, Loss: 0.7124


Training:   8%|▊         | 17/200 [00:24<04:17,  1.41s/it]

Epoch 16, Loss: 0.7123


Training:   9%|▉         | 18/200 [00:25<04:11,  1.38s/it]

Epoch 17, Loss: 0.7108


Training:  10%|▉         | 19/200 [00:26<04:08,  1.37s/it]

Epoch 18, Loss: 0.7114


Training:  10%|█         | 20/200 [00:28<04:04,  1.36s/it]

Epoch 19, Loss: 0.7112


Training:  10%|█         | 21/200 [00:29<04:05,  1.37s/it]

Epoch 20, Loss: 0.7106


Training:  11%|█         | 22/200 [00:31<04:28,  1.51s/it]

Epoch 21, Loss: 0.7101


Training:  12%|█▏        | 23/200 [00:33<04:39,  1.58s/it]

Epoch 22, Loss: 0.7101


Training:  12%|█▏        | 24/200 [00:34<04:26,  1.51s/it]

Epoch 23, Loss: 0.7086


Training:  12%|█▎        | 25/200 [00:36<04:15,  1.46s/it]

Epoch 24, Loss: 0.7097


Training:  13%|█▎        | 26/200 [00:37<04:07,  1.42s/it]

Epoch 25, Loss: 0.7081


Training:  14%|█▎        | 27/200 [00:38<04:00,  1.39s/it]

Epoch 26, Loss: 0.7072


Training:  14%|█▍        | 28/200 [00:39<03:54,  1.36s/it]

Epoch 27, Loss: 0.7062


Training:  14%|█▍        | 29/200 [00:41<03:53,  1.36s/it]

Epoch 28, Loss: 0.7053


Training:  15%|█▌        | 30/200 [00:42<03:47,  1.34s/it]

Epoch 29, Loss: 0.7046


Training:  16%|█▌        | 31/200 [00:44<04:11,  1.49s/it]

Epoch 30, Loss: 0.7025


Training:  16%|█▌        | 32/200 [00:46<04:20,  1.55s/it]

Epoch 31, Loss: 0.6994


Training:  16%|█▋        | 33/200 [00:47<04:09,  1.50s/it]

Epoch 32, Loss: 0.6960


Training:  17%|█▋        | 34/200 [00:48<03:59,  1.44s/it]

Epoch 33, Loss: 0.6924


Training:  18%|█▊        | 35/200 [00:50<03:50,  1.40s/it]

Epoch 34, Loss: 0.6882


Training:  18%|█▊        | 36/200 [00:51<03:44,  1.37s/it]

Epoch 35, Loss: 0.6830


Training:  18%|█▊        | 37/200 [00:52<03:43,  1.37s/it]

Epoch 36, Loss: 0.6777


Training:  19%|█▉        | 38/200 [00:54<03:38,  1.35s/it]

Epoch 37, Loss: 0.6709


Training:  20%|█▉        | 39/200 [00:55<03:33,  1.32s/it]

Epoch 38, Loss: 0.6613


Training:  20%|██        | 40/200 [00:57<03:51,  1.45s/it]

Epoch 39, Loss: 0.6526


Training:  20%|██        | 41/200 [00:58<04:04,  1.54s/it]

Epoch 40, Loss: 0.6413


Training:  21%|██        | 42/200 [01:00<03:49,  1.46s/it]

Epoch 41, Loss: 0.6283


Training:  22%|██▏       | 43/200 [01:01<03:39,  1.40s/it]

Epoch 42, Loss: 0.6161


Training:  22%|██▏       | 44/200 [01:02<03:36,  1.39s/it]

Epoch 43, Loss: 0.6027


Training:  22%|██▎       | 45/200 [01:04<03:34,  1.38s/it]

Epoch 44, Loss: 0.5874


Training:  23%|██▎       | 46/200 [01:05<03:30,  1.37s/it]

Epoch 45, Loss: 0.5730


Training:  24%|██▎       | 47/200 [01:06<03:28,  1.36s/it]

Epoch 46, Loss: 0.5576


Training:  24%|██▍       | 48/200 [01:08<03:26,  1.36s/it]

Epoch 47, Loss: 0.5431


Training:  24%|██▍       | 49/200 [01:09<03:45,  1.49s/it]

Epoch 48, Loss: 0.5289


Training:  25%|██▌       | 50/200 [01:11<03:53,  1.55s/it]

Epoch 49, Loss: 0.5140


Training:  26%|██▌       | 51/200 [01:12<03:40,  1.48s/it]

Epoch 50, Loss: 0.5009


Training:  26%|██▌       | 52/200 [01:14<03:31,  1.43s/it]

Epoch 51, Loss: 0.4867


Training:  26%|██▋       | 53/200 [01:15<03:26,  1.41s/it]

Epoch 52, Loss: 0.4748


Training:  27%|██▋       | 54/200 [01:16<03:23,  1.39s/it]

Epoch 53, Loss: 0.4623


Training:  28%|██▊       | 55/200 [01:18<03:20,  1.39s/it]

Epoch 54, Loss: 0.4491


Training:  28%|██▊       | 56/200 [01:19<03:19,  1.39s/it]

Epoch 55, Loss: 0.4386


Training:  28%|██▊       | 57/200 [01:21<03:21,  1.41s/it]

Epoch 56, Loss: 0.4271


Training:  29%|██▉       | 58/200 [01:23<03:42,  1.57s/it]

Epoch 57, Loss: 0.4157


Training:  30%|██▉       | 59/200 [01:24<03:41,  1.57s/it]

Epoch 58, Loss: 0.4064


Training:  30%|███       | 60/200 [01:26<03:29,  1.50s/it]

Epoch 59, Loss: 0.3967


Training:  30%|███       | 61/200 [01:27<03:24,  1.47s/it]

Epoch 60, Loss: 0.3870


Training:  31%|███       | 62/200 [01:28<03:20,  1.46s/it]

Epoch 61, Loss: 0.3782


Training:  32%|███▏      | 63/200 [01:30<03:16,  1.43s/it]

Epoch 62, Loss: 0.3693


Training:  32%|███▏      | 64/200 [01:31<03:14,  1.43s/it]

Epoch 63, Loss: 0.3614


Training:  32%|███▎      | 65/200 [01:33<03:12,  1.42s/it]

Epoch 64, Loss: 0.3539


Training:  33%|███▎      | 66/200 [01:34<03:20,  1.50s/it]

Epoch 65, Loss: 0.3465


Training:  34%|███▎      | 67/200 [01:36<03:29,  1.58s/it]

Epoch 66, Loss: 0.3392


Training:  34%|███▍      | 68/200 [01:37<03:17,  1.50s/it]

Epoch 67, Loss: 0.3326


Training:  34%|███▍      | 69/200 [01:39<03:06,  1.42s/it]

Epoch 68, Loss: 0.3260


Training:  35%|███▌      | 70/200 [01:40<02:58,  1.37s/it]

Epoch 69, Loss: 0.3204


Training:  36%|███▌      | 71/200 [01:41<02:54,  1.35s/it]

Epoch 70, Loss: 0.3144


Training:  36%|███▌      | 72/200 [01:43<03:05,  1.45s/it]

Epoch 71, Loss: 0.3091


Training:  36%|███▋      | 73/200 [01:44<03:12,  1.52s/it]

Epoch 72, Loss: 0.3043


Training:  37%|███▋      | 74/200 [01:46<03:29,  1.66s/it]

Epoch 73, Loss: 0.2985


Training:  38%|███▊      | 75/200 [01:50<04:24,  2.11s/it]

Epoch 74, Loss: 0.2942


Training:  38%|███▊      | 76/200 [01:52<04:23,  2.13s/it]

Epoch 75, Loss: 0.2903


Training:  38%|███▊      | 77/200 [01:53<03:54,  1.91s/it]

Epoch 76, Loss: 0.2856


Training:  39%|███▉      | 78/200 [01:55<03:32,  1.74s/it]

Epoch 77, Loss: 0.2817


Training:  40%|███▉      | 79/200 [01:56<03:21,  1.66s/it]

Epoch 78, Loss: 0.2782


Training:  40%|████      | 80/200 [01:58<03:16,  1.64s/it]

Epoch 79, Loss: 0.2751


Training:  40%|████      | 81/200 [01:59<03:06,  1.57s/it]

Epoch 80, Loss: 0.2713


Training:  41%|████      | 82/200 [02:01<03:11,  1.62s/it]

Epoch 81, Loss: 0.2681


Training:  42%|████▏     | 83/200 [02:03<03:21,  1.72s/it]

Epoch 82, Loss: 0.2649


Training:  42%|████▏     | 84/200 [02:04<03:07,  1.61s/it]

Epoch 83, Loss: 0.2623


Training:  42%|████▎     | 85/200 [02:05<02:55,  1.53s/it]

Epoch 84, Loss: 0.2594


Training:  43%|████▎     | 86/200 [02:07<02:47,  1.47s/it]

Epoch 85, Loss: 0.2566


Training:  44%|████▎     | 87/200 [02:08<02:42,  1.44s/it]

Epoch 86, Loss: 0.2550


Training:  44%|████▍     | 88/200 [02:09<02:38,  1.42s/it]

Epoch 87, Loss: 0.2519


Training:  44%|████▍     | 89/200 [02:11<02:33,  1.38s/it]

Epoch 88, Loss: 0.2501


Training:  45%|████▌     | 90/200 [02:12<02:29,  1.36s/it]

Epoch 89, Loss: 0.2482


Training:  46%|████▌     | 91/200 [02:14<02:38,  1.45s/it]

Epoch 90, Loss: 0.2460


Training:  46%|████▌     | 92/200 [02:16<02:48,  1.56s/it]

Epoch 91, Loss: 0.2436


Training:  46%|████▋     | 93/200 [02:17<02:38,  1.48s/it]

Epoch 92, Loss: 0.2421


Training:  47%|████▋     | 94/200 [02:18<02:31,  1.43s/it]

Epoch 93, Loss: 0.2406


Training:  48%|████▊     | 95/200 [02:19<02:26,  1.40s/it]

Epoch 94, Loss: 0.2389


Training:  48%|████▊     | 96/200 [02:21<02:26,  1.41s/it]

Epoch 95, Loss: 0.2371


Training:  48%|████▊     | 97/200 [02:22<02:21,  1.37s/it]

Epoch 96, Loss: 0.2359


Training:  49%|████▉     | 98/200 [02:23<02:17,  1.35s/it]

Epoch 97, Loss: 0.2344


Training:  50%|████▉     | 99/200 [02:25<02:14,  1.33s/it]

Epoch 98, Loss: 0.2328


Training:  50%|█████     | 100/200 [02:27<02:25,  1.46s/it]

Epoch 99, Loss: 0.2317


Training:  50%|█████     | 101/200 [02:28<02:35,  1.57s/it]

Epoch 100, Loss: 0.2307


Training:  51%|█████     | 102/200 [02:30<02:27,  1.50s/it]

Epoch 101, Loss: 0.2287


Training:  52%|█████▏    | 103/200 [02:31<02:21,  1.46s/it]

Epoch 102, Loss: 0.2280


Training:  52%|█████▏    | 104/200 [02:32<02:17,  1.43s/it]

Epoch 103, Loss: 0.2272


Training:  52%|█████▎    | 105/200 [02:34<02:15,  1.42s/it]

Epoch 104, Loss: 0.2257


Training:  53%|█████▎    | 106/200 [02:35<02:12,  1.41s/it]

Epoch 105, Loss: 0.2250


Training:  54%|█████▎    | 107/200 [02:37<02:10,  1.40s/it]

Epoch 106, Loss: 0.2237


Training:  54%|█████▍    | 108/200 [02:38<02:08,  1.39s/it]

Epoch 107, Loss: 0.2230


Training:  55%|█████▍    | 109/200 [02:40<02:19,  1.54s/it]

Epoch 108, Loss: 0.2220


Training:  55%|█████▌    | 110/200 [02:42<02:22,  1.58s/it]

Epoch 109, Loss: 0.2213


Training:  56%|█████▌    | 111/200 [02:43<02:15,  1.52s/it]

Epoch 110, Loss: 0.2204


Training:  56%|█████▌    | 112/200 [02:44<02:10,  1.49s/it]

Epoch 111, Loss: 0.2194


Training:  56%|█████▋    | 113/200 [02:47<02:27,  1.70s/it]

Epoch 112, Loss: 0.2185


Training:  57%|█████▋    | 114/200 [02:48<02:18,  1.61s/it]

Epoch 113, Loss: 0.2181


Training:  57%|█████▊    | 115/200 [02:49<02:10,  1.54s/it]

Epoch 114, Loss: 0.2175


Training:  58%|█████▊    | 116/200 [02:51<02:05,  1.49s/it]

Epoch 115, Loss: 0.2160


Training:  58%|█████▊    | 117/200 [02:53<02:14,  1.62s/it]

Epoch 116, Loss: 0.2160


Training:  59%|█████▉    | 118/200 [02:55<02:36,  1.91s/it]

Epoch 117, Loss: 0.2155


Training:  60%|█████▉    | 119/200 [02:59<03:09,  2.34s/it]

Epoch 118, Loss: 0.2139


Training:  60%|██████    | 120/200 [03:00<02:54,  2.18s/it]

Epoch 119, Loss: 0.2137


Training:  60%|██████    | 121/200 [03:02<02:32,  1.93s/it]

Epoch 120, Loss: 0.2133


Training:  61%|██████    | 122/200 [03:03<02:15,  1.74s/it]

Epoch 121, Loss: 0.2130


Training:  62%|██████▏   | 123/200 [03:05<02:12,  1.73s/it]

Epoch 122, Loss: 0.2123


Training:  62%|██████▏   | 124/200 [03:06<02:12,  1.74s/it]

Epoch 123, Loss: 0.2117


Training:  62%|██████▎   | 125/200 [03:08<02:00,  1.61s/it]

Epoch 124, Loss: 0.2110


Training:  63%|██████▎   | 126/200 [03:09<01:53,  1.53s/it]

Epoch 125, Loss: 0.2107


Training:  64%|██████▎   | 127/200 [03:10<01:46,  1.46s/it]

Epoch 126, Loss: 0.2099


Training:  64%|██████▍   | 128/200 [03:12<01:42,  1.43s/it]

Epoch 127, Loss: 0.2098


Training:  64%|██████▍   | 129/200 [03:13<01:39,  1.40s/it]

Epoch 128, Loss: 0.2092


Training:  65%|██████▌   | 130/200 [03:14<01:36,  1.38s/it]

Epoch 129, Loss: 0.2088


Training:  66%|██████▌   | 131/200 [03:16<01:33,  1.35s/it]

Epoch 130, Loss: 0.2082


Training:  66%|██████▌   | 132/200 [03:17<01:39,  1.46s/it]

Epoch 131, Loss: 0.2075


Training:  66%|██████▋   | 133/200 [03:19<01:47,  1.60s/it]

Epoch 132, Loss: 0.2073


Training:  67%|██████▋   | 134/200 [03:21<01:42,  1.56s/it]

Epoch 133, Loss: 0.2073


Training:  68%|██████▊   | 135/200 [03:22<01:38,  1.51s/it]

Epoch 134, Loss: 0.2066


Training:  68%|██████▊   | 136/200 [03:24<01:46,  1.66s/it]

Epoch 135, Loss: 0.2062


Training:  68%|██████▊   | 137/200 [03:26<01:40,  1.59s/it]

Epoch 136, Loss: 0.2063


Training:  69%|██████▉   | 138/200 [03:27<01:35,  1.55s/it]

Epoch 137, Loss: 0.2054


Training:  70%|██████▉   | 139/200 [03:29<01:32,  1.51s/it]

Epoch 138, Loss: 0.2049


Training:  70%|███████   | 140/200 [03:30<01:36,  1.61s/it]

Epoch 139, Loss: 0.2048


Training:  70%|███████   | 141/200 [03:32<01:40,  1.71s/it]

Epoch 140, Loss: 0.2043


Training:  71%|███████   | 142/200 [03:34<01:34,  1.63s/it]

Epoch 141, Loss: 0.2042


Training:  72%|███████▏  | 143/200 [03:35<01:29,  1.57s/it]

Epoch 142, Loss: 0.2038


Training:  72%|███████▏  | 144/200 [03:37<01:25,  1.53s/it]

Epoch 143, Loss: 0.2037


Training:  72%|███████▎  | 145/200 [03:38<01:22,  1.50s/it]

Epoch 144, Loss: 0.2035


Training:  73%|███████▎  | 146/200 [03:39<01:20,  1.48s/it]

Epoch 145, Loss: 0.2031


Training:  74%|███████▎  | 147/200 [03:41<01:16,  1.45s/it]

Epoch 146, Loss: 0.2024


Training:  74%|███████▍  | 148/200 [03:42<01:17,  1.50s/it]

Epoch 147, Loss: 0.2025


Training:  74%|███████▍  | 149/200 [03:45<01:25,  1.68s/it]

Epoch 148, Loss: 0.2023


Training:  75%|███████▌  | 150/200 [03:46<01:20,  1.62s/it]

Epoch 149, Loss: 0.2020


Training:  76%|███████▌  | 151/200 [03:48<01:17,  1.59s/it]

Epoch 150, Loss: 0.2014


Training:  76%|███████▌  | 152/200 [03:49<01:14,  1.55s/it]

Epoch 151, Loss: 0.2014


Training:  76%|███████▋  | 153/200 [03:51<01:14,  1.59s/it]

Epoch 152, Loss: 0.2014


Training:  77%|███████▋  | 154/200 [03:52<01:11,  1.56s/it]

Epoch 153, Loss: 0.2007


Training:  78%|███████▊  | 155/200 [03:55<01:28,  1.97s/it]

Epoch 154, Loss: 0.2007


Training:  78%|███████▊  | 156/200 [04:01<02:11,  2.99s/it]

Epoch 155, Loss: 0.2002


Training:  78%|███████▊  | 157/200 [04:03<02:06,  2.94s/it]

Epoch 156, Loss: 0.2003


Training:  79%|███████▉  | 158/200 [04:06<02:03,  2.94s/it]

Epoch 157, Loss: 0.2002


Training:  80%|███████▉  | 159/200 [04:10<02:08,  3.14s/it]

Epoch 158, Loss: 0.1997


Training:  80%|████████  | 160/200 [04:12<01:57,  2.94s/it]

Epoch 159, Loss: 0.1993


Training:  80%|████████  | 161/200 [04:14<01:44,  2.69s/it]

Epoch 160, Loss: 0.1997


Training:  81%|████████  | 162/200 [04:17<01:38,  2.58s/it]

Epoch 161, Loss: 0.1992


Training:  82%|████████▏ | 163/200 [04:19<01:36,  2.62s/it]

Epoch 162, Loss: 0.1988


Training:  82%|████████▏ | 164/200 [04:23<01:40,  2.78s/it]

Epoch 163, Loss: 0.1984


Training:  82%|████████▎ | 165/200 [04:28<02:04,  3.55s/it]

Epoch 164, Loss: 0.1985


Training:  83%|████████▎ | 166/200 [04:32<02:01,  3.57s/it]

Epoch 165, Loss: 0.1984


Training:  84%|████████▎ | 167/200 [04:34<01:45,  3.19s/it]

Epoch 166, Loss: 0.1980


Training:  84%|████████▍ | 168/200 [04:36<01:33,  2.92s/it]

Epoch 167, Loss: 0.1980


Training:  84%|████████▍ | 169/200 [04:38<01:19,  2.55s/it]

Epoch 168, Loss: 0.1977


Training:  85%|████████▌ | 170/200 [04:39<01:05,  2.18s/it]

Epoch 169, Loss: 0.1978


Training:  86%|████████▌ | 171/200 [04:41<00:55,  1.92s/it]

Epoch 170, Loss: 0.1974


Training:  86%|████████▌ | 172/200 [04:42<00:48,  1.73s/it]

Epoch 171, Loss: 0.1970


Training:  86%|████████▋ | 173/200 [04:43<00:43,  1.62s/it]

Epoch 172, Loss: 0.1973


Training:  87%|████████▋ | 174/200 [04:45<00:39,  1.54s/it]

Epoch 173, Loss: 0.1971


Training:  88%|████████▊ | 175/200 [04:46<00:36,  1.47s/it]

Epoch 174, Loss: 0.1963


Training:  88%|████████▊ | 176/200 [04:47<00:34,  1.44s/it]

Epoch 175, Loss: 0.1966


Training:  88%|████████▊ | 177/200 [04:49<00:35,  1.56s/it]

Epoch 176, Loss: 0.1964


Training:  89%|████████▉ | 178/200 [04:51<00:35,  1.60s/it]

Epoch 177, Loss: 0.1964


Training:  90%|████████▉ | 179/200 [04:52<00:32,  1.53s/it]

Epoch 178, Loss: 0.1963


Training:  90%|█████████ | 180/200 [04:53<00:29,  1.47s/it]

Epoch 179, Loss: 0.1958


Training:  90%|█████████ | 181/200 [04:55<00:27,  1.42s/it]

Epoch 180, Loss: 0.1958


Training:  91%|█████████ | 182/200 [04:56<00:25,  1.39s/it]

Epoch 181, Loss: 0.1953


Training:  92%|█████████▏| 183/200 [04:57<00:23,  1.38s/it]

Epoch 182, Loss: 0.1952


Training:  92%|█████████▏| 184/200 [04:59<00:21,  1.35s/it]

Epoch 183, Loss: 0.1953


Training:  92%|█████████▎| 185/200 [05:00<00:19,  1.33s/it]

Epoch 184, Loss: 0.1955


Training:  93%|█████████▎| 186/200 [05:02<00:20,  1.48s/it]

Epoch 185, Loss: 0.1950


Training:  94%|█████████▎| 187/200 [05:03<00:19,  1.51s/it]

Epoch 186, Loss: 0.1952


Training:  94%|█████████▍| 188/200 [05:05<00:17,  1.43s/it]

Epoch 187, Loss: 0.1945


Training:  94%|█████████▍| 189/200 [05:06<00:15,  1.37s/it]

Epoch 188, Loss: 0.1951


Training:  95%|█████████▌| 190/200 [05:07<00:13,  1.35s/it]

Epoch 189, Loss: 0.1947


Training:  96%|█████████▌| 191/200 [05:08<00:11,  1.32s/it]

Epoch 190, Loss: 0.1944


Training:  96%|█████████▌| 192/200 [05:10<00:10,  1.30s/it]

Epoch 191, Loss: 0.1941


Training:  96%|█████████▋| 193/200 [05:11<00:09,  1.29s/it]

Epoch 192, Loss: 0.1943


Training:  97%|█████████▋| 194/200 [05:12<00:07,  1.29s/it]

Epoch 193, Loss: 0.1940


Training:  98%|█████████▊| 195/200 [05:14<00:06,  1.40s/it]

Epoch 194, Loss: 0.1936


Training:  98%|█████████▊| 196/200 [05:16<00:06,  1.50s/it]

Epoch 195, Loss: 0.1940


Training:  98%|█████████▊| 197/200 [05:17<00:04,  1.44s/it]

Epoch 196, Loss: 0.1935


Training:  99%|█████████▉| 198/200 [05:18<00:02,  1.41s/it]

Epoch 197, Loss: 0.1933


Training: 100%|█████████▉| 199/200 [05:20<00:01,  1.38s/it]

Epoch 198, Loss: 0.1937


Training: 100%|██████████| 200/200 [05:21<00:00,  1.61s/it]


Epoch 199, Loss: 0.1935

Split Integrity Check:
Train-CV overlap: 0
Train-Test overlap: 0
CV-Test overlap: 0


Collecting Predictions (Training): 100%|██████████| 490/490 [00:00<00:00, 696.15it/s]
Evaluating Users (Training): 100%|██████████| 610/610 [00:08<00:00, 74.47it/s]



Metrics for Training Dataset:
Accuracy (±0.5 tolerance): 0.8000
Precision@20: 0.3960
Recall@20: 0.7689
F1-score@20: 0.4671
NDCG@20: 0.7575


Collecting Predictions (Cross-Validation): 100%|██████████| 109/109 [00:00<00:00, 641.50it/s]
Evaluating Users (Cross-Validation): 100%|██████████| 610/610 [00:19<00:00, 32.07it/s]



Metrics for Cross-Validation Dataset:
Accuracy (±0.5 tolerance): 0.4004
Precision@20: 0.0677
Recall@20: 0.3586
F1-score@20: 0.2343
NDCG@20: 0.2217


Collecting Predictions (Test): 100%|██████████| 109/109 [00:00<00:00, 687.07it/s]
Evaluating Users (Test): 100%|██████████| 610/610 [00:21<00:00, 28.63it/s]


Metrics for Test Dataset:
Accuracy (±0.5 tolerance): 0.4103
Precision@20: 0.0719
Recall@20: 0.3739
F1-score@20: 0.2377
NDCG@20: 0.2372





## FunkSVD

In [None]:
def split_ratings_per_user(ratings_df, train_ratio=0.7, cv_ratio=0.15, test_ratio=0.15, random_state=456):
    try:
        train_data = []
        cv_data = []
        test_data = []
        for user_id, user_ratings in ratings_df.groupby('userId'):
            if len(user_ratings) < 3:
                continue
            train_cv_ratings, test_ratings = train_test_split(
                user_ratings, test_size=test_ratio, random_state=random_state, shuffle=True
            )
            train_ratio_adjusted = train_ratio / (train_ratio + cv_ratio)
            train_ratings, cv_ratings = train_test_split(
                train_cv_ratings, test_size=1 - train_ratio_adjusted, random_state=random_state, shuffle=True
            )
            train_data.append(train_ratings)
            cv_data.append(cv_ratings)
            test_data.append(test_ratings)
        return pd.concat(train_data, ignore_index=True), pd.concat(cv_data, ignore_index=True), pd.concat(test_data, ignore_index=True)
    except Exception as e:
        print(f"Error in split_ratings_per_user: {e}")
        return ratings_df, ratings_df, ratings_df

def create_rating_matrix(ratings_df, n_users, n_items, userid2idx, movieid2idx):
    try:
        data = ratings_df['rating'].values
        rows = [userid2idx[uid] for uid in ratings_df['userId']]
        cols = [movieid2idx[mid] for mid in ratings_df['movieId']]
        return scipy.sparse.csr_matrix((data, (rows, cols)), shape=(n_users, n_items))
    except Exception as e:
        print(f"Error in create_rating_matrix: {e}")
        return None

def rpca(M, lambda_=0.00001, mu=1.0, max_iter=100, tol=1e-5, k_svd=150):
    try:
        M_sparse = scipy.sparse.csr_matrix(M)
        Y = M_sparse.copy()
        L = scipy.sparse.csr_matrix(M.shape)
        S = scipy.sparse.csr_matrix(M.shape)
        for _ in tqdm(range(max_iter), desc="RPCA"):
            Y_dense = Y.toarray()
            S_dense = S.toarray()
            U, Sigma, Vt = svds(Y_dense - S_dense, k=min(k_svd, min(M.shape)-1))
            Sigma_shrunk = np.maximum(Sigma - 1/mu, 0)
            L_new = U @ np.diag(Sigma_shrunk) @ Vt
            S_new = np.sign(Y_dense - L_new) * np.maximum(np.abs(Y_dense - L_new) - lambda_/mu, 0)
            Y = Y + mu * (M_sparse - scipy.sparse.csr_matrix(L_new) - scipy.sparse.csr_matrix(S_new))
            if np.linalg.norm(L_new - L.toarray(), 'fro') < tol and np.linalg.norm(S_new - S.toarray(), 'fro') < tol:
                print(f"RPCA converged after {_} iterations")
                break
            L = scipy.sparse.csr_matrix(L_new)
            S = scipy.sparse.csr_matrix(S_new)
        L_new = np.clip(L_new, 0.5, 5.0)
        target_std = 1.03
        L_new = (L_new - L_new.mean()) / (L_new.std() + 1e-8) * target_std + 3.54
        L_new = np.clip(L_new, 0.5, 5.0)
        L_flat = L_new.flatten()
        bins = np.arange(0.5, 5.5, 0.5)

        for i, count in enumerate(hist):
            print(f"[{bins[i]:.1f}, {bins[i+1]:.1f}]: {count} ({count/len(L_flat):.4f})")
        return L_new, S_new
    except Exception as e:
        print(f"Error in rpca: {e}")
        return M, np.zeros_like(M)

def funk_svd(ratings_df, L, n_users, n_items, userid2idx, movieid2idx, k=30, lr=0.02, reg=0.01, epochs=200):
    try:
        global_mean = ratings_df['rating'].mean()
        P = np.random.normal(0, 0.1, (n_users, k))
        Q = np.random.normal(0, 0.1, (n_items, k))
        user_bias = np.zeros(n_users)
        item_bias = np.zeros(n_items)
        initial_lr = lr
        errors = []
        for epoch in tqdm(range(epochs), desc="Funk SVD Training"):
            lr = initial_lr / (1 + 0.005 * epoch)
            epoch_error = 0
            count = 0
            max_error = -np.inf
            min_error = np.inf
            error_5_0 = []
            for _, row in ratings_df.sample(frac=1).iterrows():
                u = userid2idx.get(row['userId'], -1)
                i = movieid2idx.get(row['movieId'], -1)
                if u == -1 or i == -1:
                    continue
                r_ui = row['rating']
                pred = global_mean + user_bias[u] + item_bias[i] + np.dot(P[u], Q[i])
                error = r_ui - pred
                error = np.clip(error, -5.0, 5.0)
                epoch_error += error ** 2
                count += 1
                max_error = max(max_error, error)
                min_error = min(min_error, error)
                if r_ui == 5.0:
                    error_5_0.append(error)
                user_bias[u] += lr * (error - reg * user_bias[u])
                item_bias[i] += lr * (error - reg * item_bias[i])
                P_update = lr * (error * Q[i] - reg * P[u])
                Q_update = lr * (error * P[u] - reg * Q[i])
                P[u] += np.clip(P_update, -1.0, 1.0)
                Q[i] += np.clip(Q_update, -1.0, 1.0)
            if count > 0:
                rmse = np.sqrt(epoch_error / count)
                errors.append(rmse)
                mean_error_5_0 = np.mean(error_5_0) if error_5_0 else 0.0
                if epoch % 10 == 0:
                    print(f"Epoch {epoch}, RMSE: {rmse:.4f}, Max Error: {max_error:.4f}, Min Error: {min_error:.4f}, Mean Error for 5.0: {mean_error_5_0:.4f}")
            if np.isnan(rmse):
                print(f"NaN detected at epoch {epoch}. Reducing learning rate.")
                lr = initial_lr * 0.5
        print(f"Final Training RMSE: {errors[-1]:.4f}")
        return P, Q, user_bias, item_bias, global_mean
    except Exception as e:
        print(f"Error in funk_svd: {e}")
        return None, None, None, None, global_mean

def evaluate_dataset_funk_svd(ratings_df, P, Q, user_bias, item_bias, global_mean, dataset_name, userid2idx, movieid2idx, movie_names, k=20, relevance_threshold=5.0, tolerance=0.25):
    try:
        eval_df = ratings_df.copy()
        avg_item_bias = np.mean(item_bias) if len(item_bias) > 0 else 0.0
        avg_Q = np.mean(Q, axis=0) if Q is not None else np.zeros_like(Q[0])
        eval_df['predicted_rating'] = [
            global_mean + user_bias[userid2idx.get(uid, 0)] + item_bias[movieid2idx.get(mid, 0)] + np.dot(P[userid2idx.get(uid, 0)], Q[movieid2idx.get(mid, 0)])
            if uid in userid2idx and mid in movieid2idx
            else global_mean + user_bias[userid2idx.get(uid, 0)] + avg_item_bias + np.dot(P[userid2idx.get(uid, 0)], avg_Q) if uid in userid2idx
            else global_mean
            for uid, mid in zip(eval_df['userId'], eval_df['movieId'])
        ]
        eval_df['predicted_rating'] = eval_df['predicted_rating'].clip(0.5, 5.0)

        eval_df['error'] = abs(eval_df['predicted_rating'] - eval_df['rating'])
        eval_df['correct'] = (eval_df['error'] <= tolerance).astype(int)
        accuracy = eval_df['correct'].mean()

        precision_scores = []
        recall_scores = []
        f1_scores = []
        ndcg_scores = []

        def ndcg_at_k(actual, predicted, k):
            sorted_indices = np.argsort(predicted)[::-1][:k]
            actual_k = actual[sorted_indices]
            relevant = (actual_k >= relevance_threshold).astype(int)
            dcg = sum(rel / np.log2(i + 2) for i, rel in enumerate(relevant))
            ideal_relevant = np.sort(relevant)[::-1][:k]
            idcg = sum(rel / np.log2(i + 2) for i, rel in enumerate(ideal_relevant))
            return dcg / idcg if idcg > 0 else 0.0

        for user_id in tqdm(eval_df['userId'].unique(), desc=f"Evaluating Users ({dataset_name})"):
            user_data = eval_df[eval_df['userId'] == user_id]
            if len(user_data) < k and user_id in userid2idx:
                user_idx = userid2idx[user_id]
                all_movie_indices = list(movieid2idx.keys())
                preds = [
                    global_mean + user_bias[user_idx] + item_bias[movieid2idx[mid]] + np.dot(P[user_idx], Q[movieid2idx[mid]])
                    if mid in movieid2idx
                    else global_mean + user_bias[user_idx] + avg_item_bias + np.dot(P[user_idx], avg_Q)
                    for mid in all_movie_indices
                ]
                user_data = pd.DataFrame({
                    'movieId': all_movie_indices,
                    'predicted_rating': preds,
                    'rating': [
                        eval_df[(eval_df['userId'] == user_id) & (eval_df['movieId'] == mid)]['rating'].iloc[0]
                        if mid in user_data['movieId'].values else 3.0
                        for mid in all_movie_indices
                    ]
                })

            top_k_pred = user_data.nlargest(k, 'predicted_rating')
            actual_ratings_k = top_k_pred['rating'].values
            predicted_ratings_k = top_k_pred['predicted_rating'].values

            relevant = (actual_ratings_k >= relevance_threshold).astype(int)
            precision = np.mean(relevant)
            precision_scores.append(precision)

            all_relevant = (user_data['rating'] >= relevance_threshold).sum()
            if all_relevant > 0:
                recall = np.sum(relevant) / all_relevant
                recall_scores.append(recall)

            if precision > 0 and recall > 0:
                f1 = 2 * (precision * recall) / (precision + recall)
                f1_scores.append(f1)

            ndcg = ndcg_at_k(actual_ratings_k, predicted_ratings_k, k)
            ndcg_scores.append(ndcg)

        print(f"\nMetrics for {dataset_name} Dataset (Funk SVD):")
        print(f"Accuracy (±{tolerance} tolerance): {accuracy:.4f}")
        print(f"Precision@{k}: {np.mean(precision_scores):.4f}")
        print(f"Recall@{k}: {np.mean(recall_scores):.4f}")
        print(f"F1-score@{k}: {np.mean(f1_scores):.4f}")
        print(f"NDCG@{k}: {np.mean(ndcg_scores):.4f}")

        if dataset_name == "Test":
            eval_df['relevant'] = eval_df['rating'] == 5.0
            relevant_preds = eval_df[eval_df['relevant']][['movieId', 'predicted_rating', 'rating']]
            relevant_preds['movie_title'] = relevant_preds['movieId'].map(movie_names)
            print("\nPredictions for 5.0-rated movies in Test set (Top 10):")
            print(relevant_preds.head(10).round(2))

            print(eval_df['predicted_rating'].describe().round(2))
            print(f"Mean predicted rating for 5.0: {relevant_preds['predicted_rating'].mean():.4f}")
    except Exception as e:
        print(f"Error in evaluate_dataset_funk_svd: {e}")

try:
    if not all(col in ratings_df.columns for col in ['userId', 'movieId', 'rating']):
        raise ValueError("ratings_df missing required columns: userId, movieId, rating")
    if not isinstance(movie_names, dict):
        raise ValueError("movie_names must be a dictionary mapping movieId to titles")
    print(f"Input data: {len(ratings_df)} ratings, {len(ratings_df['userId'].unique())} users, {len(ratings_df['movieId'].unique())} movies")
except Exception as e:
    print(f"Input data error: {e}")
    raise

ratings_df_filtered = filter_data(ratings_df, min_ratings_user=20, min_ratings_movie=5)

train_ratings, cv_ratings, test_ratings = split_ratings_per_user(ratings_df_filtered, random_state=456)


train_pairs = set(zip(train_ratings['userId'], train_ratings['movieId']))
cv_pairs = set(zip(cv_ratings['userId'], cv_ratings['movieId']))
test_pairs = set(zip(test_ratings['userId'], test_ratings['movieId']))
print("\nSplit Integrity Check:")
print("Train-CV overlap:", len(train_pairs & cv_pairs))
print("Train-Test overlap:", len(train_pairs & test_pairs))
print("CV-Test overlap:", len(cv_pairs & test_pairs))


try:
    userid2idx = {uid: idx for idx, uid in enumerate(ratings_df_filtered['userId'].unique())}
    movieid2idx = {mid: idx for idx, mid in enumerate(ratings_df_filtered['movieId'].unique())}
    M = create_rating_matrix(ratings_df_filtered, n_users, n_items, userid2idx, movieid2idx)
    if M is None:
        raise ValueError("Failed to create rating matrix")
    M_dense = M.toarray()
    for _, row in train_ratings.sample(100).iterrows():
        u = userid2idx[row['userId']]
        i = movieid2idx[row['movieId']]
        assert abs(M_dense[u, i] - row['rating']) < 1e-5, f"Mismatch at user {u}, item {i}: M={M_dense[u, i]}, rating={row['rating']}"
    print("Rating matrix validated.")
    L, S = rpca(M.toarray(), lambda_=0.00001, mu=1.0, k_svd=150)
    print("RPCA completed (no evaluation).")
except Exception as e:
    print(f"Error in RPCA pipeline: {e}")
    raise

try:
    P, Q, user_bias, item_bias, global_mean = funk_svd(train_ratings, L, n_users, n_items, userid2idx, movieid2idx, k=30, lr=0.02, reg=0.01, epochs=200)
    if P is None:
        raise ValueError("Funk SVD training failed")
except Exception as e:
    print(f"Error in Funk SVD pipeline: {e}")
    raise

evaluate_dataset_funk_svd(train_ratings, P, Q, user_bias, item_bias, global_mean, "Training", userid2idx, movieid2idx, movie_names, tolerance=0.25)
evaluate_dataset_funk_svd(cv_ratings, P, Q, user_bias, item_bias, global_mean, "Cross-Validation", userid2idx, movieid2idx, movie_names, tolerance=0.25)
evaluate_dataset_funk_svd(test_ratings, P, Q, user_bias, item_bias, global_mean, "Test", userid2idx, movieid2idx, movie_names, tolerance=0.25)

Input data: 100836 ratings, 610 users, 9724 movies
Filtered data: 90274 ratings, 610 users, 3650 movies

Split Integrity Check:
Train-CV overlap: 0
Train-Test overlap: 0
CV-Test overlap: 0
Rating matrix validated.


RPCA:   0%|          | 0/100 [00:00<?, ?it/s]

Error in rpca: name 'hist' is not defined
RPCA completed (no evaluation).


Funk SVD Training:   0%|          | 0/200 [00:00<?, ?it/s]

Epoch 0, RMSE: 0.9212, Max Error: 3.0810, Min Error: -3.8812, Mean Error for 5.0: 1.2109
Epoch 10, RMSE: 0.5652, Max Error: 2.3955, Min Error: -3.2913, Mean Error for 5.0: 0.5835
Epoch 20, RMSE: 0.3729, Max Error: 1.6942, Min Error: -2.2753, Mean Error for 5.0: 0.3114
Epoch 30, RMSE: 0.3012, Max Error: 1.4711, Min Error: -2.1231, Mean Error for 5.0: 0.2248
Epoch 40, RMSE: 0.2652, Max Error: 1.4305, Min Error: -1.8844, Mean Error for 5.0: 0.1866
Epoch 50, RMSE: 0.2435, Max Error: 1.4725, Min Error: -1.8887, Mean Error for 5.0: 0.1644
Epoch 60, RMSE: 0.2292, Max Error: 1.5210, Min Error: -1.6461, Mean Error for 5.0: 0.1523
Epoch 70, RMSE: 0.2188, Max Error: 1.4060, Min Error: -1.6482, Mean Error for 5.0: 0.1435
Epoch 80, RMSE: 0.2110, Max Error: 1.5178, Min Error: -1.5971, Mean Error for 5.0: 0.1374
Epoch 90, RMSE: 0.2048, Max Error: 1.3751, Min Error: -1.5759, Mean Error for 5.0: 0.1335
Epoch 100, RMSE: 0.1998, Max Error: 1.4500, Min Error: -1.5439, Mean Error for 5.0: 0.1301
Epoch 110,

Evaluating Users (Training):   0%|          | 0/610 [00:00<?, ?it/s]


Metrics for Training Dataset (Funk SVD):
Accuracy (±0.25 tolerance): 0.8825
Precision@20: 0.4348
Recall@20: 0.7872
F1-score@20: 0.5432
NDCG@20: 0.7834


Evaluating Users (Cross-Validation):   0%|          | 0/610 [00:00<?, ?it/s]


Metrics for Cross-Validation Dataset (Funk SVD):
Accuracy (±0.25 tolerance): 0.2277
Precision@20: 0.0582
Recall@20: 0.2744
F1-score@20: 0.2633
NDCG@20: 0.1695


Evaluating Users (Test):   0%|          | 0/610 [00:00<?, ?it/s]


Metrics for Test Dataset (Funk SVD):
Accuracy (±0.25 tolerance): 0.2251
Precision@20: 0.0621
Recall@20: 0.2854
F1-score@20: 0.2667
NDCG@20: 0.1804

Predictions for 5.0-rated movies in Test set (Top 10):
    movieId  predicted_rating  rating                  movie_title
7      1213              4.71     5.0            Goodfellas (1990)
8      2137              5.00     5.0       Charlotte's Web (1973)
9      1282              5.00     5.0              Fantasia (1940)
10       47              4.17     5.0  Seven (a.k.a. Se7en) (1995)
11     2141              4.26     5.0     American Tail, An (1986)
13     1256              4.59     5.0             Duck Soup (1933)
14     2094              4.31     5.0        Rocketeer, The (1991)
15     2529              4.08     5.0    Planet of the Apes (1968)
16      608              5.00     5.0                 Fargo (1996)
17     2078              4.67     5.0      Jungle Book, The (1967)
count    13827.00
mean         3.54
std          0.85
min  

## NCF

In [None]:
# def filter_data(ratings_df, min_ratings_user=20, min_ratings_movie=5):
#     try:
#         user_counts = ratings_df['userId'].value_counts()
#         movie_counts = ratings_df['movieId'].value_counts()
#         filtered_df = ratings_df[
#             (ratings_df['userId'].isin(user_counts[user_counts >= min_ratings_user].index)) &
#             (ratings_df['movieId'].isin(movie_counts[movie_counts >= min_ratings_movie].index))
#         ]
#         print(f"Filtered data: {len(filtered_df)} ratings, {len(filtered_df['userId'].unique())} users, {len(filtered_df['movieId'].unique())} movies")
#         return filtered_df
#     except Exception as e:
#         print(f"Error in filter_data: {e}")
#         return ratings_df


def create_index_mappings(ratings_df):
    user_ids = ratings_df['userId'].unique()
    movie_ids = ratings_df['movieId'].unique()
    userid2idx = {uid: idx for idx, uid in enumerate(user_ids)}
    movieid2idx = {mid: idx for idx, mid in enumerate(movie_ids)}
    return userid2idx, movieid2idx, len(user_ids), len(movie_ids)

def build_ncf_model(n_users, n_items, embed_size=64, mlp_layers=[128, 64], dropout=0.9, l2_lambda=1e-4):

    user_input = Input(shape=(1,), dtype=tf.int64, name='user_input')
    item_input = Input(shape=(1,), dtype=tf.int64, name='item_input')

    user_emb_gmf = Embedding(n_users, embed_size, name='user_emb_gmf',
                             embeddings_regularizer=regularizers.l2(l2_lambda))(user_input)
    item_emb_gmf = Embedding(n_items, embed_size, name='item_emb_gmf',
                             embeddings_regularizer=regularizers.l2(l2_lambda))(item_input)
    gmf_output = Multiply()([user_emb_gmf, item_emb_gmf])
    gmf_output = Lambda(lambda x: tf.squeeze(x, axis=1), name='gmf_squeeze')(gmf_output)

    user_emb_mlp = Embedding(n_users, embed_size, name='user_emb_mlp',
                             embeddings_regularizer=regularizers.l2(l2_lambda))(user_input)
    item_emb_mlp = Embedding(n_items, embed_size, name='item_emb_mlp',
                             embeddings_regularizer=regularizers.l2(l2_lambda))(item_input)
    mlp_input = Concatenate(axis=-1)([user_emb_mlp, item_emb_mlp])
    mlp_input = Lambda(lambda x: tf.squeeze(x, axis=1), name='mlp_squeeze')(mlp_input)

    mlp_output = mlp_input
    for i, units in enumerate(mlp_layers):
        mlp_output = Dense(units, activation='relu', name=f'mlp_dense_{i}',
                           kernel_regularizer=regularizers.l2(l2_lambda))(mlp_output)
        mlp_output = Dropout(dropout, name=f'mlp_dropout_{i}')(mlp_output)
        mlp_output = BatchNormalization(name=f'mlp_bn_{i}')(mlp_output)

    combined = Concatenate(axis=-1, name='combine_gmf_mlp')([gmf_output, mlp_output])
    output = Dense(1, activation=None, name='output',
                   kernel_regularizer=regularizers.l2(l2_lambda))(combined)

    output = Lambda(lambda x: tf.sigmoid(x) * 4.5 + 0.5, name='scale_output')(output)

    model = Model(inputs=[user_input, item_input], outputs=output, name='NCF_Model')
    return model

def train_ncf(train_df, cv_df, n_users, n_items, userid2idx, movieid2idx, embed_size=64, mlp_layers=[128, 64], dropout=0.9, l2_lambda=1e-4, lr=0.001, epochs=50, batch_size=256, patience=10):
    try:

        train_users = np.array([userid2idx[uid] for uid in train_df['userId']], dtype=np.int64)
        train_items = np.array([movieid2idx[mid] for mid in train_df['movieId']], dtype=np.int64)
        train_ratings = np.array(train_df['rating'], dtype=np.float32)

        cv_users = np.array([userid2idx[uid] for uid in cv_df['userId']], dtype=np.int64)
        cv_items = np.array([movieid2idx[mid] for mid in cv_df['movieId']], dtype=np.int64)
        cv_ratings = np.array(cv_df['rating'], dtype=np.float32)


        model = build_ncf_model(n_users, n_items, embed_size, mlp_layers, dropout, l2_lambda)
        model.compile(optimizer=Adam(learning_rate=lr), loss='mse', metrics=['mae'])


        early_stopping = EarlyStopping(monitor='val_loss', patience=patience, restore_best_weights=True, mode='min')


        history = model.fit(
            x=[train_users, train_items],
            y=train_ratings,
            validation_data=([cv_users, cv_items], cv_ratings),
            epochs=epochs,
            batch_size=batch_size,
            callbacks=[early_stopping],
            verbose=1
        )


        train_rmse = np.sqrt(history.history['loss'][-1])
        cv_rmse = np.sqrt(history.history['val_loss'][-1])
        cv_preds = model.predict([cv_users, cv_items], batch_size=batch_size, verbose=0)
        cv_pred_std = np.std(cv_preds) if cv_preds.size else 0.0

        print(f"Final Train RMSE: {train_rmse:.4f}, Best CV RMSE: {cv_rmse:.4f}, CV Pred Std: {cv_pred_std:.4f}")
        for epoch in range(0, len(history.history['loss']), 5):
            print(f"Epoch {epoch}, Train RMSE: {np.sqrt(history.history['loss'][epoch]):.4f}, CV RMSE: {np.sqrt(history.history['val_loss'][epoch]):.4f}")

        return model

    except Exception as e:
        print(f"Error in train_ncf: {e}")
        return None


def evaluate_ncf(ratings_df, model, dataset_name, userid2idx, movieid2idx, movie_names, k=20, relevance_threshold=5.0, tolerance=0.25):
    try:
        eval_df = ratings_df.copy()


        user_ids = np.array([userid2idx.get(uid, 0) for uid in eval_df['userId']], dtype=np.int64)
        item_ids = np.array([movieid2idx.get(mid, 0) for mid in eval_df['movieId']], dtype=np.int64)
        predictions = model.predict([user_ids, item_ids], batch_size=256, verbose=0)
        eval_df['predicted_rating'] = np.clip(predictions.flatten(), 0.5, 5.0)


        eval_df['error'] = abs(eval_df['predicted_rating'] - eval_df['rating'])
        eval_df['correct'] = (eval_df['error'] <= tolerance).astype(int)
        accuracy = eval_df['correct'].mean()


        precision_scores = []
        recall_scores = []
        f1_scores = []
        ndcg_scores = []

        def ndcg_at_k(actual, predicted, k):
            sorted_indices = np.argsort(predicted)[::-1][:k]
            actual_k = actual[sorted_indices]
            relevant = (actual_k >= relevance_threshold).astype(int)
            dcg = sum(rel / np.log2(i + 2) for i, rel in enumerate(relevant))
            ideal_relevant = np.sort(relevant)[::-1][:k]
            idcg = sum(rel / np.log2(i + 2) for i, rel in enumerate(ideal_relevant))
            return dcg / idcg if idcg > 0 else 0.0

        for user_id in tqdm(eval_df['userId'].unique(), desc=f"Evaluating Users ({dataset_name})"):
            user_data = eval_df[eval_df['userId'] == user_id]
            if len(user_data) < k and user_id in userid2idx:
                user_idx = userid2idx[user_id]
                all_movie_ids = list(movieid2idx.keys())
                user_array = np.array([user_idx] * len(all_movie_ids), dtype=np.int64)
                item_array = np.array([movieid2idx[mid] for mid in all_movie_ids], dtype=np.int64)
                preds = model.predict([user_array, item_array], batch_size=256, verbose=0)
                user_data = pd.DataFrame({
                    'movieId': all_movie_ids,
                    'predicted_rating': np.clip(preds.flatten(), 0.5, 5.0),
                    'rating': [
                        eval_df[(eval_df['userId'] == user_id) & (eval_df['movieId'] == mid)]['rating'].iloc[0]
                        if mid in user_data['movieId'].values else 3.0
                        for mid in all_movie_ids
                    ]
                })

            top_k_pred = user_data.nlargest(k, 'predicted_rating')
            actual_ratings_k = top_k_pred['rating'].values
            predicted_ratings_k = top_k_pred['predicted_rating'].values

            relevant = (actual_ratings_k >= relevance_threshold).astype(int)
            precision = np.mean(relevant)
            precision_scores.append(precision)

            all_relevant = (user_data['rating'] >= relevance_threshold).sum()
            if all_relevant > 0:
                recall = np.sum(relevant) / all_relevant
                recall_scores.append(recall)

            if precision > 0 and recall > 0:
                f1 = 2 * (precision * recall) / (precision + recall)
                f1_scores.append(f1)

            ndcg = ndcg_at_k(actual_ratings_k, predicted_ratings_k, k)
            ndcg_scores.append(ndcg)

        print(f"\nMetrics for {dataset_name} Dataset (NCF):")
        print(f"Accuracy (±{tolerance} tolerance): {accuracy:.4f}")
        print(f"Precision@{k}: {np.mean(precision_scores):.4f}")
        print(f"Recall@{k}: {np.mean(recall_scores):.4f}")
        print(f"F1-score@{k}: {np.mean(f1_scores):.4f}")
        print(f"NDCG@{k}: {np.mean(ndcg_scores):.4f}")


        if dataset_name == "Test":
            eval_df['relevant'] = eval_df['rating'] == 5.0
            relevant_preds = eval_df[eval_df['relevant']][['movieId', 'predicted_rating', 'rating']]
            relevant_preds['movie_title'] = relevant_preds['movieId'].map(movie_names)
            print("\nPredictions for 5.0-rated movies in Test set (Top 10):")
            print(relevant_preds.head(10).round(2))
            print(f"Mean predicted rating for 5.0: {relevant_preds['predicted_rating'].mean():.4f}")

    except Exception as e:
        print(f"Error in evaluate_ncf: {e}")

try:
    if not all(col in ratings_df.columns for col in ['userId', 'movieId', 'rating']):
        raise ValueError("ratings_df missing required columns: userId, movieId, rating")
    if not isinstance(movie_names, dict):
        raise ValueError("movie_names must be a dictionary mapping movieId to titles")
    print(f"Input data: {len(ratings_df)} ratings, {len(ratings_df['userId'].unique())} users, {len(ratings_df['movieId'].unique())} movies")
except Exception as e:
    print(f"Input data error: {e}")
    raise


ratings_df_filtered = filter_data(ratings_df, min_ratings_user=20, min_ratings_movie=5)
userid2idx, movieid2idx, n_users, n_items = create_index_mappings(ratings_df_filtered)

try:
    model = train_ncf(
        train_df, cv_df, n_users, n_items, userid2idx, movieid2idx,
        embed_size=64, mlp_layers=[128, 64], dropout=0.9, l2_lambda=1e-4,
        lr=0.001, epochs=50, batch_size=256, patience=10
    )
    if model is None:
        raise ValueError("NCF training failed")
except Exception as e:
    print(f"Error in NCF training: {e}")
    raise
evaluate_ncf(train_df, model, "Training", userid2idx, movieid2idx, movie_names, tolerance=0.25)
evaluate_ncf(cv_df, model, "Cross-Validation", userid2idx, movieid2idx, movie_names, tolerance=0.25)
evaluate_ncf(test_df, model, "Test", userid2idx, movieid2idx, movie_names, tolerance=0.25)

Input data: 100836 ratings, 610 users, 9724 movies
Filtered data: 90274 ratings, 610 users, 3650 movies
Epoch 1/50
[1m245/245[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 24ms/step - loss: 1.8392 - mae: 1.0847 - val_loss: 1.0937 - val_mae: 0.8354
Epoch 2/50
[1m245/245[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 1.1665 - mae: 0.8534 - val_loss: 1.0499 - val_mae: 0.8003
Epoch 3/50
[1m245/245[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 1.0685 - mae: 0.8080 - val_loss: 0.9511 - val_mae: 0.7491
Epoch 4/50
[1m245/245[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - loss: 0.9502 - mae: 0.7474 - val_loss: 0.8732 - val_mae: 0.7041
Epoch 5/50
[1m245/245[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - loss: 0.8671 - mae: 0.7011 - val_loss: 0.8437 - val_mae: 0.6872
Epoch 6/50
[1m245/245[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 0.8269 - mae: 0.6786 - val_loss: 0.8244 

Evaluating Users (Training): 100%|██████████| 610/610 [00:21<00:00, 28.81it/s]



Metrics for Training Dataset (NCF):
Accuracy (±0.25 tolerance): 0.2725
Precision@20: 0.2659
Recall@20: 0.5193
F1-score@20: 0.3567
NDCG@20: 0.5681


Evaluating Users (Cross-Validation): 100%|██████████| 610/610 [01:07<00:00,  8.98it/s]



Metrics for Cross-Validation Dataset (NCF):
Accuracy (±0.25 tolerance): 0.2533
Precision@20: 0.0725
Recall@20: 0.3460
F1-score@20: 0.2613
NDCG@20: 0.2272


Evaluating Users (Test): 100%|██████████| 610/610 [01:10<00:00,  8.68it/s]


Metrics for Test Dataset (NCF):
Accuracy (±0.25 tolerance): 0.2476
Precision@20: 0.0757
Recall@20: 0.3658
F1-score@20: 0.2658
NDCG@20: 0.2390

Predictions for 5.0-rated movies in Test set (Top 10):
    movieId  predicted_rating  rating  \
0       157              3.93     5.0   
1      2291              4.36     5.0   
2       260              4.78     5.0   
3      2078              4.51     5.0   
4      2427              4.36     5.0   
8       923              4.56     5.0   
9       457              4.69     5.0   
10     1625              4.30     5.0   
11     2700              4.52     5.0   
15     2005              4.35     5.0   

                                    movie_title  
0                         Canadian Bacon (1995)  
1                    Edward Scissorhands (1990)  
2     Star Wars: Episode IV - A New Hope (1977)  
3                       Jungle Book, The (1967)  
4                     Thin Red Line, The (1998)  
8                           Citizen Kane (1941)  




## NCF + Contrastive learning

In [None]:
# def filter_data(ratings_df, min_ratings_user=20, min_ratings_movie=5):
#     try:
#         user_counts = ratings_df['userId'].value_counts()
#         movie_counts = ratings_df['movieId'].value_counts()
#         filtered_df = ratings_df[
#             (ratings_df['userId'].isin(user_counts[user_counts >= min_ratings_user].index)) &
#             (ratings_df['movieId'].isin(movie_counts[movie_counts >= min_ratings_movie].index))
#         ]
#         print(f"Filtered data: {len(filtered_df)} ratings, {len(filtered_df['userId'].unique())} users, {len(filtered_df['movieId'].unique())} movies")
#         return filtered_df
#     except Exception as e:
#         print(f"Error in filter_data: {e}")
#         return ratings_df

def split_data(ratings_df, train_ratio=0.7, cv_ratio=0.15, test_ratio=0.15, random_state=42):
    try:
        train_data, cv_data, test_data = [], [], []
        for user_id, user_ratings in ratings_df.groupby('userId'):
            if len(user_ratings) < 3:
                continue
            train_cv, test = train_test_split(
                user_ratings, test_size=test_ratio, random_state=random_state, shuffle=True
            )
            train_ratio_adj = train_ratio / (train_ratio + cv_ratio)
            train, cv = train_test_split(
                train_cv, test_size=1 - train_ratio_adj, random_state=random_state, shuffle=True
            )
            train_data.append(train)
            cv_data.append(cv)
            test_data.append(test)
        train_df = pd.concat(train_data, ignore_index=True)
        cv_df = pd.concat(cv_data, ignore_index=True)
        test_df = pd.concat(test_data, ignore_index=True)
        print(f"Train: {len(train_df)} ratings, CV: {len(cv_df)} ratings, Test: {len(test_df)} ratings")
        return train_df, cv_df, test_df
    except Exception as e:
        print(f"Error in split_data: {e}")
        return ratings_df, ratings_df, ratings_df


def create_index_mappings(ratings_df):
    user_ids = ratings_df['userId'].unique()
    movie_ids = ratings_df['movieId'].unique()
    userid2idx = {uid: idx for idx, uid in enumerate(user_ids)}
    movieid2idx = {mid: idx for idx, mid in enumerate(movie_ids)}
    return userid2idx, movieid2idx, len(user_ids), len(movie_ids)


def generate_negative_samples(train_df, n_items, movieid2idx, neg_ratio=1):
    neg_users, neg_items, neg_ratings = [], [], []
    rated_items_per_user = train_df.groupby('userId')['movieId'].apply(set).to_dict()

    for user_id in train_df['userId'].unique():
        rated_items = rated_items_per_user.get(user_id, set())
        n_pos = len(rated_items)
        n_neg = int(n_pos * neg_ratio)
        unrated_items = [mid for mid in movieid2idx.keys() if mid not in rated_items]
        if len(unrated_items) < n_neg:
            continue
        neg_samples = np.random.choice(unrated_items, size=n_neg, replace=False)
        neg_users.extend([user_id] * n_neg)
        neg_items.extend(neg_samples)
        neg_ratings.extend([0.0] * n_neg)

    neg_df = pd.DataFrame({'userId': neg_users, 'movieId': neg_items, 'rating': neg_ratings})
    return neg_df


def contrastive_loss(user_emb, item_emb, ratings, temperature=0.07):
    user_emb = tf.nn.l2_normalize(user_emb, axis=-1)
    item_emb = tf.nn.l2_normalize(item_emb, axis=-1)
    similarity = tf.reduce_sum(user_emb * item_emb, axis=-1) / temperature

    target_similarity = tf.clip_by_value((ratings - 0.5) / 4.5, 0.0, 1.0)
    loss = tf.reduce_mean(tf.square(similarity - target_similarity))
    return loss


def build_ncf_model(n_users, n_items, embed_size=64, mlp_layers=[128, 64], dropout=0.9, l2_lambda=1e-5):
    user_input = Input(shape=(1,), dtype=tf.int64, name='user_input')
    item_input = Input(shape=(1,), dtype=tf.int64, name='item_input')
    rating_input = Input(shape=(1,), dtype=tf.float32, name='rating_input')


    user_emb_gmf = Embedding(n_users, embed_size, name='user_emb_gmf',
                             embeddings_regularizer=regularizers.l2(l2_lambda))(user_input)
    item_emb_gmf = Embedding(n_items, embed_size, name='item_emb_gmf',
                             embeddings_regularizer=regularizers.l2(l2_lambda))(item_input)
    gmf_output = Multiply()([user_emb_gmf, item_emb_gmf])
    gmf_output = Lambda(lambda x: tf.squeeze(x, axis=1), name='gmf_squeeze')(gmf_output)


    user_emb_mlp = Embedding(n_users, embed_size, name='user_emb_mlp',
                             embeddings_regularizer=regularizers.l2(l2_lambda))(user_input)
    item_emb_mlp = Embedding(n_items, embed_size, name='item_emb_mlp',
                             embeddings_regularizer=regularizers.l2(l2_lambda))(item_input)
    mlp_input = Concatenate(axis=-1)([user_emb_mlp, item_emb_mlp])
    mlp_input = Lambda(lambda x: tf.squeeze(x, axis=1), name='mlp_squeeze')(mlp_input)


    mlp_output = mlp_input
    for i, units in enumerate(mlp_layers):
        mlp_output = Dense(units, activation='relu', name=f'mlp_dense_{i}',
                           kernel_regularizer=regularizers.l2(l2_lambda))(mlp_output)
        mlp_output = Dropout(dropout, name=f'mlp_dropout_{i}')(mlp_output)
        mlp_output = BatchNormalization(name=f'mlp_bn_{i}')(mlp_output)


    combined = Concatenate(axis=-1, name='combine_gmf_mlp')([gmf_output, mlp_output])
    output = Dense(1, activation=None, name='output',
                   kernel_regularizer=regularizers.l2(l2_lambda))(combined)
    rating_output = Lambda(lambda x: tf.sigmoid(x) * 4.5 + 0.5, name='scale_output')(output)

    model = Model(inputs=[user_input, item_input, rating_input],
                  outputs=[rating_output, user_emb_gmf, item_emb_gmf, user_emb_mlp, item_emb_mlp])
    return model


def train_ncf(train_df, cv_df, n_users, n_items, userid2idx, movieid2idx, embed_size=64, mlp_layers=[128, 64], dropout=0.9, l2_lambda=1e-5, lr=0.0005, alpha=0.01, epochs=50, batch_size=256, patience=10):
    try:

        neg_df = generate_negative_samples(train_df, n_items, movieid2idx, neg_ratio=1)
        train_df_aug = pd.concat([train_df, neg_df], ignore_index=True)


        train_users = np.array([userid2idx[uid] for uid in train_df_aug['userId']], dtype=np.int64)
        train_items = np.array([movieid2idx[mid] for mid in train_df_aug['movieId']], dtype=np.int64)
        train_ratings = np.array(train_df_aug['rating'], dtype=np.float32)

        cv_users = np.array([userid2idx[uid] for uid in cv_df['userId']], dtype=np.int64)
        cv_items = np.array([movieid2idx[mid] for mid in cv_df['movieId']], dtype=np.int64)
        cv_ratings = np.array(cv_df['rating'], dtype=np.float32)


        model = build_ncf_model(n_users, n_items, embed_size, mlp_layers, dropout, l2_lambda)


        optimizer = Adam(learning_rate=lr, clipnorm=1.0)
        mse_loss_fn = tf.keras.losses.MeanSquaredError()

        @tf.function
        def train_step(users, items, ratings):
            with tf.GradientTape() as tape:
                rating_pred, user_emb_gmf, item_emb_gmf, user_emb_mlp, item_emb_mlp = model([users, items, ratings], training=True)
                mse_loss = mse_loss_fn(ratings, rating_pred)

                cont_loss_gmf = contrastive_loss(user_emb_gmf, item_emb_gmf, ratings, temperature=0.07)
                cont_loss_mlp = contrastive_loss(user_emb_mlp, item_emb_mlp, ratings, temperature=0.07)
                cont_loss = (cont_loss_gmf + cont_loss_mlp) / 2
                total_loss = mse_loss + alpha * cont_loss + sum(model.losses)
            gradients = tape.gradient(total_loss, model.trainable_variables)
            optimizer.apply_gradients(zip(gradients, model.trainable_variables))
            return mse_loss, cont_loss


        history = {'loss': [], 'val_loss': [], 'cont_loss': []}
        best_val_loss = float('inf')
        patience_counter = 0

        for epoch in range(epochs):
            indices = np.random.permutation(len(train_users))
            train_users_shuffled = train_users[indices]
            train_items_shuffled = train_items[indices]
            train_ratings_shuffled = train_ratings[indices]

            total_mse_loss = 0.0
            total_cont_loss = 0.0
            steps = 0
            for i in range(0, len(train_users), batch_size):
                batch_users = train_users_shuffled[i:i+batch_size]
                batch_items = train_items_shuffled[i:i+batch_size]
                batch_ratings = train_ratings_shuffled[i:i+batch_size]
                mse_loss, cont_loss = train_step(batch_users, batch_items, batch_ratings)
                total_mse_loss += mse_loss.numpy()
                total_cont_loss += cont_loss.numpy()
                steps += 1

            val_pred, _, _, _, _ = model([cv_users, cv_items, cv_ratings], training=False)
            val_loss = mse_loss_fn(cv_ratings, val_pred).numpy()

            history['loss'].append(total_mse_loss / steps)
            history['cont_loss'].append(total_cont_loss / steps)
            history['val_loss'].append(val_loss)

            print(f"Epoch {epoch+1}/{epochs}, Train MSE Loss: {total_mse_loss/steps:.4f}, Contrastive Loss: {total_cont_loss/steps:.4f}, Val Loss: {val_loss:.4f}")

            if val_loss < best_val_loss:
                best_val_loss = val_loss
                patience_counter = 0
                best_weights = model.get_weights()
            else:
                patience_counter += 1
                if patience_counter >= patience:
                    print(f"Early stopping at epoch {epoch+1}")
                    model.set_weights(best_weights)
                    break


        train_rmse = np.sqrt(history['loss'][-1])
        cv_rmse = np.sqrt(history['val_loss'][-1])
        cv_preds = model.predict([cv_users, cv_items, cv_ratings], batch_size=batch_size, verbose=0)[0]
        cv_pred_std = np.std(cv_preds) if cv_preds.size else 0.0

        print(f"Final Train RMSE: {train_rmse:.4f}, Best CV RMSE: {cv_rmse:.4f}, CV Pred Std: {cv_pred_std:.4f}")
        for epoch in range(0, len(history['loss']), 5):
            print(f"Epoch {epoch}, Train RMSE: {np.sqrt(history['loss'][epoch]):.4f}, CV RMSE: {np.sqrt(history['val_loss'][epoch]):.4f}")

        return model

    except Exception as e:
        print(f"Error in train_ncf: {e}")
        return None


def evaluate_ncf(ratings_df, model, dataset_name, userid2idx, movieid2idx, movie_names, k=20, relevance_threshold=5.0, tolerance=0.25):
    try:
        eval_df = ratings_df.copy()


        user_ids = np.array([userid2idx.get(uid, 0) for uid in eval_df['userId']], dtype=np.int64)
        item_ids = np.array([movieid2idx.get(mid, 0) for mid in eval_df['movieId']], dtype=np.int64)
        ratings_dummy = np.zeros_like(user_ids, dtype=np.float32)
        predictions = model.predict([user_ids, item_ids, ratings_dummy], batch_size=256, verbose=0)[0]
        eval_df['predicted_rating'] = np.clip(predictions.flatten(), 0.5, 5.0)


        eval_df['error'] = abs(eval_df['predicted_rating'] - eval_df['rating'])
        eval_df['correct'] = (eval_df['error'] <= tolerance).astype(int)
        accuracy = eval_df['correct'].mean()

        precision_scores = []
        recall_scores = []
        f1_scores = []
        ndcg_scores = []

        def ndcg_at_k(actual, predicted, k):
            sorted_indices = np.argsort(predicted)[::-1][:k]
            actual_k = actual[sorted_indices]
            relevant = (actual_k >= relevance_threshold).astype(int)
            dcg = sum(rel / np.log2(i + 2) for i, rel in enumerate(relevant))
            ideal_relevant = np.sort(relevant)[::-1][:k]
            idcg = sum(rel / np.log2(i + 2) for i, rel in enumerate(ideal_relevant))
            return dcg / idcg if idcg > 0 else 0.0

        for user_id in tqdm(eval_df['userId'].unique(), desc=f"Evaluating Users ({dataset_name})"):
            user_data = eval_df[eval_df['userId'] == user_id]
            if len(user_data) < k and user_id in userid2idx:
                user_idx = userid2idx[user_id]
                all_movie_ids = list(movieid2idx.keys())
                user_array = np.array([user_idx] * len(all_movie_ids), dtype=np.int64)
                item_array = np.array([movieid2idx[mid] for mid in all_movie_ids], dtype=np.int64)
                ratings_dummy = np.zeros_like(user_array, dtype=np.float32)
                preds = model.predict([user_array, item_array, ratings_dummy], batch_size=256, verbose=0)[0]
                user_data = pd.DataFrame({
                    'movieId': all_movie_ids,
                    'predicted_rating': np.clip(preds.flatten(), 0.5, 5.0),
                    'rating': [
                        eval_df[(eval_df['userId'] == user_id) & (eval_df['movieId'] == mid)]['rating'].iloc[0]
                        if mid in user_data['movieId'].values else 3.0
                        for mid in all_movie_ids
                    ]
                })

            top_k_pred = user_data.nlargest(k, 'predicted_rating')
            actual_ratings_k = top_k_pred['rating'].values
            predicted_ratings_k = top_k_pred['predicted_rating'].values

            relevant = (actual_ratings_k >= relevance_threshold).astype(int)
            precision = np.mean(relevant)
            precision_scores.append(precision)

            all_relevant = (user_data['rating'] >= relevance_threshold).sum()
            if all_relevant > 0:
                recall = np.sum(relevant) / all_relevant
                recall_scores.append(recall)

            if precision > 0 and recall > 0:
                f1 = 2 * (precision * recall) / (precision + recall)
                f1_scores.append(f1)

            ndcg = ndcg_at_k(actual_ratings_k, predicted_ratings_k, k)
            ndcg_scores.append(ndcg)

        print(f"\nMetrics for {dataset_name} Dataset (NCF):")
        print(f"Accuracy (±{tolerance} tolerance): {accuracy:.4f}")
        print(f"Precision@{k}: {np.mean(precision_scores):.4f}")
        print(f"Recall@{k}: {np.mean(recall_scores):.4f}")
        print(f"F1-score@{k}: {np.mean(f1_scores):.4f}")
        print(f"NDCG@{k}: {np.mean(ndcg_scores):.4f}")


        if dataset_name == "Test":
            eval_df['relevant'] = eval_df['rating'] == 5.0
            relevant_preds = eval_df[eval_df['relevant']][['movieId', 'predicted_rating', 'rating']]
            relevant_preds['movie_title'] = relevant_preds['movieId'].map(movie_names)
            print("\nPredictions for 5.0-rated movies in Test set (Top 10):")
            print(relevant_preds.head(10).round(2))

    except Exception as e:
        print(f"Error in evaluate_ncf: {e}")


try:
    if not all(col in ratings_df.columns for col in ['userId', 'movieId', 'rating']):
        raise ValueError("ratings_df missing required columns: userId, movieId, rating")
    if not isinstance(movie_names, dict):
        raise ValueError("movie_names must be a dictionary mapping movieId to titles")
    print(f"Input data: {len(ratings_df)} ratings, {len(ratings_df['userId'].unique())} users, {len(ratings_df['movieId'].unique())} movies")
except Exception as e:
    print(f"Input data error: {e}")
    raise


ratings_df_filtered = filter_data(ratings_df, min_ratings_user=20, min_ratings_movie=5)
train_df, cv_df, test_df = split_data(ratings_df_filtered, random_state=42)
userid2idx, movieid2idx, n_users, n_items = create_index_mappings(ratings_df_filtered)


try:
    model = train_ncf(
        train_df, cv_df, n_users, n_items, userid2idx, movieid2idx,
        embed_size=64, mlp_layers=[128, 64], dropout=0.9, l2_lambda=1e-5,
        lr=0.0005, alpha=0.01, epochs=50, batch_size=256, patience=10
    )
    if model is None:
        raise ValueError("NCF training failed")
except Exception as e:
    print(f"Error in NCF training: {e}")
    raise


evaluate_ncf(train_df, model, "Training", userid2idx, movieid2idx, movie_names, tolerance=0.25)
evaluate_ncf(cv_df, model, "Cross-Validation", userid2idx, movieid2idx, movie_names, tolerance=0.25)
evaluate_ncf(test_df, model, "Test", userid2idx, movieid2idx, movie_names, tolerance=0.25)

Input data: 100836 ratings, 610 users, 9724 movies
Filtered data: 90274 ratings, 610 users, 3650 movies
Train: 62620 ratings, CV: 13827 ratings, Test: 13827 ratings
Epoch 1/50, Train MSE Loss: 4.2266, Contrastive Loss: 3.4101, Val Loss: 3.8130
Epoch 2/50, Train MSE Loss: 3.7327, Contrastive Loss: 2.2342, Val Loss: 3.8684
Epoch 3/50, Train MSE Loss: 3.5720, Contrastive Loss: 1.9123, Val Loss: 3.8465
Epoch 4/50, Train MSE Loss: 3.2708, Contrastive Loss: 1.7458, Val Loss: 3.4799
Epoch 5/50, Train MSE Loss: 3.0601, Contrastive Loss: 1.6282, Val Loss: 3.3447
Epoch 6/50, Train MSE Loss: 2.9128, Contrastive Loss: 1.4980, Val Loss: 3.2998
Epoch 7/50, Train MSE Loss: 2.7373, Contrastive Loss: 1.3951, Val Loss: 3.1502
Epoch 8/50, Train MSE Loss: 2.4924, Contrastive Loss: 1.3064, Val Loss: 3.0178
Epoch 9/50, Train MSE Loss: 2.2239, Contrastive Loss: 1.2434, Val Loss: 2.9480
Epoch 10/50, Train MSE Loss: 1.9619, Contrastive Loss: 1.1917, Val Loss: 2.8716
Epoch 11/50, Train MSE Loss: 1.7241, Contras

Evaluating Users (Training): 100%|██████████| 610/610 [00:25<00:00, 24.00it/s]



Metrics for Training Dataset (NCF):
Accuracy (±0.25 tolerance): 0.1668
Precision@20: 0.2778
Recall@20: 0.5623
F1-score@20: 0.3445
NDCG@20: 0.6258


Evaluating Users (Cross-Validation): 100%|██████████| 610/610 [01:27<00:00,  6.98it/s]



Metrics for Cross-Validation Dataset (NCF):
Accuracy (±0.25 tolerance): 0.1101
Precision@20: 0.0706
Recall@20: 0.3977
F1-score@20: 0.2255
NDCG@20: 0.2474


Evaluating Users (Test): 100%|██████████| 610/610 [01:25<00:00,  7.11it/s]


Metrics for Test Dataset (NCF):
Accuracy (±0.25 tolerance): 0.1109
Precision@20: 0.0730
Recall@20: 0.3947
F1-score@20: 0.2342
NDCG@20: 0.2478

Predictions for 5.0-rated movies in Test set (Top 10):
    movieId  predicted_rating  rating  \
0       157              2.03     5.0   
1      2291              0.90     5.0   
2       260              4.50     5.0   
3      2078              4.13     5.0   
4      2427              3.04     5.0   
8       923              1.96     5.0   
9       457              3.39     5.0   
10     1625              2.05     5.0   
11     2700              3.94     5.0   
15     2005              4.12     5.0   

                                    movie_title  
0                         Canadian Bacon (1995)  
1                    Edward Scissorhands (1990)  
2     Star Wars: Episode IV - A New Hope (1977)  
3                       Jungle Book, The (1967)  
4                     Thin Red Line, The (1998)  
8                           Citizen Kane (1941)  




# Conclusion

The primary metrics for evaluating this project are Recall@20, which quantifies the capability to suggest relevant items, and NDCG@20, which evaluates the quality of ranked recommendations. The performance on the test set for these metrics is as follows:

- Matrix Factorization: Recall@20 = 0.3712, NDCG@20 = 0.2311
- Tuned Matrix Factorization: Recall@20 = 0.3739, NDCG@20 = 0.2372
- Funk SVD + RPCA: Recall@20 = 0.2854, NDCG@20 = 0.1804
- Neural Collaborative Filtering: Recall@20 = 0.3658, NDCG@20 = 0.2390
- NCF + Contrastive Learning: Recall@20 = 0.3947, NDCG@20 = 0.2478

NCF with Contrastive Learning surpassed all other models, achieving the highest Recall@20 (0.3947) and NDCG@20 (0.2478) on the test set, showcasing its exceptional ability to deliver pertinent movie recommendations and prioritize them effectively. Tuned Matrix Factorization performed closely behind, with Recall@20 of 0.3739 and NDCG@20 of 0.2372, marginally outperforming standard Matrix Factorization (Recall@20 = 0.3712, NDCG@20 = 0.2311). Despite excelling during training (Recall@20 = 0.7872, NDCG@20 = 0.7834), Funk SVD + RPCA exhibited the poorest generalization, with Recall@20 of 0.2854 and NDCG@20 of 0.1804. In comparison to established methods like SVD++ and NCF on larger datasets, our models fall short, primarily due to the constrained size of the MovieLens ml-latest-small dataset. Nonetheless, NCF with Contrastive Learning demonstrates strong ranking performance. Techniques such as data filtering, regularization, and contrastive learning substantially improved outcomes. Future efforts will focus on incorporating content-based attributes, exploring sophisticated ensemble approaches, and evaluating on larger datasets to enhance Recall@20 and NDCG@20 further.

# References

- He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T.-S. (2017). Neural collaborative filtering. In *Proceedings of the 26th International Conference on World Wide Web* (pp. 173–182). International World Wide Web Conferences Steering Committee. https://doi.org/10.1145/3038912.3052569
- Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. *Computer*, 42(8), 30–37. https://doi.org/10.1109/MC.2009.263
- Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In *Advances in Neural Information Processing Systems* (pp. 8024–8035). https://pytorch.org/
- Hug, N. (2020). Surprise: A Python library for recommender systems. *Journal of Open Source Software*, 5(52), 2174. https://doi.org/10.21105/joss.02174
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. *Journal of Machine Learning Research*, 12, 2825–2830. https://scikit-learn.org/
- Harper, F. M., & Konstan, J. A. (2015). The MovieLens datasets: History and context. *ACM Transactions on Interactive Intelligent Systems*, 5(4), 19:1–19:19. https://doi.org/10.1145/2827872