# Benchmark with Movielens dataset
The main purpose of this notebook is not to produce comprehensive benchmarking results on multiple datasets. Rather, it is intended to evaluate different recommender algorithms(SVD, LightGCN, Transformer and our algorithm) in this repository.

* Datasets
  * [Movielens 100K](https://grouplens.org/datasets/movielens/100k/).
  * [Movielens 1M](https://grouplens.org/datasets/movielens/1m/).

* Data split
  * TODO
  

* Evaluation metrics
  * Ranking metrics:
    * Precision@k.
    * Recall@k.
    * Normalized discounted cumulative gain@k (NDCG@k).
    * Mean-average-precision (MAP). 
  * Rating metrics:
    * Root mean squared error (RMSE).
    * Mean average error (MAE).
    * R squared.
    * Explained variance.

In [6]:
!pip install torch



In [7]:
import warnings
warnings.filterwarnings("ignore")
import logging
logging.basicConfig(level=logging.ERROR) 

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

from utils.dataloader import load_data_df, load_item_df, load_user_features, maybe_download
from utils.benchmark import svd_model_benchmark

In [8]:
def generate_summary(data, algo, k, rating_metrics, ranking_metrics):
    summary = {"Data": data, "Algo": algo, "K": k}
    if rating_metrics is None:
        rating_metrics = {
            "RMSE": np.nan,
            "MAE": np.nan,
            "R2": np.nan,
            "Explained Variance": np.nan,
        }
    if ranking_metrics is None:
        ranking_metrics = {
            "MAP": np.nan,
            "nDCG@k": np.nan,
            "Precision@k": np.nan,
            "Recall@k": np.nan,
        }
    summary.update(rating_metrics)
    summary.update(ranking_metrics)
    return summary

In [9]:
def benchmark_recommenders():
    cols = ["Data", "Algo", "K", "RMSE", "MAE", "R2", "Explained Variance", "MAP", "nDCG@k", "Precision@k", "Recall@k"]
    df_results = pd.DataFrame(columns=cols)
    sizes = ["100k"]
    algos=["svd"]
    models={"svd":svd_model_benchmark,}
    for size in sizes:
        for algo in algos:
            ratings, rankings = models[algo](size)
            summary = generate_summary(size, algo, 10, ratings, rankings)
            df_results.loc[df_results.shape[0] + 1] = summary
    return df_results



In [10]:
df_results = benchmark_recommenders()
df_results

file is already exist
/Users/sun/Desktop/NUS/CS5248/project/movie_recommdender/dinghui101/data/ml-100k/u.data
file is already exist
/Users/sun/Desktop/NUS/CS5248/project/movie_recommdender/dinghui101/data/ml-100k/u.data


Unnamed: 0,Data,Algo,K,RMSE,MAE,R2,Explained Variance,MAP,nDCG@k,Precision@k,Recall@k
1,100k,svd,10,0.946796,0.744555,0.291006,0.291048,0.016135,0.113928,0.100743,0.035116
