# RecSys Tutorial for the EvalRS 2023 hackaton 

## RecList sample evaluation notebook from saved predictions

In this notebook we provide an example on how you can take a parquet file with predictions/recommendations from a model and use [RecList](https://github.com/RecList/reclist) for evaluation.  
In this case, the prediction files are output from the models trained on [Merlin tutorial notebook](evalrs_kdd_2023_tutorial_retrieval_models_with_merlin_tf.ipynb).

## Setup

In [None]:
%%sh
pushd $HOME/
# make a workspace for us
mkdir -p reclist_workspace
cd reclist_workspace
git clone https://github.com/Reclist/reclist/
cd reclist
echo "*********installing reclist requirements**************"
pip install -e .
popd
echo "*********installing kdd 2023 requirements**************"
pip install -r requirements.txt

In [2]:
import os
import sys
import pandas as pd
import numpy as np
import shutil
from functools import partial
sys.path.append(os.path.abspath('../../evaluation'))

from EvalRSRunner import ChallengeDataset
from EvalRSReclist import EvalRSReclist
from reclist.reclist import LOGGER, METADATA_STORE

Installing RecList and dependencies

## Dataset download and split

Let's first download and uncompress the dataset and all its tables.  
In order to reduce the dataset for faster and less memory intensive computation, we set `sample_users_perc=0.25` to sample 25% of the users and keep only their events. We also set `min_user_item_freq=10` so that we have users and items with a mininum frequency of 10.  
**Please don't change the options of ChallengeDataset, otherwise the set of users won't match the ones we used fo r generating predictions**

In [9]:
# note, if YES, the dataset will be donwloaded again
dataset = ChallengeDataset(force_download=False, 
                           folded_dataset_split = False,
                           sample_users_perc=0.25,
                           min_user_item_freq=10)  

Downloading LFM dataset...
Downloading to /root/.cache/evalrs/evalrs_dataset.zip...


evalrs_dataset_KDD_2023.zip: 100%|█████████| 1.60G/1.60G [02:15<00:00, 12.6MB/s]


Loading dataset.
Generating dataset hashes.


The test set is defined as the last interaction for each user. So all users and items in test set are present in test set, as we are not focused in exploring the user/item cold-start problem in this example of music streaming domain.

In [10]:
events_train_df, events_test_df = dataset._get_train_set(), dataset._get_test_set()
print(len(events_train_df), len(events_test_df))

6869679 29722


## Predictions download

For your convenience, we have already run the above notebook in a V100 GPU and saved the [model prediction parquet files](https://drive.google.com/file/d/1PrFP5KWvU8tlMRgFst_nXPYYq-_2q-sY/view?usp=sharing).  Download that file and uncompress so that files are inside the folder where this notebook is located.


## Evaluating the prediction file

We made available prediction files for the following retrieval models:
- `mf_test_preds.parquet` - Matrix Factorization (uses only user id and item id)
- `tt_test_preds.parquet` - Two-Tower architecture (uses many user and item features)

In [11]:
preds_df = pd.read_parquet('evalrs23_model_predictions/model_predictions/mf_test_preds.parquet')
preds_df

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
50967444,15310,20166813,926150,17820138,106622,67791,124951,645045,15306,20166814,...,15314,58863,67583,282393,266914,198971,274461,11263914,284270,136145
50900118,29448037,114443,114449,700753,109073,41288,29061,109633,602229,614225,...,210911,179201,21897545,12602652,652288,114451,817817,29477042,109118,29450642
50086315,217676,214609,211420,213432,211529,214756,211425,211527,220022,216227,...,13455577,214490,8166,216264,214642,216256,13455574,211732,215612,212474
50085736,102667,102663,102640,102659,102658,102642,118966,102662,102668,102648,...,11167153,102594,150747,3540,11167154,108573,106121,17716901,38333,18590
50083088,91840,95477,78164,90245,93505,90520,247378,91843,78157,78163,...,92220,451166,1374552,91934,93503,115490,98673,98677,94888,90810
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15067,12654389,12654387,12654386,12654388,7894841,12654384,13468147,12654385,12654383,877324,...,7101845,877378,799040,104064,877379,1056660,2275832,297343,245076,608414
14308,161793,544088,151742,152099,161048,129676,162617,1015443,162626,162621,...,6956,332092,329225,22076,152096,9319564,679190,152093,52692,204425
12619,17145266,12890610,17145268,17145267,17145264,17145263,17145262,5478591,16592156,799197,...,297344,17145265,13667842,26125385,2036766,16413496,296683,1128056,12288156,26031145
10879,53645,49656,82547,1744,91521,82548,91522,1746,37988,82306,...,137486,166068,56387,59695,82620,163304,29220,1749,59697,166041


Here are the `RecList` metrics for these predictions on LastFM users last' listened track

In [12]:
# initialize with everything
cdf = EvalRSReclist(
    dataset=dataset,
    model_name="SimpleModel",
    predictions=preds_df,
    logger=LOGGER.LOCAL,
    metadata_store=METADATA_STORE.LOCAL,
)

# run reclist
cdf(verbose=True)