# Running a dummy model

This notebook is a near replica of our [Colab notebook](https://colab.research.google.com/drive/1QeXglfCUEcscHB6L0Gch2qDKDDlfwLlq?usp=sharing#scrollTo=qYkGWADDxdlL) which demonstrates the same thing. 

This is the official repository for [EvalRS @ KDD 2023](https://reclist.io/kdd2023-cup/): _a Well-Rounded Evaluation of Recommender Systems_.

During KDD 2023 we will host a pizza hackathon night, where participants will pursue innovative projects for the rounded evaluation of recommender systems. The aim of the hackathon is to evaluate recommender systems across a set of important dimensions (accuracy being _one_ of them) through a principled and re-usable sets of abstractions, as provided by [RecList](https://github.com/jacopotagliabue/reclist) 🚀.

Organizers will provide in advance an open dataset and tools to help the teams, and award monetary prizes for the best projects. Everything will go back to the community as open source contributions!

This is a basic notebook that should get you up to speed on how to write code and models for EvalRS2023. We suggest running this notebook using either venv or conda to manage your python dependencies. This notebook makes the following assumptions:

- unix-like 
- write access to your `$HOME` dir
- Jupyter kernel running correctly. See [this guide on Jupyter + virtual environments for setup](https://janakiev.com/blog/jupyter-virtual-envs/).


The first cell is a setup cell that will clone the reclist repo localling and install it into your environment.

In [None]:
%%sh
pushd $HOME/
# make a workspace for us
mkdir -p reclist_workspace
cd reclist_workspace
git clone https://github.com/Reclist/reclist/
cd reclist
echo "*********installing reclist requirements**************"
pip install -e .
popd
echo "*********installing kdd 2023 requirements**************"
pip install -r requirements.txt

The following cell ensures that we add the `evalRS-KDD-2023` repo to the working path so our kernel can load it correctly and loads the rest of our dependencies. 

In [1]:
import os
import sys
import pandas as pd
import numpy as np
sys.path.append(os.path.abspath('./evaluation'))

from EvalRSRunner import ChallengeDataset
from EvalRSReclist import EvalRSReclist
from reclist.reclist import LOGGER, METADATA_STORE

In [2]:
# will automatically load it if it finds a cached copy locally.
dataset = ChallengeDataset()

LFM dataset already downloaded. Skipping download.
Loading dataset.
Generating Train/Test Split.
Generating dataset hashes.


Our basic class wrapper that shows how to define `predict`.

In [3]:
class EvalRSSimpleModel(object):
    """
    This is a dummy model that returns random predictions on the EvalRS dataset.
    """
    def __init__(self, items: pd.DataFrame, top_k: int=10, **kwargs):
        self.items = items
        self.top_k = top_k
        print("Received additional arguments: {}".format(kwargs))

    def predict(self, user_ids: pd.DataFrame) -> pd.DataFrame:
        k = self.top_k
        num_users = len(user_ids)
        pred = self.items.sample(n=k*num_users, replace=True).index.values
        pred = pred.reshape(num_users, k)
        pred = np.concatenate((user_ids[['user_id']].values, pred), axis=1)
        return pd.DataFrame(pred, columns=['user_id', *[str(i) for i in range(k)]]).set_index('user_id')

In [4]:
# dummy model
my_df_model = EvalRSSimpleModel(dataset.df_tracks, top_k=100)
# get some predictions
df_predictions = my_df_model.predict(dataset._get_test_set(fold=0)[['user_id']])

Received additional arguments: {}


In [None]:
df_predictions

In [6]:
# initialize with everything
cdf = EvalRSReclist(
    dataset=dataset,
    model_name="SimpleModel",
    predictions=df_predictions,
    logger=LOGGER.LOCAL,
    metadata_store=METADATA_STORE.LOCAL,
)

# run reclist
cdf(verbose=True)

Output()