## Random model tutorial


In this notebook, we present a more verbose version of the standard submission.py script, with the aim of explaining in detail how the main abstractions work and showing how easy it is to partecipate in the challenge. 

_NOTE_: this notebook is meant as a coding guide to the evaluation script, and a walk-through baseline submission to explain how to partecipate in the challenge. While you're free to experiment with this or other notebooks and even submit to the leaderboard from here, the _final_ submission should comply with the template scripts, as explained in the README.

Please contact the organizers on Slack should you have any doubt.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
# check we are using the right interpreter with the right RecList version
!which python

In [None]:
import os
import sys
sys.path.insert(0, '../')

_Basic imports, read the credentials from the env file_

In [None]:
import numpy as np
import pandas as pd
from dotenv import load_dotenv

load_dotenv('../upload.env')

EMAIL = os.getenv('EMAIL')  # the e-mail you used to sign up
assert EMAIL != '' and EMAIL is not None
BUCKET_NAME = os.getenv('BUCKET_NAME') # you received it in your e-mail
PARTICIPANT_ID = os.getenv('PARTICIPANT_ID') # you received it in your e-mail
AWS_ACCESS_KEY = os.getenv('AWS_ACCESS_KEY') # you received it in your e-mail
AWS_SECRET_KEY = os.getenv('AWS_SECRET_KEY') # you received it in your e-mail
UPLOAD = bool(os.getenv('UPLOAD'))  # it's a boolean, True if you want to upload your submission
LIMIT = int(os.getenv('LIMIT'))  # limit the number of test cases, for quick tests / iterations. 0 for no limit
FOLDS = int(os.getenv('FOLDS'))  # number of folds for evaluation
TOP_K = int(os.getenv('TOP_K'))  # number of recommendations to be provided by the model

print("Submission will be uploaded: {}".format(UPLOAD))
if LIMIT > 0:
    print("\nWARNING: only {} test cases will be used".format(LIMIT))
if FOLDS != 4 or TOP_K != 20 or LIMIT != 0:
    print("\nWARNING: default values are not used - the evaluation will run locally but won't be considered for the leaderboard")

_NOTE: as long as there is a limit specified, the runner won't upload results: make sure to have LIMIT=0 when you want to submit to the leaderboard!_

In [None]:
from evaluation.EvalRSRunner import EvalRSRunner
from evaluation.EvalRSRecList import EvalRSRecList
from reclist.abstractions import RecModel

_Declare our model, in this case, a random generator: any model needs to include an implementation of "predict", taking user IDs as input and returning a DataFrame with predictions as output._

In [None]:
class RandomModel(RecModel):
    
    def __init__(self, items: pd.DataFrame, top_k: int=20):
        super(RandomModel, self).__init__()
        self.items = items
        self.top_k = top_k

    def predict(self, user_ids: pd.DataFrame) -> pd.DataFrame:
        """
        
        This function takes as input all the users that we want to predict the top-k items for, and 
        returns all the predicted songs.

        While in this example is just a random generator, the same logic in your implementation 
        would allow for batch predictions of all the target data points.
        
        """
        k = self.top_k
        num_users = len(user_ids)
        pred = self.items.sample(n=k*num_users, replace=True).index.values
        pred = pred.reshape(num_users, k)
        pred = np.concatenate((user_ids[['user_id']].values, pred), axis=1)
        return pd.DataFrame(pred, columns=['user_id', *[str(i) for i in range(k)]]).set_index('user_id')

_We inherit from EvalRSRunner, and implement the required method, train_model: train_model encapsulate all your training logic, and should return any model class, wrapping predictions as shown above._

RandomEvalRSRunner is the Python object that will run the evaluation loop and provide utility functions to access data assets, tests, and upload results to the leaderboard.

In [None]:
class MyEvalRSRunner(EvalRSRunner):
    
    def train_model(self, train_df: pd.DataFrame, **kwargs):
        """
        Implement here your training logic. Since our example method is a simple random model,
        we actually don't use any training data to build the model, but you should ;-)

        At the end of training, you should return a model class that implements the `predict` method,
        as RandomModel does.
        """
        # kwargs may contain additional arguments in case, for example, you 
        # have data augmentation strategies
        print("Received additional arguments: {}".format(kwargs))
        return RandomModel(self.df_tracks, top_k=20)

_We initiliaze the runner with our credentials_

In [None]:
runner = MyEvalRSRunner(
    num_folds=FOLDS,
    aws_access_key_id=AWS_ACCESS_KEY,
    aws_secret_access_key=AWS_SECRET_KEY,
    participant_id=PARTICIPANT_ID,
    bucket_name=BUCKET_NAME,
    email=EMAIL)

_Let's inspect the main data assets first_

In [None]:
runner.df_tracks.head()

In [None]:
runner._get_train_set(3).head()

In [None]:
runner.df_users.head()

_Finally, we run the evaluation code_

In [None]:
runner.evaluate(upload=UPLOAD, limit=LIMIT, top_k=TOP_K)

### Customizing RecList

A huge motivation behind the Challenge is building as a community shareable insights in the form of working tests for our use case.

While your leaderboard score is ONLY influenced by the official tests as stated in the evaluation README, we encourage your final submissions to also include custom tests that you found helpful / insightful when improving your model.

The snippet below shows a working example of how to _extend_ the default RecList with additional tests, and run the evaluation code.

In [None]:
from reclist.abstractions import rec_test

class myRecList(EvalRSRecList):
    
    @rec_test(test_type='custom_test')
    def lucky_user_test(self):
        """
        Custom test, returning my lucky user from the catalog
        """
        from random import choice

        return {
          "luck_user": str(choice(self._x_test['user_id'].unique())) 
        }


_Re-run the evaluation with the additional test, which gets executed together with the default ones that produce the leaderboard score._

In [None]:
runner.evaluate(upload=UPLOAD, limit=LIMIT, custom_RecList=myRecList, top_k=TOP_K)

### Final submission to the committee

Since this is a code competition, you'll be required to submit your repository for statistical verification of your scores. 

Please consult the README carefully to make sure your project complies with the rules and follows the provided template script.