Test Deepchecks for Recommender System
===============================================================================

Deepchecks Recommender  is your go-to tool for developing and evaluating recommender system models, ensuring their robustness before deployment. Our comprehensive testing package not only detects potential failures but also saves you valuable development time. In this quickstart guide, you'll learn how to utilize Deepchecks Recommender to analyze and evaluate various aspects of your recommender system, including data quality, leakage, product associations, cold start detection, and drift. Let's get started.

**Step 1: Data Preparation and Auto Analysis**
---------------------------------------------

To run Deepchecks Recommender, make sure you have the following data for both your training and testing sets:

1. User-Item Interaction Data: A structured dataset containing information about user-item interactions. Each record represents a user's interaction with an item, such as viewing, purchasing, or rating.

2. Product Information: Additional information about the items in your catalog, like product categories, descriptions, or features.

3. User Information (Optional): If available, user-specific data such as demographics, preferences, or historical behavior can enhance the evaluation.

4. Your labels : These are not needed for checks that don't require labels (such as the Cold Start Detection check or most data integrity checks), but are needed for many other checks.

5. Your model's predictions: These are needed only for the model related checks, shown in the Model Evaluation section of this guide.

What has been done
===============

- pass Pylint for all the checks ( docstring, snake_case naming ...)

- Speeding up checks (some checks took 50 sec to run like TrainTestOverlap).

- fixing the PR (logic, class inheritance, docstring).

- First version of the quickstart available to use, which is simple and straightforward.

Setting Up
----------

In [2]:
import pandas as pd
import numpy as np
from datetime import datetime
from collections import defaultdict, Counter

### Helper functions

In [3]:
# Create train and validation split.
def split_train_validation(interaction_df : pd.DataFrame,
                           session_col : str,
                           item_col : str,
                           timestamp_col : str, 
                           test_percentage=0.2,
                           random_seed=None):
    
    assert set([session_col, item_col, timestamp_col]).issubset(interaction_df.columns)
    np.random.seed(random_seed)
    
    interaction_df = interaction_df.sort_values(timestamp_col,ascending=True)
    split_index = int(len(interaction_df) * (1 - test_percentage))

    train = interaction_df.iloc[:split_index]
    test = interaction_df.iloc[split_index:]

    # Let's discard overlapping sessions to make train and valid sets disjoints and independent.
    overlapping_sessions = set(train[session_col]).intersection(set(test[session_col]))

    test = test[~test[session_col].isin(overlapping_sessions)]

    data_to_calculate_validation_score = []
    new_test = []
    for grp in test.groupby(session_col):
        cutoff = np.random.randint(1, grp[1].shape[0]) # we want at least a single item in our validation data for each userId
        new_test.append(grp[1].iloc[:cutoff])
        data_to_calculate_validation_score.append(grp[1].iloc[cutoff:])

    test = pd.concat(new_test).reset_index(drop=True)
    
    test_labels = pd.concat(data_to_calculate_validation_score).reset_index(drop=True)
    assert test[timestamp_col].max() < test_labels[timestamp_col].max()
    
    test_labels = test_labels.groupby(session_col)[item_col].apply(list)
    assert (test[session_col].unique() == test_labels.index.values).sum()
    
    return train,test,test_labels

# Create user features.
def user_features(X_interaction : pd.DataFrame):
    user_df = pd.DataFrame()
    user_df['userId'] = X_interaction['userId'].unique()
    user_df['mean_rating'] = X_interaction.groupby("userId")['rating'].mean().values
    user_df['median_rating'] = X_interaction.groupby("userId")['rating'].median().values
    user_df['std_rating'] = X_interaction.groupby("userId")['rating'].std().values
    user_df['session_length'] = X_interaction.groupby("userId")['rating'].count().values
    return user_df

### Load Data


For the purpose of this guide, we\'ll use a small subset of the [movieLens](https://grouplens.org/datasets/movielens/) dataset:

In [4]:
%time 

# Load interaction data.
df = pd.read_csv("/Users/rayanaay/Downloads/testing_deepchecks/ml-latest-small/ratings.csv")

# Load item data.
movie_df =  pd.read_csv("/Users/rayanaay/Downloads/testing_deepchecks/ml-latest-small/movies.csv")

# Split interaction data into train and validation data.
df['timestamp'] = df['timestamp'].apply(lambda x : datetime.fromtimestamp(x))
X_train_interactions, X_test_interactions, y_test = split_train_validation(interaction_df=df,
                                           session_col='userId',
                                           item_col ='movieId',
                                           timestamp_col='timestamp')

# Create User Dataframe
train_users_df = user_features(X_train_interactions)
valid_users_df = user_features(X_test_interactions)

# Add targets to the valid users
valid_users_df = pd.merge(valid_users_df,y_test.rename('target'),how="left",on="userId")

CPU times: user 1 µs, sys: 0 ns, total: 1 µs
Wall time: 2.86 µs


Create a Recommender Datasets
================================

We can now create a Dataset object for the train and test dataframes. This object is
used to pass your data to the deepchecks checks.

To create a Recommender Dataset, the only required argument is the data
itself, but passing only the data will prevent multiple checks from
running. In this example we\'ll define the task type
and finally define the
metadata columns (the other columns in the dataframe) which we\'ll use later on in the
guide.

In [5]:
!ls

ABOUT.rst                           [34mdeepchecks[m[m
CITATION.cff                        deepchecks_recsys_quickstart2.ipynb
CODE_OF_CONDUCT.md                  [34mdocs[m[m
CONTRIBUTING.rst                    [34mexamples[m[m
DESCRIPTION.rst                     [34mextensive_testing[m[m
FAQ.rst                             makefile
LICENSE                             [34mrequirements[m[m
MANIFEST.in                         setup.py
README.md                           spelling-allowlist.txt
VERSION                             [34mtests[m[m
[34mbenchmarks[m[m                          tox.ini
[34mconda-recipe[m[m                        [34mvenv[m[m


In [6]:
from deepchecks.recommender import InteractionDataset, UserDataset,ItemDataset

# Interaction Datasets
#################################################################
train_interaction_ds = InteractionDataset(df=X_train_interactions,                    
                features=['rating'],
                datetime_name='timestamp',
                user_index_name='userId',
                item_index_name='movieId')

valid_interaction_ds = InteractionDataset(df=X_test_interactions,                    
                features=['rating'],
                datetime_name='timestamp',
                user_index_name='userId',
                item_index_name='movieId')

# User Datasets
#################################################################
train_user_ds = UserDataset(df = train_users_df,
                label = None,                    
                features=['mean_rating', 'session_length'],
                cat_features=None)

valid_user_ds = UserDataset(df = valid_users_df,
                label = "target",                    
                features=['mean_rating', 'session_length'],
                cat_features=None)
            
# Item Dataset
#################################################################
item_ds = ItemDataset(df=movie_df,
                      item_column_name='title',
                      features=['title','genres'],
                      cat_features=['title','genres'])


PackageNotFoundError: deepchecks

Create a Recommender Model Class
================================

In [None]:
class CoOccurrenceRecommender:
    def __init__(self, col, item_col, num_predictions=20):
        self.col = col
        self.item_col = item_col
        self.num_predictions = num_predictions
        self.co_occurences = defaultdict(Counter)
        
    def fit(self, X_train):
        # Make a copy of the training data
        filtered_interactions = X_train.copy()
        
        # Create a new column with the previous item in each user session
        prev_item_col = f'prev_{self.item_col}'
        filtered_interactions[prev_item_col] = filtered_interactions.groupby(self.col)[self.item_col].shift(1).astype("Int64").dropna()
        
        # Create a DataFrame with columns 'previous item' and 'item'
        products_association_df = filtered_interactions[[prev_item_col, self.item_col]].copy().dropna()
        
        # Generate associations between 'previous item' and 'item'
        for row in products_association_df.itertuples(index=False):
            self.co_occurences[row[0]][row[1]] += 1

    def predict(self, X_valid):
        # Generate predictions for the validation set
        labels = []
        X_test_session_items = X_valid.groupby(self.col)[self.item_col].apply(list)

        for items in X_test_session_items:
            items = list(dict.fromkeys(items[::-1]))

            counter = Counter()

            for item in items:
                subsequent_item_counter = self.co_occurences.get(item)
                if subsequent_item_counter:
                    counter += subsequent_item_counter
            
            # Get the top N recommended items based on the associations
            recommendations = [item for item, cnt in counter.most_common(self.num_predictions) if item not in items]
            labels.append(recommendations)
        
        return labels

In [None]:
%time

# Assuming you have X_train and X_valid DataFrames with columns 'userId' and 'movieId'
# Create an instance of the AssociationRecommender class
recommender = CoOccurrenceRecommender(col='userId',
                                      item_col='movieId',
                                      num_predictions=20)

# Fit the model on the training data
recommender.fit(X_train_interactions)

# Generate predictions for the validation data
predictions = recommender.predict(X_test_interactions)


In [None]:
from deepchecks.recommender.ranking import mean_average_recall_at_k,mean_average_precision_at_k

mean_average_recall_at_k(y_test.values.tolist(),
                             predictions,
                             k=20)

In [None]:
%time

from deepchecks.recommender.checks import SamplePerformance

check = SamplePerformance(scorers=['mean_average_precision_at_k',
                                   'mean_average_recall_at_k',
                                   'mean_reciprocal_rank'])

result = check.run(valid_user_ds,
                   y_pred=predictions)

In [None]:
result.show()

### DateTrainTestLeakageOverlap

In [None]:
from deepchecks.recommender.checks import DateTrainTestLeakageOverlap

check = DateTrainTestLeakageOverlap(validation_per_user=False)

result = check.run(train_dataset=train_interaction_ds,
                   test_dataset=valid_interaction_ds)
result.show()

### Cold Start Detection

In [None]:
from deepchecks.recommender.checks  import ColdStartDetection

all_interaction_ds = train_interaction_ds + valid_interaction_ds
check = ColdStartDetection()
result = check.run(all_interaction_ds)
result.show()

Product Association
=====================

In [None]:
from deepchecks.recommender.checks import ProductAssociation

check = ProductAssociation(max_timestamp_delta=3600)
result = check.run(all_interaction_ds,
                   item_dataset=item_ds
                   )
result.show()

User Session Length Drift
===================================

Also in the \"Didn\'t Pass\" tab we can see the two segment performance
checks - Property Segment Performance and Metadata Segment Performance.
These use the metadata columns  of user related information OR our
calculated properties to try and **automatically** detect significant data
segments on which our model performs badly.




In [None]:
from deepchecks.recommender.checks import UserSessionDrift

check = UserSessionDrift()

result = check.run(train_dataset = train_interaction_ds,
                   test_dataset = valid_interaction_ds)
result.show()



Prediction Popularity Drift
================================

In [None]:
from deepchecks.recommender.checks import PredictionPopularityDrift

check = PredictionPopularityDrift()
result = check.run(valid_user_ds,
                   y_pred=predictions,
                   interaction_dataset=train_interaction_ds)
result


Label Popularity Drift
================================

In [None]:
from deepchecks.recommender.checks import LabelPopularityDrift

check = LabelPopularityDrift()
result = check.run(valid_user_ds,
                   interaction_dataset=train_interaction_ds+valid_interaction_ds)
result

### Segment Performance

In [None]:
from deepchecks.recommender.checks import SegmentPerformance
import traceback
try:
    result = SegmentPerformance(feature_1='session_length',
                       feature_2='mean_rating',
                       alternative_scorer={'recall':'mean_average_precision_at_k'},
                       max_segments=3
                       ).run(valid_user_ds, y_pred=predictions)
except:
    traceback.print_exc()

In [None]:
result.show()

What's left
===============
- **merging the PR.**

- **finalizing the quickstart**
    - description (markdown).
    - Initial simplified description for the heuristic model used.
    - replace SegmentPerformance by WeakSegmentPerformance because the first is deprecated.
    - push it without the display.
- **make pylint (specific to deepchecks)**

- **Suites & Conditions**

- **adding a Reranker to the quickstart**
    - lightgbm classifier.
    - use of classic checks of classifier.
- **try all others scorers to solve potential issues**