Test Deepchecks for Recommender System
===============================================================================

Deepchecks Recommender  is your go-to tool for developing and evaluating recommender system models, ensuring their robustness before deployment. Our comprehensive testing package not only detects potential failures but also saves you valuable development time. In this quickstart guide, you'll learn how to utilize Deepchecks Recommender to analyze and evaluate various aspects of your recommender system, including data quality, leakage, product associations, cold start detection, and drift. Let's get started.

**Step 1: Data Preparation and Auto Analysis**
---------------------------------------------

To run Deepchecks Recommender, make sure you have the following data for both your training and testing sets:

1. User-Item Interaction Data: A structured dataset containing information about user-item interactions. Each record represents a user's interaction with an item, such as viewing, purchasing, or rating.

2. Product Information: Additional information about the items in your catalog, like product categories, descriptions, or features.

3. User Information (Optional): If available, user-specific data such as demographics, preferences, or historical behavior can enhance the evaluation.

4. Your labels : These are not needed for checks that don't require labels (such as the Cold Start Detection check or most data integrity checks), but are needed for many other checks.

5. Your model's predictions: These are needed only for the model related checks, shown in the Model Evaluation section of this guide.

Installation of Libraries
=========================
we will install two essential Python libraries, gensim and annoy, to empower our recommender system.

`gensim` is a popular library for topic modeling and document similarity analysis. It provides tools for training and using word embeddings, topic models, and other natural language processing techniques.

`annoy` is a library for approximate nearest neighbor search. It is often used in information retrieval and recommendation systems to efficiently find similar items in large datasets.

In [None]:
!pip3 install gensim==4.3.1 annoy==1.17.3

Setting Up
----------

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime
from collections import defaultdict, Counter

### Helper functions

In [None]:
# Create train and validation split.
def split_train_validation(interaction_df : pd.DataFrame,
                           session_col : str,
                           item_col : str,
                           timestamp_col : str, 
                           test_percentage=0.2,
                           random_seed=None):
    
    assert set([session_col, item_col, timestamp_col]).issubset(interaction_df.columns)
    np.random.seed(random_seed)
    
    interaction_df = interaction_df.sort_values(timestamp_col,ascending=True)
    split_index = int(len(interaction_df) * (1 - test_percentage))

    train = interaction_df.iloc[:split_index]
    test = interaction_df.iloc[split_index:]

    # Let's discard overlapping sessions to make train and valid sets disjoints and independent.
    overlapping_sessions = set(train[session_col]).intersection(set(test[session_col]))

    test = test[~test[session_col].isin(overlapping_sessions)]

    data_to_calculate_validation_score = []
    new_test = []
    for grp in test.groupby(session_col):
        cutoff = np.random.randint(1, grp[1].shape[0]) # we want at least a single item in our validation data for each userId
        new_test.append(grp[1].iloc[:cutoff])
        data_to_calculate_validation_score.append(grp[1].iloc[cutoff:])

    test = pd.concat(new_test).reset_index(drop=True)
    
    test_labels = pd.concat(data_to_calculate_validation_score).reset_index(drop=True)
    assert test[timestamp_col].max() < test_labels[timestamp_col].max()
    
    test_labels = test_labels.groupby(session_col)[item_col].apply(list)
    assert (test[session_col].unique() == test_labels.index.values).sum()
    
    return train,test,test_labels

# Create user features.
def user_features(X_interaction : pd.DataFrame):
    user_df = pd.DataFrame()
    user_df['userId'] = X_interaction['userId'].unique()
    user_df['mean_rating'] = X_interaction.groupby("userId")['rating'].mean().values
    user_df['median_rating'] = X_interaction.groupby("userId")['rating'].median().values
    user_df['std_rating'] = X_interaction.groupby("userId")['rating'].std().values
    user_df['session_length'] = X_interaction.groupby("userId")['rating'].count().values

    user_df['min_rating'] = X_interaction.groupby("userId")['rating'].min().values
    user_df['max_rating'] = X_interaction.groupby("userId")['rating'].max().values
    user_df['last_timestamp'] = X_interaction.groupby("userId")['timestamp'].tail(1).values
    user_df['last_timestamp'] = user_df['last_timestamp'].apply(lambda x: pd.to_datetime(x).timestamp()).astype(int)

    user_df['sum_rating'] = X_interaction.groupby("userId")['rating'].sum().values


    user_df['noise'] = np.random.normal(0,1,size=(user_df['session_length'].shape))

    return user_df

### Load Data


For the purpose of this guide, we\'ll use a small subset of the [movieLens](https://grouplens.org/datasets/movielens/) dataset:

In [None]:
!curl -O http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
!unzip -n ml-latest-small.zip

In [None]:
%time 

# Load interaction data.
df = pd.read_csv("ml-latest-small/ratings.csv")

# Load item data.
movie_df =  pd.read_csv("ml-latest-small/movies.csv")

# Split interaction data into train and validation data.
df['timestamp'] = df['timestamp'].apply(lambda x : datetime.fromtimestamp(x))
X_train_interactions, X_test_interactions, y_test = split_train_validation(interaction_df=df,
                                           session_col='userId',
                                           item_col ='movieId',
                                           timestamp_col='timestamp',
                                           test_percentage=0.6)

# Create User Dataframe
train_users_df = user_features(X_train_interactions)
valid_users_df = user_features(X_test_interactions)

# Add targets to the valid users
valid_users_df = pd.merge(valid_users_df,y_test.rename('target'),how="left",on="userId")


Create a Recommender Datasets
================================

We can now create a Dataset object for the train and test dataframes. This object is
used to pass your data to the deepchecks checks.

To create a Recommender Dataset, the only required argument is the data
itself, but passing only the data will prevent multiple checks from
running. In this example we\'ll define the task type
and finally define the
metadata columns (the other columns in the dataframe) which we\'ll use later on in the
guide.

In [None]:
from deepchecks.recommender import InteractionDataset, UserDataset,ItemDataset

# Interaction Datasets
#################################################################
train_interaction_ds = InteractionDataset(df=X_train_interactions,                    
                features=['rating'],
                datetime_name='timestamp',
                user_index_name='userId',
                item_index_name='movieId')

valid_interaction_ds = InteractionDataset(df=X_test_interactions,                    
                features=['rating'],
                datetime_name='timestamp',
                user_index_name='userId',
                item_index_name='movieId')

# User Datasets
#################################################################
train_user_ds = UserDataset(df = train_users_df,
                label = None,                    
                features=['mean_rating', 'session_length','median_rating','std_rating','noise','min_rating','max_rating','last_timestamp','sum_rating'],
                cat_features=None)

valid_user_ds = UserDataset(df = valid_users_df,
                label = "target",                    
                features=['mean_rating', 'session_length','median_rating','std_rating','noise','min_rating','max_rating','last_timestamp','sum_rating'],
                cat_features=None)
            
# Item Dataset
#################################################################
item_ds = ItemDataset(df=movie_df,
                      item_column_name='title',
                      features=['title','genres'],
                      cat_features=['title','genres'])


Create a Recommender Model Class
================================

We initially used a covisitation recommender algorithm to generate a diverse set of candidate items for recommendation. This approach leverages the co-occurrence patterns of items in user sessions to suggest a pool of potential items that might be of interest to users.


The co-occurrence recommender leverages the patterns of item co-occurrence in user sessions to generate relevant recommendations. It's based on the idea that if two items frequently appear together in user sessions, they might be related or have some inherent similarity that can be exploited for recommendations.


1. **Fitting**: Using training data, we track associations between previous and current items based on their co-occurrence in user sessions.

2. **Predicting**: For each user in validation data, we accumulate associated item counts from their session items.

3. **Recommendations**: We suggest the most common associated items, excluding those already in the session.

In [None]:
class CoOccurrenceRecommender:
    def __init__(self, user_col, item_col, num_predictions=20):
        self.user_col = user_col
        self.item_col = item_col
        self.num_predictions = num_predictions
        self.co_occurences = defaultdict(Counter)
        
    def fit(self, X_train):
        # Make a copy of the training data
        filtered_interactions = X_train.copy()
        
        # Create a new column with the previous item in each user session
        prev_item_col = f'prev_{self.item_col}'
        filtered_interactions[prev_item_col] = filtered_interactions.groupby(self.user_col)[self.item_col].shift(1).astype("Int64").dropna()
        
        # Create a DataFrame with columns 'previous item' and 'item'
        products_association_df = filtered_interactions[[prev_item_col, self.item_col]].copy().dropna()
        
        # Generate associations between 'previous item' and 'item'
        for row in products_association_df.itertuples(index=False):
            self.co_occurences[row[0]][row[1]] += 1

    def predict(self, X_valid):
        user_recommendations = {}

        X_test_session_items = X_valid.groupby(self.user_col)[self.item_col].apply(list)

        for user, items in X_test_session_items.items():
            items = list(dict.fromkeys(items[::-1]))

            counter = Counter()

            for item in items:
                subsequent_item_counter = self.co_occurences.get(item)
                if subsequent_item_counter:
                    counter += subsequent_item_counter
            
            # Get the top N recommended items based on the associations
            recommendations = [item for item, cnt in counter.most_common(self.num_predictions) if item not in items]
            user_recommendations[user] = recommendations
        
        recommendations_series = pd.Series(user_recommendations)
        return recommendations_series

In summary, this method uses item co-occurrence patterns to provide recommendations by suggesting items that often appear together in user sessions.

In [None]:
%time

recommender = CoOccurrenceRecommender(user_col='userId',
                                      item_col='movieId',
                                      num_predictions=50)

# Fit the model on the training data
recommender.fit(X_train_interactions)

# Generate predictions for the validation data
predictions = recommender.predict(X_test_interactions)


SamplePerformance
===================
This evaluation check is tailored for assessing the performance of a recommender system on a labeled dataset using various metrics.

In the context of recommender systems, we employ specific metrics to evaluate the model's effectiveness. These metrics gauge the quality of recommendations provided by the system. Just as scorers are a convention in sklearn for model evaluation, these metrics are standard in the realm of recommender systems.

The default metrics we employ for recommender systems encompass Mean Average Recall (MAR), Precision (MAP), F1-Score (MA F1), Normalized Discounted Cumulative Gain (NDCG), mean reciprocal rank (MRR). Each of these metrics serves to quantify different aspects of recommendation quality.

In [None]:
from deepchecks.recommender.checks import SamplePerformance

perf_check = SamplePerformance(scorers=[
                                   'mean_average_recall_at_k',
                                   'mean_average_precision_at_k',
                                   'mean_average_f1_at_k',
                                   'mean_average_ndcg_k',
                                   'mean_reciprocal_rank'
                                   ],
                                   k = 20
                                )
perf_check.add_condition_greater_than(threshold=0.1, class_mode='all')
result = perf_check.run(valid_user_ds,
                   y_pred=predictions.values.tolist())
result.show()

DateTrainTestLeakageOverlap
============================

Data leakage and user independence between the train and test datasets is crucial in ensuring **the reliability of recommender systems**. Here's a concise version that retains the key points:

Data leakage and user independence is critical in recommender systems. These safeguards prevent inadvertent information transfer between training and testing data and verify that users in the training set are distinct from those in the test set. By doing so, the system maintains its:
- integrity.
- accuracy.
- unbiased performance evaluation.

In [None]:
from deepchecks.recommender.checks import DateTrainTestLeakageOverlap

check = DateTrainTestLeakageOverlap(validation_per_user=False)

result = check.run(train_dataset=train_interaction_ds,
                   test_dataset=valid_interaction_ds)
result.show()

Cold Start Detection
=====================

Detecting cold start users is vital in recommender systems due to their scarce historical data. This hampers personalization, user engagement, and system performance, leading to sparse data challenges. By addressing cold start users, systems can enhance personalization, user onboarding, overall performance, and business outcomes, ensuring effective recommendations and user satisfaction.

In [None]:
from deepchecks.recommender.checks  import ColdStartDetection

all_interaction_ds = train_interaction_ds + valid_interaction_ds
check = ColdStartDetection()
check.add_condition_greater_than(min_cold_start_entity=0.1)

result = check.run(all_interaction_ds)
result.show()

Product Association
=====================

Product association is a fundamental concept within recommender systems, focusing on understanding the connections and patterns of products that tend to be chosen together by consumers. This notion holds significant importance in optimizing the performance of machine learning models.

Product association dynamics come into play when shifts in consumer behavior lead to changes in the correlations between items frequently purchased or recommended in tandem. Recognizing and analyzing product associations is particularly advantageous when there is a lack of available labels for the test dataset

*Let say the probability of buying product X (e.g., ketchup) in a supermarket is 10%,
and product Y (e.g., ground beef) is 20%, then if we assume independence, the probability
of both happening together would be P(X) * P(Y) = 2%.
However, if the probability of both products occurring together is 8%,resulting in a lift of 4,
it means that those products are four times more likely to be purchased together
than if they had no relationship to each other.*

In [None]:
from deepchecks.recommender.checks import ProductAssociation

check = ProductAssociation(max_timestamp_delta=3600)
result = check.run(all_interaction_ds,
                   item_dataset=item_ds
                   )
result.show()

Drift Checks
==============

Popularity drift is a phenomenon within recommender systems where the relative popularity of items changes over time. This dynamic shift in item popularity significantly impacts the performance of machine learning models.

Popularity drift occurs when the distribution of item preferences or choices evolves, leading to changes in the popularity ranking of items. Recognizing and quantifying popularity drift is particularly valuable when there is a lack of available labels for the test dataset. In such cases, observing a drift in the predicted popularities serves as the primary signal that shifts have transpired in the underlying data affecting the model's predictions.

User Session Length Drift
===================================




In [None]:
from deepchecks.recommender.checks import UserSessionDrift
# Explain user session drift
check = UserSessionDrift()
check.add_condition_drift_score_less_than()
result = check.run(train_dataset = train_interaction_ds,
                   test_dataset = valid_interaction_ds)
result.show()



Prediction Popularity Drift
================================

In [None]:
from deepchecks.recommender.checks import PredictedItemsPopularityDrift

check = PredictedItemsPopularityDrift()
check.add_condition_drift_score_less_than()
result = check.run(valid_user_ds,
                   y_pred=predictions.values.tolist(),
                   interaction_dataset=train_interaction_ds)
result.show()


Label Popularity Drift
================================

In [None]:
from deepchecks.recommender.checks import LabelPopularityDrift

check = LabelPopularityDrift()
check.add_condition_drift_score_less_than()

result = check.run(valid_user_ds,
                   interaction_dataset=train_interaction_ds+valid_interaction_ds)
result

Segment Performance
-------------------

In [None]:
from deepchecks.recommender.checks import SegmentPerformance
import traceback
try:
    result = SegmentPerformance(feature_1='session_length',
                       feature_2='mean_rating',
                       alternative_scorer={'recall':'mean_average_precision_at_k'},
                       max_segments=3
                       ).run(valid_user_ds, y_pred=predictions.values.tolist())
except:
    traceback.print_exc()
result.show()

WeakSegmentPerformance
----------------------


The check is designed to help you easily identify the model’s weakest segments in the data provided. In addition, it enables to provide a sublist of the Dataset’s features, thus limiting the check to search in interesting subspaces.

In [None]:
from deepchecks.recommender.checks import WeakSegmentsPerformance

check = WeakSegmentsPerformance(columns=['mean_rating',
                                        'session_length',
                                        'median_rating',
                                        'std_rating',
                                        'noise',
                                        'min_rating',
                                        'max_rating',
                                        'last_timestamp',
                                        'sum_rating'],
                                alternative_scorer={'mean_average_recall_at_k': 'mean_average_recall_at_k'},
                                segment_minimum_size_ratio=0.1,
                                categorical_aggregation_threshold=0.5)
check.add_condition_segments_relative_performance_greater_than(0.1)

result = check.run(valid_user_ds, y_pred=predictions.values.tolist())
result.show()

## Generate more candidates : a Word2Vec Approach

The integration of the co-occurrence recommender was just one facet of our strategy. In tandem, let's incorporate the Word2Vec model, which delves deeper into the semantic relationships between items. By mapping items into a multi-dimensional vector space, the Word2Vec model identifies underlying similarities between items, even when co-occurrence patterns are not explicit.



To fit the Word2Vec model:

1. **Data Preparation**: We combine training and validation data (without labels), grouping user interactions by the user column.

2. **Creating Sentences**: User interactions become "sentences" for the Word2Vec model.

3. **Training Word2Vec**: The model is trained using collected sentences.

4. **Index Mapping**: We map item IDs to their Word2Vec indices.

5. **Building K-NN Graph**: here we use ``annoy`` to build the knn-graph, which is suitable for large dataset, avoiding memory issues.

6. **Adding Items to Graph**: For each item, we add its index and vector to the k-nearest neighbor graph.

7. **Building K-NN Structure**: The graph is built for k-nearest neighbor queries.

8. **Prediction**:  the recommendations will be the k-nearest item of the last item each user interacted with, using the knn-graph based on items embeddings.

In [None]:
from gensim.models import Word2Vec
from annoy import AnnoyIndex

class Word2VecRecommender:
    def __init__(self, user_col, item_col, vector_size=8, num_recommendations=20):
        self.user_col = user_col
        self.item_col = item_col
        self.vector_size = vector_size
        self.num_recommendations = num_recommendations
        self.w2vec = None
        self.knn_graph = None
        self.item2idx = None
    
    def fit(self, X_train, X_valid):
        sentences_df = pd.concat([X_train, X_valid]).groupby(self.user_col)[self.item_col].apply(list).reset_index()
        sentences_df.rename(columns={self.item_col: 'sentence'}, inplace=True)

        sentences = sentences_df['sentence'].to_list()

        self.w2vec = Word2Vec(sentences=sentences, vector_size=self.vector_size, min_count=1)

        self.item2idx = {aid: i for i, aid in enumerate(self.w2vec.wv.index_to_key)}

        self.knn_graph = AnnoyIndex(self.vector_size, 'angular')

        for aid, idx in self.item2idx.items():
            self.knn_graph.add_item(idx, self.w2vec.wv.vectors[idx])

        self.knn_graph.build(30)
    
    def predict(self, X_valid):
        user_recommendations = {}

        X_test_session_items = X_valid.groupby(self.user_col)[self.item_col].apply(list)

        for user, items in X_test_session_items.items():
            items = list(dict.fromkeys(items[::-1]))

            most_recent_aid = items[0]

            nns = [self.w2vec.wv.index_to_key[i] for i in 
                   self.knn_graph.get_nns_by_item(self.item2idx[most_recent_aid], self.num_recommendations + 1)[1:]]

            recommendations = [item for item in nns if item not in items]
            user_recommendations[user] = recommendations

        word2vec_recommendations = pd.Series(user_recommendations)
        return word2vec_recommendations

In [None]:
%time 

word2vec_recommender = Word2VecRecommender(user_col='userId',
                                           item_col='movieId',
                                           vector_size=8,
                                           num_recommendations=50)
word2vec_recommender.fit(X_train_interactions, X_test_interactions)
word2vec_predictions = word2vec_recommender.predict(X_test_interactions)

In [None]:
# Word2Vec Performance
result = perf_check.run(valid_user_ds,
                   y_pred=word2vec_predictions.values.tolist())

result.show()

Combine predictions/candidates
------------------------------

In [None]:
all_predictions = pd.concat([predictions.rename("co-occurence_pred"),word2vec_predictions.rename("word2vec_pred")],axis=1)
all_predictions['all_preds'] = all_predictions['co-occurence_pred'] + all_predictions['word2vec_pred']
all_predictions['all_preds'] = all_predictions['all_preds'].apply(lambda x : list(set(x)))

## Reranking : a LightGBMRanker Approach

After generating a large set of candidate items, let's use the LGBMRanker algorithm for reranking. 

LGBMRanker is a gradient boosting algorithm designed for ranking tasks, and it's well-suited for reranking a list of items based on their predicted relevance to users.

- The reranking process involves considering multiple features or signals associated with items and users to estimate the relevance of items for individual users. The algorithm takes into account various factors such as item popularity, user behavior, and more.

- By reranking the candidates using LGBMRanker, we will reorder the candidate items for each user in a way that aims to improve the overall relevance of the recommendations.

In [None]:
# Preprocess item features
item_popularity = pd.concat([X_train_interactions,X_test_interactions],axis=0)['movieId'].value_counts().to_dict()

def preprocess_item_features(movie_df, item_popularity):
    movie_df[['genre_1', 'genre_2', 'genre_3']] = movie_df['genres'].str.split('|', expand=True).iloc[:, :3]
    popularity_df = pd.DataFrame(list(item_popularity.items()), columns=['movieId', 'popularity'])
    item_features = movie_df.merge(popularity_df, on='movieId').drop(['title', 'genres'], axis=1)
    return item_features

item_features = preprocess_item_features(movie_df, item_popularity)

# Preprocess user features
def preprocess_user_features(valid_users_df):
    user_features = valid_users_df.copy().drop('target', axis=1)
    return user_features

user_features = preprocess_user_features(valid_users_df)

In [None]:
# Explode candidates and true labels
def explode_candidates_true_labels(predictions, y_test, item_features, user_features):
    candidates = predictions.reset_index()
    candidates.columns = ['userId', 'item']
    candidates = candidates.explode('item')

    df = candidates.merge(item_features, left_on='item', right_on='movieId', right_index=True, how='left').fillna(-1)
    df = df.merge(user_features, on='userId', how='left').fillna(-1)

    true_labels = y_test.reset_index()
    true_labels.columns = ['userId', 'item']
    true_labels = true_labels.explode('item')
    true_labels['gt'] = 1

    df_ = pd.merge(df, true_labels, on=['userId', 'item'], how='left').fillna(0)
    df_['gt'] = df_['gt'].astype(int)
    
    object_columns = df_.select_dtypes(include=['object']).columns
    df_[object_columns] = df_[object_columns].astype('category')

    return df_

df_ = explode_candidates_true_labels(all_predictions['all_preds'], y_test, item_features, user_features)

In [None]:
from lightgbm.sklearn import LGBMRanker

# Splitting the data into train and validation
def split_data(df_):
    unique_users = df_['userId'].unique()
    np.random.shuffle(unique_users)
    train_size = int(len(unique_users) * 0.7)
    train_users = unique_users[:train_size]
    valid_users = unique_users[train_size:]
    train_df = df_[df_['userId'].isin(train_users)]
    valid_df = df_[df_['userId'].isin(valid_users)]

    d_train = train_df.groupby("userId").size().values.tolist()
    d_valid = valid_df.groupby("userId").size().values.tolist()

    return train_df, valid_df, d_train, d_valid

train_df, valid_df, d_train, d_valid = split_data(df_)


# Training and evaluation
def train_evaluate_lgbm_ranker(train_df, valid_df, features_col, target_col, d_train, d_valid):
    ranker = LGBMRanker(
        objective="lambdarank",
        metric="ndcg",
        boosting_type="gbdt",
        n_estimators=100,
        max_depth=20,
    )
    n_eval = int(df_.groupby("userId").size().mean())
    ranker.fit(
        train_df[features_col],
        train_df[target_col],
        group=d_train,
        eval_set=[(valid_df[features_col], valid_df[target_col])],
        eval_group=[d_valid],
        eval_metric="ndcg",
        eval_at=[int(n_eval/3), int(n_eval/2),n_eval]
    )

    return ranker, ranker.best_score_['valid_0']

features_col = ['genre_1', 'genre_2', 'genre_3',
       'popularity', 'mean_rating', 'median_rating', 'std_rating',
       'session_length', 'min_rating', 'max_rating', 'last_timestamp',
       'sum_rating', 'noise']
target_col = ['gt']

ranker, best_ndcg_score = train_evaluate_lgbm_ranker(train_df, valid_df, features_col, target_col, d_train, d_valid)
print(f"Best NDCG Score: {best_ndcg_score}")


In [None]:
# Generating predictions
def generate_predictions(ranker, candidates, features_col):
    scores = ranker.predict(candidates[features_col])

    candidates['score'] = scores
    predictions_lgbm = (
        candidates.sort_values(by=['userId', 'score'], ascending=[True, False])
        .groupby('userId')
        .apply(lambda group: group['item'].head(20).tolist())
    )
    return predictions_lgbm

predictions_lgbm = generate_predictions(ranker, df_, features_col)

In [None]:
y_true_valid2 = pd.merge(predictions_lgbm.rename("lgbm_pred"),y_test.rename("true_labels"),how="inner",on="userId")['true_labels']

In [None]:
from deepchecks.recommender.ranking import mean_average_recall_at_k, mean_average_precision_at_k, mean_average_f1_at_k

print("mean average recall",mean_average_recall_at_k(y_true_valid2.values.tolist(),predictions_lgbm.values.tolist()))
print("mean average precision",mean_average_precision_at_k(y_true_valid2.values.tolist(),predictions_lgbm.values.tolist()))
print("mean average f1",mean_average_f1_at_k(y_true_valid2.values.tolist(),predictions_lgbm.values.tolist()))

As evident from the results, using a reranker algorithm like LGBMRanker enhances the performance of our recommender system.


In summary, we've combined the strengths of both the covisitation recommender, which captures item co-occurrence patterns, and the Word2Vec model, which captures item semantics, to generate an extensive list of candidate items. Then, by using the LGBMRanker algorithm to rerank these candidates, we've achieved better recommendations by considering multiple factors related to item-user interactions and relevance. This approach reflects a well-rounded and effective strategy for improving the recommendation quality of our system.