This notebook compares NDCG scores of the three models (random, top popular, and LightFM's pure collaborative filtering model) across different data preprocessing steps and metric calculations (specifically, the minimum interactions needed for a user be included in the dataset and the number of scores to consider for NDCG calculation). 

In [1]:
#imports
import sys
sys.path.insert(0,'../src/models') 
sys.path.insert(0,'../src/data')

from fbpreprocessing import create_fbcommenters, create_UI_interactions
from model_preprocessing import get_data 
from run_models import run_models

  "LightFM was compiled without OpenMP support. "


In [2]:
# paths
comments_path = '../data/raw/comments.csv'
fbcommenters_path = '../data/preprocessed/fbcommenters.csv'
user_item_interactions_path = '../data/preprocessed/user_item_interactions.csv'

The function `run` will do the data preprocessing steps needed for this analysis and run the results of the three models. It takes in parameters `d` and `k`:
- `d` is the minimum interactions needed for a user be included in the dataset. i.e. if d is 5, then any user with less than 5 interactions will not have their interactions show up in `user_item_interactions.csv`
- `k` is the number of scores to consider for NDCG (it takes the top k scores). k=None defaults to use all the scores. This comes from sklearn's [NDCG](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ndcg_score.html) documentation.

In [3]:
def run(d=None,k=None):
    create_fbcommenters(comments_path, fbcommenters_path, d)
    create_UI_interactions(comments_path, fbcommenters_path, user_item_interactions_path)
    data = get_data(user_item_interactions_path)
    run_models(data, k)

When keeping k as None, as d increases (from 5 to 20) the NDCG increases for all three recommendation models. This is intuitive since dropping users with little interactions will decrease the noise in the models. This increase is more prominent with top popular and LightFM versus the random recommender.


In [4]:
# droppping users with less than 5 interactions, all scores considered for NDCG
run(d=5)

Num users: 4026, num_items 580.

Running Random...
Average NDCG of Random: 0.1865467894085752

Running Top Popular...
Average NDCG of Top Popular: 0.26864000890074335

Running LightFM...
Average NDCG of LightFM: 0.2662631783853528


In [5]:
# droppping users with less than 10 interactions, all scores considered for NDCG
run(d=10)

Num users: 1813, num_items 579.

Running Random...
Average NDCG of Random: 0.23601372445714858

Running Top Popular...
Average NDCG of Top Popular: 0.3331195537758132

Running LightFM...
Average NDCG of LightFM: 0.3255172633564795


In [6]:
# droppping users with less than 15 interactions, all scores considered for NDCG
run(d=15)

Num users: 1079, num_items 579.

Running Random...
Average NDCG of Random: 0.26742703481502855

Running Top Popular...
Average NDCG of Top Popular: 0.3704792543567162

Running LightFM...
Average NDCG of LightFM: 0.3617290772757607


In [7]:
# droppping users with less than 20 interactions, all scores considered for NDCG
run(d=20)

Num users: 709, num_items 578.

Running Random...
Average NDCG of Random: 0.29290975190967555

Running Top Popular...
Average NDCG of Top Popular: 0.4001573719249373

Running LightFM...
Average NDCG of LightFM: 0.3956638201395667


We also compare different k values (10, 20, and 50) at a fixed d of 5. Compared the previous runs that have a high k value (~580), the NDCG scores of all models are much lower across all these k values (and more so with the random recommender). Higher k values correspond to a higher NDCG metric as well.

In [8]:
# droppping users with less than 5 interactions, top 10 scores considered for NDCG
run(d=5, k=10)

Num users: 4026, num_items 580.

Running Random...
Average NDCG of Random: 0.010693453441008716

Running Top Popular...
Average NDCG of Top Popular: 0.07292912878426112

Running LightFM...
Average NDCG of LightFM: 0.06746902416725901


In [9]:
# droppping users with less than 5 interactions, top 20 scores considered for NDCG
run(d=5, k=20)

Num users: 4026, num_items 580.

Running Random...
Average NDCG of Random: 0.01722516408386896

Running Top Popular...
Average NDCG of Top Popular: 0.09887370285575565

Running LightFM...
Average NDCG of LightFM: 0.0959112871832934


In [10]:
# droppping users with less than 5 interactions, top 50 scores considered for NDCG
run(d=5, k=50)

Num users: 4026, num_items 580.

Running Random...
Average NDCG of Random: 0.02840079036319522

Running Top Popular...
Average NDCG of Top Popular: 0.1478240491530547

Running LightFM...
Average NDCG of LightFM: 0.14178593064705813


Overall, across all values of d and k, the random recommender performs the worse (trivial). Top popular and LightFM's collaborative filtering model perform roughly the same, with top popular having slightly higher NDCG scores. 

Note: the user-item interactions data inferred by Fitness Blender commenters is preprocessed with d=5 before uploading to our database (which is used by our web application). The main reason of choosing d=5 is that we would like to keep all workout videos in our dataset to recommend to users (increasing d will make the number of workouts to be less than 580, which is the total number of workouts scraped)