## Demonstration of the User-Based Collaborative Recommender

This system leverages collaborative filtering by analyzing user interactions, such as scroll length and read time, to identify users with similar behavior. 
Therefore, it focuses on the user-item relation.

It recommends articles that these similar users have engaged with, aiming to provide personalized suggestions. The model's performance is evaluated using MAP@K and NDCG@K metrics.



In [1]:
import sys
import os

parent_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(parent_dir)

import polars as pl
import numpy as np

from parquet_data_reader import ParquetDataReader
from utils.data_preprocessing import DataProcesser
from models.hybrid.most_popular_user_CF import MostPopularCollaborativeRecommender

pl.Config.set_tbl_cols(-1)

polars.config.Config

## Data Import and EDA

In [2]:
dataProcesser = DataProcesser()
behaviors_df = dataProcesser.collaborative_filtering_preprocess()
train_df, test_df = dataProcesser.random_split(behaviors_df, test_ratio=0.3)
print(train_df.head())

shape: (5, 4)
┌────────────┬─────────┬────────────────┬────────────┐
│ article_id ┆ user_id ┆ total_readtime ┆ max_scroll │
│ ---        ┆ ---     ┆ ---            ┆ ---        │
│ i32        ┆ u32     ┆ f32            ┆ f32        │
╞════════════╪═════════╪════════════════╪════════════╡
│ 9781262    ┆ 787192  ┆ 31.0           ┆ 100.0      │
│ 9772355    ┆ 73373   ┆ 73.0           ┆ 100.0      │
│ 9778168    ┆ 533329  ┆ 126.0          ┆ 100.0      │
│ 9784702    ┆ 207983  ┆ 57.0           ┆ 100.0      │
│ 9772543    ┆ 544609  ┆ 33.0           ┆ 100.0      │
└────────────┴─────────┴────────────────┴────────────┘


## Model Fit

This first model uses readtime and read percentage interactions to compare the user interactions 

In [3]:
recommender = MostPopularCollaborativeRecommender(train_df)
recommender.fit()

{787192: [(2270736, np.float64(0.9995240233285215)),
  (1309274, np.float64(0.7341892117091785)),
  (2434131, np.float64(0.5666302768389282)),
  (1567686, np.float64(0.4889148083921879)),
  (1817209, np.float64(0.2548501952552784)),
  (1307206, np.float64(0.22850211739258253)),
  (1926368, np.float64(0.15616458883442186)),
  (1196555, np.float64(0.04308529138342487)),
  (501716, np.float64(0.039007180613341186)),
  (1766749, np.float64(0.03157404577353706))],
 73373: [(1358976, np.float64(0.7728046104460357)),
  (1486263, np.float64(0.7728046104460355)),
  (576307, np.float64(0.7728046104460355)),
  (864221, np.float64(0.7728046104460355)),
  (621197, np.float64(0.7292706210383284)),
  (827851, np.float64(0.72696684113746)),
  (1074758, np.float64(0.6638860224981276)),
  (1251347, np.float64(0.6298012321342792)),
  (2032861, np.float64(0.54645538057862)),
  (476654, np.float64(0.531676846953809))],
 533329: [(395122, np.float64(0.9935462865196026)),
  (1692114, np.float64(0.99354628651

This first model just compares all artilces read by users when comparing users

In [4]:
binary_recommender = MostPopularCollaborativeRecommender(train_df, binary_model=True)
binary_recommender.fit()

{787192: [(269545, np.float64(0.33333333333333337)),
  (1556630, np.float64(0.33333333333333337)),
  (1309274, np.float64(0.33333333333333337)),
  (1977478, np.float64(0.28867513459481287)),
  (2533123, np.float64(0.28867513459481287)),
  (1899524, np.float64(0.28867513459481287)),
  (2127509, np.float64(0.28867513459481287)),
  (1869248, np.float64(0.28867513459481287)),
  (1992275, np.float64(0.28867513459481287)),
  (2387021, np.float64(0.28867513459481287))],
 73373: [(494828, np.float64(0.5773502691896258)),
  (864221, np.float64(0.5773502691896258)),
  (1358976, np.float64(0.5773502691896258)),
  (1243623, np.float64(0.5773502691896258)),
  (1279764, np.float64(0.5773502691896258)),
  (44468, np.float64(0.5773502691896258)),
  (2499070, np.float64(0.5773502691896258)),
  (2422862, np.float64(0.5773502691896258)),
  (2461297, np.float64(0.5773502691896258)),
  (576307, np.float64(0.5773502691896258))],
 533329: [(1067149, np.float64(0.7071067811865475)),
  (412935, np.float64(0.5)

Of the original 15143 users, only 9194 can be accounted for with the current solution. This should be changed in the future

## Model Presentation

### Article Recommendation

In [5]:
for user in [630220, 620796, 1067393, 1726258, 17205]:
    print("reccomended for user ", user, ": ", recommender.recommend_n_articles(user_id=user, n=5, allow_read_articles=True))

reccomended for user  630220 :  [9778351, 9773356, 9771554, 9776497, 9776147]
reccomended for user  620796 :  [9770450, 9783865, 9785049, 9783137, 9765753]
reccomended for user  1067393 :  [9776234, 9773282, 9790548, 9771919, 9780195]
reccomended for user  1726258 :  [9771224, 9771996, 9772923, 9780428, 9778219]
reccomended for user  17205 :  [9780325, 9765941, 9788125, 9788760, 9775776]


In [6]:
for user in [630220, 620796, 1067393, 1726258, 17205]:
    print("reccomended for user ", user, ": ", binary_recommender.recommend_n_articles(user_id=user, n=5, allow_read_articles=True))

reccomended for user  630220 :  [9783334, 9782722, 9788362, 9774142, 9783509]
reccomended for user  620796 :  [9770082, 9779737, 9774392, 9788760, 9772485]
reccomended for user  1067393 :  [9776234, 9773282, 9790548, 9771919, 9780195]
reccomended for user  1726258 :  [9771224, 9786209, 9773846, 9774789, 9781998]
reccomended for user  17205 :  [9775562, 9771846, 9780325, 9771796, 9769504]


### Evaluation Scores

#### Without the Ability to Recommend Read Articles

The complex model only reccomending articles the user has not yet read

In [7]:
results = recommender.evaluate_recommender(test_df, k=100, n_jobs=4, user_sample=20, allow_read_articles=False)
results

{'MAP@K': np.float64(0.005454545454545455),
 'NDCG@K': np.float64(0.0732414308028071)}

The binary reccomender model only reccomending articles the user has not yet read

In [8]:
results = binary_recommender.evaluate_recommender(test_df, k=100, n_jobs=4, user_sample=20, allow_read_articles=False)
results

{'MAP@K': np.float64(0.004615384615384616),
 'NDCG@K': np.float64(0.06265566652392031)}

#### With the Ability to Recommend Previously Read Articles

The complex model reccomending articles the user, even if they have read them before

In [9]:
results = recommender.evaluate_recommender(test_df, k=100, n_jobs=4, user_sample=20, allow_read_articles=True)
results

{'MAP@K': np.float64(0.005384615384615385),
 'NDCG@K': np.float64(0.07120588911148355)}

The binary reccomender model reccomending articles the user, even if they have read them before

In [10]:
results = binary_recommender.evaluate_recommender(test_df, k=100, n_jobs=4, user_sample=20, allow_read_articles=True)
results

{'MAP@K': np.float64(0.00923076923076923),
 'NDCG@K': np.float64(0.04947630601400477)}

## Model Experimentation

In [11]:
test_user_id = 630220

predictions = recommender.recommend_n_articles(user_id=test_user_id, n=1000, allow_read_articles=True)
results = set(test_df.filter(pl.col("user_id") == test_user_id)["article_id"])

print(results)
print(predictions)

for prediction in predictions:
    if prediction in results:
        print("Yes")

{9773700, 9781902, 9784591, 9778448, 9774352, 9773711, 9774864, 9776916, 9779860, 9776406, 9788705, 9774120, 9756075, 9782315, 9787441, 9770045, 9778500, 9778628, 9782726, 9538375, 9776071, 9787465, 9780348, 9771473, 9776855, 9776862, 9786718, 9786719, 9780193, 9738729, 9775985, 9771127, 9779577, 9775484}
[9778351, 9773356, 9771554, 9776497, 9776147, 9781878, 9769557, 9782869, 9771168, 9783657, 9774229, 9774430, 9780476, 9770551, 9786378, 9775785, 9771330, 9779269, 9780284, 9773744, 9772343, 9771253, 9780677, 9780384, 9789647, 9775207, 9781785, 9779650, 9776560, 9790987, 9775562, 9784863, 9779737, 9778915, 9769624, 9776985, 9780406, 9771919, 9775881, 9772227, 9772475, 9771916, 9789997, 9779045, 9781991, 9772862, 9771350, 9783057, 9784879, 9784575, 9783655, 9779133, 9777319, 9773210, 9774363, 9780561, 9773078, 9777393, 9779294, 9771187, 9777299, 9789517, 9772442, 9774187, 9781389, 9760091, 9781423, 9786351, 9776234, 9773282, 9790548, 9780195, 9781476, 9775776, 9785475, 9781086, 9787465,

In [12]:
test_user_id = 630220

predictions = recommender.recommend_n_articles(user_id=test_user_id, n=1000, allow_read_articles=True)
results = set(test_df.filter(pl.col("user_id") == test_user_id)["article_id"])

print(results)
print(predictions)

for prediction in predictions:
    if prediction in results:
        print("Yes")

{9773700, 9781902, 9784591, 9778448, 9774352, 9773711, 9774864, 9776916, 9779860, 9776406, 9788705, 9774120, 9756075, 9782315, 9787441, 9770045, 9778500, 9778628, 9782726, 9538375, 9776071, 9787465, 9780348, 9771473, 9776855, 9776862, 9786718, 9786719, 9780193, 9738729, 9775985, 9771127, 9779577, 9775484}
[9778351, 9773356, 9771554, 9776497, 9776147, 9781878, 9769557, 9782869, 9771168, 9783657, 9774229, 9774430, 9780476, 9770551, 9786378, 9771330, 9779269, 9775785, 9780284, 9773744, 9772343, 9771253, 9780677, 9780384, 9789647, 9775207, 9781785, 9779650, 9776560, 9790987, 9775562, 9779737, 9784863, 9778915, 9776985, 9769624, 9780406, 9771919, 9775881, 9772227, 9772475, 9771916, 9789997, 9779045, 9781991, 9772862, 9771350, 9783057, 9784879, 9784575, 9779133, 9783655, 9777319, 9773210, 9780561, 9774363, 9773078, 9777299, 9789517, 9779294, 9771187, 9777393, 9772442, 9774187, 9781389, 9760091, 9781423, 9786351, 9776234, 9773282, 9790548, 9780195, 9781476, 9775776, 9785475, 9781086, 9787465,

In [13]:
from utils.evaluation import perform_model_evaluation

matrics = perform_model_evaluation(recommender, test_df, k=5)
matrics

{'precision@k': np.float64(0.0067671850884482975),
 'recall@k': np.float64(0.011043252720062312),
 'fpr@k': np.float64(0.002231791280228843)}

### Carbon Footprint
This section creates an emissions.csv file in the "output"-folder
It utilizes the code_carbon (`codecarbon EmissionsTracker`) to record the carbon footprint of the `fit` and the `recommend` methods of the model.

In [14]:
from utils.evaluation import track_model_energy

print("\nCarbon footprint of the recommender:")
footprint = track_model_energy(recommender, "user_based", user_id=test_user_id, n=5)
footprint

[codecarbon ERROR @ 16:07:11] Error: Another instance of codecarbon is probably running as we find `C:\Users\chris\AppData\Local\Temp\.codecarbon.lock`. Turn off the other instance to be able to run this one or use `allow_multiple_runs` or delete the file. Exiting.



Carbon footprint of the recommender:


[codecarbon ERROR @ 17:00:48] Error: Another instance of codecarbon is probably running as we find `C:\Users\chris\AppData\Local\Temp\.codecarbon.lock`. Turn off the other instance to be able to run this one or use `allow_multiple_runs` or delete the file. Exiting.


{'fit': ({787192: [(2270736, np.float64(0.9995240233285215)),
    (1309274, np.float64(0.7341892117091785)),
    (2434131, np.float64(0.5666302768389282)),
    (1567686, np.float64(0.4889148083921879)),
    (1817209, np.float64(0.2548501952552784)),
    (1307206, np.float64(0.22850211739258253)),
    (1926368, np.float64(0.15616458883442186)),
    (1196555, np.float64(0.04308529138342487)),
    (501716, np.float64(0.039007180613341186)),
    (1766749, np.float64(0.03157404577353706))],
   73373: [(1358976, np.float64(0.7728046104460357)),
    (1486263, np.float64(0.7728046104460355)),
    (576307, np.float64(0.7728046104460355)),
    (864221, np.float64(0.7728046104460355)),
    (621197, np.float64(0.7292706210383284)),
    (827851, np.float64(0.72696684113746)),
    (1074758, np.float64(0.6638860224981276)),
    (1251347, np.float64(0.6298012321342792)),
    (2032861, np.float64(0.54645538057862)),
    (476654, np.float64(0.531676846953809))],
   533329: [(395122, np.float64(0.9935462