## Demonstration of the Item-Based Collaborative Recommender

This system leverages collaborative filtering by analyzing how user-item interactions bridges items
Therefore, it focuses on the user-item relation.

It recommends articles that these similar users have engaged with, aiming to provide personalized suggestions. The model's performance is evaluated using MAP@K and NDCG@K metrics.

In [1]:
import sys
import os

parent_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(parent_dir)

import polars as pl
import numpy as np

from parquet_data_reader import ParquetDataReader
from utils.data_preprocessing import DataProcesser
from models.item_based_CF import ItemBasedCollaborativeRecommender

pl.Config.set_tbl_cols(-1)

polars.config.Config

## Data import & Preprocessing

In [2]:
dataProcesser = DataProcesser()
behaviors_df = dataProcesser.collaborative_filtering_preprocess()
train_df, test_df = dataProcesser.random_split(behaviors_df, test_ratio=0.2)
print(train_df.head())

shape: (5, 4)
┌────────────┬─────────┬────────────────┬────────────┐
│ article_id ┆ user_id ┆ total_readtime ┆ max_scroll │
│ ---        ┆ ---     ┆ ---            ┆ ---        │
│ i32        ┆ u32     ┆ f32            ┆ f32        │
╞════════════╪═════════╪════════════════╪════════════╡
│ 9770082    ┆ 2367662 ┆ 161.0          ┆ 100.0      │
│ 9772517    ┆ 2077871 ┆ 363.0          ┆ 100.0      │
│ 9776238    ┆ 2187849 ┆ 26.0           ┆ 100.0      │
│ 9771330    ┆ 1761009 ┆ 96.0           ┆ 100.0      │
│ 9772088    ┆ 151570  ┆ 45.0           ┆ 100.0      │
└────────────┴─────────┴────────────────┴────────────┘


## Model Fit

This first model uses readtime and read percentage interactions to compare the user interactions 

In [3]:
recommender = ItemBasedCollaborativeRecommender(train_df)
recommender.fit()

{9770082: [(9771151, np.float64(0.003365974955387152)),
  (9785049, np.float64(0.0002885630454896315)),
  (9781621, np.float64(4.4684089752444756e-05)),
  (9767483, np.float64(8.027310498093243e-06)),
  (9782126, np.float64(3.975478888862405e-06)),
  (8315213, np.float64(1.2728615342849636e-06)),
  (9783019, np.float64(4.595927526329646e-07)),
  (9778736, np.float64(1.2391621673213393e-07)),
  (9770194, np.float64(1.2354973932815483e-07)),
  (9769684, np.float64(1.173139807519874e-07))],
 9772517: [(9782290, np.float64(0.9999830956386629)),
  (9784064, np.float64(0.999746449002802)),
  (9768850, np.float64(0.9629640196432988)),
  (9767484, np.float64(0.5888637526438537)),
  (9778971, np.float64(0.26802386286082247)),
  (9777034, np.float64(0.22523608044874976)),
  (9777308, np.float64(0.1192746895591934)),
  (9791165, np.float64(0.07989033285812619)),
  (9769370, np.float64(0.05958371582201738)),
  (9778668, np.float64(0.024674736740216918))],
 9776238: [(9782929, np.float64(0.99999971

This first model just compares all artilces read by users when comparing users

In [4]:
binary_recommender = ItemBasedCollaborativeRecommender(train_df, binary_model=True)
binary_recommender.fit()

{9770082: [(9771168, np.float64(0.10537131981999948)),
  (9769424, np.float64(0.10092277355803603)),
  (9771686, np.float64(0.10001020564374752)),
  (9774972, np.float64(0.09994504533344628)),
  (9771576, np.float64(0.0997096032035154)),
  (9644742, np.float64(0.09690031662230181)),
  (9766441, np.float64(0.09690031662230181)),
  (9674928, np.float64(0.09690031662230181)),
  (9771572, np.float64(0.09630764857562957)),
  (9771916, np.float64(0.09524781489880718))],
 9772517: [(9772297, np.float64(0.15931737313308103)),
  (9772380, np.float64(0.14562017853000253)),
  (9772029, np.float64(0.1346983657012849)),
  (9644742, np.float64(0.13074409009212262)),
  (9769581, np.float64(0.1282051282051282)),
  (9767697, np.float64(0.12766073029704128)),
  (9788126, np.float64(0.11694106924093717)),
  (9772389, np.float64(0.11020775375559677)),
  (9780815, np.float64(0.10785837148823896)),
  (9774764, np.float64(0.10747098003652678))],
 9776238: [(9575236, np.float64(0.13245323570650436)),
  (97443

Of the original 15143 users, only 9194 can be accounted for with the current solution. This should be changed in the future

## Model presentation

### Article Reccomendation

In [5]:
for user in [630220, 620796, 1067393, 1726258, 17205]:
    print("reccomended for user ", user, ": ", recommender.recommend_n_articles(user_id=user, n=5, allow_read_articles=True))

reccomended for user  630220 :  [9776917, 9787261, 9782180, 4265340, 9771355]
reccomended for user  620796 :  [9749637, 9306867, 9500467, 9678305, 9772433]
reccomended for user  1067393 :  [9770082, 9772517, 9776238, 9771330, 9772088]
reccomended for user  1726258 :  [9777324, 9778842, 9779520, 9500467, 9306867]
reccomended for user  17205 :  [9766441, 9773296, 9640315, 9627627, 9776688]


In [6]:
for user in [630220, 620796, 1067393, 1726258, 17205]:
    print("reccomended for user ", user, ": ", binary_recommender.recommend_n_articles(user_id=user, n=5, allow_read_articles=True))

reccomended for user  630220 :  [9774864, 9672256, 9673979, 9738729, 9769367]
reccomended for user  620796 :  [9765326, 9766178, 9749469, 9749901, 9306867]
reccomended for user  1067393 :  [9787659, 9728471, 9655661, 9768129, 9788400]
reccomended for user  1726258 :  [9306867, 9718808, 9341193, 9749637, 9722269]
reccomended for user  17205 :  [9640315, 9627627, 9771910, 9737199, 9737266]


### Evaluation Scores

#### Without ability to reccomend read articles

The complex model only reccomending articles the user has not yet read

In [7]:
results = recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=False)
results

{'MAP@K': np.float64(0.001851851851851852),
 'NDCG@K': np.float64(0.002429466707645081)}

The binary reccomender model only reccomending articles the user has not yet read

In [8]:
results = binary_recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=False)
results

{'MAP@K': np.float64(0.005357142857142857),
 'NDCG@K': np.float64(0.01590832239857842)}

#### With ability to reccomend previously read articles

The complex model reccomending articles the user, even if they have read them before

In [9]:
results = recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=True)
results

{'MAP@K': np.float64(0.002), 'NDCG@K': np.float64(0.003757786548120076)}

The binary reccomender model reccomending articles the user, even if they have read them before

In [10]:
results = binary_recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=True)
results

{'MAP@K': np.float64(0.001851851851851852),
 'NDCG@K': np.float64(0.010464135956765748)}

## Model Experimentation

In [11]:
test_user_id = 630220

predictions = recommender.recommend_n_articles(user_id=test_user_id, n=100, allow_read_articles=True)
results = set(test_df.filter(pl.col("user_id") == test_user_id)["article_id"])

print(results)
print(predictions)

for prediction in predictions:
    if prediction in results:
        print("Yes")

{9788705, 9789473, 9772355, 9778500, 9786566, 9774120, 9778413, 9781902, 9778351, 9774352, 9783824, 9779860, 9783509, 9776406, 9778939, 9776862, 9781887}
[9776917, 9787261, 9782180, 4265340, 9771355, 9772710, 9787586, 9775985, 9759717, 9759345, 9776691, 9783655, 9772221, 9756546, 9769367, 9790019, 9774864, 9767639, 9776694, 9767852, 9771568, 9767233, 9764444, 9778845, 9789404, 8534547, 9766803, 9239697, 9494299, 9777200, 8860119, 9777397, 9789427, 9785475, 9782915, 9789502, 9777464, 9773873, 9783585, 9714168, 9625415, 9783276, 9789494, 9777492, 9765759, 9775568, 9578459, 9514605, 9790755, 9773316, 9539706, 9722202, 9080070, 9774079, 9789745, 9778021, 9776566, 9783405, 9710762, 9788524, 9768062, 9775990, 9738729, 9569934, 9766635, 9673979, 9787230, 9654458, 9779242, 9782361, 9770450, 9655559, 9775573, 9769679, 9686860, 9769917, 9786860, 9672256, 9774287, 9440043, 9787243, 9538375, 9778745, 9732481, 9780096, 9766225, 9778682, 9780815, 9781991, 9772256, 8315213, 9781389, 9778219, 9782315,

In [12]:
test_user_id = 630220

predictions = recommender.recommend_n_articles(user_id=test_user_id, n=100, allow_read_articles=True)
results = set(test_df.filter(pl.col("user_id") == test_user_id)["article_id"])

print(results)
print(predictions)

for prediction in predictions:
    if prediction in results:
        print("Yes")

{9788705, 9789473, 9772355, 9778500, 9786566, 9774120, 9778413, 9781902, 9778351, 9774352, 9783824, 9779860, 9783509, 9776406, 9778939, 9776862, 9781887}
[9776917, 9787261, 9782180, 4265340, 9771355, 9772710, 9787586, 9775985, 9759717, 9759345, 9776691, 9783655, 9772221, 9756546, 9769367, 9790019, 9774864, 9767639, 9776694, 9767852, 9771568, 9767233, 9764444, 9778845, 9789404, 8534547, 9766803, 9239697, 9494299, 9777200, 8860119, 9777397, 9789427, 9785475, 9782915, 9789502, 9777464, 9773873, 9783585, 9714168, 9625415, 9783276, 9789494, 9777492, 9765759, 9775568, 9578459, 9514605, 9790755, 9773316, 9539706, 9722202, 9080070, 9774079, 9789745, 9778021, 9776566, 9783405, 9710762, 9788524, 9768062, 9775990, 9738729, 9569934, 9766635, 9673979, 9787230, 9654458, 9779242, 9782361, 9770450, 9655559, 9775573, 9769679, 9686860, 9769917, 9786860, 9672256, 9774287, 9440043, 9787243, 9538375, 9778745, 9732481, 9780096, 9766225, 9778682, 9780815, 9781991, 9772256, 8315213, 9781389, 9778219, 9782315,

In [None]:
from utils.evaluation import perform_model_evaluation
metrics = perform_model_evaluation(binary_recommender, test_data=test_df, k=5)
metrics

{'precision@k': np.float64(0.0032693674484719263),
 'recall@k': np.float64(0.009326988779912133),
 'fpr@k': np.float64(0.0022216772175403677)}

### Carbon Footprint
This section creates an emissions.csv file in the "output"-folder
It utilizes the code_carbon (`codecarbon EmissionsTracker`) to record the carbon footprint of the `fit` and the `recommend` methods of the model.

In [14]:
from utils.evaluation import track_model_energy

print("\nCarbon footprint of the recommender:")
footprint = track_model_energy(recommender, "item_based", user_id=test_user_id, n=5)
footprint

[codecarbon INFO @ 14:25:05] [setup] RAM Tracking...
[codecarbon INFO @ 14:25:05] [setup] CPU Tracking...
 Windows OS detected: Please install Intel Power Gadget to measure CPU




Carbon footprint of the recommender:


[codecarbon INFO @ 14:25:07] CPU Model on constant consumption mode: AMD Ryzen 5 5500U with Radeon Graphics
[codecarbon INFO @ 14:25:07] [setup] GPU Tracking...
[codecarbon INFO @ 14:25:08] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 14:25:08] >>> Tracker's metadata:
[codecarbon INFO @ 14:25:08]   Platform system: Windows-10-10.0.19045-SP0
[codecarbon INFO @ 14:25:08]   Python version: 3.11.1
[codecarbon INFO @ 14:25:08]   CodeCarbon version: 2.8.3
[codecarbon INFO @ 14:25:08]   Available RAM : 7.338 GB
[codecarbon INFO @ 14:25:08]   CPU count: 12
[codecarbon INFO @ 14:25:08]   CPU model: AMD Ryzen 5 5500U with Radeon Graphics
[codecarbon INFO @ 14:25:08]   GPU count: 1
[codecarbon INFO @ 14:25:08]   GPU model: 1 x NVIDIA GeForce GTX 1650
[codecarbon INFO @ 14:25:09] Saving emissions data to file c:\Users\chris\Desktop\NTNU Ting\8. Semester\Anbefalingssystemer\Project\TDT4215\recommender_system\demostrations\output\item_based_fit_emission.csv
[codecarbon INFO @ 14:25:24] Energy c

{'fit': ({9770082: [(9771151, np.float64(0.003365974955387152)),
    (9785049, np.float64(0.0002885630454896315)),
    (9781621, np.float64(4.4684089752444756e-05)),
    (9767483, np.float64(8.027310498093243e-06)),
    (9782126, np.float64(3.975478888862405e-06)),
    (8315213, np.float64(1.2728615342849636e-06)),
    (9783019, np.float64(4.595927526329646e-07)),
    (9778736, np.float64(1.2391621673213393e-07)),
    (9770194, np.float64(1.2354973932815483e-07)),
    (9769684, np.float64(1.173139807519874e-07))],
   9772517: [(9782290, np.float64(0.9999830956386629)),
    (9784064, np.float64(0.999746449002802)),
    (9768850, np.float64(0.9629640196432988)),
    (9767484, np.float64(0.5888637526438537)),
    (9778971, np.float64(0.26802386286082247)),
    (9777034, np.float64(0.22523608044874976)),
    (9777308, np.float64(0.1192746895591934)),
    (9791165, np.float64(0.07989033285812619)),
    (9769370, np.float64(0.05958371582201738)),
    (9778668, np.float64(0.024674736740216918