## Demonstration of the Item-Based Collaborative Recommender

This system leverages collaborative filtering by analyzing how user-item interactions bridges items
Therefore, it focuses on the user-item relation.

It recommends articles that these similar users have engaged with, aiming to provide personalized suggestions. The model's performance is evaluated using MAP@K and NDCG@K metrics.

In [30]:
import sys
import os

parent_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(parent_dir)

import polars as pl
import numpy as np

from parquet_data_reader import ParquetDataReader
from utils.data_preprocessing import DataProcesser
from models.collaborative.item_based_CF import ItemBasedCollaborativeRecommender

pl.Config.set_tbl_cols(-1)

polars.config.Config

## Data import & Preprocessing

In [31]:
dataProcesser = DataProcesser()
behaviors_df = dataProcesser.collaborative_filtering_preprocess()
train_df, test_df = dataProcesser.random_split(behaviors_df, test_ratio=0.2)
print(train_df.head())

shape: (5, 4)
┌────────────┬─────────┬────────────────┬────────────┐
│ article_id ┆ user_id ┆ total_readtime ┆ max_scroll │
│ ---        ┆ ---     ┆ ---            ┆ ---        │
│ i32        ┆ u32     ┆ f32            ┆ f32        │
╞════════════╪═════════╪════════════════╪════════════╡
│ 9775699    ┆ 1425490 ┆ 145.0          ┆ 100.0      │
│ 9771919    ┆ 1754050 ┆ 99.0           ┆ 100.0      │
│ 9789664    ┆ 519934  ┆ 6.0            ┆ 100.0      │
│ 9787586    ┆ 409445  ┆ 1836.0         ┆ 100.0      │
│ 9696204    ┆ 2371157 ┆ 171.0          ┆ 100.0      │
└────────────┴─────────┴────────────────┴────────────┘


## Model Fit

This first model uses readtime and read percentage interactions to compare the user interactions 

In [32]:
recommender = ItemBasedCollaborativeRecommender(train_df)
recommender.fit(readtime_weight=1.0, scroll_weight=0.1)

{9775699: [(9786213, np.float64(0.4910119457826563)),
  (9770452, np.float64(0.4902539369162391)),
  (9780384, np.float64(0.4697314012417555)),
  (9649538, np.float64(0.3682452378012825)),
  (9699867, np.float64(0.347272105418589)),
  (9773292, np.float64(0.19282748351752677)),
  (9771355, np.float64(0.18280354337270877)),
  (9785790, np.float64(0.177030518149079)),
  (9781756, np.float64(0.17703049534168458)),
  (9775909, np.float64(0.1763792429758776))],
 9771919: [(9788197, np.float64(0.11686603657672934)),
  (9774461, np.float64(0.0010152369966798247)),
  (9740324, np.float64(0.0004475680331558207)),
  (9773539, np.float64(0.00020998115676806872)),
  (9788095, np.float64(0.00019131703253894017)),
  (9775493, np.float64(0.00017765472034259044)),
  (9773673, np.float64(0.0001609660384312983)),
  (9785986, np.float64(9.429451857756455e-05)),
  (9778328, np.float64(4.865305132406572e-05)),
  (9773612, np.float64(4.592527657654166e-05))],
 9789664: [(9780271, np.float64(0.07726811648604

This first model just compares all artilces read by users when comparing users

In [33]:
binary_recommender = ItemBasedCollaborativeRecommender(train_df, binary_model=True)
binary_recommender.fit()

{9775699: [(9651883, np.float64(0.18898223650461365)),
  (9649514, np.float64(0.18898223650461365)),
  (9649560, np.float64(0.18898223650461365)),
  (9659345, np.float64(0.1428571428571428)),
  (9775716, np.float64(0.14173667737846019)),
  (9712699, np.float64(0.13363062095621214)),
  (9699867, np.float64(0.13363062095621214)),
  (9674928, np.float64(0.13363062095621214)),
  (9775648, np.float64(0.13363062095621214)),
  (9696657, np.float64(0.10910894511799618))],
 9771919: [(9771916, np.float64(0.190997473078107)),
  (9769348, np.float64(0.1379858122713551)),
  (9772000, np.float64(0.13446321855011933)),
  (9771948, np.float64(0.12532010295690077)),
  (9771896, np.float64(0.12222222222222223)),
  (9771187, np.float64(0.11846977555181848)),
  (9779267, np.float64(0.11180339887498947)),
  (9769370, np.float64(0.10801234497346435)),
  (9771846, np.float64(0.10651074037450892)),
  (9774074, np.float64(0.09799578870122228))],
 9789664: [(9790414, np.float64(0.14564381625088385)),
  (912714

Of the original 15143 users, only 9194 can be accounted for with the current solution. This should be changed in the future

## Model presentation

### Article Reccomendation

In [34]:
for user in [630220, 620796, 1067393, 1726258, 17205]:
    print("reccomended for user ", user, ": ", recommender.recommend_n_articles(user_id=user, n=5, allow_read_articles=True))

reccomended for user  630220 :  [9777846, 9777969, 9790293, 9775985, 9780193]
reccomended for user  620796 :  [9784275, 9786217, 9788126, 9767490, 9785016]
reccomended for user  1067393 :  [9775699, 9771919, 9789664, 9787586, 9696204]
reccomended for user  1726258 :  [9786209, 9784758, 9771197, 9788362, 9440508]
reccomended for user  17205 :  [9773488, 9780496, 9767955, 9766441, 9482380]


In [35]:
for user in [630220, 620796, 1067393, 1726258, 17205]:
    print("reccomended for user ", user, ": ", binary_recommender.recommend_n_articles(user_id=user, n=5, allow_read_articles=True))

reccomended for user  630220 :  [9774864, 9672256, 9769367, 9695259, 9152828]
reccomended for user  620796 :  [9765326, 9749469, 9766178, 9782495, 9782616]
reccomended for user  1067393 :  [9771627, 9769471, 9097165, 9575236, 9728471]
reccomended for user  1726258 :  [9775804, 9718262, 9632108, 9667501, 9474656]
reccomended for user  17205 :  [9768638, 9482380, 9705425, 9766441, 9768860]


### Evaluation Scores

#### Without ability to reccomend read articles

The complex model only reccomending articles the user has not yet read

In [36]:
results = recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=False)
results

{'MAP@K': np.float64(0.0008403361344537816),
 'NDCG@K': np.float64(0.001975172684155339)}

The binary reccomender model only reccomending articles the user has not yet read

In [37]:
results = binary_recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=False)
results

{'MAP@K': np.float64(0.0032258064516129037),
 'NDCG@K': np.float64(0.006316683617423812)}

#### With ability to reccomend previously read articles

The complex model reccomending articles the user, even if they have read them before

In [38]:
results = recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=True)
results

{'MAP@K': np.float64(0.001075268817204301),
 'NDCG@K': np.float64(0.0009777207593438585)}

The binary reccomender model reccomending articles the user, even if they have read them before

In [39]:
results = binary_recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=True)
results

{'MAP@K': np.float64(0.004672897196261682),
 'NDCG@K': np.float64(0.01441380158202379)}

## Model Experimentation

In [40]:
test_user_id = 630220

predictions = recommender.recommend_n_articles(user_id=test_user_id, n=100, allow_read_articles=True)
results = set(test_df.filter(pl.col("user_id") == test_user_id)["article_id"])

print(results)
print(predictions)

for prediction in predictions:
    if prediction in results:
        print("Yes")

{9773700, 9776916, 9769504, 9788705, 9774142, 9779777, 9771460, 9782726, 9776071, 9786566, 9787465, 9788362, 9782092, 9771473, 9780181, 9786351, 9780849, 9787510, 9673979}
[9777846, 9777969, 9790293, 9775985, 9780193, 9778413, 9769193, 9735085, 9778155, 9784506, 9657822, 9785986, 9772710, 9777200, 9785062, 9788692, 9775323, 9772297, 9783276, 9789494, 9734133, 9789745, 9776238, 9776566, 9710762, 9768062, 9781870, 9775990, 9779724, 9785009, 9774120, 8860119, 9777397, 9539706, 9722202, 9786529, 9432542, 9756546, 9769367, 9788048, 9583381, 9356137, 9556552, 9784607, 9700883, 9779996, 9757878, 9765753, 9781902, 9776152, 9780329, 9767852, 9385951, 9776694, 9787243, 9778973, 9778701, 9778021, 9775573, 9774864, 9769679, 9655559, 9776715, 9783751, 9781389, 9761784, 9767342, 9782315, 9339920, 9777228, 9773288, 9744403, 9706355, 9777320, 9782633, 9672915, 9765336, 9779370, 9785174, 9778661, 9772367, 9699394, 9670430, 9497898, 9683742, 9768599, 9780983, 9718262, 9674243, 9776551, 9776147, 9773335,

In [41]:
test_user_id = 630220

predictions = recommender.recommend_n_articles(user_id=test_user_id, n=100, allow_read_articles=True)
results = set(test_df.filter(pl.col("user_id") == test_user_id)["article_id"])

print(results)
print(predictions)

for prediction in predictions:
    if prediction in results:
        print("Yes")

{9773700, 9776916, 9769504, 9788705, 9774142, 9779777, 9771460, 9782726, 9776071, 9786566, 9787465, 9788362, 9782092, 9771473, 9780181, 9786351, 9780849, 9787510, 9673979}
[9777846, 9777969, 9790293, 9775985, 9780193, 9778413, 9769193, 9735085, 9778155, 9784506, 9657822, 9785986, 9772710, 9777200, 9785062, 9788692, 9775323, 9772297, 9783276, 9789494, 9734133, 9789745, 9776238, 9776566, 9710762, 9768062, 9781870, 9775990, 9779724, 9785009, 9774120, 8860119, 9777397, 9539706, 9722202, 9786529, 9432542, 9756546, 9769367, 9788048, 9583381, 9356137, 9556552, 9784607, 9700883, 9779996, 9757878, 9765753, 9781902, 9776152, 9780329, 9767852, 9385951, 9776694, 9787243, 9778973, 9778701, 9778021, 9775573, 9774864, 9769679, 9655559, 9776715, 9783751, 9781389, 9761784, 9767342, 9782315, 9339920, 9777228, 9773288, 9744403, 9706355, 9777320, 9782633, 9672915, 9765336, 9779370, 9785174, 9778661, 9772367, 9699394, 9670430, 9497898, 9683742, 9768599, 9780983, 9718262, 9674243, 9776551, 9776147, 9773335,

In [42]:
from utils.evaluation import perform_model_evaluation
metrics = perform_model_evaluation(binary_recommender, test_data=test_df, k=5)
metrics

{'precision@k': np.float64(0.003618090452261307),
 'recall@k': np.float64(0.010873761717264948),
 'fpr@k': np.float64(0.002207846354639879)}

### Carbon Footprint
This section creates an emissions.csv file in the "output"-folder
It utilizes the code_carbon (`codecarbon EmissionsTracker`) to record the carbon footprint of the `fit` and the `recommend` methods of the model.

In [None]:
from utils.evaluation import track_model_energy

print("\nCarbon footprint of the recommender:")
footprint = track_model_energy(recommender, "item_based", user_id=test_user_id, n=5)
footprint

[codecarbon INFO @ 19:58:06] [setup] RAM Tracking...
[codecarbon INFO @ 19:58:06] [setup] CPU Tracking...
 Windows OS detected: Please install Intel Power Gadget to measure CPU




Carbon footprint of the recommender:


[codecarbon INFO @ 19:58:08] CPU Model on constant consumption mode: AMD Ryzen 5 5500U with Radeon Graphics
[codecarbon INFO @ 19:58:08] [setup] GPU Tracking...
[codecarbon INFO @ 19:58:08] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 19:58:08] >>> Tracker's metadata:
[codecarbon INFO @ 19:58:08]   Platform system: Windows-10-10.0.19045-SP0
[codecarbon INFO @ 19:58:08]   Python version: 3.11.1
[codecarbon INFO @ 19:58:08]   CodeCarbon version: 2.8.3
[codecarbon INFO @ 19:58:08]   Available RAM : 7.338 GB
[codecarbon INFO @ 19:58:08]   CPU count: 12
[codecarbon INFO @ 19:58:08]   CPU model: AMD Ryzen 5 5500U with Radeon Graphics
[codecarbon INFO @ 19:58:08]   GPU count: 1
[codecarbon INFO @ 19:58:08]   GPU model: 1 x NVIDIA GeForce GTX 1650
[codecarbon INFO @ 19:58:09] Saving emissions data to file c:\Users\chris\Desktop\NTNU Ting\8. Semester\Anbefalingssystemer\Project\TDT4215\recommender_system\demostrations\output\item_based_fit_emission.csv
[codecarbon INFO @ 19:58:24] Energy c