## Demonstration of the User-Based Collaborative Recommender

This system leverages collaborative filtering by analyzing user interactions, such as scroll length and read time, to identify users with similar behavior. 
Therefore, it focuses on the user-item relation.

It recommends articles that these similar users have engaged with, aiming to provide personalized suggestions. The model's performance is evaluated using MAP@K and NDCG@K metrics.



In [None]:
import sys
import os

parent_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(parent_dir)

import polars as pl
import numpy as np

from parquet_data_reader import ParquetDataReader
from utils.data_preprocessing import DataProcesser
from models.collaborative.user_based_CF import UserBasedCollaborativeRecommender

pl.Config.set_tbl_cols(-1)

polars.config.Config

## Data Import and EDA

In [2]:
dataProcesser = DataProcesser()
behaviors_df = dataProcesser.collaborative_filtering_preprocess()
train_df, test_df = dataProcesser.random_split(behaviors_df, test_ratio=0.2)
print(train_df.head())

shape: (5, 4)
┌────────────┬─────────┬────────────────┬────────────┐
│ article_id ┆ user_id ┆ total_readtime ┆ max_scroll │
│ ---        ┆ ---     ┆ ---            ┆ ---        │
│ i32        ┆ u32     ┆ f32            ┆ f32        │
╞════════════╪═════════╪════════════════╪════════════╡
│ 9781998    ┆ 1975245 ┆ 50.0           ┆ 100.0      │
│ 9778318    ┆ 1720950 ┆ 62.0           ┆ 100.0      │
│ 9771168    ┆ 91186   ┆ 37407.0        ┆ 100.0      │
│ 9777199    ┆ 1757161 ┆ 54.0           ┆ 100.0      │
│ 9778769    ┆ 2121561 ┆ 116.0          ┆ null       │
└────────────┴─────────┴────────────────┴────────────┘


## Model Fit

This first model uses readtime and read percentage interactions to compare the user interactions 

In [3]:
recommender = UserBasedCollaborativeRecommender(train_df)
recommender.fit()

{1975245: [(2323519, np.float64(0.9806188678907397)),
  (1311412, np.float64(0.9806188678907397)),
  (290581, np.float64(0.9806188678907397)),
  (2025427, np.float64(0.9741663842919888)),
  (1785274, np.float64(0.9395433607032765)),
  (1567415, np.float64(0.9212210999418743)),
  (1336910, np.float64(0.7764514493528887)),
  (1123281, np.float64(0.7566356879068933)),
  (2244766, np.float64(0.44523665397489554)),
  (1873098, np.float64(0.42394136021199813))],
 1720950: [(1127322, np.float64(0.9356325025246671)),
  (1049088, np.float64(0.9356325025246671)),
  (1079131, np.float64(0.9356325025246345)),
  (579626, np.float64(0.9356002343069257)),
  (1101117, np.float64(0.9353385161504759)),
  (1671916, np.float64(0.9247013446977203)),
  (1580025, np.float64(0.7212069955858702)),
  (733262, np.float64(0.7179473998628796)),
  (1334762, np.float64(0.5856668241463715)),
  (691640, np.float64(0.4832551243516786))],
 91186: [(336676, np.float64(0.9854474476187824)),
  (1918780, np.float64(0.985446

This first model just compares all artilces read by users when comparing users

In [4]:
binary_recommender = UserBasedCollaborativeRecommender(train_df, binary_model=True)
binary_recommender.fit()

{1975245: [(1205878, np.float64(0.2038588765750502)),
  (2376979, np.float64(0.20225995873897262)),
  (242138, np.float64(0.19069251784911845)),
  (52029, np.float64(0.19069251784911845)),
  (314421, np.float64(0.19069251784911845)),
  (37352, np.float64(0.19069251784911845)),
  (1958444, np.float64(0.19069251784911845)),
  (1471138, np.float64(0.19069251784911845)),
  (1483380, np.float64(0.19069251784911845)),
  (250718, np.float64(0.19069251784911845))],
 1720950: [(1334762, np.float64(0.3481553119113957)),
  (691640, np.float64(0.3481553119113957)),
  (1824385, np.float64(0.30151134457776363)),
  (1233960, np.float64(0.30151134457776363)),
  (2494588, np.float64(0.30151134457776363)),
  (1994147, np.float64(0.30151134457776363)),
  (1873528, np.float64(0.30151134457776363)),
  (2078289, np.float64(0.30151134457776363)),
  (1805779, np.float64(0.30151134457776363)),
  (290532, np.float64(0.30151134457776363))],
 91186: [(2330871, np.float64(0.20100756305184242)),
  (1989532, np.floa

Of the original 15143 users, only 9194 can be accounted for with the current solution. This should be changed in the future

## Model Presentation

### Article Recommendation

In [5]:
for user in [630220, 620796, 1067393, 1726258, 17205]:
    print("reccomended for user ", user, ": ", recommender.recommend_n_articles(user_id=user, n=5, allow_read_articles=True))

reccomended for user  630220 :  [9778351, 9776497, 9775484, 9776147, 9778945]
reccomended for user  620796 :  [9770450, 9770288, 9783379, 9778732, 9772193]
reccomended for user  1067393 :  [9771113, 9774789, 9785986, 9787230, 9785424]
reccomended for user  1726258 :  [9774142, 9771224, 9789997, 9759891, 9790756]
reccomended for user  17205 :  [9780325, 9774764, 9781991, 9778939, 9776917]


In [6]:
for user in [630220, 620796, 1067393, 1726258, 17205]:
    print("reccomended for user ", user, ": ", binary_recommender.recommend_n_articles(user_id=user, n=5, allow_read_articles=True))

reccomended for user  630220 :  [9783334, 9778628, 9776497, 9783137, 9787465]
reccomended for user  620796 :  [9769432, 9770082, 9771168, 9779777, 9771686]
reccomended for user  1067393 :  [9787098, 9771113, 9771758, 9782517, 9098807]
reccomended for user  1726258 :  [9771224, 9771242, 9786618, 9774392, 9775717]
reccomended for user  17205 :  [9775562, 9779269, 9777804, 9771333, 9779674]


### Evaluation Scores

#### Without the Ability to Recommend Read Articles

The complex model only reccomending articles the user has not yet read

In [7]:
results = recommender.evaluate_recommender(test_df, k=100, n_jobs=4, user_sample=200, allow_read_articles=False)
results

{'MAP@K': np.float64(0.002156862745098039),
 'NDCG@K': np.float64(0.013656959065451457)}

The binary reccomender model only reccomending articles the user has not yet read

In [8]:
results = binary_recommender.evaluate_recommender(test_df, k=100, n_jobs=4, user_sample=200, allow_read_articles=False)
results

{'MAP@K': np.float64(0.0008181818181818183),
 'NDCG@K': np.float64(0.005322055887025115)}

#### With the Ability to Recommend Previously Read Articles

The complex model reccomending articles the user, even if they have read them before

In [9]:
results = recommender.evaluate_recommender(test_df, k=100, n_jobs=4, user_sample=200, allow_read_articles=True)
results

{'MAP@K': np.float64(0.001836734693877551),
 'NDCG@K': np.float64(0.019060568710418134)}

The binary reccomender model reccomending articles the user, even if they have read them before

In [10]:
results = binary_recommender.evaluate_recommender(test_df, k=100, n_jobs=4, user_sample=200, allow_read_articles=True)
results

{'MAP@K': np.float64(0.0004347826086956522),
 'NDCG@K': np.float64(0.0021494533036356058)}

## Model Experimentation

In [11]:
test_user_id = 630220

predictions = recommender.recommend_n_articles(user_id=test_user_id, n=1000, allow_read_articles=True)
results = set(test_df.filter(pl.col("user_id") == test_user_id)["article_id"])

print(results)
print(predictions)

for prediction in predictions:
    if prediction in results:
        print("Yes")

{9787524, 9773700, 9771916, 9776406, 9789473, 9774120, 9782315, 9771948, 9787441, 9776691, 9773248, 9774532, 9778500, 9791049, 9780815, 9780181, 9776862, 9786719, 9778657, 9788661, 9673979, 9781887}
[9778351, 9776497, 9775484, 9776147, 9778945, 9781878, 9769557, 9772300, 9779867, 9782869, 9779289, 9771168, 9779411, 9774229, 9783993, 9778035, 9782423, 9774430, 9780476, 9773846, 9774120, 9786378, 9775785, 9779269, 9771330, 9780284, 9779408, 9777621, 9772343, 9771253, 9780677, 9784425, 9780384, 9789647, 9781785, 9779650, 9776560, 9775562, 9779737, 9784863, 9779724, 9769624, 9780406, 9775881, 9781913, 9772227, 9780960, 9772475, 9771916, 9789997, 9771796, 9779045, 9772862, 9771350, 9784696, 9783732, 9779133, 9783655, 9777319, 9773137, 9773210, 9780561, 9773543, 9784852, 9784710, 9771187, 9777299, 9779294, 9772442, 9771919, 9780496, 9771170, 9781389, 9781598, 9760091, 9781423]
Yes
Yes


In [12]:
test_user_id = 630220

predictions = recommender.recommend_n_articles(user_id=test_user_id, n=1000, allow_read_articles=True)
results = set(test_df.filter(pl.col("user_id") == test_user_id)["article_id"])

print(results)
print(predictions)

for prediction in predictions:
    if prediction in results:
        print("Yes")

{9787524, 9773700, 9771916, 9776406, 9789473, 9774120, 9782315, 9771948, 9787441, 9776691, 9773248, 9774532, 9778500, 9791049, 9780815, 9780181, 9776862, 9786719, 9778657, 9788661, 9673979, 9781887}
[9778351, 9776497, 9775484, 9776147, 9778945, 9781878, 9769557, 9772300, 9779867, 9782869, 9779289, 9779411, 9771168, 9774229, 9782423, 9783993, 9778035, 9774430, 9780476, 9773846, 9774120, 9786378, 9771330, 9775785, 9779269, 9780284, 9779408, 9777621, 9772343, 9771253, 9780677, 9784425, 9780384, 9789647, 9781785, 9779650, 9776560, 9775562, 9779737, 9784863, 9769624, 9779724, 9780406, 9775881, 9781913, 9772227, 9780960, 9772475, 9771916, 9789997, 9771796, 9779045, 9772862, 9771350, 9784696, 9783655, 9783732, 9779133, 9777319, 9773137, 9780561, 9773210, 9773543, 9784852, 9784710, 9779294, 9771187, 9777299, 9772442, 9771919, 9780496, 9771170, 9781389, 9781598, 9760091, 9781423]
Yes
Yes


In [None]:
from utils.evaluation import perform_model_evaluation

matrics = perform_model_evaluation(recommender, test_df, k=5)
matrics

{'precision@k': np.float64(0.004309360730593608),
 'recall@k': np.float64(0.007180823964876379),
 'fpr@k': np.float64(0.0019355669596602982)}

### Carbon Footprint
This section creates an emissions.csv file in the "output"-folder
It utilizes the code_carbon (`codecarbon EmissionsTracker`) to record the carbon footprint of the `fit` and the `recommend` methods of the model.

In [14]:
from utils.evaluation import track_model_energy

print("\nCarbon footprint of the recommender:")
footprint = track_model_energy(recommender, "user_based", user_id=test_user_id, n=5)
footprint

[codecarbon INFO @ 14:32:14] [setup] RAM Tracking...
[codecarbon INFO @ 14:32:14] [setup] CPU Tracking...
 Windows OS detected: Please install Intel Power Gadget to measure CPU




Carbon footprint of the recommender:


[codecarbon INFO @ 14:32:16] CPU Model on constant consumption mode: AMD Ryzen 5 5500U with Radeon Graphics
[codecarbon INFO @ 14:32:16] [setup] GPU Tracking...
[codecarbon INFO @ 14:32:17] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 14:32:17] >>> Tracker's metadata:
[codecarbon INFO @ 14:32:17]   Platform system: Windows-10-10.0.19045-SP0
[codecarbon INFO @ 14:32:17]   Python version: 3.11.1
[codecarbon INFO @ 14:32:17]   CodeCarbon version: 2.8.3
[codecarbon INFO @ 14:32:17]   Available RAM : 7.338 GB
[codecarbon INFO @ 14:32:17]   CPU count: 12
[codecarbon INFO @ 14:32:17]   CPU model: AMD Ryzen 5 5500U with Radeon Graphics
[codecarbon INFO @ 14:32:17]   GPU count: 1
[codecarbon INFO @ 14:32:17]   GPU model: 1 x NVIDIA GeForce GTX 1650
[codecarbon INFO @ 14:32:17] Saving emissions data to file c:\Users\chris\Desktop\NTNU Ting\8. Semester\Anbefalingssystemer\Project\TDT4215\recommender_system\demostrations\output\user_based_fit_emission.csv
[codecarbon INFO @ 14:32:33] Energy c

{'fit': ({1975245: [(2323519, np.float64(0.9806188678907397)),
    (1311412, np.float64(0.9806188678907397)),
    (290581, np.float64(0.9806188678907397)),
    (2025427, np.float64(0.9741663842919888)),
    (1785274, np.float64(0.9395433607032765)),
    (1567415, np.float64(0.9212210999418743)),
    (1336910, np.float64(0.7764514493528887)),
    (1123281, np.float64(0.7566356879068933)),
    (2244766, np.float64(0.44523665397489554)),
    (1873098, np.float64(0.42394136021199813))],
   1720950: [(1127322, np.float64(0.9356325025246671)),
    (1049088, np.float64(0.9356325025246671)),
    (1079131, np.float64(0.9356325025246345)),
    (579626, np.float64(0.9356002343069257)),
    (1101117, np.float64(0.9353385161504759)),
    (1671916, np.float64(0.9247013446977203)),
    (1580025, np.float64(0.7212069955858702)),
    (733262, np.float64(0.7179473998628796)),
    (1334762, np.float64(0.5856668241463715)),
    (691640, np.float64(0.4832551243516786))],
   91186: [(336676, np.float64(0.98