## Demonstration of the Item-Based Collaborative Recommender

This system leverages collaborative filtering by analyzing how user-item interactions bridges items
Therefore, it focuses on the user-item relation.

It recommends articles that these similar users have engaged with, aiming to provide personalized suggestions. The model's performance is evaluated using MAP@K and NDCG@K metrics.

In [None]:
import sys
import os

parent_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(parent_dir)

import polars as pl
import numpy as np

from parquet_data_reader import ParquetDataReader
from utils.data_preprocessing import DataProcesser
from models.user_based_CF import UserBasedCollaborativeRecommender

pl.Config.set_tbl_cols(-1)

polars.config.Config

## Data import & Preprocessing

In [2]:
dataProcesser = DataProcesser()
behaviors_df = dataProcesser.collaborative_filtering_preprocess()
train_df, test_df = dataProcesser.random_split(behaviors_df, test_ratio=0.2)
print(train_df.head())

shape: (5, 4)
┌────────────┬─────────┬────────────────┬────────────┐
│ article_id ┆ user_id ┆ total_readtime ┆ max_scroll │
│ ---        ┆ ---     ┆ ---            ┆ ---        │
│ i32        ┆ u32     ┆ f32            ┆ f32        │
╞════════════╪═════════╪════════════════╪════════════╡
│ 9780325    ┆ 2425189 ┆ 82.0           ┆ 100.0      │
│ 9771121    ┆ 346297  ┆ 129.0          ┆ 100.0      │
│ 9769306    ┆ 2500030 ┆ 75.0           ┆ 100.0      │
│ 9782407    ┆ 746371  ┆ 210.0          ┆ 100.0      │
│ 9776234    ┆ 1292585 ┆ 38.0           ┆ 100.0      │
└────────────┴─────────┴────────────────┴────────────┘


## Model Fit

This first model uses readtime and read percentage interactions to compare the user interactions 

In [None]:
recommender = UserBasedCollaborativeRecommender(train_df)
recommender.fit()

{9780325: [(9774652, np.float64(0.9949266249057374)),
  (9778917, np.float64(0.9949259518842211)),
  (9766592, np.float64(0.349454857993146)),
  (9775562, np.float64(0.15300251688098676)),
  (9778945, np.float64(0.07579577530878145)),
  (9768790, np.float64(0.010629074798780613)),
  (9779748, np.float64(0.010629056817913929)),
  (9669015, np.float64(0.010549662999354226)),
  (9789754, np.float64(0.010130575055592672)),
  (9782115, np.float64(0.009664744054166952))],
 9771121: [(9779968, np.float64(0.9547185514867494)),
  (9771775, np.float64(0.49157095336893253)),
  (9773962, np.float64(0.09599151178855103)),
  (9774598, np.float64(0.029919197222921357)),
  (9779184, np.float64(0.014225332348961506)),
  (9774648, np.float64(0.009828771818059967)),
  (9778915, np.float64(0.0026040330636389886)),
  (9775323, np.float64(0.0023767566839286713)),
  (9773465, np.float64(0.0020369531631946325)),
  (9765943, np.float64(0.0008164455784350766))],
 9769306: [(9778074, np.float64(0.034624142345548

This first model just compares all artilces read by users when comparing users

In [None]:
binary_recommender = UserBasedCollaborativeRecommender(train_df, binary_model=True)
binary_recommender.fit()

{9780325: [(9780482, np.float64(0.13268304495737837)),
  (9779713, np.float64(0.10530400393442729)),
  (9780284, np.float64(0.10074320393528013)),
  (9788125, np.float64(0.09477360354098452)),
  (9777846, np.float64(0.09383985068515122)),
  (9780271, np.float64(0.09248956660924212)),
  (9781947, np.float64(0.09188876191104745)),
  (9789997, np.float64(0.09183209532911096)),
  (9767484, np.float64(0.08935341032175403)),
  (9780447, np.float64(0.08822942331061989))],
 9771121: [(9770451, np.float64(0.13181762600207414)),
  (9765943, np.float64(0.1290994448735805)),
  (9699564, np.float64(0.1290994448735805)),
  (9530940, np.float64(0.1290994448735805)),
  (9771910, np.float64(0.1290994448735805)),
  (9663634, np.float64(0.1290994448735805)),
  (9735991, np.float64(0.1290994448735805)),
  (9561819, np.float64(0.1290994448735805)),
  (9354622, np.float64(0.1290994448735805)),
  (9775003, np.float64(0.10741723110591495))],
 9769306: [(9770741, np.float64(0.16754156331667813)),
  (9740182, n

Of the original 15143 users, only 9194 can be accounted for with the current solution. This should be changed in the future

## Model presentation

### Article Reccomendation

In [5]:
for user in [630220, 620796, 1067393, 1726258, 17205]:
    print("reccomended for user ", user, ": ", recommender.recommend_n_articles(user_id=user, n=5, allow_read_articles=True))

reccomended for user  630220 :  [9785596, 9774864, 9653790, 9746073, 9673979]
reccomended for user  620796 :  [9678305, 9498042, 9784489, 9778257, 9339920]
reccomended for user  1067393 :  [9780325, 9771121, 9769306, 9782407, 9776234]
reccomended for user  1726258 :  [9778842, 9777324, 9759891, 9779520, 9767665]
reccomended for user  17205 :  [9774652, 9778917, 9787510, 9790987, 9780195]


In [6]:
for user in [630220, 620796, 1067393, 1726258, 17205]:
    print("reccomended for user ", user, ": ", binary_recommender.recommend_n_articles(user_id=user, n=5, allow_read_articles=True))

reccomended for user  630220 :  [9774864, 9673979, 9699524, 9672256, 9497898]
reccomended for user  620796 :  [9765326, 9749469, 9766178, 9740047, 9551172]
reccomended for user  1067393 :  [9768129, 9728471, 9767975, 9084355, 9734834]
reccomended for user  1726258 :  [9718262, 9722269, 9566544, 9657797, 9306867]
reccomended for user  17205 :  [9768638, 9766441, 9364014, 9482380, 9772104]


### Evaluation Scores

#### Without ability to reccomend read articles

The complex model only reccomending articles the user has not yet read

In [7]:
results = recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=False)
results

{'MAP@K': np.float64(0.0), 'NDCG@K': np.float64(0.0)}

The binary reccomender model only reccomending articles the user has not yet read

In [8]:
results = binary_recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=False)
results

{'MAP@K': np.float64(0.006666666666666667),
 'NDCG@K': np.float64(0.024664567357916917)}

#### With ability to reccomend previously read articles

The complex model reccomending articles the user, even if they have read them before

In [9]:
results = recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=True)
results

{'MAP@K': np.float64(0.0026548672566371685),
 'NDCG@K': np.float64(0.00449403350647971)}

The binary reccomender model reccomending articles the user, even if they have read them before

In [10]:
results = binary_recommender.evaluate_recommender(test_df, k=10, n_jobs=4, user_sample=200, allow_read_articles=True)
results

{'MAP@K': np.float64(0.00196078431372549),
 'NDCG@K': np.float64(0.0056890118710962)}

## Model Experimentation

In [11]:
test_user_id = 630220

predictions = recommender.recommend_n_articles(user_id=test_user_id, n=100, allow_read_articles=True)
results = set(test_df.filter(pl.col("user_id") == test_user_id)["article_id"])

print(results)
print(predictions)

for prediction in predictions:
    if prediction in results:
        print("Yes")

{9789473, 9778500, 9782726, 9786566, 9772045, 9786351, 9776497, 9774142, 9786719}
[9785596, 9774864, 9653790, 9746073, 9673979, 9785973, 9771065, 9782915, 9776147, 9749857, 9784808, 9772710, 9714168, 9759717, 9759345, 9789125, 9785668, 9789810, 9772029, 9672256, 9778745, 9556552, 9356137, 9583381, 9220931, 9766803, 9767852, 9756546, 9769712, 9784952, 9783379, 9732481, 9782315, 9430567, 9702511, 9575287, 9463833, 9770997, 9779538, 9778661, 9773288, 9776152, 9723342, 9644742, 9774392, 9683742, 9497898, 9768599, 9774937, 9674243, 9428643, 9772443, 9776087, 9645545, 9699394, 9788106, 9782361, 9785205, 9779150, 9773727, 9775577, 9566633, 9789754, 9779267, 9776070, 9777983, 9784758, 9782836, 9774840, 9785174, 9780428, 9778413, 9669647, 9786243, 9786280, 9527217, 9287091, 9778351, 9786718, 9661024, 9777858, 9789327, 9771739, 9778028, 9538375, 9785016, 9777475, 9786230, 9783183, 9772508, 9359995, 9790002, 9778500, 9781859, 9275787, 9776710, 9782234, 9790713, 9783019, 9791128]
Yes


In [12]:
test_user_id = 630220

predictions = recommender.recommend_n_articles(user_id=test_user_id, n=100, allow_read_articles=True)
results = set(test_df.filter(pl.col("user_id") == test_user_id)["article_id"])

print(results)
print(predictions)

for prediction in predictions:
    if prediction in results:
        print("Yes")

{9789473, 9778500, 9782726, 9786566, 9772045, 9786351, 9776497, 9774142, 9786719}
[9785596, 9774864, 9653790, 9746073, 9673979, 9785973, 9771065, 9782915, 9776147, 9749857, 9784808, 9772710, 9714168, 9759717, 9759345, 9789125, 9785668, 9789810, 9772029, 9672256, 9778745, 9556552, 9356137, 9583381, 9220931, 9766803, 9767852, 9756546, 9769712, 9784952, 9783379, 9732481, 9782315, 9430567, 9702511, 9575287, 9463833, 9770997, 9779538, 9778661, 9773288, 9776152, 9723342, 9644742, 9774392, 9683742, 9497898, 9768599, 9774937, 9674243, 9428643, 9772443, 9776087, 9645545, 9699394, 9788106, 9782361, 9785205, 9779150, 9773727, 9775577, 9566633, 9789754, 9779267, 9776070, 9777983, 9784758, 9782836, 9774840, 9785174, 9780428, 9778413, 9669647, 9786243, 9786280, 9527217, 9287091, 9778351, 9786718, 9661024, 9777858, 9789327, 9771739, 9778028, 9538375, 9785016, 9777475, 9786230, 9783183, 9772508, 9359995, 9790002, 9778500, 9781859, 9275787, 9776710, 9782234, 9790713, 9783019, 9791128]
Yes


In [15]:
from utils.evaluation import perform_model_evaluation
metrics = perform_model_evaluation(binary_recommender, test_data=test_df, k=5)
metrics


{'precision@k': np.float64(0.003161042289619821),
 'recall@k': np.float64(0.009196432295535245),
 'fpr@k': np.float64(0.002176950737869508)}

### Carbon Footprint
This section creates an emissions.csv file in the "output"-folder
It utilizes the code_carbon (`codecarbon EmissionsTracker`) to record the carbon footprint of the `fit` and the `recommend` methods of the model.

In [14]:
from utils.evaluation import track_model_energy

print("\nCarbon footprint of the recommender:")
footprint = track_model_energy(recommender, "item_based", user_id=test_user_id, n=5)
footprint

[codecarbon INFO @ 14:12:56] [setup] RAM Tracking...
[codecarbon INFO @ 14:12:56] [setup] CPU Tracking...
 Windows OS detected: Please install Intel Power Gadget to measure CPU




Carbon footprint of the recommender:


[codecarbon INFO @ 14:12:58] CPU Model on constant consumption mode: AMD Ryzen 5 5500U with Radeon Graphics
[codecarbon INFO @ 14:12:58] [setup] GPU Tracking...
[codecarbon INFO @ 14:12:59] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 14:12:59] >>> Tracker's metadata:
[codecarbon INFO @ 14:12:59]   Platform system: Windows-10-10.0.19045-SP0
[codecarbon INFO @ 14:12:59]   Python version: 3.11.1
[codecarbon INFO @ 14:12:59]   CodeCarbon version: 2.8.3
[codecarbon INFO @ 14:12:59]   Available RAM : 7.338 GB
[codecarbon INFO @ 14:12:59]   CPU count: 12
[codecarbon INFO @ 14:12:59]   CPU model: AMD Ryzen 5 5500U with Radeon Graphics
[codecarbon INFO @ 14:12:59]   GPU count: 1
[codecarbon INFO @ 14:12:59]   GPU model: 1 x NVIDIA GeForce GTX 1650
[codecarbon INFO @ 14:13:00] Saving emissions data to file c:\Users\chris\Desktop\NTNU Ting\8. Semester\Anbefalingssystemer\Project\TDT4215\recommender_system\demostrations\output\item_based_fit_emission.csv
[codecarbon INFO @ 14:13:15] Energy c

{'fit': ({9780325: [(9774652, np.float64(0.9949266249057374)),
    (9778917, np.float64(0.9949259518842211)),
    (9766592, np.float64(0.349454857993146)),
    (9775562, np.float64(0.15300251688098676)),
    (9778945, np.float64(0.07579577530878145)),
    (9768790, np.float64(0.010629074798780613)),
    (9779748, np.float64(0.010629056817913929)),
    (9669015, np.float64(0.010549662999354226)),
    (9789754, np.float64(0.010130575055592672)),
    (9782115, np.float64(0.009664744054166952))],
   9771121: [(9779968, np.float64(0.9547185514867494)),
    (9771775, np.float64(0.49157095336893253)),
    (9773962, np.float64(0.09599151178855103)),
    (9774598, np.float64(0.029919197222921357)),
    (9779184, np.float64(0.014225332348961506)),
    (9774648, np.float64(0.009828771818059967)),
    (9778915, np.float64(0.0026040330636389886)),
    (9775323, np.float64(0.0023767566839286713)),
    (9773465, np.float64(0.0020369531631946325)),
    (9765943, np.float64(0.0008164455784350766))],
  