## Demonstration of a Simple Most Popular Baseline Recommender System
Here the demonstration of the baseline model recommending the most popular news is presented.

It simply recommends the most popular item in terms of the *click_count* per item.

It also includes the evaluation of the recommender model using the metrics *Precision and Recall*.

In [1]:
# Imports and setup
import sys
import os

parent_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(parent_dir)
from parquet_data_reader import ParquetDataReader
from models.baseline.most_popular import MostPopularRecommender

import polars as pl
pl.Config.set_tbl_cols(-1)
import numpy as np
parquet_reader = ParquetDataReader()

### Data Extraction and Preprocessing

In [2]:
import polars as pl
from utils.baseline_processing import process_behavior_data, random_split, time_based_split

train_behavior_df = parquet_reader.read_data("../../data/train/behaviors.parquet")
test_behaviours_df = parquet_reader.read_data('../../data/validation/behaviors.parquet')

# Processes the data
combined_df = process_behavior_data(train_behavior_df, test_behaviours_df)

# ----- Method 1: Random Split -----
train_random, test_random = random_split(combined_df, test_ratio=0.30)
print("Random Split:")
print("Train shape:", train_random.shape)
print("Test shape:", test_random.shape)

# ----- Method 2: Time-based Split -----
train_time, test_time = time_based_split(combined_df, test_ratio=0.30)
print("\nTime-based Split:")
print("Train shape:", train_time.shape)
print("Test shape:", test_time.shape)

Random Split:
Train shape: (98957, 4)
Test shape: (42766, 4)

Time-based Split:
Train shape: (99207, 4)
Test shape: (42516, 4)


### Method 1: Random Split of Train/Test for Recommendations

In [3]:
# Creates a recommender and fits it to the training data split using the random split method
recommender = MostPopularRecommender(behaviors=train_random)
recommender.fit()

# Test user which is known to have interactions in the data
user_id_test = 151570
recommendations = recommender.recommend(user_id=user_id_test, n=5)

print(f"Recommendations for user {user_id_test}:")
print(recommendations)

Recommendations for user 151570:
shape: (5,)
Series: 'article_id' [i32]
[
	9773282
	9775562
	9776234
	9775776
	9786378
]


### Method 2: Time-Based Split Train/Test Recommendations
This methods splits the data into the oldest interactions *(test_ratio percent)*
are used for testing, and the newest interactions are used for training. This happens after the total data (train and test) has been combined. 

In [4]:
# Creates a recommender and fits it to the training data split using the time split method
recommender2 = MostPopularRecommender(behaviors=train_time)
recommender2.fit()

recommendations2 = recommender2.recommend(user_id=user_id_test, n=5)

print(f"Recommendations for user {user_id_test}:")
print(recommendations2)

Recommendations for user 151570:
shape: (5,)
Series: 'article_id' [i32]
[
	9776234
	9787465
	9785668
	9780195
	9786378
]


### Comparison: Evaluation of the Most Popular (Baseline) Recommender
Comparing the two different data-splits for this most popular recommender using the metrics *Precision and Recall*.
*FPR* is also printed for reference.

In [5]:
from utils.evaluation import perform_model_evaluation

# Evaluates the recommender using the same data as test data
metrics = perform_model_evaluation(recommender, test_data=test_random, k=5)
print("\nEvaluation metrics (precision and recall at k):")
print(metrics)


# Evaluates the recommender using the same data as test data
metrics2 = perform_model_evaluation(recommender2, test_data=test_time, k=5)
print("\nEvaluation metrics (precision and recall at k):")
print(metrics2)



Evaluation metrics (precision and recall at k):
{'precision@k': np.float64(0.012832102304045861), 'recall@k': np.float64(0.019111413300729043), 'fpr@k': np.float64(0.0021611723302661363)}

Evaluation metrics (precision and recall at k):
{'precision@k': np.float64(0.0), 'recall@k': np.float64(0.0), 'fpr@k': np.float64(0.004364917949446781)}


In [6]:
from utils.evaluation import append_model_metrics
append_model_metrics(metrics, "most_pop_baseline_random_split")
append_model_metrics(metrics2, "most_pop_baseline_time_split")

### Model Diversity Evaluation

Calculates the aggrigate diversity of the recommender model recommendations, and appends the result to the `/evaluation_summary/model_overview_diversity.csv`-file. 

In [None]:
from utils.evaluation import aggregate_diversity
from utils.evaluation import append_aggregate_diversity

# For the random split model
diversity = aggregate_diversity(recommender, combined_df, user_sample=1000)

print("Diversity Random Split")
print(diversity)

append_aggregate_diversity(diversity, "most_popular_random")

diversity2 = aggregate_diversity(recommender2, combined_df, user_sample=1000)

# For the time split rec model
print("Diversity Time Split")
print(diversity2)

append_aggregate_diversity(diversity2, "most_popular_time")

Diversity Random Split
0.0017193947730398899
Diversity Time Split
0.0017193947730398899


### Carbon Footprint
This section creates an emissions.csv file in the "output"-folder
It utilizes the code_carbon (`codecarbon EmissionsTracker`) to record the carbon footprint of the `fit` and the `recommend` methods of the model.

In [8]:
from utils.evaluation import record_carbon_footprint, track_model_energy

# Records the carbon footprint of the recommender
#carbon_footprint = record_carbon_footprint(recommender.recommend, user_id=user_id_test, n=5)

print("\nCarbon footprint of the recommender:")
footprint = track_model_energy(recommender, "most_popular", user_id=user_id_test, n=5)
footprint

[codecarbon INFO @ 15:50:11] [setup] RAM Tracking...
[codecarbon INFO @ 15:50:11] [setup] CPU Tracking...
 Windows OS detected: Please install Intel Power Gadget to measure CPU




Carbon footprint of the recommender:


[codecarbon INFO @ 15:50:13] CPU Model on constant consumption mode: 13th Gen Intel(R) Core(TM) i7-13700H
[codecarbon INFO @ 15:50:13] [setup] GPU Tracking...
[codecarbon INFO @ 15:50:13] No GPU found.
[codecarbon INFO @ 15:50:13] >>> Tracker's metadata:
[codecarbon INFO @ 15:50:13]   Platform system: Windows-10-10.0.26100-SP0
[codecarbon INFO @ 15:50:13]   Python version: 3.11.9
[codecarbon INFO @ 15:50:13]   CodeCarbon version: 2.8.3
[codecarbon INFO @ 15:50:13]   Available RAM : 15.731 GB
[codecarbon INFO @ 15:50:13]   CPU count: 20
[codecarbon INFO @ 15:50:13]   CPU model: 13th Gen Intel(R) Core(TM) i7-13700H
[codecarbon INFO @ 15:50:13]   GPU count: None
[codecarbon INFO @ 15:50:13]   GPU model: None
[codecarbon INFO @ 15:50:16] Saving emissions data to file c:\Users\magnu\NewDesk\An.sys\TDT4215\recommender_system\demostrations\output\most_popular_fit_emission.csv
[codecarbon INFO @ 15:50:16] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.899243354797363 W
[codecarbon INFO 

{'fit': (None, 6.2988015538794525e-09),
 'recommend': (shape: (5,)
  Series: 'article_id' [i32]
  [
  	9773282
  	9775562
  	9776234
  	9775776
  	9786378
  ],
  1.2178898341192162e-08)}