## Demonstration of a Simple Most Popular Baseline Recommender System
Here the demonstration of the baseline model recommending the most popular news is presented.

It simply recommends the most popular item in terms of the *click_count* per item.

It also includes the evaluation of the recommender model using the metrics *Precision and Recall*.

In [37]:
# Imports and setup
import sys
import os

parent_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(parent_dir)
from parquet_data_reader import ParquetDataReader
from models.most_popular import MostPopularRecommender

import polars as pl
pl.Config.set_tbl_cols(-1)
import numpy as np
parquet_reader = ParquetDataReader()

### Data Extraction and Preprocessing

In [38]:
import polars as pl
from utils.baseline_processing import process_behavior_data, random_split, time_based_split

train_behavior_df = parquet_reader.read_data("../../data/train/behaviors.parquet")
test_behaviours_df = parquet_reader.read_data('../../data/validation/behaviors.parquet')

# Processes the data
combined_df = process_behavior_data(train_behavior_df, test_behaviours_df)

# ----- Method 1: Random Split -----
train_random, test_random = random_split(combined_df, test_ratio=0.30)
print("Random Split:")
print("Train shape:", train_random.shape)
print("Test shape:", test_random.shape)

# ----- Method 2: Time-based Split -----
train_time, test_time = time_based_split(combined_df, test_ratio=0.30)
print("\nTime-based Split:")
print("Train shape:", train_time.shape)
print("Test shape:", test_time.shape)

Random Split:
Train shape: (99210, 4)
Test shape: (42513, 4)

Time-based Split:
Train shape: (99207, 4)
Test shape: (42516, 4)


### Method 1: Random Split of Train/Test for Recommendations

In [39]:
# Creates a recommender and fits it to the training data split using the random split method
recommender = MostPopularRecommender(behaviors=train_random)
recommender.fit()

# Test user which is known to have interactions in the data
user_id_test = 151570
recommendations = recommender.recommend(user_id=user_id_test, n=5)

print(f"Recommendations for user {user_id_test}:")
print(recommendations)

Recommendations for user 151570:
shape: (5,)
Series: 'article_id' [i32]
[
	9773282
	9775562
	9776234
	9786378
	9775776
]


### Method 2: Time-Based Split Train/Test Recommendations
This methods splits the data into the oldest interactions *(test_ratio percent)*
are used for testing, and the newest interactions are used for training. This happens after the total data (train and test) has been combined. 

In [40]:
# Creates a recommender and fits it to the training data split using the time split method
recommender2 = MostPopularRecommender(behaviors=train_time)
recommender2.fit()

recommendations2 = recommender2.recommend(user_id=user_id_test, n=5)

print(f"Recommendations for user {user_id_test}:")
print(recommendations2)

Recommendations for user 151570:
shape: (5,)
Series: 'article_id' [i32]
[
	9776234
	9787465
	9785668
	9780195
	9786378
]


### Comparison: Evaluation of the Most Popular (Baseline) Recommender
Comparing the two different data-splits for this most popular recommender using the metrics *Precision and Recall*.
*FPR* is also printed for reference.

In [41]:
from utils.evaluation import evaluate_recommender

# Evaluates the recommender using the same data as test data
metrics = evaluate_recommender(recommender, test_data=test_random, k=5)
print("\nEvaluation metrics (precision and recall at k):")
print(metrics)


# Evaluates the recommender using the same data as test data
metrics2 = evaluate_recommender(recommender2, test_data=test_time, k=5)
print("\nEvaluation metrics (precision and recall at k):")
print(metrics2)



Evaluation metrics (precision and recall at k):
{'precision@k': np.float64(0.01234896352954218), 'recall@k': np.float64(0.017889268359968683), 'fpr@k': np.float64(0.0021803865997622454)}

Evaluation metrics (precision and recall at k):
{'precision@k': np.float64(0.0), 'recall@k': np.float64(0.0), 'fpr@k': np.float64(0.004364917949446781)}
