## Demonstration of a ring buffer baseline recommender system
Here the demonstration of the ring buffer recommender is presented. It recommends items to the user based on which are present in the current *RingBuffer* (see implementation in *ring_buffer_baseline.py*). This way, it captures both recency of the items and the popularity through how many ring buffer entries each item has.

It simply recommends the first item which is not the one the user is currently browsing, found by looking back in the ring buffer.

The file also includes the evaluation of the recommender model using the metrics *Precision and Recall*.

In [1]:
import sys
import os

parent_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(parent_dir)
from parquet_data_reader import ParquetDataReader
from models.ring_buffer_baseline import RingBufferBaseline

import polars as pl
pl.Config.set_tbl_cols(-1)
import numpy as np
parquet_reader = ParquetDataReader()

### Data extraction and processing

In [2]:
import polars as pl
from utils.baseline_processing import process_behavior_data, random_split, time_based_split

train_behavior_df = parquet_reader.read_data("../../data/train/behaviors.parquet")
test_behaviours_df = parquet_reader.read_data('../../data/validation/behaviors.parquet')

# Processes the data
combined_df = process_behavior_data(train_behavior_df, test_behaviours_df)

# ----- Method 1: Random Split -----
train_random, test_random = random_split(combined_df, test_ratio=0.30)
print("Random Split:")
print("Train shape:", train_random.shape)
print("Test shape:", test_random.shape)

# ----- Method 2: Time-based Split -----
train_time, test_time = time_based_split(combined_df, test_ratio=0.30)
print("\nTime-based Split:")
print("Train shape:", train_time.shape)
print("Test shape:", test_time.shape)


Random Split:
Train shape: (99243, 17)
Test shape: (42913, 17)

Time-based Split:
Train shape: (99510, 17)
Test shape: (42646, 17)


### Method 1: Random split of train/ for recommendations

In [3]:
# Creates a recommender and fits it to the training data split using the random split method
recommender = RingBufferBaseline(behaviors=train_random)
recommender.fit()

user_id_test = 151570
recommendations = recommender.recommend(user_id=user_id_test, n=5)

print(f"Recommendations for user {user_id_test}:")
print(recommendations)

Recommendations for user 151570:
[9770989, 9770538, 9770541, 9769650, 9769650]


### Method 2: Time-based split of train/test for recommendations
This methods splits the data into the oldest interactions *(test_ratio percent)*
are used for testing, and the newest interactions are used for training. This happens after the total data (train and test) has been combined. 

In [4]:
# Creates a recommender and fits it to the training data split using the time-based split method
recommender2 = RingBufferBaseline(behaviors=train_time)
recommender2.fit()

user_id_test2 = 151570
recommendations2 = recommender.recommend(user_id=user_id_test2, n=5)

print(f"Recommendations for user {user_id_test2}:")
print(recommendations2)

Recommendations for user 151570:
[9770989, 9770538, 9770541, 9769650, 9769650]


### Comparison: Evaluation of the ring buffer baseline recommender
Comparing the two different data-splits for this  ring buffer baseline recommender using the metrics *Precision and Recall*.
*FPR* is also printed for reference.

In [5]:
from utils.evaluation import evaluate_recommender

# Evaluates the recommender using the same data as test data
metrics = evaluate_recommender(recommender, test_data=test_random, k=5)
print("\nEvaluation metrics (precision and recall at k):")
print(metrics)


# Evaluates the recommender using the same data as test data
metrics2 = evaluate_recommender(recommender2, test_data=test_time, k=5)
print("\nEvaluation metrics (precision and recall at k):")
print(metrics2)



Evaluation metrics (precision and recall at k):
{'precision': np.float64(0.000504607283896446), 'recall': np.float64(0.0006729838344581654), 'fpr': np.float64(0.0021901364748191576)}

Evaluation metrics (precision and recall at k):
{'precision': np.float64(0.014185981569394024), 'recall': np.float64(0.02262336024255056), 'fpr': np.float64(0.00430274981628095)}
