# 03 — Recommendation Engine

This notebook develops and evaluates a hybrid recommendation engine:

1. **Collaborative filtering** (ALS) on implicit feedback (playtime)
2. **Content-based** (cosine similarity on game features) for cold-start
3. **Hybrid blend** with confidence-weighted α

## Why ALS over Neural Approaches?

- Transparency and explainability — latent factors can be inspected
- Well-suited for implicit feedback (playtime, not ratings)
- Scales to our dataset size without GPU requirements
- Follows the principle: avoid unnecessary complexity

In [None]:
import sys
from pathlib import Path

import numpy as np
import pandas as pd

sys.path.insert(0, str(Path.cwd().parent))

from src.evaluation.metrics import evaluate_recommender
from src.evaluation.validation import (
    cold_start_split,
    leave_one_out_split,
    popularity_baseline,
)
from src.models.recommender import (
    CollaborativeFilter,
    ContentBasedFilter,
    HybridRecommender,
)
from src.processing.features import build_game_features, build_interaction_matrix

PROCESSED_DIR = Path("../data/processed")

In [None]:
# Load data
games = pd.read_json(PROCESSED_DIR / "games.json", lines=True)
user_games = pd.read_csv(PROCESSED_DIR / "user_games.csv")

print(f"Games: {len(games):,} | User-game pairs: {len(user_games):,}")

## Build Interaction Matrix & Feature Vectors

In [None]:
# Interaction matrix (implicit feedback: log1p(playtime))
interaction_matrix, user_ids, app_ids = build_interaction_matrix(user_games, min_playtime=0)
print(f"Interaction matrix: {interaction_matrix.shape}")
print(f"Sparsity: {1 - interaction_matrix.nnz / (interaction_matrix.shape[0] * interaction_matrix.shape[1]):.4%}")

In [None]:
# Content-based feature vectors
feature_df, feature_meta = build_game_features(games)

# Align feature matrix with app_ids from the interaction matrix
app_id_to_idx = {aid: i for i, aid in enumerate(app_ids)}
feature_df_aligned = feature_df[feature_df["app_id"].isin(app_ids)].copy()
feature_df_aligned["item_idx"] = feature_df_aligned["app_id"].map(app_id_to_idx)
feature_df_aligned = feature_df_aligned.sort_values("item_idx").reset_index(drop=True)

numeric_cols = [c for c in feature_df_aligned.columns if c not in ("app_id", "item_idx")]
feature_matrix = feature_df_aligned[numeric_cols].fillna(0).values
print(f"Feature matrix: {feature_matrix.shape}")

## Train/Test Split

Using leave-one-out evaluation: for each user, hold out one game and try to recommend it.

In [None]:
train_matrix, test_data = leave_one_out_split(interaction_matrix)
print(f"Train nnz: {train_matrix.nnz:,} | Test users: {len(test_data):,}")

# Cold-start split
warm_test, cold_test = cold_start_split(test_data, interaction_matrix, threshold=100)
print(f"Warm test: {len(warm_test):,} | Cold test: {len(cold_test):,}")

## Train Models

In [None]:
# Collaborative filtering
cf = CollaborativeFilter(factors=64, regularization=0.01, iterations=15)
cf.fit(train_matrix)

# Content-based
cb = ContentBasedFilter(feature_matrix)

# Hybrid
hybrid = HybridRecommender(cf, cb, interaction_threshold=100)
hybrid.set_interaction_counts(interaction_matrix)

print("All models trained.")

## Evaluation

### Standard Metrics: P@K, Recall@K, NDCG@K

We do **not** report accuracy — it's meaningless for implicit feedback.

In [None]:
# Build revenue map for revenue-weighted metrics
revenue_map = {}
if "estimated_revenue" in games.columns:
    for _, row in games.iterrows():
        aid = row["app_id"]
        if aid in app_id_to_idx:
            revenue_map[app_id_to_idx[aid]] = row.get("estimated_revenue", 0)

# Evaluate all test users
print("=== All Users ===")
all_results = evaluate_recommender(hybrid, test_data, item_revenues=revenue_map, k_values=[5, 10, 20])
for metric, value in sorted(all_results.items()):
    print(f"  {metric}: {value:.4f}")

In [None]:
# Warm vs. cold-start comparison
print("\n=== Warm Items (≥100 interactions) ===")
warm_results = evaluate_recommender(hybrid, warm_test, item_revenues=revenue_map)
for metric, value in sorted(warm_results.items()):
    print(f"  {metric}: {value:.4f}")

print("\n=== Cold-Start Items (<100 interactions) ===")
cold_results = evaluate_recommender(hybrid, cold_test, item_revenues=revenue_map)
for metric, value in sorted(cold_results.items()):
    print(f"  {metric}: {value:.4f}")

### Popularity Baseline Comparison

Any useful recommender should beat a simple popularity baseline.

In [None]:
pop_items = popularity_baseline(interaction_matrix, n=20)
print(f"Popularity baseline top-20 items: {pop_items}")

# Manual baseline evaluation
from src.evaluation.metrics import precision_at_k, ndcg_at_k

baseline_p10 = np.mean([
    precision_at_k(pop_items, set(e["test_items"]), 10)
    for e in test_data
])
baseline_ndcg10 = np.mean([
    ndcg_at_k(pop_items, set(e["test_items"]), 10)
    for e in test_data
])
print(f"\nPopularity baseline — P@10: {baseline_p10:.4f}, NDCG@10: {baseline_ndcg10:.4f}")
print(f"Hybrid model      — P@10: {all_results.get('precision@10', 0):.4f}, NDCG@10: {all_results.get('ndcg@10', 0):.4f}")

## Key Takeaways

*To be filled after running with real data:*

- For games with >100 interactions, hybrid achieves P@10 of X
- For cold-start games (<100 interactions), content-based achieves P@10 of Y, vs. Z for popularity baseline
- Revenue-weighted hit rate shows the model surfaces high-value games, not just popular free titles