# Music Recommendation System — Evaluation

This notebook evaluates the three recommendation models:
- **SVD** (Collaborative Filtering) — learns user preferences from listening history
- **KNN** (Content-Based Filtering) — finds sonically similar songs via audio features
- **Hybrid** (KNN → SVD) — KNN generates candidates, SVD ranks them

We measure each model using ranking metrics (Precision@k, Recall@k, NDCG@k) and generate visualizations of the training process and feature space.

## 1. Setup

In [None]:
import sys
from pathlib import Path

project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

from model.evaluation import (
    run_full_evaluation,
    plot_training_curves,
    plot_feature_distributions,
    plot_correlation_heatmap,
    plot_tsne_embeddings,
    plot_model_comparison,
)

import matplotlib.pyplot as plt
%matplotlib inline

data_dir = project_root / "data"
plots_dir = project_root / "plots"
print(f"Project root: {project_root}")
print(f"Audio features exist: {(data_dir / 'processed' / 'audio_features.csv').exists()}")
print(f"SVD model exists: {(data_dir / 'models' / 'svd_model.npz').exists()}")

## 2. Run Full Evaluation Pipeline

This trains SVD from scratch, loads KNN and Hybrid if audio features are available, and computes Precision@k, Recall@k, and NDCG@k for each model across all 11 users.

In [None]:
results = run_full_evaluation(data_dir, k_values=[5, 10, 20], svd_epochs=20)

## 3. SVD Training Curves

How train and validation MSE evolve across 20 epochs of SGD. We expect both to decrease and then plateau, with the gap between them indicating generalization.

In [None]:
plot_training_curves(results["svd_history"], save_path=plots_dir / "svd_training_curves.png")

# Display inline
from IPython.display import Image
Image(filename=str(plots_dir / "svd_training_curves.png"))

## 4. Audio Feature Analysis

Visualizing the 13 Spotify audio features: distributions, correlations, and a t-SNE embedding of the song space.

**Requires `audio_features.csv`** — if not generated yet, this section is skipped.

In [None]:
if results["features_df"] is not None:
    plot_feature_distributions(results["features_df"], save_path=plots_dir / "audio_feature_distributions.png")
    Image(filename=str(plots_dir / "audio_feature_distributions.png"))
else:
    print("audio_features.csv not found. Run: python scripts/match_audio_features.py")

In [None]:
if results["features_df"] is not None:
    plot_correlation_heatmap(results["features_df"], save_path=plots_dir / "feature_correlation_heatmap.png")
    Image(filename=str(plots_dir / "feature_correlation_heatmap.png"))
else:
    print("Skipped (no audio features)")

In [None]:
if results["X_normalized"] is not None:
    plot_tsne_embeddings(
        results["X_normalized"], results["features_df"],
        save_path=plots_dir / "tsne_song_embeddings.png",
        sample_size=5000,
    )
    Image(filename=str(plots_dir / "tsne_song_embeddings.png"))
else:
    print("Skipped (no audio features)")

## 5. Model Comparison

Side-by-side comparison of Precision@k, Recall@k, and NDCG@k across all available models.

In [None]:
plot_model_comparison(results, save_path=plots_dir / "model_comparison.png")
Image(filename=str(plots_dir / "model_comparison.png"))

## 6. Per-User Breakdown

Detailed metrics for each of the 11 users.

In [None]:
import pandas as pd

rows = []
for user_name, user_data in sorted(results["per_user"].items()):
    for model_name, metrics in user_data.items():
        row = {"user": user_name, "model": model_name.upper()}
        row.update(metrics)
        rows.append(row)

if rows:
    df = pd.DataFrame(rows)
    # Format to 4 decimal places
    numeric_cols = [c for c in df.columns if c not in ("user", "model")]
    df[numeric_cols] = df[numeric_cols].round(4)
    display(df)
else:
    print("No per-user results available.")