# ارزیابی سیستم توصیه‌گر فیلم (Mindminer)

این نوتبوک معیارهای ارزیابی مناسب برای سیستم توصیه **محتوا-محور** را محاسبه می‌کند:

- **Precision@K** و **Recall@K**: دقت و بازخوانی در K توصیه اول (با تعریف مرتبط بودن بر اساس اشتراک تگ‌ها)
- **MRR** (Mean Reciprocal Rank): رتبه معکوس اولین آیتم مرتبط
- **NDCG@K**: Normalized Discounted Cumulative Gain
- **پوشش کاتالوگ (Catalog Coverage)**: درصد فیلم‌هایی که حداقل یک بار در لیست توصیه‌ها ظاهر می‌شوند
- **تنوع درون‌لیست (Intra-list Diversity)**: تنوع بین فیلم‌های توصیه‌شده

In [None]:
import sys
from pathlib import Path
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.stem.porter import PorterStemmer

# Project path (run from repo root or from evaluation folder)
PROJECT_ROOT = Path(".").resolve()
if "src" in str(PROJECT_ROOT):
    PROJECT_ROOT = PROJECT_ROOT.parent.parent
sys.path.insert(0, str(PROJECT_ROOT))

from src.evaluation.metrics import (
    precision_at_k,
    recall_at_k,
    mean_reciprocal_rank,
    ndcg_at_k,
    catalog_coverage,
    average_diversity,
    relevance_by_tag_overlap,
    relevance_score_tag_overlap,
)

DATA_PATH = PROJECT_ROOT / "data" / "processed" / "movies_merge.csv"
print("Project root:", PROJECT_ROOT)
print("Data exists:", DATA_PATH.exists())

## Load Data and Build Model (same pipeline as training)

Data is loaded, stemming is applied to the `tags` column, and vectors are built with CountVectorizer and cosine similarity.

In [None]:
df = pd.read_csv(DATA_PATH)
print("Shape:", df.shape)
print("Columns:", list(df.columns))
df.head()

In [None]:
ps = PorterStemmer()
def stem(text):
    return " ".join(ps.stem(w) for w in (text or "").split())

df = df.copy()
df["tags"] = df["tags"].apply(stem)
cv = CountVectorizer(max_features=5000, stop_words="english")
vectors = cv.fit_transform(df["tags"]).toarray()
similarity = cosine_similarity(vectors)
print("Similarity matrix shape:", similarity.shape)

## تعریف «مرتبط» و حلقه ارزیابی

برای سیستم محتوا-محور بدون امتیاز کاربر، **مرتبط بودن** را با **اشتراک تگ** تعریف می‌کنیم: فیلمی مرتبط است که حداقل `min_common` تگ با فیلم پرس‌وجو داشته باشد.  
روی یک نمونه از فیلم‌ها (یا همه) ارزیابی انجام می‌شود و برای هر کدام Precision@K، Recall@K، MRR و NDCG@K محاسبه می‌شود.

In [None]:
TOP_K = 5
MIN_COMMON_TAGS = 3   # Minimum shared tags for "relevant"
SAMPLE_SIZE = 500     # Number of movies to evaluate (None = all)
np.random.seed(42)
n_movies = len(df)
eval_indices = np.random.choice(n_movies, size=min(SAMPLE_SIZE, n_movies), replace=False) if SAMPLE_SIZE else np.arange(n_movies)
print(f"Evaluating on {len(eval_indices)} movies, top_k={TOP_K}, min_common_tags={MIN_COMMON_TAGS}")

In [None]:
def get_top_k_indices(sim_row, query_idx, k=TOP_K):
    """Return top-k indices (excluding the query movie itself)."""
    ranked = np.argsort(sim_row)[::-1]
    ranked = ranked[ranked != query_idx]
    return ranked[:k].tolist()

def get_relevant_indices(query_idx, min_common=MIN_COMMON_TAGS):
    """List of indices of movies that share at least min_common tags with the query (excluding itself)."""
    query_tags = df.iloc[query_idx]["tags"]
    relevant = []
    for i in range(len(df)):
        if i == query_idx:
            continue
        if relevance_by_tag_overlap(query_tags, df.iloc[i]["tags"], min_common=min_common):
            relevant.append(i)
    return relevant

def relevance_fn_for_ndcg(query_idx):
    """Relevance function for NDCG: score based on Jaccard similarity of tags."""
    query_tags = df.iloc[query_idx]["tags"]
    def fn(candidate_idx):
        return relevance_score_tag_overlap(query_tags, df.iloc[candidate_idx]["tags"])
    return fn

In [None]:
results = []
all_recommendations = []

for idx in eval_indices:
    sim_row = similarity[idx]
    rec = get_top_k_indices(sim_row, idx, k=TOP_K)
    relevant = get_relevant_indices(idx, min_common=MIN_COMMON_TAGS)
    all_recommendations.append(rec)

    # NDCG needs relevance for full ranking; we use full similarity-ordered list (excluding query)
    rel_fn = relevance_fn_for_ndcg(idx)
    full_ranking = np.argsort(sim_row)[::-1]
    full_ranking = full_ranking[full_ranking != idx].tolist()
    ndcg = ndcg_at_k(full_ranking, rel_fn, k=TOP_K)

    results.append({
        "precision_at_k": precision_at_k(rec, relevant, k=TOP_K),
        "recall_at_k": recall_at_k(rec, relevant, k=TOP_K),
        "mrr": mean_reciprocal_rank(rec, relevant),
        "ndcg_at_k": ndcg,
        "n_relevant": len(relevant),
    })

results_df = pd.DataFrame(results)
results_df.describe()

## Evaluation Metrics Summary (mean over sample)

In [None]:
summary = {
    "Precision@5": results_df["precision_at_k"].mean(),
    "Recall@5": results_df["recall_at_k"].mean(),
    "MRR": results_df["mrr"].mean(),
    "NDCG@5": results_df["ndcg_at_k"].mean(),
}
summary_series = pd.Series(summary)
print("Ranking and relevance metrics:")
display(summary_series.to_frame("Value"))

## Catalog Coverage and Diversity

- **Catalog Coverage**: Fraction of all movies that appear in at least one top-5 recommendation list.
- **Average Intra-list Diversity**: Mean diversity within each recommendation list (1 minus average pairwise cosine similarity).

In [None]:
coverage = catalog_coverage(all_recommendations, catalog_size=len(df))
diversity = average_diversity(similarity, all_recommendations)
print(f"Catalog Coverage: {coverage:.4f} ({coverage*100:.2f}%)")
print(f"Average Intra-list Diversity: {diversity:.4f}")

## Metrics Plots

In [None]:
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

# Main metrics
metrics_for_chart = {**summary, "Coverage": coverage, "Diversity": diversity}
axes[0].bar(metrics_for_chart.keys(), metrics_for_chart.values(), color=["#2ecc71", "#3498db", "#9b59b6", "#e74c3c", "#1abc9c", "#f39c12"])
axes[0].set_ylabel("Value")
axes[0].set_title("Recommender System Evaluation Metrics")
axes[0].tick_params(axis="x", rotation=25)

# Precision@K distribution
axes[1].hist(results_df["precision_at_k"], bins=15, edgecolor="black", alpha=0.7)
axes[1].set_xlabel("Precision@5")
axes[1].set_ylabel("Count")
axes[1].set_title("Precision@5 Distribution over Sample")

plt.tight_layout()
plt.show()

## Metrics Summary Table

| Metric | Description |
|--------|-------------|
| **Precision@K** | Of the top-K recommendations, how many are "relevant" (tag overlap ≥ threshold). |
| **Recall@K** | Of all relevant movies, how many appear in the top-K recommendations. |
| **MRR** | Reciprocal rank of the first relevant movie; higher is better. |
| **NDCG@K** | Ranking quality accounting for degree of relevance (Jaccard on tags). |
| **Catalog Coverage** | Fraction of the catalog recommended at least once; higher means more variety. |
| **Intra-list Diversity** | Diversity within each recommendation list; higher = more diverse recommendations. |

To change the evaluation sample size, set `SAMPLE_SIZE` in the parameters cell; use `None` to evaluate on all movies.