# Recommendation System Sampling Experiments - Results Analysis

This notebook provides comprehensive visualization and analysis of the sampling experiment results.

## Overview

We analyze how different sampling strategies affect recommendation system performance across:
- **12 Datasets**: Amazon Health, Grocery, book-crossing, lastfm, ModCloth, pinterest, RateBeer, steam, yelp2022, jester, Behance, mind
- **3 Algorithms**: LightGCN, BPR, NeuMF
- **4 Sampling Strategies**: difficult, random, difficult_inverse (easiest), temporal
- **10 Sampling Rates**: 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%
- **3 Metrics**: Precision@10, NDCG@10, MAP@10

## Key Questions

1. Which sampling strategy minimizes performance loss with limited data?
2. At what sampling rate does performance plateau?
3. Are difficult ratings more informative than random?
4. Do results generalize across datasets and algorithms?

# Section 1: Setup & Data Loading

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Try importing plotly for interactive visualizations
try:
    import plotly.express as px
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    PLOTLY_AVAILABLE = True
    print("Plotly available: Interactive plots enabled")
except ImportError:
    PLOTLY_AVAILABLE = False
    print("Plotly not available: Skipping interactive plots")

# Configuration
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Figure sizes
FIGSIZE_SINGLE = (10, 6)
FIGSIZE_TRIPLE = (15, 5)
FIGSIZE_GRID = (12, 10)

# Color scheme (consistent with run_experiments.py)
COLORS = {
    'difficult': '#E74C3C',           # Red
    'random': '#3498DB',              # Blue
    'difficult_inverse': '#2ECC71',   # Green (easiest)
    'temporal': '#F77F00'             # Orange
}

# Font sizes
TITLE_SIZE = 14
LABEL_SIZE = 12
TICK_SIZE = 10

# Directory paths
RESULTS_DIR = Path('results')
PLOTS_DIR = Path('plots')
PLOTS_DIR.mkdir(exist_ok=True)

print("Setup complete!")

In [None]:
# Load aggregated results
csv_path = RESULTS_DIR / 'all_results_summary.csv'

if not csv_path.exists():
    print(f"ERROR: {csv_path} not found!")
    print(f"Please ensure experiments have been run and results are in {RESULTS_DIR}/")
else:
    df = pd.read_csv(csv_path)
    print(f"Loaded data from: {csv_path}")
    print(f"Data shape: {df.shape[0]} rows × {df.shape[1]} columns")

In [None]:
# Data overview
print("=" * 80)
print("DATA OVERVIEW")
print("=" * 80)

print("\nColumn Names:")
print(df.columns.tolist())

print("\nData Types:")
print(df.dtypes)

print("\nFirst 5 rows:")
display(df.head())

print("\nLast 5 rows:")
display(df.tail())

print("\nBasic Statistics:")
display(df.describe())

In [None]:
# Data completeness analysis
print("=" * 80)
print("DATA COMPLETENESS ANALYSIS")
print("=" * 80)

print("\nUnique values per dimension:")
print(f"Datasets: {df['dataset'].nunique()} - {sorted(df['dataset'].unique())}")
print(f"Algorithms: {df['algorithm'].nunique()} - {sorted(df['algorithm'].unique())}")
print(f"Strategies: {df['strategy'].nunique()} - {sorted(df['strategy'].unique())}")
print(f"Sampling Rates: {df['sampling_rate'].nunique()} - {sorted(df['sampling_rate'].unique())}")

# Expected vs actual experiments
expected_datasets = 12
expected_algorithms = 3
expected_strategies = 4
expected_rates = 10

expected_total = expected_datasets * expected_algorithms * expected_strategies * expected_rates
actual_total = len(df)

print(f"\nExpected total experiments: {expected_total}")
print(f"Actual experiments: {actual_total}")
print(f"Completion: {actual_total/expected_total*100:.1f}%")
print(f"Missing: {expected_total - actual_total} experiments")

# Count by dataset-algorithm combination
print("\nExperiments per dataset-algorithm combination:")
pivot = df.groupby(['dataset', 'algorithm']).size().unstack(fill_value=0)
display(pivot)

# Missing combinations
print("\nMissing dataset-algorithm combinations:")
all_datasets = ['Amazon_Health_and_Personal_Care', 'Amazon_Grocery_and_Gourmet_Food', 
                'book-crossing', 'lastfm', 'ModCloth', 'pinterest', 'RateBeer', 
                'steam', 'yelp2022', 'jester', 'Behance', 'mind']
all_algorithms = ['LightGCN', 'BPR', 'NeuMF']

missing_combos = []
for dataset in all_datasets:
    for algorithm in all_algorithms:
        if len(df[(df['dataset'] == dataset) & (df['algorithm'] == algorithm)]) == 0:
            missing_combos.append(f"{dataset} - {algorithm}")

if missing_combos:
    print(f"Found {len(missing_combos)} missing combinations:")
    for combo in missing_combos[:10]:  # Show first 10
        print(f"  - {combo}")
    if len(missing_combos) > 10:
        print(f"  ... and {len(missing_combos) - 10} more")
else:
    print("All dataset-algorithm combinations present!")

In [None]:
# Summary statistics by metric
print("=" * 80)
print("SUMMARY STATISTICS BY METRIC")
print("=" * 80)

metrics = ['precision', 'ndcg', 'map']

for metric in metrics:
    if metric in df.columns:
        print(f"\n{metric.upper()}@10:")
        print(f"  Mean: {df[metric].mean():.4f}")
        print(f"  Std:  {df[metric].std():.4f}")
        print(f"  Min:  {df[metric].min():.4f}")
        print(f"  Max:  {df[metric].max():.4f}")
        
        # By strategy
        print(f"\n  By Strategy:")
        strategy_stats = df.groupby('strategy')[metric].agg(['mean', 'std', 'count'])
        display(strategy_stats)

# RPA statistics
print("\n" + "=" * 80)
print("RELATIVE PERFORMANCE ANALYSIS (RPA) STATISTICS")
print("=" * 80)

rpa_metrics = ['precision_rpa', 'ndcg_rpa', 'map_rpa']

for metric in rpa_metrics:
    if metric in df.columns:
        # Filter out 100% sampling (RPA = 0 by definition)
        df_filtered = df[df['sampling_rate'] != 100]
        
        print(f"\n{metric.upper()} (% change vs 100% baseline):")
        print(f"  Mean: {df_filtered[metric].mean():.2f}%")
        print(f"  Std:  {df_filtered[metric].std():.2f}%")
        print(f"  Min:  {df_filtered[metric].min():.2f}%")
        print(f"  Max:  {df_filtered[metric].max():.2f}%")

# Section 2: Individual Performance Plots

This section recreates the metric and RPA plots for each dataset-algorithm combination.

In [None]:
# Helper function: Filter data
def get_data(df, dataset=None, algorithm=None, strategy=None, sampling_rate=None):
    """
    Filter dataframe by criteria.
    
    Parameters:
    -----------
    df : pd.DataFrame
        Full dataset
    dataset : str, optional
        Filter by dataset name
    algorithm : str, optional
        Filter by algorithm
    strategy : str, optional
        Filter by sampling strategy
    sampling_rate : int, optional
        Filter by sampling rate
    
    Returns:
    --------
    pd.DataFrame : Filtered data
    """
    filtered = df.copy()
    if dataset:
        filtered = filtered[filtered['dataset'] == dataset]
    if algorithm:
        filtered = filtered[filtered['algorithm'] == algorithm]
    if strategy:
        filtered = filtered[filtered['strategy'] == strategy]
    if sampling_rate is not None:
        filtered = filtered[filtered['sampling_rate'] == sampling_rate]
    return filtered

In [None]:
# Helper function: Plot metrics for one dataset-algorithm
def plot_metric_comparison(df, dataset, algorithm, save=True):
    """
    Generate metric comparison plot (Precision, NDCG, MAP) for one dataset-algorithm.
    
    Parameters:
    -----------
    df : pd.DataFrame
        Full dataset
    dataset : str
        Dataset name
    algorithm : str
        Algorithm name
    save : bool, optional
        Whether to save the plot
    """
    # Filter data
    data = get_data(df, dataset=dataset, algorithm=algorithm)
    
    # Check if enough data
    if len(data) < 3:
        print(f"Insufficient data for {dataset} - {algorithm} (only {len(data)} rows)")
        return
    
    # Create figure
    fig, axes = plt.subplots(1, 3, figsize=FIGSIZE_TRIPLE)
    metrics = ['precision', 'ndcg', 'map']
    metric_labels = ['Precision@10', 'NDCG@10', 'MAP@10']
    
    for ax, metric, label in zip(axes, metrics, metric_labels):
        for strategy in data['strategy'].unique():
            strategy_data = data[data['strategy'] == strategy].sort_values('sampling_rate')
            
            # Only plot if data exists
            if len(strategy_data) > 0:
                ax.plot(strategy_data['sampling_rate'], strategy_data[metric],
                       marker='o', label=strategy, color=COLORS.get(strategy, 'gray'),
                       linewidth=2, markersize=6)
        
        ax.set_xlabel('Sampling Rate (%)', fontsize=LABEL_SIZE)
        ax.set_ylabel(label, fontsize=LABEL_SIZE)
        ax.set_title(f'{label}', fontsize=TITLE_SIZE)
        ax.legend(fontsize=TICK_SIZE)
        ax.grid(True, alpha=0.3)
        ax.tick_params(labelsize=TICK_SIZE)
    
    # Overall title
    fig.suptitle(f'{dataset} - {algorithm}', fontsize=TITLE_SIZE + 2, y=1.02)
    plt.tight_layout()
    
    # Save if requested
    if save:
        filename = PLOTS_DIR / f'{dataset}_{algorithm}_metrics.png'
        plt.savefig(filename, dpi=300, bbox_inches='tight')
        print(f"Saved: {filename}")
    
    plt.show()

In [None]:
# Helper function: Plot RPA for one dataset-algorithm
def plot_rpa_comparison(df, dataset, algorithm, save=True):
    """
    Generate RPA comparison plot for one dataset-algorithm.
    
    Parameters:
    -----------
    df : pd.DataFrame
        Full dataset
    dataset : str
        Dataset name
    algorithm : str
        Algorithm name
    save : bool, optional
        Whether to save the plot
    """
    # Filter data (exclude 100% sampling for RPA plots)
    data = get_data(df, dataset=dataset, algorithm=algorithm)
    data = data[data['sampling_rate'] != 100]
    
    # Check if enough data
    if len(data) < 3:
        print(f"Insufficient data for {dataset} - {algorithm} RPA (only {len(data)} rows)")
        return
    
    # Create figure
    fig, axes = plt.subplots(1, 3, figsize=FIGSIZE_TRIPLE)
    rpa_metrics = ['precision_rpa', 'ndcg_rpa', 'map_rpa']
    metric_labels = ['Precision@10 RPA', 'NDCG@10 RPA', 'MAP@10 RPA']
    
    for ax, metric, label in zip(axes, rpa_metrics, metric_labels):
        for strategy in data['strategy'].unique():
            strategy_data = data[data['strategy'] == strategy].sort_values('sampling_rate')
            
            # Only plot if data exists
            if len(strategy_data) > 0:
                ax.plot(strategy_data['sampling_rate'], strategy_data[metric],
                       marker='o', label=strategy, color=COLORS.get(strategy, 'gray'),
                       linewidth=2, markersize=6)
        
        # Add horizontal line at 0%
        ax.axhline(y=0, color='black', linestyle='--', linewidth=1, alpha=0.5)
        
        ax.set_xlabel('Sampling Rate (%)', fontsize=LABEL_SIZE)
        ax.set_ylabel('% Change vs 100%', fontsize=LABEL_SIZE)
        ax.set_title(f'{label}', fontsize=TITLE_SIZE)
        ax.legend(fontsize=TICK_SIZE)
        ax.grid(True, alpha=0.3)
        ax.tick_params(labelsize=TICK_SIZE)
    
    # Overall title
    fig.suptitle(f'{dataset} - {algorithm} (RPA)', fontsize=TITLE_SIZE + 2, y=1.02)
    plt.tight_layout()
    
    # Save if requested
    if save:
        filename = PLOTS_DIR / f'{dataset}_{algorithm}_rpa.png'
        plt.savefig(filename, dpi=300, bbox_inches='tight')
        print(f"Saved: {filename}")
    
    plt.show()

In [None]:
# Generate plots for all dataset-algorithm combinations with data
print("=" * 80)
print("GENERATING INDIVIDUAL PLOTS")
print("=" * 80)

combinations = df.groupby(['dataset', 'algorithm']).size().reset_index()[['dataset', 'algorithm']]

print(f"\nFound {len(combinations)} dataset-algorithm combinations with data\n")

for idx, row in combinations.iterrows():
    dataset = row['dataset']
    algorithm = row['algorithm']
    
    print(f"\n[{idx+1}/{len(combinations)}] Plotting: {dataset} - {algorithm}")
    print("-" * 80)
    
    # Metric plots
    print("Metrics plot:")
    plot_metric_comparison(df, dataset, algorithm, save=True)
    
    # RPA plots
    print("\nRPA plot:")
    plot_rpa_comparison(df, dataset, algorithm, save=True)
    
print("\n" + "=" * 80)
print("INDIVIDUAL PLOTS COMPLETE")
print("=" * 80)

# Section 3: Aggregated Analysis

This section combines results across datasets and algorithms to identify general trends.

In [None]:
# Average performance by strategy across all datasets
print("=" * 80)
print("AVERAGE PERFORMANCE BY STRATEGY")
print("=" * 80)

# Group by strategy and sampling_rate, calculate mean and std
strategy_avg = df.groupby(['strategy', 'sampling_rate']).agg({
    'ndcg': ['mean', 'std'],
    'precision': ['mean', 'std'],
    'map': ['mean', 'std']
}).reset_index()

# Plot average NDCG@10 by strategy
fig, ax = plt.subplots(figsize=FIGSIZE_SINGLE)

for strategy in df['strategy'].unique():
    strategy_data = strategy_avg[strategy_avg['strategy'] == strategy].sort_values('sampling_rate')
    
    if len(strategy_data) > 0:
        means = strategy_data['ndcg']['mean'].values
        stds = strategy_data['ndcg']['std'].values
        rates = strategy_data['sampling_rate'].values
        
        # Plot with error bars
        ax.plot(rates, means, marker='o', label=strategy, 
               color=COLORS.get(strategy, 'gray'), linewidth=2, markersize=6)
        ax.fill_between(rates, means - stds, means + stds, 
                       alpha=0.2, color=COLORS.get(strategy, 'gray'))

ax.set_xlabel('Sampling Rate (%)', fontsize=LABEL_SIZE)
ax.set_ylabel('Average NDCG@10', fontsize=LABEL_SIZE)
ax.set_title('Average NDCG@10 Across All Datasets', fontsize=TITLE_SIZE)
ax.legend(fontsize=TICK_SIZE)
ax.grid(True, alpha=0.3)
ax.tick_params(labelsize=TICK_SIZE)

plt.tight_layout()
plt.savefig(PLOTS_DIR / 'aggregated_ndcg_by_strategy.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nAverage NDCG@10 statistics by strategy:")
display(df.groupby('strategy')['ndcg'].describe())

In [None]:
# Heatmap: Best NDCG@10 at 50% sampling
print("=" * 80)
print("PERFORMANCE HEATMAP AT 50% SAMPLING")
print("=" * 80)

# Filter for 50% sampling rate
df_50 = df[df['sampling_rate'] == 50].copy()

if len(df_50) > 0:
    # For each dataset-algorithm, find the best strategy
    best_ndcg = df_50.groupby(['dataset', 'algorithm', 'strategy'])['ndcg'].mean().reset_index()
    
    # Pivot to get dataset x algorithm matrix with best NDCG
    pivot_best = best_ndcg.groupby(['dataset', 'algorithm'])['ndcg'].max().unstack(fill_value=0)
    
    # Create heatmap
    fig, ax = plt.subplots(figsize=FIGSIZE_GRID)
    sns.heatmap(pivot_best, annot=True, fmt='.4f', cmap='YlOrRd', 
                cbar_kws={'label': 'Best NDCG@10'}, ax=ax)
    ax.set_title('Best NDCG@10 at 50% Sampling\n(across all strategies)', fontsize=TITLE_SIZE)
    ax.set_xlabel('Algorithm', fontsize=LABEL_SIZE)
    ax.set_ylabel('Dataset', fontsize=LABEL_SIZE)
    
    plt.tight_layout()
    plt.savefig(PLOTS_DIR / 'heatmap_50pct_sampling.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("\nBest NDCG@10 values at 50% sampling:")
    display(pivot_best)
else:
    print("No data available for 50% sampling rate")

In [None]:
# Strategy effectiveness comparison (Average RPA)
print("=" * 80)
print("STRATEGY EFFECTIVENESS COMPARISON")
print("=" * 80)

# Filter out 100% sampling
df_rpa = df[df['sampling_rate'] != 100].copy()

if len(df_rpa) > 0:
    # Key sampling rates for comparison
    key_rates = [30, 50, 70]
    available_rates = [r for r in key_rates if r in df_rpa['sampling_rate'].unique()]
    
    if available_rates:
        df_key = df_rpa[df_rpa['sampling_rate'].isin(available_rates)]
        
        # Calculate average RPA by strategy and sampling rate
        rpa_avg = df_key.groupby(['strategy', 'sampling_rate'])['ndcg_rpa'].mean().reset_index()
        
        # Bar chart
        fig, ax = plt.subplots(figsize=FIGSIZE_SINGLE)
        
        x = np.arange(len(available_rates))
        width = 0.2
        
        strategies = sorted(df['strategy'].unique())
        for i, strategy in enumerate(strategies):
            strategy_data = rpa_avg[rpa_avg['strategy'] == strategy]
            values = [strategy_data[strategy_data['sampling_rate'] == rate]['ndcg_rpa'].values[0] 
                     if len(strategy_data[strategy_data['sampling_rate'] == rate]) > 0 else 0
                     for rate in available_rates]
            
            ax.bar(x + i*width, values, width, label=strategy, 
                  color=COLORS.get(strategy, 'gray'))
        
        ax.set_xlabel('Sampling Rate (%)', fontsize=LABEL_SIZE)
        ax.set_ylabel('Average NDCG RPA (%)', fontsize=LABEL_SIZE)
        ax.set_title('Average Performance Loss by Strategy', fontsize=TITLE_SIZE)
        ax.set_xticks(x + width * (len(strategies)-1) / 2)
        ax.set_xticklabels(available_rates)
        ax.legend(fontsize=TICK_SIZE)
        ax.axhline(y=0, color='black', linestyle='--', linewidth=1, alpha=0.5)
        ax.grid(True, alpha=0.3, axis='y')
        
        plt.tight_layout()
        plt.savefig(PLOTS_DIR / 'strategy_rpa_comparison.png', dpi=300, bbox_inches='tight')
        plt.show()
        
        print("\nAverage NDCG RPA by strategy:")
        display(df_rpa.groupby('strategy')['ndcg_rpa'].describe())
    else:
        print(f"None of the key rates {key_rates} available in data")
else:
    print("No RPA data available (all sampling rates are 100%)")

In [None]:
# Aggregated RPA plot (similar to run_experiments.py)
print("=" * 80)
print("AGGREGATED RPA PLOT")
print("=" * 80)

if len(df_rpa) > 0:
    # Calculate average RPA across all datasets and algorithms
    aggregated_rpa = df_rpa.groupby(['strategy', 'sampling_rate']).agg({
        'ndcg_rpa': ['mean', 'std']
    }).reset_index()
    
    # Plot
    fig, ax = plt.subplots(figsize=FIGSIZE_SINGLE)
    
    for strategy in df['strategy'].unique():
        strategy_data = aggregated_rpa[aggregated_rpa['strategy'] == strategy].sort_values('sampling_rate')
        
        if len(strategy_data) > 0:
            rates = strategy_data['sampling_rate'].values
            means = strategy_data['ndcg_rpa']['mean'].values
            stds = strategy_data['ndcg_rpa']['std'].values
            
            ax.plot(rates, means, marker='o', label=strategy,
                   color=COLORS.get(strategy, 'gray'), linewidth=2, markersize=6)
            ax.fill_between(rates, means - stds, means + stds,
                           alpha=0.2, color=COLORS.get(strategy, 'gray'))
    
    ax.axhline(y=0, color='black', linestyle='--', linewidth=1, alpha=0.5, label='Baseline (100%)')
    ax.set_xlabel('Sampling Rate (%)', fontsize=LABEL_SIZE)
    ax.set_ylabel('Average NDCG RPA (%)', fontsize=LABEL_SIZE)
    ax.set_title('Aggregated RPA: Average Performance Loss Across All Experiments', fontsize=TITLE_SIZE)
    ax.legend(fontsize=TICK_SIZE)
    ax.grid(True, alpha=0.3)
    ax.tick_params(labelsize=TICK_SIZE)
    
    plt.tight_layout()
    plt.savefig(PLOTS_DIR / 'aggregated_rpa.png', dpi=300, bbox_inches='tight')
    plt.show()
else:
    print("No RPA data available")

# Section 4: Statistical Analysis

Deeper statistical insights into the data.

In [None]:
# Distribution of RPA by strategy
print("=" * 80)
print("RPA DISTRIBUTION ANALYSIS")
print("=" * 80)

if len(df_rpa) > 0:
    fig, axes = plt.subplots(1, 3, figsize=FIGSIZE_TRIPLE)
    rpa_metrics = ['precision_rpa', 'ndcg_rpa', 'map_rpa']
    metric_labels = ['Precision RPA', 'NDCG RPA', 'MAP RPA']
    
    for ax, metric, label in zip(axes, rpa_metrics, metric_labels):
        # Box plot
        data_for_plot = [df_rpa[df_rpa['strategy'] == s][metric].dropna() 
                        for s in sorted(df['strategy'].unique())]
        
        bp = ax.boxplot(data_for_plot, labels=sorted(df['strategy'].unique()),
                       patch_artist=True)
        
        # Color boxes
        for patch, strategy in zip(bp['boxes'], sorted(df['strategy'].unique())):
            patch.set_facecolor(COLORS.get(strategy, 'gray'))
            patch.set_alpha(0.6)
        
        ax.axhline(y=0, color='black', linestyle='--', linewidth=1, alpha=0.5)
        ax.set_ylabel(f'{label} (%)', fontsize=LABEL_SIZE)
        ax.set_title(f'{label} Distribution', fontsize=TITLE_SIZE)
        ax.grid(True, alpha=0.3, axis='y')
        ax.tick_params(labelsize=TICK_SIZE)
        plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, ha='right')
    
    plt.tight_layout()
    plt.savefig(PLOTS_DIR / 'rpa_distributions.png', dpi=300, bbox_inches='tight')
    plt.show()
else:
    print("No RPA data available")

In [None]:
# Correlation analysis between metrics
print("=" * 80)
print("METRIC CORRELATION ANALYSIS")
print("=" * 80)

# Correlation matrix
metrics_for_corr = ['precision', 'ndcg', 'map']
corr_matrix = df[metrics_for_corr].corr()

print("\nCorrelation Matrix:")
display(corr_matrix)

# Heatmap
fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, fmt='.3f', cmap='coolwarm', 
            center=0, square=True, ax=ax, vmin=-1, vmax=1)
ax.set_title('Metric Correlation Matrix', fontsize=TITLE_SIZE)
plt.tight_layout()
plt.savefig(PLOTS_DIR / 'metric_correlation_heatmap.png', dpi=300, bbox_inches='tight')
plt.show()

# Scatter plots
fig, axes = plt.subplots(1, 3, figsize=FIGSIZE_TRIPLE)
pairs = [('precision', 'ndcg'), ('precision', 'map'), ('ndcg', 'map')]

for ax, (metric1, metric2) in zip(axes, pairs):
    for strategy in df['strategy'].unique():
        strategy_data = df[df['strategy'] == strategy]
        ax.scatter(strategy_data[metric1], strategy_data[metric2],
                  alpha=0.6, label=strategy, color=COLORS.get(strategy, 'gray'))
    
    ax.set_xlabel(f'{metric1.upper()}@10', fontsize=LABEL_SIZE)
    ax.set_ylabel(f'{metric2.upper()}@10', fontsize=LABEL_SIZE)
    ax.set_title(f'{metric1.upper()} vs {metric2.upper()}', fontsize=TITLE_SIZE)
    ax.legend(fontsize=TICK_SIZE-2)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(PLOTS_DIR / 'metric_scatter_plots.png', dpi=300, bbox_inches='tight')
plt.show()

In [None]:
# Sampling efficiency analysis
print("=" * 80)
print("SAMPLING EFFICIENCY ANALYSIS")
print("=" * 80)

# For each strategy, find the sampling rate where performance plateaus
# (defined as RPA > -5%, i.e., within 5% of 100% baseline)

if len(df_rpa) > 0:
    print("\nSampling rate where RPA > -5% (within 5% of baseline):")
    print("-" * 80)
    
    plateau_threshold = -5.0
    
    for strategy in sorted(df['strategy'].unique()):
        strategy_data = df_rpa[df_rpa['strategy'] == strategy].sort_values('sampling_rate')
        
        # Find minimum sampling rate where NDCG RPA > threshold
        plateau_data = strategy_data[strategy_data['ndcg_rpa'] > plateau_threshold]
        
        if len(plateau_data) > 0:
            min_rate = plateau_data['sampling_rate'].min()
            print(f"{strategy:20s}: {min_rate:3.0f}%")
        else:
            print(f"{strategy:20s}: Never reaches plateau (all < {plateau_threshold}%)")
    
    # Marginal loss analysis
    print("\n" + "=" * 80)
    print("MARGINAL LOSS PER 10% DATA REDUCTION")
    print("=" * 80)
    
    for strategy in sorted(df['strategy'].unique()):
        strategy_data = df[df['strategy'] == strategy].sort_values('sampling_rate')
        
        print(f"\n{strategy}:")
        
        # Calculate marginal loss between consecutive sampling rates
        for i in range(len(strategy_data) - 1):
            rate1 = strategy_data.iloc[i]['sampling_rate']
            rate2 = strategy_data.iloc[i+1]['sampling_rate']
            ndcg1 = strategy_data.iloc[i]['ndcg']
            ndcg2 = strategy_data.iloc[i+1]['ndcg']
            
            rate_diff = rate2 - rate1
            ndcg_diff = ndcg2 - ndcg1
            
            # Normalize to per-10% change
            ndcg_diff_normalized = ndcg_diff * (10.0 / rate_diff) if rate_diff > 0 else 0
            
            print(f"  {rate1:.0f}% → {rate2:.0f}%: ΔNDCG = {ndcg_diff_normalized:+.4f} per 10%")
else:
    print("No RPA data available for efficiency analysis")

# Section 5: Comparative Analysis

Direct comparisons across algorithms, datasets, and strategies.

In [None]:
# Algorithm comparison
print("=" * 80)
print("ALGORITHM COMPARISON")
print("=" * 80)

# Average performance by algorithm
algo_avg = df.groupby('algorithm')[['precision', 'ndcg', 'map']].mean()

print("\nAverage metrics by algorithm:")
display(algo_avg)

# Plot comparison
fig, axes = plt.subplots(1, 3, figsize=FIGSIZE_TRIPLE)
metrics = ['precision', 'ndcg', 'map']
metric_labels = ['Precision@10', 'NDCG@10', 'MAP@10']

for ax, metric, label in zip(axes, metrics, metric_labels):
    # Group by algorithm and sampling rate
    for algorithm in sorted(df['algorithm'].unique()):
        algo_data = df[df['algorithm'] == algorithm].groupby('sampling_rate')[metric].mean()
        ax.plot(algo_data.index, algo_data.values, marker='o', label=algorithm, linewidth=2)
    
    ax.set_xlabel('Sampling Rate (%)', fontsize=LABEL_SIZE)
    ax.set_ylabel(label, fontsize=LABEL_SIZE)
    ax.set_title(f'{label} by Algorithm', fontsize=TITLE_SIZE)
    ax.legend(fontsize=TICK_SIZE)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(PLOTS_DIR / 'algorithm_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

In [None]:
# Dataset characteristics
print("=" * 80)
print("DATASET CHARACTERISTICS")
print("=" * 80)

# Performance at 100% (baseline predictability)
df_100 = df[df['sampling_rate'] == 100].copy()

if len(df_100) > 0:
    dataset_baseline = df_100.groupby('dataset')['ndcg'].mean().sort_values(ascending=False)
    
    print("\nDataset ranking by NDCG@10 at 100% sampling:")
    print("(Higher = more predictable)")
    print("-" * 80)
    for i, (dataset, ndcg) in enumerate(dataset_baseline.items(), 1):
        print(f"{i:2d}. {dataset:40s}: {ndcg:.4f}")
    
    # Visualize
    fig, ax = plt.subplots(figsize=FIGSIZE_SINGLE)
    ax.barh(range(len(dataset_baseline)), dataset_baseline.values)
    ax.set_yticks(range(len(dataset_baseline)))
    ax.set_yticklabels(dataset_baseline.index)
    ax.set_xlabel('NDCG@10 at 100% Sampling', fontsize=LABEL_SIZE)
    ax.set_title('Dataset Predictability Ranking', fontsize=TITLE_SIZE)
    ax.grid(True, alpha=0.3, axis='x')
    
    plt.tight_layout()
    plt.savefig(PLOTS_DIR / 'dataset_ranking.png', dpi=300, bbox_inches='tight')
    plt.show()
else:
    print("No 100% sampling data available for baseline comparison")

# Performance loss sensitivity
if len(df_rpa) > 0:
    print("\n" + "=" * 80)
    print("DATASET SENSITIVITY TO SAMPLING")
    print("=" * 80)
    print("\nAverage NDCG RPA by dataset (lower is better):")
    print("-" * 80)
    
    dataset_sensitivity = df_rpa.groupby('dataset')['ndcg_rpa'].mean().sort_values()
    
    for i, (dataset, rpa) in enumerate(dataset_sensitivity.items(), 1):
        print(f"{i:2d}. {dataset:40s}: {rpa:+.2f}%")
    
    # Visualize
    fig, ax = plt.subplots(figsize=FIGSIZE_SINGLE)
    colors_bar = ['green' if x > -10 else 'orange' if x > -20 else 'red' 
                  for x in dataset_sensitivity.values]
    ax.barh(range(len(dataset_sensitivity)), dataset_sensitivity.values, color=colors_bar, alpha=0.7)
    ax.set_yticks(range(len(dataset_sensitivity)))
    ax.set_yticklabels(dataset_sensitivity.index)
    ax.set_xlabel('Average NDCG RPA (%)', fontsize=LABEL_SIZE)
    ax.set_title('Dataset Sensitivity to Sampling\n(Lower = Less Sensitive)', fontsize=TITLE_SIZE)
    ax.axvline(x=0, color='black', linestyle='--', linewidth=1)
    ax.grid(True, alpha=0.3, axis='x')
    
    plt.tight_layout()
    plt.savefig(PLOTS_DIR / 'dataset_sensitivity.png', dpi=300, bbox_inches='tight')
    plt.show()

In [None]:
# Strategy rankings
print("=" * 80)
print("STRATEGY RANKINGS")
print("=" * 80)

# Overall average performance by strategy
strategy_ranking = df.groupby('strategy')[['precision', 'ndcg', 'map']].mean().sort_values('ndcg', ascending=False)

print("\nOverall strategy ranking (by average NDCG@10):")
display(strategy_ranking)

# Win-rate analysis: How often does each strategy perform best?
print("\n" + "=" * 80)
print("STRATEGY WIN-RATE ANALYSIS")
print("=" * 80)

# For each dataset-algorithm-sampling_rate combination, find the best strategy
grouped = df.groupby(['dataset', 'algorithm', 'sampling_rate'])
wins = {strategy: 0 for strategy in df['strategy'].unique()}
total_comparisons = 0

for name, group in grouped:
    if len(group) > 1:  # Only count if multiple strategies present
        best_strategy = group.loc[group['ndcg'].idxmax(), 'strategy']
        wins[best_strategy] += 1
        total_comparisons += 1

print(f"\nTotal comparisons: {total_comparisons}")
print("\nWin counts by strategy (how many times it had the best NDCG):")
for strategy, count in sorted(wins.items(), key=lambda x: x[1], reverse=True):
    win_rate = count / total_comparisons * 100 if total_comparisons > 0 else 0
    print(f"{strategy:20s}: {count:4d} wins ({win_rate:5.1f}%)")

# Visualize win rates
if total_comparisons > 0:
    fig, ax = plt.subplots(figsize=(10, 6))
    strategies = sorted(wins.keys())
    win_counts = [wins[s] for s in strategies]
    colors_bars = [COLORS.get(s, 'gray') for s in strategies]
    
    ax.bar(strategies, win_counts, color=colors_bars, alpha=0.7)
    ax.set_ylabel('Number of Wins', fontsize=LABEL_SIZE)
    ax.set_title('Strategy Win-Rate: How Often Each Strategy Performs Best', fontsize=TITLE_SIZE)
    ax.grid(True, alpha=0.3, axis='y')
    
    # Add percentage labels on bars
    for i, (strategy, count) in enumerate(zip(strategies, win_counts)):
        percentage = count / total_comparisons * 100 if total_comparisons > 0 else 0
        ax.text(i, count + total_comparisons*0.01, f'{percentage:.1f}%', 
               ha='center', va='bottom', fontsize=TICK_SIZE)
    
    plt.tight_layout()
    plt.savefig(PLOTS_DIR / 'strategy_win_rate.png', dpi=300, bbox_inches='tight')
    plt.show()

In [None]:
# Best strategy by sampling rate
print("=" * 80)
print("BEST STRATEGY BY SAMPLING RATE")
print("=" * 80)

# For each sampling rate, find which strategy performs best on average
rate_strategy_avg = df.groupby(['sampling_rate', 'strategy'])['ndcg'].mean().reset_index()

best_by_rate = rate_strategy_avg.loc[rate_strategy_avg.groupby('sampling_rate')['ndcg'].idxmax()]

print("\nBest strategy at each sampling rate:")
print("-" * 80)
for _, row in best_by_rate.sort_values('sampling_rate').iterrows():
    print(f"{row['sampling_rate']:3.0f}%: {row['strategy']:20s} (NDCG = {row['ndcg']:.4f})")

# Visualize
fig, ax = plt.subplots(figsize=FIGSIZE_SINGLE)

for strategy in sorted(df['strategy'].unique()):
    strategy_data = rate_strategy_avg[rate_strategy_avg['strategy'] == strategy].sort_values('sampling_rate')
    ax.plot(strategy_data['sampling_rate'], strategy_data['ndcg'],
           marker='o', label=strategy, color=COLORS.get(strategy, 'gray'),
           linewidth=2, markersize=6)

ax.set_xlabel('Sampling Rate (%)', fontsize=LABEL_SIZE)
ax.set_ylabel('Average NDCG@10', fontsize=LABEL_SIZE)
ax.set_title('Strategy Performance by Sampling Rate', fontsize=TITLE_SIZE)
ax.legend(fontsize=TICK_SIZE)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(PLOTS_DIR / 'best_strategy_by_rate.png', dpi=300, bbox_inches='tight')
plt.show()

# Section 6: Interactive Visualizations (Optional)

Interactive plots using Plotly (if available).

In [None]:
if PLOTLY_AVAILABLE:
    print("=" * 80)
    print("INTERACTIVE VISUALIZATIONS")
    print("=" * 80)
    
    # Interactive line plot: NDCG by strategy
    fig = px.line(df, x='sampling_rate', y='ndcg', color='strategy',
                  facet_col='algorithm', facet_row='dataset',
                  color_discrete_map=COLORS,
                  title='Interactive NDCG@10 by Strategy',
                  labels={'sampling_rate': 'Sampling Rate (%)', 'ndcg': 'NDCG@10'})
    
    fig.update_traces(mode='lines+markers')
    fig.update_layout(height=300*len(df['dataset'].unique()))
    fig.show()
    
    # Interactive scatter: Precision vs NDCG
    fig = px.scatter(df, x='precision', y='ndcg', color='strategy',
                    size='sampling_rate', hover_data=['dataset', 'algorithm'],
                    color_discrete_map=COLORS,
                    title='Interactive Scatter: Precision vs NDCG',
                    labels={'precision': 'Precision@10', 'ndcg': 'NDCG@10'})
    fig.show()
    
    # Interactive RPA plot
    if len(df_rpa) > 0:
        fig = px.line(df_rpa, x='sampling_rate', y='ndcg_rpa', color='strategy',
                     color_discrete_map=COLORS,
                     title='Interactive RPA: NDCG Performance Loss',
                     labels={'sampling_rate': 'Sampling Rate (%)', 'ndcg_rpa': 'NDCG RPA (%)'})
        fig.update_traces(mode='lines+markers')
        fig.add_hline(y=0, line_dash="dash", line_color="black", annotation_text="Baseline")
        fig.show()
else:
    print("Plotly not available - skipping interactive visualizations")
    print("Install with: pip install plotly")

# Section 7: Summary & Key Findings

Synthesize insights and export recommendations.

In [None]:
print("=" * 80)
print("KEY FINDINGS SUMMARY")
print("=" * 80)

# 1. Best overall strategy
if len(df) > 0:
    overall_best = df.groupby('strategy')['ndcg'].mean().idxmax()
    overall_best_score = df.groupby('strategy')['ndcg'].mean().max()
    
    print(f"\n1. BEST OVERALL STRATEGY")
    print(f"   '{overall_best}' achieves highest average NDCG@10: {overall_best_score:.4f}")

# 2. Strategy with minimal performance loss
if len(df_rpa) > 0:
    best_rpa_strategy = df_rpa.groupby('strategy')['ndcg_rpa'].mean().idxmax()
    best_rpa_score = df_rpa.groupby('strategy')['ndcg_rpa'].mean().max()
    
    print(f"\n2. STRATEGY WITH MINIMAL PERFORMANCE LOSS")
    print(f"   '{best_rpa_strategy}' has smallest average RPA: {best_rpa_score:.2f}%")

# 3. Critical sampling threshold
if len(df_rpa) > 0:
    # Find minimum sampling rate where average RPA > -10%
    avg_rpa_by_rate = df_rpa.groupby('sampling_rate')['ndcg_rpa'].mean().sort_index()
    critical_rates = avg_rpa_by_rate[avg_rpa_by_rate > -10.0]
    
    if len(critical_rates) > 0:
        critical_threshold = critical_rates.index.min()
        print(f"\n3. CRITICAL SAMPLING THRESHOLD")
        print(f"   {critical_threshold:.0f}% sampling achieves <10% performance loss on average")
    else:
        print(f"\n3. CRITICAL SAMPLING THRESHOLD")
        print(f"   No sampling rate achieves <10% loss (more data needed)")

# 4. Algorithm performance
if 'algorithm' in df.columns:
    best_algorithm = df.groupby('algorithm')['ndcg'].mean().idxmax()
    best_algo_score = df.groupby('algorithm')['ndcg'].mean().max()
    
    print(f"\n4. BEST ALGORITHM")
    print(f"   '{best_algorithm}' achieves highest average NDCG@10: {best_algo_score:.4f}")

# 5. Temporal strategy effectiveness
if 'temporal' in df['strategy'].unique():
    temporal_avg = df[df['strategy'] == 'temporal']['ndcg'].mean()
    random_avg = df[df['strategy'] == 'random']['ndcg'].mean()
    improvement = (temporal_avg - random_avg) / random_avg * 100
    
    print(f"\n5. TEMPORAL STRATEGY EFFECTIVENESS")
    print(f"   Temporal sampling: NDCG = {temporal_avg:.4f}")
    print(f"   Random sampling:   NDCG = {random_avg:.4f}")
    print(f"   Improvement: {improvement:+.2f}%")
else:
    print(f"\n5. TEMPORAL STRATEGY EFFECTIVENESS")
    print(f"   No temporal strategy data available")

print("\n" + "=" * 80)

In [None]:
# Generate recommendations table
print("=" * 80)
print("RECOMMENDATIONS TABLE")
print("=" * 80)

# For each dataset-algorithm, find:
# 1. Best strategy at 50% sampling
# 2. Minimum sampling rate for <10% RPA loss

recommendations = []

for dataset in df['dataset'].unique():
    for algorithm in df['algorithm'].unique():
        subset = df[(df['dataset'] == dataset) & (df['algorithm'] == algorithm)]
        
        if len(subset) == 0:
            continue
        
        # Best strategy at 50%
        subset_50 = subset[subset['sampling_rate'] == 50]
        if len(subset_50) > 0:
            best_strategy_50 = subset_50.loc[subset_50['ndcg'].idxmax(), 'strategy']
            best_ndcg_50 = subset_50['ndcg'].max()
        else:
            best_strategy_50 = 'N/A'
            best_ndcg_50 = np.nan
        
        # Minimum rate for <10% loss
        subset_rpa = subset[(subset['sampling_rate'] != 100) & (subset['ndcg_rpa'] > -10.0)]
        if len(subset_rpa) > 0:
            min_rate = subset_rpa['sampling_rate'].min()
        else:
            min_rate = np.nan
        
        # Performance at 100%
        subset_100 = subset[subset['sampling_rate'] == 100]
        if len(subset_100) > 0:
            ndcg_100 = subset_100['ndcg'].mean()
        else:
            ndcg_100 = np.nan
        
        recommendations.append({
            'dataset': dataset,
            'algorithm': algorithm,
            'best_strategy_at_50pct': best_strategy_50,
            'ndcg_at_50pct': best_ndcg_50,
            'min_rate_for_10pct_loss': min_rate,
            'baseline_ndcg_100pct': ndcg_100
        })

recommendations_df = pd.DataFrame(recommendations)

print("\nRecommendations:")
display(recommendations_df)

# Export to CSV
recommendations_path = RESULTS_DIR / 'recommendations.csv'
recommendations_df.to_csv(recommendations_path, index=False)
print(f"\nExported recommendations to: {recommendations_path}")

In [None]:
# Data completeness report
print("=" * 80)
print("DATA COMPLETENESS REPORT")
print("=" * 80)

# Expected experiments
expected_datasets = 12
expected_algorithms = 3
expected_strategies = 4
expected_rates = 10
expected_total = expected_datasets * expected_algorithms * expected_strategies * expected_rates

actual_total = len(df)
completion_pct = actual_total / expected_total * 100

print(f"\nData Completeness:")
print(f"  Expected experiments: {expected_total}")
print(f"  Actual experiments:   {actual_total}")
print(f"  Completion:           {completion_pct:.1f}%")
print(f"  Missing:              {expected_total - actual_total}")

# Missing by dimension
print(f"\nAvailable dimensions:")
print(f"  Datasets:   {df['dataset'].nunique()}/{expected_datasets}")
print(f"  Algorithms: {df['algorithm'].nunique()}/{expected_algorithms}")
print(f"  Strategies: {df['strategy'].nunique()}/{expected_strategies}")
print(f"  Rates:      {df['sampling_rate'].nunique()}/{expected_rates}")

# Missing datasets
all_expected_datasets = ['Amazon_Health_and_Personal_Care', 'Amazon_Grocery_and_Gourmet_Food',
                        'book-crossing', 'lastfm', 'ModCloth', 'pinterest', 'RateBeer',
                        'steam', 'yelp2022', 'jester', 'Behance', 'mind']
missing_datasets = set(all_expected_datasets) - set(df['dataset'].unique())

if missing_datasets:
    print(f"\nMissing datasets ({len(missing_datasets)}):")
    for dataset in sorted(missing_datasets):
        print(f"  - {dataset}")

# Missing strategies
all_expected_strategies = ['difficult', 'random', 'difficult_inverse', 'temporal']
missing_strategies = set(all_expected_strategies) - set(df['strategy'].unique())

if missing_strategies:
    print(f"\nMissing strategies ({len(missing_strategies)}):")
    for strategy in sorted(missing_strategies):
        print(f"  - {strategy}")

# Suggestions
print(f"\n" + "=" * 80)
print("SUGGESTIONS FOR ADDITIONAL EXPERIMENTS")
print("=" * 80)

if missing_datasets:
    print(f"\n1. Run experiments on {len(missing_datasets)} missing datasets")

if missing_strategies:
    print(f"\n2. Implement and test {len(missing_strategies)} missing strategies")

# Incomplete dataset-algorithm combinations
incomplete_combos = []
for dataset in df['dataset'].unique():
    for algorithm in df['algorithm'].unique():
        subset = df[(df['dataset'] == dataset) & (df['algorithm'] == algorithm)]
        expected_for_combo = expected_strategies * expected_rates
        if len(subset) < expected_for_combo:
            incomplete_combos.append((dataset, algorithm, len(subset), expected_for_combo))

if incomplete_combos:
    print(f"\n3. Complete {len(incomplete_combos)} partially-finished dataset-algorithm combinations:")
    for dataset, algorithm, actual, expected in incomplete_combos[:5]:
        print(f"   - {dataset} - {algorithm}: {actual}/{expected} experiments")
    if len(incomplete_combos) > 5:
        print(f"   ... and {len(incomplete_combos) - 5} more")

print("\n" + "=" * 80)

# Conclusion

This notebook provides comprehensive analysis of the sampling experiment results. Key outputs:

1. **Individual plots**: Metric and RPA plots for each dataset-algorithm combination
2. **Aggregated analysis**: Strategy comparison, heatmaps, and RPA trends
3. **Statistical insights**: Distributions, correlations, efficiency analysis
4. **Comparative analysis**: Algorithm comparison, dataset rankings, strategy win-rates
5. **Recommendations**: Exported to `results/recommendations.csv`
6. **Data quality report**: Completeness analysis and suggestions

All plots are saved to the `plots/` directory at 300 DPI for publication quality.

## Next Steps

1. Review the recommendations table for best strategies per dataset-algorithm
2. Check the data completeness report for missing experiments
3. Run additional experiments as suggested
4. Re-run this notebook after collecting more data for updated analysis