# üìä Model Results Deep Analysis

**Obiettivo**: Analisi approfondita dei risultati dei modelli trainati.

**Prerequisiti**: Aver eseguito `python main.py --config config/config.yaml --steps train`

**Analisi**:
1. **Performance Metrics**: MAE, RMSE, MAPE, R¬≤ per train/test
2. **Model Comparison**: Confronto tra modelli individuali e ensemble
3. **Overfitting Analysis**: Gap metrics train-test
4. **Group Performance**: Errori per categoria catastale, zona OMI, tipologia
5. **Worst Predictions**: Analisi predizioni peggiori
6. **Residual Plots**: Distribuzione residui dai grafici salvati
7. **Prediction Intervals**: Coverage e larghezza intervalli

**Output**: `model_analysis_outputs/`

## üîß Setup

In [None]:
# Imports
import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd().parent / "src"))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import json
import warnings
from IPython.display import Image, display

warnings.filterwarnings('ignore')

# Plot settings
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

print("‚úÖ Setup completato")

In [None]:
# Configurazione
MODELS_DIR = Path("../models")
SUMMARY_PATH = MODELS_DIR / "summary.json"
VALIDATION_PATH = MODELS_DIR / "validation_results.csv"
OUTPUT_DIR = Path("model_analysis_outputs")
OUTPUT_DIR.mkdir(exist_ok=True)

def save_plot(name, dpi=120):
    plt.tight_layout()
    plt.savefig(OUTPUT_DIR / f"{name}.png", dpi=dpi, bbox_inches='tight')
    print(f"üíæ Salvato: {name}.png")

print(f"üìÇ Output directory: {OUTPUT_DIR}")
print(f"üìÇ Models directory: {MODELS_DIR}")

## üì¶ 1. Load Results Summary

In [None]:
# Check che i risultati esistano
if not SUMMARY_PATH.exists():
    print("‚ùå ERRORE: Results non trovati!")
    print("\nEsegui prima il training:")
    print("  python main.py --config config/config.yaml --steps train")
    raise FileNotFoundError(f"Results non trovati: {SUMMARY_PATH}")

# Load summary
with open(SUMMARY_PATH, 'r') as f:
    summary = json.load(f)

print("‚úÖ Results summary caricato")
print(f"\nüìä Modelli disponibili: {list(summary['models'].keys())}")
print(f"üìä Ensemble disponibili: {list(summary.get('ensembles', {}).keys())}")

## üìä 2. Performance Comparison - All Models

In [None]:
# Estrai metriche per tutti i modelli
models_metrics = []

# Modelli individuali
for model_name, model_data in summary['models'].items():
    test_metrics = model_data.get('metrics_test_original', {})
    train_metrics = model_data.get('metrics_train_original', {})
    overfit = model_data.get('overfit', {})
    
    models_metrics.append({
        'Model': model_name.upper(),
        'Type': 'Individual',
        'Test_R2': test_metrics.get('r2', 0),
        'Test_MAE': test_metrics.get('mae', 0),
        'Test_RMSE': test_metrics.get('rmse', 0),
        'Test_MAPE': test_metrics.get('mape', 0),
        'Train_R2': train_metrics.get('r2', 0),
        'Train_MAE': train_metrics.get('mae', 0),
        'Overfit_Gap_R2': overfit.get('gap_r2', 0),
    })

# Ensemble
for ensemble_name, ensemble_data in summary.get('ensembles', {}).items():
    test_metrics = ensemble_data.get('metrics_test_original', {})
    train_metrics = ensemble_data.get('metrics_train_original', {})
    overfit = ensemble_data.get('overfit', {})
    
    models_metrics.append({
        'Model': ensemble_name.upper(),
        'Type': 'Ensemble',
        'Test_R2': test_metrics.get('r2', 0),
        'Test_MAE': test_metrics.get('mae', 0),
        'Test_RMSE': test_metrics.get('rmse', 0),
        'Test_MAPE': test_metrics.get('mape', 0),
        'Train_R2': train_metrics.get('r2', 0),
        'Train_MAE': train_metrics.get('mae', 0),
        'Overfit_Gap_R2': overfit.get('gap_r2', 0),
    })

metrics_df = pd.DataFrame(models_metrics).sort_values('Test_R2', ascending=False)

print("=" * 100)
print("MODEL PERFORMANCE COMPARISON")
print("=" * 100)
print("\n", metrics_df.to_string(index=False))

# Salva
metrics_df.to_csv(OUTPUT_DIR / "01_model_comparison.csv", index=False)
print(f"\nüíæ Salvato: 01_model_comparison.csv")

In [None]:
# Visualize comparison
fig, axes = plt.subplots(2, 2, figsize=(16, 10))

# Colors by type
colors = ['steelblue' if t == 'Individual' else 'orange' for t in metrics_df['Type']]

# R2
ax = axes[0, 0]
ax.barh(metrics_df['Model'], metrics_df['Test_R2'], color=colors, edgecolor='black')
ax.set_xlabel('R¬≤ Score')
ax.set_title('Test R¬≤ by Model')
ax.grid(True, alpha=0.3, axis='x')
ax.invert_yaxis()

# MAE
ax = axes[0, 1]
ax.barh(metrics_df['Model'], metrics_df['Test_MAE'], color=colors, edgecolor='black')
ax.set_xlabel('MAE (‚Ç¨)')
ax.set_title('Test MAE by Model')
ax.grid(True, alpha=0.3, axis='x')
ax.invert_yaxis()

# RMSE
ax = axes[1, 0]
ax.barh(metrics_df['Model'], metrics_df['Test_RMSE'], color=colors, edgecolor='black')
ax.set_xlabel('RMSE (‚Ç¨)')
ax.set_title('Test RMSE by Model')
ax.grid(True, alpha=0.3, axis='x')
ax.invert_yaxis()

# MAPE
ax = axes[1, 1]
ax.barh(metrics_df['Model'], metrics_df['Test_MAPE'] * 100, color=colors, edgecolor='black')
ax.set_xlabel('MAPE (%)')
ax.set_title('Test MAPE by Model')
ax.grid(True, alpha=0.3, axis='x')
ax.invert_yaxis()

plt.suptitle('Model Performance Comparison', fontsize=16, fontweight='bold')
save_plot("02_model_comparison")
plt.show()

## üìä 3. Overfitting Analysis

In [None]:
# Train vs Test comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# R2 comparison
ax = axes[0]
x = np.arange(len(metrics_df))
width = 0.35
ax.bar(x - width/2, metrics_df['Train_R2'], width, label='Train', color='lightgreen', edgecolor='black')
ax.bar(x + width/2, metrics_df['Test_R2'], width, label='Test', color='steelblue', edgecolor='black')
ax.set_xticks(x)
ax.set_xticklabels(metrics_df['Model'], rotation=45, ha='right')
ax.set_ylabel('R¬≤ Score')
ax.set_title('Train vs Test R¬≤')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

# Overfitting gap
ax = axes[1]
colors_overfit = ['red' if gap > 0.1 else 'orange' if gap > 0.05 else 'green' 
                  for gap in metrics_df['Overfit_Gap_R2']]
ax.bar(metrics_df['Model'], metrics_df['Overfit_Gap_R2'], color=colors_overfit, edgecolor='black')
ax.axhline(0.05, color='orange', linestyle='--', linewidth=2, alpha=0.5, label='Moderate (0.05)')
ax.axhline(0.10, color='red', linestyle='--', linewidth=2, alpha=0.5, label='High (0.10)')
ax.set_xticklabels(metrics_df['Model'], rotation=45, ha='right')
ax.set_ylabel('Gap R¬≤ (Train - Test)')
ax.set_title('Overfitting Gap')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

plt.suptitle('Overfitting Analysis', fontsize=16, fontweight='bold')
save_plot("03_overfitting_analysis")
plt.show()

print("\nüîç Overfitting Interpretation:")
print("  üü¢ Gap < 0.05: Good generalization")
print("  üü† Gap 0.05-0.10: Moderate overfitting")
print("  üî¥ Gap > 0.10: High overfitting")

## üìä 4. Group Performance Analysis

In [None]:
# Trova tutti i model directories
model_dirs = [d for d in MODELS_DIR.iterdir() if d.is_dir() and d.name not in ['__pycache__']]

print(f"üìÇ Model directories found: {[d.name for d in model_dirs]}")

# Analizza il miglior modello (primo in classifica)
best_model = metrics_df.iloc[0]['Model'].lower()
best_model_dir = MODELS_DIR / best_model

print(f"\nüèÜ Analyzing best model: {best_model.upper()}")

# Cerca group metrics
group_files = {
    'Categoria Catastale': best_model_dir / 'group_metrics_AI_IdCategoriaCatastale.csv',
    'Tipologia Edilizia': best_model_dir / 'group_metrics_AI_IdTipologiaEdilizia.csv',
    'Zona OMI': best_model_dir / 'group_metrics_AI_ZonaOmi.csv',
    'Price Band': best_model_dir / 'group_metrics_price_band.csv',
}

available_groups = {k: v for k, v in group_files.items() if v.exists()}

if not available_groups:
    print("‚ö†Ô∏è  Nessun group metrics trovato per il modello")
else:
    print(f"\nüìä Group metrics disponibili: {list(available_groups.keys())}")

In [None]:
# Visualize group performance
if available_groups:
    n_groups = len(available_groups)
    fig, axes = plt.subplots(n_groups, 2, figsize=(16, 5 * n_groups))
    
    if n_groups == 1:
        axes = axes.reshape(1, -1)
    
    for idx, (group_name, group_file) in enumerate(available_groups.items()):
        df_group = pd.read_csv(group_file)
        
        # Prendi prime 10 categorie per MAPE
        df_group_sorted = df_group.sort_values('test_mape', ascending=False).head(10)
        
        # MAPE
        ax = axes[idx, 0]
        ax.barh(df_group_sorted.iloc[:, 0].astype(str), df_group_sorted['test_mape'] * 100, 
                color='coral', edgecolor='black')
        ax.set_xlabel('MAPE (%)')
        ax.set_title(f'{group_name} - Top 10 MAPE (Worst)')
        ax.grid(True, alpha=0.3, axis='x')
        ax.invert_yaxis()
        
        # Count
        ax = axes[idx, 1]
        df_group_count = df_group.sort_values('count', ascending=False).head(10)
        ax.barh(df_group_count.iloc[:, 0].astype(str), df_group_count['count'], 
                color='skyblue', edgecolor='black')
        ax.set_xlabel('Count')
        ax.set_title(f'{group_name} - Sample Distribution')
        ax.grid(True, alpha=0.3, axis='x')
        ax.invert_yaxis()
    
    plt.suptitle(f'Group Performance Analysis - {best_model.upper()}', fontsize=16, fontweight='bold')
    save_plot("04_group_performance")
    plt.show()

## üìä 5. Worst Predictions Analysis

In [None]:
# Load worst predictions
worst_pred_file = best_model_dir / f"{best_model}_worst_predictions.csv"

if worst_pred_file.exists():
    worst_df = pd.read_csv(worst_pred_file)
    
    print("=" * 100)
    print(f"WORST PREDICTIONS - {best_model.upper()}")
    print("=" * 100)
    print("\n", worst_df.head(20).to_string(index=False))
    
    # Salva
    worst_df.to_csv(OUTPUT_DIR / "05_worst_predictions.csv", index=False)
    print(f"\nüíæ Salvato: 05_worst_predictions.csv")
    
    # Visualize
    fig, ax = plt.subplots(figsize=(12, 6))
    top20 = worst_df.head(20)
    x = range(len(top20))
    
    ax.scatter(x, top20['y_true'], color='green', s=100, alpha=0.6, label='Actual', marker='o')
    ax.scatter(x, top20['y_pred'], color='red', s=100, alpha=0.6, label='Predicted', marker='x')
    
    for i in x:
        ax.plot([i, i], [top20.iloc[i]['y_true'], top20.iloc[i]['y_pred']], 'k--', alpha=0.3)
    
    ax.set_xlabel('Prediction Index (sorted by error)')
    ax.set_ylabel('Price (‚Ç¨)')
    ax.set_title('Top 20 Worst Predictions: Actual vs Predicted')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    save_plot("06_worst_predictions")
    plt.show()
else:
    print(f"‚ö†Ô∏è  Worst predictions file non trovato: {worst_pred_file}")

## üìä 6. Residual Plots (from saved images)

In [None]:
# Show residual plots
residual_dir = best_model_dir / f"{best_model}_residual_plots"

if residual_dir.exists():
    residual_plots = sorted(residual_dir.glob("*.png"))
    
    print(f"üìä Residual plots disponibili: {len(residual_plots)}")
    
    for plot_path in residual_plots:
        print(f"\n{'='*60}")
        print(f"üìà {plot_path.stem.replace('_', ' ').title()}")
        print(f"{'='*60}")
        display(Image(filename=str(plot_path)))
else:
    print(f"‚ö†Ô∏è  Residual plots non trovati: {residual_dir}")

## üìä 7. Prediction Intervals Analysis

In [None]:
# Load prediction intervals
pred_intervals_file = best_model_dir / f"{best_model}_prediction_intervals.json"

if pred_intervals_file.exists():
    with open(pred_intervals_file, 'r') as f:
        pred_intervals = json.load(f)
    
    print("=" * 100)
    print(f"PREDICTION INTERVALS - {best_model.upper()}")
    print("=" * 100)
    
    # Converti in DataFrame
    intervals_data = []
    for level, metrics in pred_intervals.items():
        intervals_data.append({
            'Confidence_Level': level,
            'Target_Coverage': metrics['target_coverage'],
            'Actual_Coverage': metrics['coverage'],
            'Coverage_Gap': metrics['coverage'] - metrics['target_coverage'],
            'Avg_Width_EUR': metrics['average_width'],
            'Avg_Width_PCT': metrics['average_width_pct'],
        })
    
    intervals_df = pd.DataFrame(intervals_data)
    print("\n", intervals_df.to_string(index=False))
    
    # Salva
    intervals_df.to_csv(OUTPUT_DIR / "07_prediction_intervals.csv", index=False)
    print(f"\nüíæ Salvato: 07_prediction_intervals.csv")
    
    # Visualize
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Coverage
    ax = axes[0]
    x = range(len(intervals_df))
    width = 0.35
    ax.bar([i - width/2 for i in x], intervals_df['Target_Coverage'], width, 
           label='Target', color='lightgreen', edgecolor='black')
    ax.bar([i + width/2 for i in x], intervals_df['Actual_Coverage'], width, 
           label='Actual', color='steelblue', edgecolor='black')
    ax.set_xticks(x)
    ax.set_xticklabels(intervals_df['Confidence_Level'])
    ax.set_ylabel('Coverage')
    ax.set_title('Prediction Interval Coverage')
    ax.legend()
    ax.grid(True, alpha=0.3, axis='y')
    
    # Width
    ax = axes[1]
    ax.bar(intervals_df['Confidence_Level'], intervals_df['Avg_Width_EUR'], 
           color='coral', edgecolor='black')
    ax.set_ylabel('Average Width (‚Ç¨)')
    ax.set_title('Prediction Interval Width')
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.suptitle(f'Prediction Intervals Analysis - {best_model.upper()}', fontsize=16, fontweight='bold')
    save_plot("08_prediction_intervals")
    plt.show()
    
    print("\nüìä Interpretation:")
    print("  - Coverage: % di osservazioni reali che cadono nell'intervallo")
    print("  - Coverage Gap: differenza tra coverage attuale e target (idealmente ~0)")
    print("  - Avg Width: larghezza media dell'intervallo in euro")
    print("  - Avg Width %: larghezza media in percentuale rispetto al prezzo")
    
    # Diagnostics
    print("\nüîç Diagnostics:")
    for _, row in intervals_df.iterrows():
        level = row['Confidence_Level']
        gap = row['Coverage_Gap']
        if abs(gap) < 0.02:
            status = "üü¢ Well calibrated"
        elif gap < -0.05:
            status = "üî¥ Under-coverage (intervallo troppo stretto)"
        elif gap > 0.05:
            status = "üü† Over-coverage (intervallo troppo largo)"
        else:
            status = "üü° Acceptable"
        
        print(f"  {level}: {status} (gap={gap:.4f})")
    
else:
    print(f"‚ö†Ô∏è  Prediction intervals file non trovato: {pred_intervals_file}")

## üìã 8. Summary Report

In [None]:
# Final report
report = {
    'best_model': best_model.upper(),
    'best_model_metrics': {
        'test_r2': float(metrics_df.iloc[0]['Test_R2']),
        'test_mae': float(metrics_df.iloc[0]['Test_MAE']),
        'test_rmse': float(metrics_df.iloc[0]['Test_RMSE']),
        'test_mape': float(metrics_df.iloc[0]['Test_MAPE']),
        'overfitting_gap': float(metrics_df.iloc[0]['Overfit_Gap_R2']),
    },
    'models_compared': len(metrics_df),
    'group_metrics_available': list(available_groups.keys()) if available_groups else [],
}

# Add prediction intervals if available
if pred_intervals_file.exists():
    report['prediction_intervals'] = pred_intervals

# Salva JSON
with open(OUTPUT_DIR / "00_analysis_summary.json", 'w') as f:
    json.dump(report, f, indent=2)

print("\n" + "=" * 100)
print("üìã ANALYSIS SUMMARY")
print("=" * 100)
print(json.dumps(report, indent=2))
print(f"\nüíæ Salvato: 00_analysis_summary.json")

## ‚úÖ Conclusioni

### File Generati

1. `00_analysis_summary.json` - Report completo
2. `01_model_comparison.csv` - Confronto metriche tra modelli
3. `02_model_comparison.png` - Grafici confronto modelli
4. `03_overfitting_analysis.png` - Analisi overfitting
5. `04_group_performance.png` - Performance per gruppo
6. `05_worst_predictions.csv` - Top worst predictions
7. `06_worst_predictions.png` - Grafico worst predictions
8. `07_prediction_intervals.csv` - Analisi intervalli predizione
9. `08_prediction_intervals.png` - Grafici intervalli

### Key Insights

- **Best Model**: Modello con R¬≤ test pi√π alto
- **Overfitting**: Gap train-test indica generalizzazione
- **Group Performance**: Identifica categorie problematiche
- **Prediction Intervals**: Quantifica incertezza del modello

### Next Steps

1. Se overfitting alto (gap > 0.10): aumenta regularization
2. Se coverage intervals bassa: rivedi calibrazione
3. Se errori alti su gruppi specifici: feature engineering mirato
4. Analizza worst predictions per pattern comuni