# Complete Guide to Regression Evaluation Metrics
## MSE, RMSE, MAE, R¬≤, and More

## Learning Objectives
By the end of this notebook, you will understand:
1. Why we need evaluation metrics
2. Mean Squared Error (MSE) and Root Mean Squared Error (RMSE)
3. Mean Absolute Error (MAE)
4. R¬≤ Score (Coefficient of Determination)
5. When to use which metric
6. Advanced metrics (MAPE, MSLE, Adjusted R¬≤)
7. Residual analysis
8. Best practices for model evaluation

---

## 1. Why Do We Need Evaluation Metrics?

### The Core Question

**How do we measure if our model is good?**

We need metrics to:
1. **Quantify performance**: Convert "good" into numbers
2. **Compare models**: Which model is better?
3. **Track improvements**: Is our new model better than the old one?
4. **Set goals**: "We need R¬≤ > 0.8"
5. **Communicate results**: Report to stakeholders

### Prediction Error

For a single prediction:
$$\text{Error}_i = y_i - \hat{y}_i$$

Where:
- $y_i$ : true value
- $\hat{y}_i$ : predicted value
- Positive error: underprediction
- Negative error: overprediction

But we need to aggregate errors across all predictions!

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_regression, load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import (
    mean_squared_error, 
    mean_absolute_error, 
    r2_score,
    mean_absolute_percentage_error,
    max_error
)
import pandas as pd
from scipy import stats

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
np.random.seed(42)

# Generate sample data
X = np.linspace(0, 10, 50).reshape(-1, 1)
y_true = 2 * X.flatten() + 3
y_pred = y_true + np.random.randn(50) * 1.5

# Visualize predictions
plt.figure(figsize=(12, 5))

# Predictions vs True
plt.subplot(1, 2, 1)
plt.scatter(X, y_true, color='green', s=50, alpha=0.6, label='True values')
plt.scatter(X, y_pred, color='blue', s=50, alpha=0.6, label='Predictions')
for i in range(len(X)):
    plt.plot([X[i], X[i]], [y_true[i], y_pred[i]], 'r--', alpha=0.3, linewidth=1)
plt.xlabel('Feature (x)', fontsize=11)
plt.ylabel('Target (y)', fontsize=11)
plt.title('Predictions vs True Values\n(Red lines = errors)', fontsize=12, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)

# Error distribution
plt.subplot(1, 2, 2)
errors = y_true - y_pred
plt.hist(errors, bins=20, edgecolor='black', alpha=0.7)
plt.axvline(x=0, color='red', linestyle='--', linewidth=2, label='Zero error')
plt.axvline(x=np.mean(errors), color='green', linestyle='--', linewidth=2, 
           label=f'Mean error: {np.mean(errors):.2f}')
plt.xlabel('Error (True - Predicted)', fontsize=11)
plt.ylabel('Frequency', fontsize=11)
plt.title('Distribution of Errors', fontsize=12, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("Key Insight:")
print("‚Ä¢ Individual errors vary (some positive, some negative)")
print("‚Ä¢ We need a single number to summarize overall performance")
print("‚Ä¢ Different metrics emphasize different aspects of errors")

---
## 2. Mean Squared Error (MSE)

### Definition

**MSE** is the average of squared errors:

$$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$

### Properties:
1. **Always positive** (errors are squared)
2. **Units**: squared units of target (e.g., dollars¬≤)
3. **Penalizes large errors more** than small errors
4. **Differentiable**: Good for optimization
5. **Lower is better**: 0 = perfect predictions

### Why Square Errors?
- Positive and negative errors don't cancel out
- Penalizes outliers heavily
- Mathematically convenient (differentiable)
- Related to variance

### Advantages:
- ‚úÖ Most common metric
- ‚úÖ Good for optimization
- ‚úÖ Heavily penalizes large errors
- ‚úÖ Theoretical foundation (Gaussian assumption)

### Disadvantages:
- ‚ùå Not in original units
- ‚ùå Sensitive to outliers
- ‚ùå Hard to interpret magnitude

In [None]:
# MSE from scratch
def mse_from_scratch(y_true, y_pred):
    """
    Calculate Mean Squared Error
    """
    errors = y_true - y_pred
    squared_errors = errors ** 2
    mse = np.mean(squared_errors)
    return mse

# Test implementation
print("=== MEAN SQUARED ERROR (MSE) ===")

# Simple example
y_true = np.array([3, -0.5, 2, 7])
y_pred = np.array([2.5, 0.0, 2, 8])

print("\nSimple Example:")
print(f"True values:     {y_true}")
print(f"Predictions:     {y_pred}")
print(f"Errors:          {y_true - y_pred}")
print(f"Squared errors:  {(y_true - y_pred)**2}")

mse_manual = mse_from_scratch(y_true, y_pred)
mse_sklearn = mean_squared_error(y_true, y_pred)

print(f"\nMSE (manual):    {mse_manual:.4f}")
print(f"MSE (sklearn):   {mse_sklearn:.4f}")
print(f"Match: {np.isclose(mse_manual, mse_sklearn)}")

# Visualize how MSE penalizes errors
errors_range = np.linspace(-5, 5, 100)
squared_errors = errors_range ** 2
absolute_errors = np.abs(errors_range)

plt.figure(figsize=(12, 5))

# Loss functions
plt.subplot(1, 2, 1)
plt.plot(errors_range, squared_errors, 'b-', linewidth=2, label='Squared (MSE)')
plt.plot(errors_range, absolute_errors, 'r-', linewidth=2, label='Absolute (MAE)')
plt.xlabel('Error', fontsize=11)
plt.ylabel('Loss', fontsize=11)
plt.title('MSE vs MAE: How They Penalize Errors', fontsize=12, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.axvline(x=0, color='black', linestyle='--', linewidth=1)
plt.axhline(y=0, color='black', linestyle='--', linewidth=1)

# MSE calculation breakdown
plt.subplot(1, 2, 2)
x_pos = np.arange(len(y_true))
errors = y_true - y_pred
squared_errors = errors ** 2

plt.bar(x_pos, squared_errors, alpha=0.7, edgecolor='black')
plt.axhline(y=mse_manual, color='red', linestyle='--', linewidth=2,
           label=f'MSE = {mse_manual:.3f}')
plt.xlabel('Sample Index', fontsize=11)
plt.ylabel('Squared Error', fontsize=11)
plt.title('Squared Errors per Sample', fontsize=12, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\nKey Observations:")
print("‚Ä¢ Squared loss penalizes large errors much more than small ones")
print("‚Ä¢ Error of 2 contributes 4 to MSE")
print("‚Ä¢ Error of 4 contributes 16 to MSE (4√ó worse)")
print("‚Ä¢ MSE is very sensitive to outliers!")

---
## 3. Root Mean Squared Error (RMSE)

### Definition

**RMSE** is the square root of MSE:

$$RMSE = \sqrt{MSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}$$

### Properties:
1. **Same units as target**: If predicting dollars, RMSE is in dollars
2. **Still penalizes large errors**: Due to squaring before averaging
3. **More interpretable** than MSE
4. **Lower is better**: 0 = perfect predictions

### Interpretation:
RMSE represents the **typical prediction error** in the original units.

Example: RMSE = $5,000 for house prices
- Typical prediction is off by about $5,000

### Advantages:
- ‚úÖ In original units (interpretable)
- ‚úÖ Penalizes large errors
- ‚úÖ More intuitive than MSE

### Disadvantages:
- ‚ùå Still sensitive to outliers
- ‚ùå Not scale-invariant

In [None]:
# RMSE from scratch
def rmse_from_scratch(y_true, y_pred):
    """
    Calculate Root Mean Squared Error
    """
    mse = np.mean((y_true - y_pred) ** 2)
    rmse = np.sqrt(mse)
    return rmse

print("=== ROOT MEAN SQUARED ERROR (RMSE) ===")

# House price example
y_true_prices = np.array([300000, 450000, 250000, 600000, 350000])
y_pred_prices = np.array([310000, 440000, 260000, 580000, 355000])

mse_prices = mean_squared_error(y_true_prices, y_pred_prices)
rmse_prices = np.sqrt(mse_prices)

print("\nHouse Price Predictions:")
print(f"True prices: {y_true_prices}")
print(f"Predictions: {y_pred_prices}")
print(f"Errors:      {y_true_prices - y_pred_prices}")

print(f"\nMSE:  ${mse_prices:,.0f}¬≤ (hard to interpret!)")
print(f"RMSE: ${rmse_prices:,.0f} (typical error)")

print(f"\nInterpretation:")
print(f"  On average, predictions are off by about ${rmse_prices:,.0f}")

# Demonstrate outlier sensitivity
print("\n=== OUTLIER SENSITIVITY ===")

y_true_no_outlier = np.array([10, 12, 11, 13, 12])
y_pred_no_outlier = np.array([10.5, 11.5, 11.5, 12.5, 11.5])

y_true_with_outlier = np.array([10, 12, 11, 13, 100])  # Added outlier
y_pred_with_outlier = np.array([10.5, 11.5, 11.5, 12.5, 11.5])

rmse_no_outlier = rmse_from_scratch(y_true_no_outlier, y_pred_no_outlier)
rmse_with_outlier = rmse_from_scratch(y_true_with_outlier, y_pred_with_outlier)

print(f"\nWithout outlier: RMSE = {rmse_no_outlier:.3f}")
print(f"With outlier:    RMSE = {rmse_with_outlier:.3f}")
print(f"\nOutlier increased RMSE by {(rmse_with_outlier/rmse_no_outlier - 1)*100:.1f}%!")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# No outlier
axes[0].scatter(range(len(y_true_no_outlier)), y_true_no_outlier, 
               color='green', s=100, label='True', zorder=3)
axes[0].scatter(range(len(y_pred_no_outlier)), y_pred_no_outlier,
               color='blue', s=100, label='Predicted', zorder=3)
for i in range(len(y_true_no_outlier)):
    axes[0].plot([i, i], [y_true_no_outlier[i], y_pred_no_outlier[i]], 
                'r--', linewidth=2, alpha=0.5)
axes[0].set_xlabel('Sample Index', fontsize=11)
axes[0].set_ylabel('Value', fontsize=11)
axes[0].set_title(f'Without Outlier\nRMSE = {rmse_no_outlier:.3f}',
                 fontsize=12, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(True, alpha=0.3)

# With outlier
axes[1].scatter(range(len(y_true_with_outlier)), y_true_with_outlier,
               color='green', s=100, label='True', zorder=3)
axes[1].scatter(range(len(y_pred_with_outlier)), y_pred_with_outlier,
               color='blue', s=100, label='Predicted', zorder=3)
for i in range(len(y_true_with_outlier)):
    axes[1].plot([i, i], [y_true_with_outlier[i], y_pred_with_outlier[i]],
                'r--', linewidth=2, alpha=0.5)
axes[1].scatter([4], [100], color='red', s=200, marker='*', 
               label='Outlier!', zorder=4)
axes[1].set_xlabel('Sample Index', fontsize=11)
axes[1].set_ylabel('Value', fontsize=11)
axes[1].set_title(f'With Outlier\nRMSE = {rmse_with_outlier:.3f} (Much Higher!)',
                 fontsize=12, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n‚ö†Ô∏è  RMSE is very sensitive to outliers due to squaring!")

---
## 4. Mean Absolute Error (MAE)

### Definition

**MAE** is the average of absolute errors:

$$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i|$$

### Properties:
1. **Same units as target**: Direct interpretation
2. **Linear penalty**: All errors weighted equally
3. **Robust to outliers**: Less sensitive than MSE/RMSE
4. **Lower is better**: 0 = perfect predictions

### MAE vs RMSE:

| Property | MAE | RMSE |
|----------|-----|------|
| Outlier sensitivity | Low | High |
| Error penalty | Linear | Quadratic |
| Interpretability | High | High |
| Differentiability | Not at zero | Everywhere |

### Advantages:
- ‚úÖ Robust to outliers
- ‚úÖ Easy to interpret
- ‚úÖ In original units
- ‚úÖ All errors weighted equally

### Disadvantages:
- ‚ùå Not differentiable at zero
- ‚ùå Doesn't penalize large errors heavily
- ‚ùå Less commonly used in optimization

In [None]:
# MAE from scratch
def mae_from_scratch(y_true, y_pred):
    """
    Calculate Mean Absolute Error
    """
    errors = y_true - y_pred
    absolute_errors = np.abs(errors)
    mae = np.mean(absolute_errors)
    return mae

print("=== MEAN ABSOLUTE ERROR (MAE) ===")

# Compare MAE and RMSE
y_true = np.array([3, -0.5, 2, 7])
y_pred = np.array([2.5, 0.0, 2, 8])

mae = mae_from_scratch(y_true, y_pred)
rmse = rmse_from_scratch(y_true, y_pred)

print("\nSimple Example:")
print(f"True values:      {y_true}")
print(f"Predictions:      {y_pred}")
print(f"Errors:           {y_true - y_pred}")
print(f"Absolute errors:  {np.abs(y_true - y_pred)}")

print(f"\nMAE:  {mae:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"\nRMSE ‚â• MAE always (due to squaring)")

# Outlier comparison
print("\n=== MAE vs RMSE: OUTLIER ROBUSTNESS ===")

# Without outlier
y_true_clean = np.array([10, 12, 11, 13, 12])
y_pred_clean = np.array([10.5, 11.5, 11.5, 12.5, 11.5])

# With outlier
y_true_outlier = np.array([10, 12, 11, 13, 100])
y_pred_outlier = np.array([10.5, 11.5, 11.5, 12.5, 11.5])

# Calculate metrics
mae_clean = mae_from_scratch(y_true_clean, y_pred_clean)
rmse_clean = rmse_from_scratch(y_true_clean, y_pred_clean)

mae_outlier = mae_from_scratch(y_true_outlier, y_pred_outlier)
rmse_outlier = rmse_from_scratch(y_true_outlier, y_pred_outlier)

print("\nWithout outlier:")
print(f"  MAE:  {mae_clean:.3f}")
print(f"  RMSE: {rmse_clean:.3f}")

print("\nWith outlier:")
print(f"  MAE:  {mae_outlier:.3f}")
print(f"  RMSE: {rmse_outlier:.3f}")

mae_increase = (mae_outlier / mae_clean - 1) * 100
rmse_increase = (rmse_outlier / rmse_clean - 1) * 100

print(f"\nIncrease due to outlier:")
print(f"  MAE:  +{mae_increase:.1f}%")
print(f"  RMSE: +{rmse_increase:.1f}%")
print(f"\n‚úì MAE is more robust to outliers!")

# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Metric comparison
metrics = ['MAE', 'RMSE']
clean_values = [mae_clean, rmse_clean]
outlier_values = [mae_outlier, rmse_outlier]

x_pos = np.arange(len(metrics))
width = 0.35

axes[0].bar(x_pos - width/2, clean_values, width, label='Without outlier', alpha=0.8)
axes[0].bar(x_pos + width/2, outlier_values, width, label='With outlier', alpha=0.8)
axes[0].set_xlabel('Metric', fontsize=11)
axes[0].set_ylabel('Value', fontsize=11)
axes[0].set_title('MAE vs RMSE: Outlier Sensitivity', fontsize=12, fontweight='bold')
axes[0].set_xticks(x_pos)
axes[0].set_xticklabels(metrics)
axes[0].legend(fontsize=10)
axes[0].grid(True, alpha=0.3, axis='y')

# Percentage increase
increases = [mae_increase, rmse_increase]
colors = ['green' if i < 50 else 'orange' if i < 100 else 'red' for i in increases]

axes[1].bar(metrics, increases, color=colors, alpha=0.7, edgecolor='black')
axes[1].set_xlabel('Metric', fontsize=11)
axes[1].set_ylabel('Percentage Increase (%)', fontsize=11)
axes[1].set_title('Impact of Outlier', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='y')

# Add value labels
for i, (metric, increase) in enumerate(zip(metrics, increases)):
    axes[1].text(i, increase + 5, f'+{increase:.0f}%', 
                ha='center', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.show()

---
## 5. R¬≤ Score (Coefficient of Determination)

### Definition

**R¬≤** measures the proportion of variance explained by the model:

$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}$$

Where:
- $SS_{res}$ : Residual sum of squares (model error)
- $SS_{tot}$ : Total sum of squares (variance in data)
- $\bar{y}$ : Mean of true values

### Interpretation:

- **R¬≤ = 1**: Perfect predictions (all variance explained)
- **R¬≤ = 0**: Model no better than predicting the mean
- **R¬≤ < 0**: Model worse than predicting the mean!
- **R¬≤ = 0.8**: Model explains 80% of variance

### Properties:
1. **Scale-invariant**: Same R¬≤ regardless of units
2. **Relative measure**: Compares to baseline (mean)
3. **Easy to interpret**: Percentage of variance explained
4. **Can be negative**: For very bad models

### Advantages:
- ‚úÖ Normalized (between -‚àû and 1)
- ‚úÖ Easy to interpret
- ‚úÖ Scale-invariant
- ‚úÖ Compares to baseline

### Disadvantages:
- ‚ùå Can be misleading with small samples
- ‚ùå Always increases with more features
- ‚ùå Not suitable for all regression problems

In [None]:
# R¬≤ from scratch
def r2_from_scratch(y_true, y_pred):
    """
    Calculate R¬≤ Score
    """
    # Residual sum of squares
    ss_res = np.sum((y_true - y_pred) ** 2)
    
    # Total sum of squares
    ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
    
    # R¬≤
    r2 = 1 - (ss_res / ss_tot)
    
    return r2, ss_res, ss_tot

print("=== R¬≤ SCORE (COEFFICIENT OF DETERMINATION) ===")

# Example with different model qualities
np.random.seed(42)
X_demo = np.linspace(0, 10, 50)
y_true_demo = 2 * X_demo + 3

# Perfect model
y_pred_perfect = y_true_demo

# Good model
y_pred_good = y_true_demo + np.random.randn(50) * 1

# Poor model
y_pred_poor = y_true_demo + np.random.randn(50) * 5

# Baseline (mean)
y_pred_baseline = np.full_like(y_true_demo, np.mean(y_true_demo))

# Terrible model
y_pred_terrible = y_true_demo + np.random.randn(50) * 10

# Calculate R¬≤ for each
r2_perfect, _, _ = r2_from_scratch(y_true_demo, y_pred_perfect)
r2_good, _, _ = r2_from_scratch(y_true_demo, y_pred_good)
r2_poor, _, _ = r2_from_scratch(y_true_demo, y_pred_poor)
r2_baseline, _, _ = r2_from_scratch(y_true_demo, y_pred_baseline)
r2_terrible, _, _ = r2_from_scratch(y_true_demo, y_pred_terrible)

print("\nR¬≤ for Different Model Qualities:")
print(f"Perfect model:    R¬≤ = {r2_perfect:.4f} (explains 100% of variance)")
print(f"Good model:       R¬≤ = {r2_good:.4f} (explains {r2_good*100:.1f}% of variance)")
print(f"Poor model:       R¬≤ = {r2_poor:.4f} (explains {r2_poor*100:.1f}% of variance)")
print(f"Baseline (mean):  R¬≤ = {r2_baseline:.4f} (no better than guessing mean)")
print(f"Terrible model:   R¬≤ = {r2_terrible:.4f} (WORSE than guessing mean!)")

# Visualize
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
axes = axes.flatten()

models = [
    ('Perfect', y_pred_perfect, r2_perfect),
    ('Good', y_pred_good, r2_good),
    ('Poor', y_pred_poor, r2_poor),
    ('Baseline (Mean)', y_pred_baseline, r2_baseline),
    ('Terrible', y_pred_terrible, r2_terrible)
]

for ax, (name, y_pred, r2) in zip(axes[:-1], models):
    ax.scatter(y_true_demo, y_pred, alpha=0.6, s=30)
    ax.plot([y_true_demo.min(), y_true_demo.max()],
           [y_true_demo.min(), y_true_demo.max()],
           'r--', linewidth=2, label='Perfect prediction')
    ax.set_xlabel('True Values', fontsize=10)
    ax.set_ylabel('Predictions', fontsize=10)
    ax.set_title(f'{name} Model\nR¬≤ = {r2:.3f}',
                fontsize=11, fontweight='bold')
    ax.legend(fontsize=9)
    ax.grid(True, alpha=0.3)

# R¬≤ interpretation guide
axes[-1].axis('off')
interpretation = """
R¬≤ INTERPRETATION GUIDE

R¬≤ = 1.0 ‚Üí Perfect predictions
           All variance explained

R¬≤ = 0.9 ‚Üí Excellent model
           90% variance explained

R¬≤ = 0.7 ‚Üí Good model
           70% variance explained

R¬≤ = 0.5 ‚Üí Moderate model
           50% variance explained

R¬≤ = 0.0 ‚Üí Baseline (mean)
           No better than average

R¬≤ < 0.0 ‚Üí Bad model
           Worse than guessing mean!

Context matters:
‚Ä¢ Social sciences: R¬≤>0.3 good
‚Ä¢ Physical sciences: R¬≤>0.9 expected
‚Ä¢ Finance: R¬≤>0.1 can be useful
"""
axes[-1].text(0.1, 0.95, interpretation, transform=axes[-1].transAxes,
             fontsize=10, verticalalignment='top', family='monospace',
             bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.5))

plt.tight_layout()
plt.show()

### Decomposing R¬≤

In [None]:
# Visualize R¬≤ decomposition
print("=== R¬≤ DECOMPOSITION ===")

# Simple example
y_true_simple = np.array([10, 15, 20, 25, 30])
y_pred_simple = np.array([12, 14, 21, 24, 29])
y_mean = np.mean(y_true_simple)

# Calculate components
ss_tot = np.sum((y_true_simple - y_mean) ** 2)
ss_res = np.sum((y_true_simple - y_pred_simple) ** 2)
ss_reg = np.sum((y_pred_simple - y_mean) ** 2)
r2 = 1 - (ss_res / ss_tot)

print(f"\nData: {y_true_simple}")
print(f"Predictions: {y_pred_simple}")
print(f"Mean: {y_mean:.1f}")

print(f"\nSum of Squares:")
print(f"  SS_tot (Total):      {ss_tot:.1f} (variance to explain)")
print(f"  SS_reg (Explained):  {ss_reg:.1f} (variance explained by model)")
print(f"  SS_res (Residual):   {ss_res:.1f} (unexplained variance)")
print(f"\n  SS_tot = SS_reg + SS_res")
print(f"  {ss_tot:.1f} = {ss_reg:.1f} + {ss_res:.1f} ‚úì")

print(f"\nR¬≤ = 1 - (SS_res / SS_tot)")
print(f"R¬≤ = 1 - ({ss_res:.1f} / {ss_tot:.1f})")
print(f"R¬≤ = {r2:.4f}")

print(f"\nInterpretation:")
print(f"  Model explains {r2*100:.1f}% of the variance")
print(f"  {(1-r2)*100:.1f}% remains unexplained (residual)")

# Visualize decomposition
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

x_pos = np.arange(len(y_true_simple))

# Total variance (from mean)
axes[0].bar(x_pos, y_true_simple - y_mean, alpha=0.7, color='blue')
axes[0].axhline(y=0, color='red', linestyle='--', linewidth=2, label=f'Mean = {y_mean:.1f}')
axes[0].set_xlabel('Sample', fontsize=11)
axes[0].set_ylabel('Deviation from Mean', fontsize=11)
axes[0].set_title(f'Total Variance (SS_tot = {ss_tot:.1f})',
                 fontsize=12, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(True, alpha=0.3, axis='y')

# Explained variance (predictions from mean)
axes[1].bar(x_pos, y_pred_simple - y_mean, alpha=0.7, color='green')
axes[1].axhline(y=0, color='red', linestyle='--', linewidth=2, label=f'Mean = {y_mean:.1f}')
axes[1].set_xlabel('Sample', fontsize=11)
axes[1].set_ylabel('Prediction Deviation from Mean', fontsize=11)
axes[1].set_title(f'Explained by Model (SS_reg = {ss_reg:.1f})',
                 fontsize=12, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(True, alpha=0.3, axis='y')

# Residual variance (true - predicted)
axes[2].bar(x_pos, y_true_simple - y_pred_simple, alpha=0.7, color='red')
axes[2].axhline(y=0, color='black', linestyle='--', linewidth=2)
axes[2].set_xlabel('Sample', fontsize=11)
axes[2].set_ylabel('Residual (Error)', fontsize=11)
axes[2].set_title(f'Unexplained (SS_res = {ss_res:.1f})',
                 fontsize=12, fontweight='bold')
axes[2].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

---
## 6. When to Use Which Metric?

### Decision Guide

In [None]:
print("=" * 80)
print("METRIC SELECTION GUIDE")
print("=" * 80)

print("\nüìä METRIC COMPARISON")
print("-" * 80)
print(f"{'Metric':<12} {'Units':<18} {'Outliers':<12} {'Interpret':<12} {'Use When'}")
print("-" * 80)
print(f"{'MSE':<12} {'Target¬≤':<18} {'Sensitive':<12} {'Hard':<12} {'Optimization'}")
print(f"{'RMSE':<12} {'Same as target':<18} {'Sensitive':<12} {'Easy':<12} {'Reporting'}")
print(f"{'MAE':<12} {'Same as target':<18} {'Robust':<12} {'Easy':<12} {'Outliers present'}")
print(f"{'R¬≤':<12} {'Normalized':<18} {'Sensitive':<12} {'Easy':<12} {'Compare models'}")
print("-" * 80)

print("\nüéØ USE MSE/RMSE WHEN:")
print("  ‚úì Training models (optimization)")
print("  ‚úì Large errors are especially bad")
print("  ‚úì Data is clean (few outliers)")
print("  ‚úì Normal distribution of errors expected")
print("  ‚úì Need differentiable loss")

print("\nüéØ USE MAE WHEN:")
print("  ‚úì Outliers are present")
print("  ‚úì All errors should be weighted equally")
print("  ‚úì Robust metric needed")
print("  ‚úì Reporting to non-technical audience")
print("  ‚úì Median-based predictions")

print("\nüéØ USE R¬≤ WHEN:")
print("  ‚úì Comparing different models")
print("  ‚úì Comparing different datasets")
print("  ‚úì Want normalized metric")
print("  ‚úì Need to communicate % variance explained")
print("  ‚úì Comparing to baseline (mean)")

print("\n‚öñÔ∏è  BEST PRACTICE:")
print("  ‚Üí Report multiple metrics!")
print("  ‚Üí Use different metrics for different purposes")
print("  ‚Üí Consider your specific problem context")

print("\n" + "=" * 80)

# Practical example comparing all metrics
print("\n=== PRACTICAL EXAMPLE ===")

# Generate realistic data
np.random.seed(42)
X, y = make_regression(n_samples=100, n_features=10, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Calculate all metrics
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("\nModel Evaluation on Test Set:")
print(f"  MSE:  {mse:.4f}")
print(f"  RMSE: {rmse:.4f} (typical error)")
print(f"  MAE:  {mae:.4f} (average absolute error)")
print(f"  R¬≤:   {r2:.4f} ({r2*100:.1f}% variance explained)")

print("\nWhat each metric tells us:")
print(f"  ‚Ä¢ On average, predictions are off by {mae:.2f} units (MAE)")
print(f"  ‚Ä¢ Typical prediction error is {rmse:.2f} units (RMSE)")
print(f"  ‚Ä¢ Model explains {r2*100:.1f}% of variance in the data (R¬≤)")
print(f"  ‚Ä¢ RMSE > MAE indicates some larger errors")

---
## 7. Advanced Metrics

### Mean Absolute Percentage Error (MAPE)

$$MAPE = \frac{100\%}{n}\sum_{i=1}^{n}\left|\frac{y_i - \hat{y}_i}{y_i}\right|$$

- **Scale-independent**: Percentage error
- **Interpretable**: "On average, predictions are off by X%"
- **Problem**: Undefined when $y_i = 0$, biased toward negative errors

### Mean Squared Logarithmic Error (MSLE)

$$MSLE = \frac{1}{n}\sum_{i=1}^{n}(\log(1 + y_i) - \log(1 + \hat{y}_i))^2$$

- **Penalizes underestimation** more than overestimation
- **Good for**: Positive targets with large range
- **Example**: Counts, prices

### Adjusted R¬≤

$$R^2_{adj} = 1 - \frac{(1-R^2)(n-1)}{n-p-1}$$

Where:
- $n$ : number of samples
- $p$ : number of features

- **Penalizes** adding more features
- **Better for model comparison** with different numbers of features

In [None]:
# Demonstrate advanced metrics
print("=== ADVANCED METRICS ===")

# MAPE example
print("\n1. MEAN ABSOLUTE PERCENTAGE ERROR (MAPE)")
print("-" * 50)

y_true_price = np.array([100, 200, 300, 400, 500])
y_pred_price = np.array([110, 190, 320, 380, 510])

# Calculate MAPE manually
mape = np.mean(np.abs((y_true_price - y_pred_price) / y_true_price)) * 100

print(f"True prices:    {y_true_price}")
print(f"Predicted:      {y_pred_price}")
print(f"\nMAPE: {mape:.2f}%")
print(f"\nInterpretation: On average, predictions are off by {mape:.1f}%")

# MSLE example
print("\n2. MEAN SQUARED LOGARITHMIC ERROR (MSLE)")
print("-" * 50)

from sklearn.metrics import mean_squared_log_error

y_true_count = np.array([10, 100, 1000, 10000])
y_pred_under = np.array([5, 50, 500, 5000])  # Underestimate
y_pred_over = np.array([15, 150, 1500, 15000])  # Overestimate

msle_under = mean_squared_log_error(y_true_count, y_pred_under)
msle_over = mean_squared_log_error(y_true_count, y_pred_over)

print(f"True counts: {y_true_count}")
print(f"\nUnderestimating by 50%: MSLE = {msle_under:.4f}")
print(f"Overestimating by 50%:  MSLE = {msle_over:.4f}")
print(f"\n‚úì MSLE penalizes underestimation more!")

# Adjusted R¬≤
print("\n3. ADJUSTED R¬≤")
print("-" * 50)

def adjusted_r2(r2, n, p):
    """Calculate adjusted R¬≤"""
    return 1 - ((1 - r2) * (n - 1) / (n - p - 1))

# Generate data
np.random.seed(42)
n_samples = 100

# Model with 5 features
X_5, y_5 = make_regression(n_samples=n_samples, n_features=5, noise=10, random_state=42)
X_train_5, X_test_5, y_train_5, y_test_5 = train_test_split(X_5, y_5, test_size=0.2)

model_5 = LinearRegression()
model_5.fit(X_train_5, y_train_5)
r2_5 = model_5.score(X_test_5, y_test_5)
adj_r2_5 = adjusted_r2(r2_5, len(X_test_5), 5)

# Model with 20 features (some irrelevant)
X_20, y_20 = make_regression(n_samples=n_samples, n_features=20, 
                             n_informative=5, noise=10, random_state=42)
X_train_20, X_test_20, y_train_20, y_test_20 = train_test_split(X_20, y_20, test_size=0.2)

model_20 = LinearRegression()
model_20.fit(X_train_20, y_train_20)
r2_20 = model_20.score(X_test_20, y_test_20)
adj_r2_20 = adjusted_r2(r2_20, len(X_test_20), 20)

print("\nModel with 5 features:")
print(f"  R¬≤:          {r2_5:.4f}")
print(f"  Adjusted R¬≤: {adj_r2_5:.4f}")

print("\nModel with 20 features (15 irrelevant):")
print(f"  R¬≤:          {r2_20:.4f}")
print(f"  Adjusted R¬≤: {adj_r2_20:.4f}")

print("\n‚úì Adjusted R¬≤ penalizes unnecessary features!")
print("  Use for comparing models with different numbers of features")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# R¬≤ vs Adjusted R¬≤
models = ['5 features', '20 features']
r2_vals = [r2_5, r2_20]
adj_r2_vals = [adj_r2_5, adj_r2_20]

x_pos = np.arange(len(models))
width = 0.35

axes[0].bar(x_pos - width/2, r2_vals, width, label='R¬≤', alpha=0.8)
axes[0].bar(x_pos + width/2, adj_r2_vals, width, label='Adjusted R¬≤', alpha=0.8)
axes[0].set_ylabel('Score', fontsize=11)
axes[0].set_title('R¬≤ vs Adjusted R¬≤', fontsize=12, fontweight='bold')
axes[0].set_xticks(x_pos)
axes[0].set_xticklabels(models)
axes[0].legend(fontsize=10)
axes[0].grid(True, alpha=0.3, axis='y')

# MAPE visualization
percentage_errors = np.abs((y_true_price - y_pred_price) / y_true_price) * 100
axes[1].bar(range(len(percentage_errors)), percentage_errors, alpha=0.7, edgecolor='black')
axes[1].axhline(y=mape, color='red', linestyle='--', linewidth=2, 
               label=f'MAPE = {mape:.1f}%')
axes[1].set_xlabel('Sample', fontsize=11)
axes[1].set_ylabel('Absolute Percentage Error (%)', fontsize=11)
axes[1].set_title('MAPE: Per-Sample Percentage Errors', fontsize=12, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

---
## 8. Residual Analysis

### Why Analyze Residuals?

Metrics give us numbers, but residuals show us **patterns**:
- Are errors random or systematic?
- Do errors depend on input values?
- Are there outliers?
- Is the model appropriate?

### Key Plots:
1. **Residuals vs Predicted**: Check for patterns
2. **Residual distribution**: Should be normal
3. **Q-Q plot**: Check normality
4. **Residuals vs Features**: Check relationships

In [None]:
# Comprehensive residual analysis
print("=== RESIDUAL ANALYSIS ===")

# Load real dataset
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = Ridge(alpha=1.0)
model.fit(X_train, y_train)

# Predictions
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

# Residuals
train_residuals = y_train - y_train_pred
test_residuals = y_test - y_test_pred

# Calculate metrics
print("\nTest Set Metrics:")
print(f"  RMSE: {np.sqrt(mean_squared_error(y_test, y_test_pred)):.2f}")
print(f"  MAE:  {mean_absolute_error(y_test, y_test_pred):.2f}")
print(f"  R¬≤:   {r2_score(y_test, y_test_pred):.4f}")

# Residual statistics
print("\nResidual Statistics:")
print(f"  Mean:   {np.mean(test_residuals):.4f} (should be ~0)")
print(f"  Std:    {np.std(test_residuals):.2f}")
print(f"  Min:    {np.min(test_residuals):.2f}")
print(f"  Max:    {np.max(test_residuals):.2f}")

# Comprehensive residual plots
fig, axes = plt.subplots(2, 3, figsize=(16, 10))

# 1. Residuals vs Predicted
axes[0, 0].scatter(y_test_pred, test_residuals, alpha=0.6, s=30)
axes[0, 0].axhline(y=0, color='red', linestyle='--', linewidth=2)
axes[0, 0].set_xlabel('Predicted Values', fontsize=10)
axes[0, 0].set_ylabel('Residuals', fontsize=10)
axes[0, 0].set_title('Residuals vs Predicted\n(Should be random)', 
                    fontsize=11, fontweight='bold')
axes[0, 0].grid(True, alpha=0.3)

# 2. Residual Distribution
axes[0, 1].hist(test_residuals, bins=20, edgecolor='black', alpha=0.7)
axes[0, 1].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[0, 1].axvline(x=np.mean(test_residuals), color='green', linestyle='--', 
                  linewidth=2, label=f'Mean={np.mean(test_residuals):.2f}')
axes[0, 1].set_xlabel('Residuals', fontsize=10)
axes[0, 1].set_ylabel('Frequency', fontsize=10)
axes[0, 1].set_title('Residual Distribution\n(Should be normal)', 
                    fontsize=11, fontweight='bold')
axes[0, 1].legend(fontsize=9)
axes[0, 1].grid(True, alpha=0.3, axis='y')

# 3. Q-Q Plot
stats.probplot(test_residuals, dist="norm", plot=axes[0, 2])
axes[0, 2].set_title('Q-Q Plot\n(Check normality)', fontsize=11, fontweight='bold')
axes[0, 2].grid(True, alpha=0.3)

# 4. Scale-Location (sqrt of standardized residuals vs predicted)
standardized_residuals = test_residuals / np.std(test_residuals)
axes[1, 0].scatter(y_test_pred, np.sqrt(np.abs(standardized_residuals)), 
                  alpha=0.6, s=30)
axes[1, 0].set_xlabel('Predicted Values', fontsize=10)
axes[1, 0].set_ylabel('‚àö|Standardized Residuals|', fontsize=10)
axes[1, 0].set_title('Scale-Location Plot\n(Check homoscedasticity)', 
                    fontsize=11, fontweight='bold')
axes[1, 0].grid(True, alpha=0.3)

# 5. Residuals vs Actual
axes[1, 1].scatter(y_test, test_residuals, alpha=0.6, s=30)
axes[1, 1].axhline(y=0, color='red', linestyle='--', linewidth=2)
axes[1, 1].set_xlabel('True Values', fontsize=10)
axes[1, 1].set_ylabel('Residuals', fontsize=10)
axes[1, 1].set_title('Residuals vs Actual', fontsize=11, fontweight='bold')
axes[1, 1].grid(True, alpha=0.3)

# 6. Ordered residuals
axes[1, 2].plot(sorted(test_residuals), 'o-', alpha=0.6, markersize=4)
axes[1, 2].axhline(y=0, color='red', linestyle='--', linewidth=2)
axes[1, 2].set_xlabel('Ordered Index', fontsize=10)
axes[1, 2].set_ylabel('Residual', fontsize=10)
axes[1, 2].set_title('Ordered Residuals\n(Spot outliers)', 
                    fontsize=11, fontweight='bold')
axes[1, 2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n=== RESIDUAL ANALYSIS CHECKLIST ===")
print("\n‚úì GOOD SIGNS:")
print("  ‚Ä¢ Residuals randomly scattered around zero")
print("  ‚Ä¢ No clear patterns in residual plots")
print("  ‚Ä¢ Residuals roughly normally distributed")
print("  ‚Ä¢ Mean residual close to zero")
print("  ‚Ä¢ Constant variance (homoscedasticity)")

print("\n‚ö†Ô∏è  WARNING SIGNS:")
print("  ‚Ä¢ Curved pattern in residuals vs predicted")
print("  ‚Ä¢ Funnel shape (heteroscedasticity)")
print("  ‚Ä¢ Heavy-tailed or skewed distribution")
print("  ‚Ä¢ Clear outliers")
print("  ‚Ä¢ Systematic over/under prediction")

---
## 9. Complete Evaluation Framework

In [None]:
# Comprehensive evaluation function
def comprehensive_evaluation(y_true, y_pred, model_name="Model"):
    """
    Comprehensive model evaluation with all metrics and visualizations
    """
    print("=" * 70)
    print(f"COMPREHENSIVE EVALUATION: {model_name}")
    print("=" * 70)
    
    # Calculate all metrics
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(y_true, y_pred)
    r2 = r2_score(y_true, y_pred)
    max_err = max_error(y_true, y_pred)
    
    # Residuals
    residuals = y_true - y_pred
    
    # Print metrics
    print("\nüìä PERFORMANCE METRICS")
    print("-" * 70)
    print(f"  MSE:        {mse:.4f}")
    print(f"  RMSE:       {rmse:.4f}  (typical error)")
    print(f"  MAE:        {mae:.4f}  (average absolute error)")
    print(f"  R¬≤:         {r2:.4f}  ({r2*100:.1f}% variance explained)")
    print(f"  Max Error:  {max_err:.4f}  (worst prediction)")
    
    # Residual statistics
    print("\nüìà RESIDUAL STATISTICS")
    print("-" * 70)
    print(f"  Mean:       {np.mean(residuals):.4f}  (should be ~0)")
    print(f"  Std Dev:    {np.std(residuals):.4f}")
    print(f"  Min:        {np.min(residuals):.4f}")
    print(f"  Max:        {np.max(residuals):.4f}")
    print(f"  Median:     {np.median(residuals):.4f}")
    
    # Interpretation
    print("\nüí° INTERPRETATION")
    print("-" * 70)
    
    if r2 > 0.9:
        quality = "Excellent"
    elif r2 > 0.7:
        quality = "Good"
    elif r2 > 0.5:
        quality = "Moderate"
    elif r2 > 0:
        quality = "Poor"
    else:
        quality = "Very Poor (worse than mean)"
    
    print(f"  Model Quality: {quality}")
    print(f"  ‚Ä¢ Model explains {r2*100:.1f}% of variance in target")
    print(f"  ‚Ä¢ Typical prediction is off by {rmse:.2f} units")
    print(f"  ‚Ä¢ Average absolute error is {mae:.2f} units")
    
    if rmse > mae * 1.5:
        print(f"  ‚ö†Ô∏è  RMSE >> MAE suggests presence of outliers")
    
    if abs(np.mean(residuals)) > mae * 0.1:
        print(f"  ‚ö†Ô∏è  Non-zero mean residual suggests bias")
    
    # Visualizations
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Predictions vs Actual
    axes[0, 0].scatter(y_true, y_pred, alpha=0.6, s=30)
    axes[0, 0].plot([y_true.min(), y_true.max()], 
                    [y_true.min(), y_true.max()], 
                    'r--', linewidth=2, label='Perfect prediction')
    axes[0, 0].set_xlabel('True Values', fontsize=11)
    axes[0, 0].set_ylabel('Predictions', fontsize=11)
    axes[0, 0].set_title(f'{model_name}\nR¬≤ = {r2:.3f}', 
                        fontsize=12, fontweight='bold')
    axes[0, 0].legend(fontsize=10)
    axes[0, 0].grid(True, alpha=0.3)
    
    # Residuals vs Predicted
    axes[0, 1].scatter(y_pred, residuals, alpha=0.6, s=30)
    axes[0, 1].axhline(y=0, color='red', linestyle='--', linewidth=2)
    axes[0, 1].set_xlabel('Predicted Values', fontsize=11)
    axes[0, 1].set_ylabel('Residuals', fontsize=11)
    axes[0, 1].set_title('Residual Plot', fontsize=12, fontweight='bold')
    axes[0, 1].grid(True, alpha=0.3)
    
    # Residual Distribution
    axes[1, 0].hist(residuals, bins=30, edgecolor='black', alpha=0.7)
    axes[1, 0].axvline(x=0, color='red', linestyle='--', linewidth=2)
    axes[1, 0].set_xlabel('Residuals', fontsize=11)
    axes[1, 0].set_ylabel('Frequency', fontsize=11)
    axes[1, 0].set_title('Residual Distribution', fontsize=12, fontweight='bold')
    axes[1, 0].grid(True, alpha=0.3, axis='y')
    
    # Metrics summary
    axes[1, 1].axis('off')
    summary_text = f"""
    METRIC SUMMARY
    
    MSE:   {mse:.3f}
    RMSE:  {rmse:.3f}
    MAE:   {mae:.3f}
    R¬≤:    {r2:.3f}
    
    Quality: {quality}
    
    Variance Explained: {r2*100:.1f}%
    Unexplained: {(1-r2)*100:.1f}%
    
    Typical Error: {rmse:.2f}
    Worst Error: {max_err:.2f}
    """
    axes[1, 1].text(0.1, 0.9, summary_text, transform=axes[1, 1].transAxes,
                   fontsize=11, verticalalignment='top', family='monospace',
                   bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.5))
    
    plt.tight_layout()
    plt.show()
    
    print("\n" + "=" * 70)
    
    return {
        'mse': mse,
        'rmse': rmse,
        'mae': mae,
        'r2': r2,
        'max_error': max_err
    }

# Test comprehensive evaluation
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = Ridge(alpha=1.0)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

results = comprehensive_evaluation(y_test, y_pred, "Ridge Regression (Œ±=1.0)")

---
## 10. Key Takeaways

### Core Concepts:
1. ‚úÖ **Multiple metrics** give complete picture
2. ‚úÖ **MSE/RMSE** penalize large errors heavily
3. ‚úÖ **MAE** is robust to outliers
4. ‚úÖ **R¬≤** shows % variance explained
5. ‚úÖ **Residual analysis** reveals patterns

### Metric Formulas:

**MSE**: $\frac{1}{n}\sum(y_i - \hat{y}_i)^2$

**RMSE**: $\sqrt{MSE}$

**MAE**: $\frac{1}{n}\sum|y_i - \hat{y}_i|$

**R¬≤**: $1 - \frac{SS_{res}}{SS_{tot}}$

### Best Practices:
1. **Report multiple metrics**: MSE, MAE, and R¬≤
2. **Always check residuals**: Look for patterns
3. **Consider context**: What matters for your problem?
4. **Use appropriate metric**: Outliers ‚Üí MAE, Optimization ‚Üí MSE
5. **Compare to baseline**: Is model better than predicting mean?

### Quick Guide:
- **Training**: Use MSE (differentiable)
- **Reporting**: Use RMSE and R¬≤ (interpretable)
- **Outliers**: Use MAE (robust)
- **Comparing**: Use R¬≤ (normalized)

---

**Congratulations! You understand regression evaluation metrics! üéâ**

**You can now properly assess model performance!**