# Ridge & Lasso Regression Tutorial
## Regularization: Taming Overfitting

Welcome to **Regularized Regression** - where we add constraints to prevent overfitting!

### What you'll learn:
- Why regularization is needed
- **Ridge Regression** (L2 regularization)
- **Lasso Regression** (L1 regularization)
- **Elastic Net** (L1 + L2 combined)
- Feature selection with Lasso
- Hyperparameter tuning with cross-validation
- Regularization path visualization

### Our Mission:
Build **robust models** that generalize well by controlling complexity and selecting important features.

Let's regularize! 🎯

## Step 1: Import Libraries and Setup

Import specialized regularization tools and prepare our environment.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, RidgeCV, LassoCV
from sklearn.preprocessing import PolynomialFeatures, StandardScaler, LabelEncoder
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.pipeline import Pipeline
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Load the dataset
data = pd.read_csv('dataset.csv')

print("✅ Libraries imported and data loaded!")
print(f"Dataset shape: {data.shape}")
print(f"Columns: {list(data.columns)}")
data.head()

## Step 2: Create High-Dimensional Feature Space

Generate many features (including polynomial) to demonstrate the need for regularization.

In [None]:
# Prepare data with high-dimensional features to show regularization benefits
print("=" * 50)
print("HIGH-DIMENSIONAL FEATURE CREATION")
print("=" * 50)

# Handle categorical variables
label_encoder = LabelEncoder()
data_processed = data.copy()
data_processed['location_encoded'] = label_encoder.fit_transform(data['location'])

# Select base features
base_features = ['area', 'bedrooms', 'age', 'location_encoded']
X_base = data_processed[base_features]
y = data_processed['price']

print(f"Base features: {base_features}")
print(f"Base feature count: {X_base.shape[1]}")

# Create polynomial features (degree 3 for demonstration)
poly_features = PolynomialFeatures(degree=3, include_bias=False)
X_poly = poly_features.fit_transform(X_base)

print(f"Polynomial features count: {X_poly.shape[1]}")
print(f"Feature expansion: {X_poly.shape[1] / X_base.shape[1]:.1f}x")

# Get feature names
feature_names = poly_features.get_feature_names_out(base_features)
print(f"\nFirst 10 polynomial features:")
for i, name in enumerate(feature_names[:10]):
    print(f"  {i+1:2d}. {name}")

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X_poly, y, test_size=0.2, random_state=42
)

# Scale features (essential for regularization)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"\nData split and scaled:")
print(f"Training samples: {X_train_scaled.shape[0]}")
print(f"Testing samples: {X_test_scaled.shape[0]}")
print(f"Features: {X_train_scaled.shape[1]}")
print("✅ High-dimensional feature space created!")

## Step 3: Linear Regression Baseline (Overfitting Demo)

First, let's see how regular linear regression performs with many features.

In [None]:
# Train linear regression on high-dimensional data
linear_model = LinearRegression()
linear_model.fit(X_train_scaled, y_train)

# Predictions
y_train_pred_linear = linear_model.predict(X_train_scaled)
y_test_pred_linear = linear_model.predict(X_test_scaled)

# Evaluate
train_r2_linear = r2_score(y_train, y_train_pred_linear)
test_r2_linear = r2_score(y_test, y_test_pred_linear)
train_rmse_linear = np.sqrt(mean_squared_error(y_train, y_train_pred_linear))
test_rmse_linear = np.sqrt(mean_squared_error(y_test, y_test_pred_linear))

print("=" * 50)
print("LINEAR REGRESSION BASELINE (HIGH-DIMENSIONAL)")
print("=" * 50)
print(f"📊 Performance Metrics:")
print(f"  Training R²:   {train_r2_linear:.4f}")
print(f"  Testing R²:    {test_r2_linear:.4f}")
print(f"  Training RMSE: ${train_rmse_linear:,.2f}")
print(f"  Testing RMSE:  ${test_rmse_linear:,.2f}")

# Check for overfitting
overfitting_gap = train_r2_linear - test_r2_linear
print(f"\n🔍 Overfitting Analysis:")
print(f"  R² Gap (Train - Test): {overfitting_gap:.4f}")

if overfitting_gap > 0.1:
    print("  ❌ Significant overfitting detected!")
    print("  💡 Regularization is needed.")
elif overfitting_gap > 0.05:
    print("  ⚠️ Moderate overfitting detected.")
    print("  💡 Regularization would help.")
else:
    print("  ✅ No significant overfitting.")

# Analyze coefficient distribution
coefficients = linear_model.coef_
print(f"\n📈 Coefficient Analysis:")
print(f"  Number of features: {len(coefficients)}")
print(f"  Coefficient range: [{coefficients.min():.2f}, {coefficients.max():.2f}]")
print(f"  Coefficient std: {coefficients.std():.2f}")
print(f"  Large coefficients (>10k): {sum(np.abs(coefficients) > 10000)}")

if coefficients.std() > 1000:
    print("  ⚠️ High coefficient variance - model may be unstable")

print("\n✅ Baseline established - ready for regularization!")

## Step 4: Ridge Regression (L2 Regularization)

Apply Ridge regression to shrink coefficients and reduce overfitting.

In [None]:
# Explore different Ridge alpha values
ridge_alphas = [0.01, 0.1, 1, 10, 100, 1000, 10000]
ridge_results = []

print("=" * 50)
print("RIDGE REGRESSION (L2 REGULARIZATION)")
print("=" * 50)
print("Testing different regularization strengths...\n")

for alpha in ridge_alphas:
    # Train Ridge model
    ridge_model = Ridge(alpha=alpha, random_state=42)
    ridge_model.fit(X_train_scaled, y_train)
    
    # Predictions
    y_train_pred = ridge_model.predict(X_train_scaled)
    y_test_pred = ridge_model.predict(X_test_scaled)
    
    # Metrics
    train_r2 = r2_score(y_train, y_train_pred)
    test_r2 = r2_score(y_test, y_test_pred)
    train_rmse = np.sqrt(mean_squared_error(y_train, y_train_pred))
    test_rmse = np.sqrt(mean_squared_error(y_test, y_test_pred))
    
    # Coefficient analysis
    coef_norm = np.linalg.norm(ridge_model.coef_)
    coef_std = ridge_model.coef_.std()
    
    ridge_results.append({
        'alpha': alpha,
        'train_r2': train_r2,
        'test_r2': test_r2,
        'train_rmse': train_rmse,
        'test_rmse': test_rmse,
        'coef_norm': coef_norm,
        'coef_std': coef_std,
        'overfitting_gap': train_r2 - test_r2,
        'model': ridge_model
    })
    
    print(f"Alpha = {alpha:7.2f}: Test R² = {test_r2:.4f}, "
          f"Gap = {train_r2 - test_r2:+.4f}, Coef Norm = {coef_norm:.1f}")

# Find best Ridge alpha
best_ridge = max(ridge_results, key=lambda x: x['test_r2'])
print(f"\n🎯 Best Ridge Alpha: {best_ridge['alpha']}")
print(f"🎯 Best Test R²: {best_ridge['test_r2']:.4f}")
print(f"🎯 Overfitting Gap: {best_ridge['overfitting_gap']:+.4f}")

# Cross-validation for more robust alpha selection
ridge_cv = RidgeCV(alphas=ridge_alphas, cv=5, scoring='r2')
ridge_cv.fit(X_train_scaled, y_train)

print(f"\n📊 Cross-Validation Results:")
print(f"  Optimal Alpha (CV): {ridge_cv.alpha_}")
print(f"  CV Score: {ridge_cv.score(X_test_scaled, y_test):.4f}")

# Compare with linear regression
improvement = best_ridge['test_r2'] - test_r2_linear
gap_reduction = overfitting_gap - best_ridge['overfitting_gap']

print(f"\n💡 Ridge vs Linear Regression:")
print(f"  R² improvement: {improvement:+.4f}")
print(f"  Overfitting reduction: {gap_reduction:+.4f}")

if improvement > 0:
    print("  ✅ Ridge regression improves generalization!")
else:
    print("  ⚠️ Ridge regression doesn't improve this dataset.")

print("\n✅ Ridge regression analysis complete!")

## Step 5: Lasso Regression (L1 Regularization)

Apply Lasso regression for automatic feature selection.

In [None]:
# Explore different Lasso alpha values
lasso_alphas = [0.01, 0.1, 1, 10, 100, 1000]
lasso_results = []

print("=" * 50)
print("LASSO REGRESSION (L1 REGULARIZATION)")
print("=" * 50)
print("Testing different regularization strengths...\n")

for alpha in lasso_alphas:
    # Train Lasso model
    lasso_model = Lasso(alpha=alpha, random_state=42, max_iter=2000)
    lasso_model.fit(X_train_scaled, y_train)
    
    # Predictions
    y_train_pred = lasso_model.predict(X_train_scaled)
    y_test_pred = lasso_model.predict(X_test_scaled)
    
    # Metrics
    train_r2 = r2_score(y_train, y_train_pred)
    test_r2 = r2_score(y_test, y_test_pred)
    train_rmse = np.sqrt(mean_squared_error(y_train, y_train_pred))
    test_rmse = np.sqrt(mean_squared_error(y_test, y_test_pred))
    
    # Feature selection analysis
    selected_features = np.sum(lasso_model.coef_ != 0)
    sparsity = 1 - (selected_features / len(lasso_model.coef_))
    
    lasso_results.append({
        'alpha': alpha,
        'train_r2': train_r2,
        'test_r2': test_r2,
        'train_rmse': train_rmse,
        'test_rmse': test_rmse,
        'selected_features': selected_features,
        'sparsity': sparsity,
        'overfitting_gap': train_r2 - test_r2,
        'model': lasso_model
    })
    
    print(f"Alpha = {alpha:7.2f}: Test R² = {test_r2:.4f}, "
          f"Features = {selected_features:3d}/{len(lasso_model.coef_)}, "
          f"Sparsity = {sparsity:.1%}")

# Find best Lasso alpha
best_lasso = max(lasso_results, key=lambda x: x['test_r2'])
print(f"\n🎯 Best Lasso Alpha: {best_lasso['alpha']}")
print(f"🎯 Best Test R²: {best_lasso['test_r2']:.4f}")
print(f"🎯 Selected Features: {best_lasso['selected_features']}/{len(feature_names)}")
print(f"🎯 Sparsity: {best_lasso['sparsity']:.1%}")

# Cross-validation for Lasso
lasso_cv = LassoCV(alphas=lasso_alphas, cv=5, random_state=42, max_iter=2000)
lasso_cv.fit(X_train_scaled, y_train)

print(f"\n📊 Cross-Validation Results:")
print(f"  Optimal Alpha (CV): {lasso_cv.alpha_:.3f}")
print(f"  CV Score: {lasso_cv.score(X_test_scaled, y_test):.4f}")
print(f"  Selected Features (CV): {np.sum(lasso_cv.coef_ != 0)}")

# Feature selection analysis
best_model = best_lasso['model']
selected_indices = np.where(best_model.coef_ != 0)[0]
selected_feature_names = [feature_names[i] for i in selected_indices]
selected_coefficients = best_model.coef_[selected_indices]

print(f"\n🔍 Selected Features (Top 10):")
feature_importance = list(zip(selected_feature_names, np.abs(selected_coefficients)))
feature_importance.sort(key=lambda x: x[1], reverse=True)

for i, (feature, coef) in enumerate(feature_importance[:10], 1):
    print(f"  {i:2d}. {feature:25s}: {coef:8.2f}")

print("\n✅ Lasso regression analysis complete!")

## Step 6: Elastic Net (L1 + L2 Regularization)

Combine the benefits of both Ridge and Lasso regression.

In [None]:
# Elastic Net with grid search for best parameters
print("=" * 50)
print("ELASTIC NET REGRESSION (L1 + L2)")
print("=" * 50)

# Define parameter grid
param_grid = {
    'alpha': [0.01, 0.1, 1, 10, 100],
    'l1_ratio': [0.1, 0.3, 0.5, 0.7, 0.9]  # 0 = Ridge, 1 = Lasso
}

# Grid search with cross-validation
elastic_net = ElasticNet(random_state=42, max_iter=2000)
grid_search = GridSearchCV(
    elastic_net, param_grid, cv=5, scoring='r2', n_jobs=-1
)
grid_search.fit(X_train_scaled, y_train)

# Best model
best_elastic = grid_search.best_estimator_
best_params = grid_search.best_params_

print(f"🔍 Grid Search Results:")
print(f"  Best Alpha: {best_params['alpha']}")
print(f"  Best L1 Ratio: {best_params['l1_ratio']}")
print(f"  Best CV Score: {grid_search.best_score_:.4f}")

# Interpret L1 ratio
l1_ratio = best_params['l1_ratio']
if l1_ratio < 0.3:
    regularization_type = "Mostly Ridge (L2)"
elif l1_ratio > 0.7:
    regularization_type = "Mostly Lasso (L1)"
else:
    regularization_type = "Balanced (L1 + L2)"

print(f"  Regularization Type: {regularization_type}")

# Evaluate best Elastic Net
y_train_pred_elastic = best_elastic.predict(X_train_scaled)
y_test_pred_elastic = best_elastic.predict(X_test_scaled)

train_r2_elastic = r2_score(y_train, y_train_pred_elastic)
test_r2_elastic = r2_score(y_test, y_test_pred_elastic)
train_rmse_elastic = np.sqrt(mean_squared_error(y_train, y_train_pred_elastic))
test_rmse_elastic = np.sqrt(mean_squared_error(y_test, y_test_pred_elastic))

# Feature selection
selected_features_elastic = np.sum(best_elastic.coef_ != 0)
sparsity_elastic = 1 - (selected_features_elastic / len(best_elastic.coef_))

print(f"\n📊 Best Elastic Net Performance:")
print(f"  Training R²: {train_r2_elastic:.4f}")
print(f"  Testing R²: {test_r2_elastic:.4f}")
print(f"  Training RMSE: ${train_rmse_elastic:,.2f}")
print(f"  Testing RMSE: ${test_rmse_elastic:,.2f}")
print(f"  Selected Features: {selected_features_elastic}/{len(feature_names)}")
print(f"  Sparsity: {sparsity_elastic:.1%}")
print(f"  Overfitting Gap: {train_r2_elastic - test_r2_elastic:+.4f}")

# Store results for comparison
elastic_results = {
    'train_r2': train_r2_elastic,
    'test_r2': test_r2_elastic,
    'train_rmse': train_rmse_elastic,
    'test_rmse': test_rmse_elastic,
    'selected_features': selected_features_elastic,
    'sparsity': sparsity_elastic,
    'overfitting_gap': train_r2_elastic - test_r2_elastic
}

print("\n✅ Elastic Net analysis complete!")

## Step 7: Comprehensive Model Comparison

Compare all regularization methods and analyze their trade-offs.

In [None]:
# Comprehensive comparison of all models
print("=" * 60)
print("COMPREHENSIVE MODEL COMPARISON")
print("=" * 60)

# Collect all results
models_comparison = {
    'Linear Regression': {
        'train_r2': train_r2_linear,
        'test_r2': test_r2_linear,
        'train_rmse': train_rmse_linear,
        'test_rmse': test_rmse_linear,
        'selected_features': len(feature_names),
        'sparsity': 0.0,
        'overfitting_gap': overfitting_gap,
        'regularization': 'None'
    },
    'Ridge (Best)': {
        'train_r2': best_ridge['train_r2'],
        'test_r2': best_ridge['test_r2'],
        'train_rmse': best_ridge['train_rmse'],
        'test_rmse': best_ridge['test_rmse'],
        'selected_features': len(feature_names),
        'sparsity': 0.0,
        'overfitting_gap': best_ridge['overfitting_gap'],
        'regularization': f"L2 (α={best_ridge['alpha']})"
    },
    'Lasso (Best)': {
        'train_r2': best_lasso['train_r2'],
        'test_r2': best_lasso['test_r2'],
        'train_rmse': best_lasso['train_rmse'],
        'test_rmse': best_lasso['test_rmse'],
        'selected_features': best_lasso['selected_features'],
        'sparsity': best_lasso['sparsity'],
        'overfitting_gap': best_lasso['overfitting_gap'],
        'regularization': f"L1 (α={best_lasso['alpha']})"
    },
    'Elastic Net': {
        'train_r2': elastic_results['train_r2'],
        'test_r2': elastic_results['test_r2'],
        'train_rmse': elastic_results['train_rmse'],
        'test_rmse': elastic_results['test_rmse'],
        'selected_features': elastic_results['selected_features'],
        'sparsity': elastic_results['sparsity'],
        'overfitting_gap': elastic_results['overfitting_gap'],
        'regularization': f"L1+L2 (α={best_params['alpha']}, ratio={best_params['l1_ratio']})"
    }
}

# Create comparison table
print(f"{'Model':15s} {'Test R²':>10s} {'RMSE':>12s} {'Features':>10s} {'Sparsity':>10s} {'Gap':>8s}")
print("-" * 70)

for model_name, results in models_comparison.items():
    print(f"{model_name:15s} {results['test_r2']:10.4f} "
          f"${results['test_rmse']:10,.0f} {results['selected_features']:8d} "
          f"{results['sparsity']:9.1%} {results['overfitting_gap']:+7.3f}")

# Find best model
best_model_name = max(models_comparison.keys(), 
                      key=lambda x: models_comparison[x]['test_r2'])
best_model_results = models_comparison[best_model_name]

print(f"\n🏆 BEST MODEL: {best_model_name}")
print(f"  Test R²: {best_model_results['test_r2']:.4f}")
print(f"  RMSE: ${best_model_results['test_rmse']:,.2f}")
print(f"  Features: {best_model_results['selected_features']}/{len(feature_names)}")
print(f"  Regularization: {best_model_results['regularization']}")

# Analysis insights
print(f"\n💡 KEY INSIGHTS:")

# Overfitting analysis
linear_gap = models_comparison['Linear Regression']['overfitting_gap']
regularized_gaps = [results['overfitting_gap'] for name, results in models_comparison.items() 
                   if name != 'Linear Regression']
avg_regularized_gap = np.mean(regularized_gaps)

print(f"  • Regularization reduced overfitting by {linear_gap - avg_regularized_gap:.3f} on average")

# Feature selection
lasso_features = models_comparison['Lasso (Best)']['selected_features']
total_features = len(feature_names)
feature_reduction = (total_features - lasso_features) / total_features

print(f"  • Lasso eliminated {feature_reduction:.1%} of features while maintaining performance")

# Performance comparison
linear_r2 = models_comparison['Linear Regression']['test_r2']
best_r2 = best_model_results['test_r2']
improvement = best_r2 - linear_r2

if improvement > 0.01:
    print(f"  • Regularization improved test performance by {improvement:.3f} R²")
elif improvement > 0:
    print(f"  • Regularization slightly improved generalization")
else:
    print(f"  • Regularization didn't improve performance (data may not need it)")

print("\n✅ Comprehensive comparison complete!")

## Step 8: Regularization Path Visualization

Visualize how regularization affects coefficients and model performance.

In [None]:
# Create comprehensive visualizations
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Regularization Analysis and Comparison', fontsize=16, fontweight='bold')

# 1. Ridge Regularization Path
ridge_alphas_plot = [result['alpha'] for result in ridge_results]
ridge_test_r2s = [result['test_r2'] for result in ridge_results]
ridge_gaps = [result['overfitting_gap'] for result in ridge_results]

axes[0, 0].semilogx(ridge_alphas_plot, ridge_test_r2s, 'o-', color='blue', label='Test R²')
axes[0, 0].axhline(y=test_r2_linear, color='red', linestyle='--', label='Linear Baseline')
axes[0, 0].axvline(x=best_ridge['alpha'], color='green', linestyle=':', label='Best α')
axes[0, 0].set_xlabel('Ridge Alpha (Log Scale)')
axes[0, 0].set_ylabel('Test R²')
axes[0, 0].set_title('Ridge Regularization Path')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 2. Lasso Regularization Path
lasso_alphas_plot = [result['alpha'] for result in lasso_results]
lasso_test_r2s = [result['test_r2'] for result in lasso_results]
lasso_features = [result['selected_features'] for result in lasso_results]

ax2 = axes[0, 1]
ax2_twin = ax2.twinx()

line1 = ax2.semilogx(lasso_alphas_plot, lasso_test_r2s, 'o-', color='blue', label='Test R²')
line2 = ax2_twin.semilogx(lasso_alphas_plot, lasso_features, 's-', color='red', label='# Features')
ax2.axhline(y=test_r2_linear, color='gray', linestyle='--', alpha=0.7)
ax2.axvline(x=best_lasso['alpha'], color='green', linestyle=':', label='Best α')

ax2.set_xlabel('Lasso Alpha (Log Scale)')
ax2.set_ylabel('Test R²', color='blue')
ax2_twin.set_ylabel('Selected Features', color='red')
ax2.set_title('Lasso Feature Selection Path')
ax2.grid(True, alpha=0.3)

# Combine legends
lines = line1 + line2
labels = [l.get_label() for l in lines]
ax2.legend(lines, labels, loc='center right')

# 3. Overfitting Comparison
model_names = list(models_comparison.keys())
overfitting_gaps = [models_comparison[name]['overfitting_gap'] for name in model_names]
colors = ['red', 'blue', 'green', 'orange']

bars = axes[0, 2].bar(range(len(model_names)), overfitting_gaps, 
                      color=colors, alpha=0.7)
axes[0, 2].axhline(y=0, color='black', linestyle='-', alpha=0.3)
axes[0, 2].set_xlabel('Model')
axes[0, 2].set_ylabel('Overfitting Gap (Train R² - Test R²)')
axes[0, 2].set_title('Overfitting Comparison')
axes[0, 2].set_xticks(range(len(model_names)))
axes[0, 2].set_xticklabels([name.split()[0] for name in model_names], rotation=45)
axes[0, 2].grid(True, alpha=0.3)

# Add value labels
for bar, gap in zip(bars, overfitting_gaps):
    height = bar.get_height()
    axes[0, 2].text(bar.get_x() + bar.get_width()/2., height + 0.001,
                    f'{gap:.3f}', ha='center', va='bottom', fontsize=8)

# 4. Coefficient Magnitude Comparison
# Compare coefficient norms for different regularization strengths
ridge_coef_norms = [result['coef_norm'] for result in ridge_results]
axes[1, 0].semilogx(ridge_alphas_plot, ridge_coef_norms, 'o-', color='blue')
axes[1, 0].axhline(y=np.linalg.norm(linear_model.coef_), color='red', 
                   linestyle='--', label='Linear Baseline')
axes[1, 0].set_xlabel('Ridge Alpha (Log Scale)')
axes[1, 0].set_ylabel('Coefficient L2 Norm')
axes[1, 0].set_title('Ridge Coefficient Shrinkage')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# 5. Feature Selection Visualization
sparsity_levels = [result['sparsity'] for result in lasso_results]
axes[1, 1].semilogx(lasso_alphas_plot, sparsity_levels, 'o-', color='green')
axes[1, 1].set_xlabel('Lasso Alpha (Log Scale)')
axes[1, 1].set_ylabel('Sparsity (% Features Eliminated)')
axes[1, 1].set_title('Lasso Sparsity Path')
axes[1, 1].axvline(x=best_lasso['alpha'], color='red', linestyle=':', label='Best α')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

# 6. Performance vs Complexity Trade-off
all_test_r2s = [models_comparison[name]['test_r2'] for name in model_names]
all_features = [models_comparison[name]['selected_features'] for name in model_names]

scatter = axes[1, 2].scatter(all_features, all_test_r2s, s=100, c=colors, alpha=0.7)
for i, name in enumerate(model_names):
    axes[1, 2].annotate(name.split()[0], (all_features[i], all_test_r2s[i]), 
                       xytext=(5, 5), textcoords='offset points', fontsize=8)
axes[1, 2].set_xlabel('Number of Selected Features')
axes[1, 2].set_ylabel('Test R²')
axes[1, 2].set_title('Performance vs Model Complexity')
axes[1, 2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("📊 Regularization visualization complete!")

## Step 9: Summary and Key Insights

### 🎯 What We Accomplished:
1. **Demonstrated overfitting** with high-dimensional linear regression
2. **Applied Ridge regression** (L2) to shrink coefficients
3. **Used Lasso regression** (L1) for automatic feature selection
4. **Combined both** with Elastic Net regularization
5. **Optimized hyperparameters** using cross-validation
6. **Visualized regularization paths** and trade-offs
7. **Compared all methods** comprehensively

### 📊 Key Results:
- **Best Model**: {best_model_name}
- **Test R²**: {best_model_results['test_r2']:.4f}
- **Feature Reduction**: {(len(feature_names) - best_model_results['selected_features'])/len(feature_names):.1%}
- **Overfitting Reduction**: {linear_gap - best_model_results['overfitting_gap']:+.3f}

### 💡 Key Learnings:

**Ridge Regression (L2):**
- **Shrinks coefficients** towards zero but doesn't eliminate them
- **Reduces overfitting** by penalizing large coefficients
- **Works well** when many features are relevant
- **Computationally efficient** and numerically stable

**Lasso Regression (L1):**
- **Eliminates irrelevant features** by setting coefficients to zero
- **Performs automatic feature selection**
- **Creates sparse models** that are interpretable
- **Can struggle** when features are highly correlated

**Elastic Net:**
- **Combines benefits** of Ridge and Lasso
- **Handles correlated features** better than Lasso alone
- **Provides balanced regularization**
- **Requires tuning** two hyperparameters

**When to Use Each:**
- **Ridge**: Many relevant features, multicollinearity
- **Lasso**: Feature selection needed, sparse solutions desired
- **Elastic Net**: Mixed requirements, correlated features

### ⚖️ The Bias-Variance-Sparsity Tradeoff:
- **No Regularization**: Low bias, high variance, all features
- **Ridge**: Slight bias increase, variance reduction, all features
- **Lasso**: Moderate bias, variance reduction, feature selection
- **High Regularization**: High bias, low variance, few features

### 🔧 Practical Guidelines:
1. **Always scale features** before regularization
2. **Use cross-validation** to select hyperparameters
3. **Start with Elastic Net** for unknown problems
4. **Plot regularization paths** to understand behavior
5. **Consider domain knowledge** in feature selection

### 🚀 Next Steps:
- Explore **advanced regularization** (Group Lasso, Sparse Group Lasso)
- Learn about **kernel methods** and **support vector machines**
- Study **ensemble methods** (Random Forest, Gradient Boosting)
- Practice with **time series** and **multi-output** regression

### 🤔 Questions to Explore:
- How do we choose between L1 and L2 regularization?
- What happens when we have more features than samples?
- How does regularization relate to Bayesian methods?
- Can we combine regularization with other techniques?

Excellent work mastering regularization techniques! 🎉

You now understand how to:
- **Control overfitting** with smart constraints
- **Select important features** automatically
- **Balance model complexity** and performance
- **Build robust models** that generalize well

These skills are fundamental for real-world machine learning success! 🌟