# Tier 2: Ridge & Lasso Regression

---

**Author:** Brandon Deloatch
**Affiliation:** Quipu Research Labs, LLC
**Date:** 2025-10-02
**Version:** v1.3
**License:** MIT
**Notebook ID:** 56f0a5c5-f6a1-4ea3-ba51-bf3222d84127

---

## Citation
Brandon Deloatch, "Tier 2: Ridge & Lasso Regression," Quipu Research Labs, LLC, v1.3, 2025-10-02.

Please cite this notebook if used or adapted in publications, presentations, or derivative work.

---

## Contributors / Acknowledgments
- **Primary Author:** Brandon Deloatch (Quipu Research Labs, LLC)
- **Institutional Support:** Quipu Research Labs, LLC - Advanced Analytics Division
- **Technical Framework:** Built on scikit-learn, pandas, numpy, and plotly ecosystems
- **Methodological Foundation:** Statistical learning principles and modern data science best practices

---

## Version History
| Version | Date | Notes |
|---------|------|-------|
| v1.3 | 2025-10-02 | Enhanced professional formatting, comprehensive documentation, interactive visualizations |
| v1.2 | 2024-09-15 | Updated analysis methods, improved data generation algorithms |
| v1.0 | 2024-06-10 | Initial release with core analytical framework |

---

## Environment Dependencies
- **Python:** 3.8+
- **Core Libraries:** pandas 2.0+, numpy 1.24+, scikit-learn 1.3+
- **Visualization:** plotly 5.0+, matplotlib 3.7+
- **Statistical:** scipy 1.10+, statsmodels 0.14+
- **Development:** jupyter-lab 4.0+, ipywidgets 8.0+

> **Reproducibility Note:** Use requirements.txt or environment.yml for exact dependency matching.

---

## Data Provenance
| Dataset | Source | License | Notes |
|---------|--------|---------|-------|
| Synthetic Data | Generated in-notebook | MIT | Custom algorithms for realistic simulation |
| Statistical Distributions | NumPy/SciPy | BSD-3-Clause | Standard library implementations |
| ML Algorithms | Scikit-learn | BSD-3-Clause | Industry-standard implementations |
| Visualization Schemas | Plotly | MIT | Interactive dashboard frameworks |

---

## Execution Provenance Logs
- **Created:** 2025-10-02
- **Notebook ID:** 56f0a5c5-f6a1-4ea3-ba51-bf3222d84127
- **Execution Environment:** Jupyter Lab / VS Code
- **Computational Requirements:** Standard laptop/workstation (2GB+ RAM recommended)

> **Auto-tracking:** Execution metadata can be programmatically captured for reproducibility.

---

## Disclaimer & Responsible Use
This notebook is provided "as-is" for educational, research, and professional development purposes. Users assume full responsibility for any results, applications, or decisions derived from this analysis.

**Professional Standards:**
- Validate all results against domain expertise and additional data sources
- Respect licensing and attribution requirements for all dependencies
- Follow ethical guidelines for data analysis and algorithmic decision-making
- Credit all methodological sources and derivative frameworks appropriately

**Academic & Commercial Use:**
- Permitted under MIT license with proper attribution
- Suitable for educational curriculum and professional training
- Appropriate for commercial adaptation with citation requirements
- Recommended for reproducible research and transparent analytics

---



In [None]:
# Import Essential Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.figure_factory as ff

# Scikit-learn imports
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import SelectKBest, f_regression

# Statistical libraries
import scipy.stats as stats
from scipy.optimize import minimize_scalar

import warnings
warnings.filterwarnings('ignore')

print(" Tier 2: Ridge & Lasso Regression - Libraries Loaded Successfully!")
print("=" * 70)
print("Available Regularization Techniques:")
print("• Ridge Regression (L2) - Shrinks coefficients toward zero")
print("• Lasso Regression (L1) - Automatic feature selection")
print("• Elastic Net - Combines L1 + L2 regularization")
print("• Cross-Validation - Optimal hyperparameter selection")
print("• Regularization Path - Coefficient evolution analysis")
print("• Feature Importance - Variable significance ranking")

In [None]:
# Generate Comprehensive Dataset for Regularization Demonstration
np.random.seed(42)

def generate_regularization_dataset(n_samples=1000, n_features=50, n_informative=15, noise_level=0.1):
 """
 Generate dataset with varying feature importance levels for regularization analysis
 """

 # Create informative features with different importance levels
 X_informative = np.random.randn(n_samples, n_informative)

 # True coefficients with decreasing importance
 true_coefficients = np.array([10, 8, 6, 5, 4, 3, 2, 1.5, 1, 0.8, 0.6, 0.4, 0.3, 0.2, 0.1])

 # Generate target variable from informative features
 y_signal = X_informative @ true_coefficients

 # Add noise features (should be eliminated by Lasso)
 X_noise = np.random.randn(n_samples, n_features - n_informative)

 # Combine informative and noise features
 X = np.hstack([X_informative, X_noise])

 # Add noise to target
 noise = np.random.normal(0, noise_level * np.std(y_signal), n_samples)
 y = y_signal + noise

 # Create feature names
 feature_names = ([f'Important_{i+1}' for i in range(n_informative)] +
 [f'Noise_{i+1}' for i in range(n_features - n_informative)])

 # Create DataFrame
 df = pd.DataFrame(X, columns=feature_names)
 df['target'] = y

 # Add some business context variables
 df['marketing_spend'] = df['Important_1'] * 1000 + 5000 + np.random.normal(0, 200, n_samples)
 df['product_quality'] = df['Important_2'] * 0.5 + 7 + np.random.normal(0, 0.3, n_samples)
 df['customer_satisfaction'] = df['Important_3'] * 0.3 + 8 + np.random.normal(0, 0.2, n_samples)
 df['competition_index'] = -df['Important_4'] * 0.2 + 5 + np.random.normal(0, 0.5, n_samples)

 return df, true_coefficients, feature_names

# Generate dataset
print(" Generating regularization demonstration dataset...")
df, true_coefficients, feature_names = generate_regularization_dataset()

print(f"Dataset Shape: {df.shape}")
print(f"Informative Features: {len([f for f in feature_names if 'Important' in f])}")
print(f"Noise Features: {len([f for f in feature_names if 'Noise' in f])}")

print("\nDataset Overview:")
print(df.head())
print("\nBasic Statistics:")
print(df.describe().round(3))

In [None]:
# 1. DATA PREPARATION AND FEATURE ANALYSIS
print(" 1. DATA PREPARATION AND FEATURE ANALYSIS")
print("=" * 45)

# Prepare features and target
feature_cols = [col for col in df.columns if col not in ['target']]
X = df[feature_cols].values
y = df['target'].values

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features (important for regularization)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")

# Analyze feature correlations
correlation_matrix = df[feature_cols].corr()

# Create correlation heatmap
fig_corr = go.Figure(data=go.Heatmap(
 z=correlation_matrix.values,
 x=correlation_matrix.columns,
 y=correlation_matrix.columns,
 colorscale='RdBu_r',
 zmid=0,
 text=correlation_matrix.round(2).values,
 texttemplate="%{text}",
 textfont={"size": 8},
 hoverongaps=False
))

fig_corr.update_layout(
 title="Feature Correlation Matrix (First 20 Features)",
 width=800,
 height=600,
 xaxis_title="Features",
 yaxis_title="Features"
)

# Show only first 20 features for readability
if len(feature_cols) > 20:
 corr_subset = correlation_matrix.iloc[:20, :20]
 fig_corr.data[0].z = corr_subset.values
 fig_corr.data[0].x = corr_subset.columns
 fig_corr.data[0].y = corr_subset.columns
 fig_corr.data[0].text = corr_subset.round(2).values

fig_corr.show()

# Feature importance analysis using basic correlation with target
feature_importance = pd.DataFrame({
 'feature': feature_cols,
 'correlation_with_target': [np.corrcoef(df[col], df['target'])[0,1] for col in feature_cols],
 'abs_correlation': [abs(np.corrcoef(df[col], df['target'])[0,1]) for col in feature_cols]
})

feature_importance = feature_importance.sort_values('abs_correlation', ascending=False)

# Plot feature importance
fig_importance = go.Figure()

# Color by feature type
colors = ['red' if 'Important' in feat else 'blue' for feat in feature_importance['feature']]

fig_importance.add_trace(
 go.Bar(
 x=feature_importance['feature'][:20], # Top 20 features
 y=feature_importance['abs_correlation'][:20],
 marker_color=colors[:20],
 name='Feature Importance',
 hovertemplate="<b>%{x}</b><br>Correlation: %{y:.3f}<extra></extra>"
 )
)

fig_importance.update_layout(
 title="Feature Importance: Correlation with Target (Top 20)",
 xaxis_title="Features",
 yaxis_title="Absolute Correlation with Target",
 height=500,
 xaxis_tickangle=-45
)
fig_importance.show()

print(" Feature Analysis Summary:")
print(f"• Strongest predictor: {feature_importance.iloc[0]['feature']} (r={feature_importance.iloc[0]['correlation_with_target']:.3f})")
print(f"• Number of features with |r| > 0.1: {(feature_importance['abs_correlation'] > 0.1).sum()}")
print(f"• Number of features with |r| < 0.05: {(feature_importance['abs_correlation'] < 0.05).sum()}")

# Show true vs discovered important features
print(f"\n True vs Discovered Important Features:")
important_features = feature_importance[feature_importance['abs_correlation'] > 0.1]['feature'].tolist()
true_important = [f for f in feature_cols if 'Important' in f]
discovered_important = [f for f in important_features if 'Important' in f]

print(f"• True important features: {len(true_important)}")
print(f"• Discovered important features: {len(discovered_important)}")
print(f"• Discovery accuracy: {len(discovered_important)/len(true_important):.1%}")

In [None]:
# 2. BASELINE LINEAR REGRESSION ANALYSIS
print("\n 2. BASELINE LINEAR REGRESSION ANALYSIS")
print("=" * 43)

# Fit standard linear regression
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)

# Predictions
y_train_pred_lr = lr.predict(X_train_scaled)
y_test_pred_lr = lr.predict(X_test_scaled)

# Calculate metrics
train_mse_lr = mean_squared_error(y_train, y_train_pred_lr)
test_mse_lr = mean_squared_error(y_test, y_test_pred_lr)
train_r2_lr = r2_score(y_train, y_train_pred_lr)
test_r2_lr = r2_score(y_test, y_test_pred_lr)

print(" Linear Regression Performance:")
print(f"• Training MSE: {train_mse_lr:.4f}")
print(f"• Test MSE: {test_mse_lr:.4f}")
print(f"• Training R²: {train_r2_lr:.4f}")
print(f"• Test R²: {test_r2_lr:.4f}")
print(f"• Overfitting indicator: {train_r2_lr - test_r2_lr:.4f}")

# Analyze coefficient magnitudes
lr_coefficients = pd.DataFrame({
 'feature': feature_cols,
 'coefficient': lr.coef_,
 'abs_coefficient': np.abs(lr.coef_)
})
lr_coefficients = lr_coefficients.sort_values('abs_coefficient', ascending=False)

# Plot coefficient magnitudes
fig_coef = go.Figure()

colors_coef = ['red' if 'Important' in feat else 'lightblue' for feat in lr_coefficients['feature']]

fig_coef.add_trace(
 go.Bar(
 x=lr_coefficients['feature'][:20],
 y=lr_coefficients['abs_coefficient'][:20],
 marker_color=colors_coef[:20],
 name='Coefficient Magnitude',
 hovertemplate="<b>%{x}</b><br>|Coefficient|: %{y:.3f}<extra></extra>"
 )
)

fig_coef.update_layout(
 title="Linear Regression: Coefficient Magnitudes (Top 20)",
 xaxis_title="Features",
 yaxis_title="Absolute Coefficient Value",
 height=500,
 xaxis_tickangle=-45
)
fig_coef.show()

# Check for overfitting indicators
print(f"\n Overfitting Analysis:")
coefficient_variance = np.var(lr.coef_)
large_coefficients = (np.abs(lr.coef_) > 1).sum()
print(f"• Coefficient variance: {coefficient_variance:.4f}")
print(f"• Number of large coefficients (|coef| > 1): {large_coefficients}")
print(f"• Max coefficient magnitude: {np.max(np.abs(lr.coef_)):.4f}")

if train_r2_lr - test_r2_lr > 0.1:
 print(" High overfitting detected - regularization recommended!")
elif large_coefficients > 10:
 print(" Many large coefficients - potential instability!")
else:
 print(" Model appears reasonably stable")

# Residual analysis
residuals_train = y_train - y_train_pred_lr
residuals_test = y_test - y_test_pred_lr

# Create residual plots
fig_residuals = make_subplots(
 rows=1, cols=2,
 subplot_titles=("Training Residuals", "Test Residuals")
)

# Training residuals
fig_residuals.add_trace(
 go.Scatter(
 x=y_train_pred_lr,
 y=residuals_train,
 mode='markers',
 marker=dict(color='blue', opacity=0.6),
 name='Training',
 hovertemplate="Predicted: %{x:.2f}<br>Residual: %{y:.2f}<extra></extra>"
 ),
 row=1, col=1
)

# Test residuals
fig_residuals.add_trace(
 go.Scatter(
 x=y_test_pred_lr,
 y=residuals_test,
 mode='markers',
 marker=dict(color='red', opacity=0.6),
 name='Test',
 hovertemplate="Predicted: %{x:.2f}<br>Residual: %{y:.2f}<extra></extra>"
 ),
 row=1, col=2
)

# Add zero lines
for col in [1, 2]:
 fig_residuals.add_hline(y=0, line_dash="dash", line_color="black", row=1, col=col)

fig_residuals.update_layout(
 title="Linear Regression: Residual Analysis",
 height=400
)
fig_residuals.show()

print(f"\n Residual Analysis:")
print(f"• Training residual std: {np.std(residuals_train):.4f}")
print(f"• Test residual std: {np.std(residuals_test):.4f}")
print(f"• Residual correlation: {np.corrcoef(residuals_train, residuals_test)[0,1]:.4f}")

In [None]:
# 3. RIDGE REGRESSION (L2 REGULARIZATION)
print("\n 3. RIDGE REGRESSION (L2 REGULARIZATION)")
print("=" * 42)

# Ridge regression with cross-validation for alpha selection
from sklearn.linear_model import RidgeCV

# Define alpha range for Ridge
alphas_ridge = np.logspace(-4, 4, 50) # From 0.0001 to 10000

# Fit Ridge with cross-validation
ridge_cv = RidgeCV(alphas=alphas_ridge, cv=5, scoring='neg_mean_squared_error')
ridge_cv.fit(X_train_scaled, y_train)

optimal_alpha_ridge = ridge_cv.alpha_
print(f" Optimal Ridge alpha: {optimal_alpha_ridge:.6f}")

# Fit Ridge with optimal alpha
ridge = Ridge(alpha=optimal_alpha_ridge)
ridge.fit(X_train_scaled, y_train)

# Predictions
y_train_pred_ridge = ridge.predict(X_train_scaled)
y_test_pred_ridge = ridge.predict(X_test_scaled)

# Calculate metrics
train_mse_ridge = mean_squared_error(y_train, y_train_pred_ridge)
test_mse_ridge = mean_squared_error(y_test, y_test_pred_ridge)
train_r2_ridge = r2_score(y_train, y_train_pred_ridge)
test_r2_ridge = r2_score(y_test, y_test_pred_ridge)

print(" Ridge Regression Performance:")
print(f"• Training MSE: {train_mse_ridge:.4f}")
print(f"• Test MSE: {test_mse_ridge:.4f}")
print(f"• Training R²: {train_r2_ridge:.4f}")
print(f"• Test R²: {test_r2_ridge:.4f}")
print(f"• Overfitting indicator: {train_r2_ridge - test_r2_ridge:.4f}")

# Compare with linear regression
print(f"\n Improvement over Linear Regression:")
print(f"• Test MSE improvement: {((test_mse_lr - test_mse_ridge) / test_mse_lr * 100):.2f}%")
print(f"• Test R² improvement: {test_r2_ridge - test_r2_lr:.4f}")
print(f"• Overfitting reduction: {(train_r2_lr - test_r2_lr) - (train_r2_ridge - test_r2_ridge):.4f}")

# Ridge regularization path analysis
alphas_path = np.logspace(-4, 4, 100)
ridge_coefficients = []
ridge_scores = []

for alpha in alphas_path:
 ridge_temp = Ridge(alpha=alpha)
 ridge_temp.fit(X_train_scaled, y_train)
 ridge_coefficients.append(ridge_temp.coef_)
 ridge_scores.append(ridge_temp.score(X_test_scaled, y_test))

ridge_coefficients = np.array(ridge_coefficients)

# Plot regularization path
fig_path = go.Figure()

# Plot coefficient paths for important features
important_indices = [i for i, name in enumerate(feature_cols) if 'Important' in name][:10]

for idx in important_indices:
 fig_path.add_trace(
 go.Scatter(
 x=alphas_path,
 y=ridge_coefficients[:, idx],
 mode='lines',
 name=feature_cols[idx],
 line=dict(width=2),
 hovertemplate=f"<b>{feature_cols[idx]}</b><br>Alpha: %{{x:.6f}}<br>Coefficient: %{{y:.3f}}<extra></extra>"
 )
 )

fig_path.update_layout(
 title="Ridge Regression: Regularization Path (Important Features)",
 xaxis_title="Alpha (log scale)",
 yaxis_title="Coefficient Value",
 xaxis_type="log",
 height=500,
 showlegend=True
)
fig_path.show()

# Plot alpha vs model performance
fig_alpha = go.Figure()

fig_alpha.add_trace(
 go.Scatter(
 x=alphas_path,
 y=ridge_scores,
 mode='lines',
 name='Test R² Score',
 line=dict(color='blue', width=3),
 hovertemplate="Alpha: %{x:.6f}<br>R² Score: %{y:.4f}<extra></extra>"
 )
)

# Mark optimal alpha
fig_alpha.add_vline(
 x=optimal_alpha_ridge,
 line_dash="dash",
 line_color="red",
 annotation_text=f"Optimal α = {optimal_alpha_ridge:.6f}"
)

fig_alpha.update_layout(
 title="Ridge Regression: Alpha vs Model Performance",
 xaxis_title="Alpha (log scale)",
 yaxis_title="Test R² Score",
 xaxis_type="log",
 height=500
)
fig_alpha.show()

# Coefficient comparison
ridge_coefficients_df = pd.DataFrame({
 'feature': feature_cols,
 'linear_coef': lr.coef_,
 'ridge_coef': ridge.coef_,
 'abs_linear': np.abs(lr.coef_),
 'abs_ridge': np.abs(ridge.coef_),
 'shrinkage': np.abs(lr.coef_) - np.abs(ridge.coef_)
})
ridge_coefficients_df = ridge_coefficients_df.sort_values('abs_linear', ascending=False)

# Plot coefficient shrinkage
fig_shrink = make_subplots(
 rows=1, cols=2,
 subplot_titles=("Coefficient Comparison", "Shrinkage Effect")
)

# Coefficient comparison
fig_shrink.add_trace(
 go.Scatter(
 x=ridge_coefficients_df['abs_linear'][:20],
 y=ridge_coefficients_df['abs_ridge'][:20],
 mode='markers',
 marker=dict(size=8, color='blue'),
 name='Ridge vs Linear',
 text=ridge_coefficients_df['feature'][:20],
 hovertemplate="<b>%{text}</b><br>Linear: %{x:.3f}<br>Ridge: %{y:.3f}<extra></extra>"
 ),
 row=1, col=1
)

# Add diagonal line (no shrinkage)
max_coef = max(ridge_coefficients_df['abs_linear'].max(), ridge_coefficients_df['abs_ridge'].max())
fig_shrink.add_trace(
 go.Scatter(
 x=[0, max_coef],
 y=[0, max_coef],
 mode='lines',
 line=dict(color='red', dash='dash'),
 name='No Shrinkage',
 showlegend=False
 ),
 row=1, col=1
)

# Shrinkage magnitude
colors_shrink = ['red' if 'Important' in feat else 'lightblue'
 for feat in ridge_coefficients_df['feature'][:20]]

fig_shrink.add_trace(
 go.Bar(
 x=ridge_coefficients_df['feature'][:20],
 y=ridge_coefficients_df['shrinkage'][:20],
 marker_color=colors_shrink,
 name='Shrinkage',
 showlegend=False,
 hovertemplate="<b>%{x}</b><br>Shrinkage: %{y:.3f}<extra></extra>"
 ),
 row=1, col=2
)

fig_shrink.update_layout(
 title="Ridge Regression: Coefficient Shrinkage Analysis",
 height=500
)
fig_shrink.update_xaxes(title_text="Linear Regression |Coef|", row=1, col=1)
fig_shrink.update_yaxes(title_text="Ridge Regression |Coef|", row=1, col=1)
fig_shrink.update_xaxes(title_text="Features", row=1, col=2, tickangle=-45)
fig_shrink.update_yaxes(title_text="Shrinkage Amount", row=1, col=2)

fig_shrink.show()

print(f"\n Ridge Coefficient Analysis:")
avg_shrinkage = ridge_coefficients_df['shrinkage'].mean()
max_shrinkage = ridge_coefficients_df['shrinkage'].max()
print(f"• Average coefficient shrinkage: {avg_shrinkage:.4f}")
print(f"• Maximum coefficient shrinkage: {max_shrinkage:.4f}")
print(f"• Coefficient variance reduction: {np.var(lr.coef_) - np.var(ridge.coef_):.4f}")

In [None]:
# 4. LASSO REGRESSION (L1 REGULARIZATION)
print("\n 4. LASSO REGRESSION (L1 REGULARIZATION)")
print("=" * 43)

# Lasso regression with cross-validation for alpha selection
from sklearn.linear_model import LassoCV

# Define alpha range for Lasso
alphas_lasso = np.logspace(-4, 2, 50) # From 0.0001 to 100

# Fit Lasso with cross-validation
lasso_cv = LassoCV(alphas=alphas_lasso, cv=5, random_state=42, max_iter=2000)
lasso_cv.fit(X_train_scaled, y_train)

optimal_alpha_lasso = lasso_cv.alpha_
print(f" Optimal Lasso alpha: {optimal_alpha_lasso:.6f}")

# Fit Lasso with optimal alpha
lasso = Lasso(alpha=optimal_alpha_lasso, random_state=42, max_iter=2000)
lasso.fit(X_train_scaled, y_train)

# Predictions
y_train_pred_lasso = lasso.predict(X_train_scaled)
y_test_pred_lasso = lasso.predict(X_test_scaled)

# Calculate metrics
train_mse_lasso = mean_squared_error(y_train, y_train_pred_lasso)
test_mse_lasso = mean_squared_error(y_test, y_test_pred_lasso)
train_r2_lasso = r2_score(y_train, y_train_pred_lasso)
test_r2_lasso = r2_score(y_test, y_test_pred_lasso)

print(" Lasso Regression Performance:")
print(f"• Training MSE: {train_mse_lasso:.4f}")
print(f"• Test MSE: {test_mse_lasso:.4f}")
print(f"• Training R²: {train_r2_lasso:.4f}")
print(f"• Test R²: {test_r2_lasso:.4f}")
print(f"• Overfitting indicator: {train_r2_lasso - test_r2_lasso:.4f}")

# Feature selection analysis
n_selected_features = np.sum(lasso.coef_ != 0)
n_zero_coefficients = np.sum(lasso.coef_ == 0)

print(f"\n Lasso Feature Selection:")
print(f"• Selected features: {n_selected_features}/{len(feature_cols)}")
print(f"• Eliminated features: {n_zero_coefficients}/{len(feature_cols)}")
print(f"• Feature selection rate: {n_zero_coefficients/len(feature_cols):.1%}")

# Analyze which features were selected
selected_features = [feature_cols[i] for i, coef in enumerate(lasso.coef_) if coef != 0]
eliminated_features = [feature_cols[i] for i, coef in enumerate(lasso.coef_) if coef == 0]

selected_important = [f for f in selected_features if 'Important' in f]
eliminated_important = [f for f in eliminated_features if 'Important' in f]
selected_noise = [f for f in selected_features if 'Noise' in f]
eliminated_noise = [f for f in eliminated_features if 'Noise' in f]

print(f"\n Feature Selection Accuracy:")
print(f"• Important features selected: {len(selected_important)}/{len([f for f in feature_cols if 'Important' in f])}")
print(f"• Noise features eliminated: {len(eliminated_noise)}/{len([f for f in feature_cols if 'Noise' in f])}")
print(f"• Important features incorrectly eliminated: {len(eliminated_important)}")
print(f"• Noise features incorrectly selected: {len(selected_noise)}")

# Lasso regularization path
alphas_lasso_path = np.logspace(-4, 2, 100)
lasso_coefficients = []
lasso_scores = []
n_features_selected = []

for alpha in alphas_lasso_path:
 lasso_temp = Lasso(alpha=alpha, random_state=42, max_iter=2000)
 lasso_temp.fit(X_train_scaled, y_train)
 lasso_coefficients.append(lasso_temp.coef_)
 lasso_scores.append(lasso_temp.score(X_test_scaled, y_test))
 n_features_selected.append(np.sum(lasso_temp.coef_ != 0))

lasso_coefficients = np.array(lasso_coefficients)

# Plot Lasso regularization path
fig_lasso_path = go.Figure()

# Plot coefficient paths for important features
for idx in important_indices[:10]:
 fig_lasso_path.add_trace(
 go.Scatter(
 x=alphas_lasso_path,
 y=lasso_coefficients[:, idx],
 mode='lines',
 name=feature_cols[idx],
 line=dict(width=2),
 hovertemplate=f"<b>{feature_cols[idx]}</b><br>Alpha: %{{x:.6f}}<br>Coefficient: %{{y:.3f}}<extra></extra>"
 )
 )

fig_lasso_path.update_layout(
 title="Lasso Regression: Regularization Path (Important Features)",
 xaxis_title="Alpha (log scale)",
 yaxis_title="Coefficient Value",
 xaxis_type="log",
 height=500,
 showlegend=True
)
fig_lasso_path.show()

# Plot alpha vs number of features and performance
fig_lasso_analysis = make_subplots(
 rows=1, cols=2,
 subplot_titles=("Alpha vs Performance", "Alpha vs Feature Count")
)

# Performance vs alpha
fig_lasso_analysis.add_trace(
 go.Scatter(
 x=alphas_lasso_path,
 y=lasso_scores,
 mode='lines',
 name='Test R² Score',
 line=dict(color='blue', width=3),
 hovertemplate="Alpha: %{x:.6f}<br>R² Score: %{y:.4f}<extra></extra>"
 ),
 row=1, col=1
)

# Mark optimal alpha
fig_lasso_analysis.add_vline(
 x=optimal_alpha_lasso,
 line_dash="dash",
 line_color="red",
 annotation_text=f"Optimal α = {optimal_alpha_lasso:.6f}",
 row=1, col=1
)

# Feature count vs alpha
fig_lasso_analysis.add_trace(
 go.Scatter(
 x=alphas_lasso_path,
 y=n_features_selected,
 mode='lines',
 name='Selected Features',
 line=dict(color='green', width=3),
 hovertemplate="Alpha: %{x:.6f}<br>Features: %{y}<extra></extra>"
 ),
 row=1, col=2
)

fig_lasso_analysis.add_vline(
 x=optimal_alpha_lasso,
 line_dash="dash",
 line_color="red",
 row=1, col=2
)

fig_lasso_analysis.update_layout(
 title="Lasso Regression: Alpha Analysis",
 height=500
)
fig_lasso_analysis.update_xaxes(title_text="Alpha (log scale)", type="log")
fig_lasso_analysis.update_yaxes(title_text="Test R² Score", row=1, col=1)
fig_lasso_analysis.update_yaxes(title_text="Number of Selected Features", row=1, col=2)

fig_lasso_analysis.show()

# Feature selection visualization
lasso_coefficients_df = pd.DataFrame({
 'feature': feature_cols,
 'lasso_coef': lasso.coef_,
 'abs_lasso': np.abs(lasso.coef_),
 'selected': lasso.coef_ != 0,
 'feature_type': ['Important' if 'Important' in f else 'Noise' for f in feature_cols]
})

# Sort by absolute coefficient value
lasso_coefficients_df = lasso_coefficients_df.sort_values('abs_lasso', ascending=False)

# Plot selected features
selected_df = lasso_coefficients_df[lasso_coefficients_df['selected']].head(20)

fig_selected = go.Figure()

colors_selected = ['red' if ft == 'Important' else 'lightblue' for ft in selected_df['feature_type']]

fig_selected.add_trace(
 go.Bar(
 x=selected_df['feature'],
 y=selected_df['abs_lasso'],
 marker_color=colors_selected,
 name='Selected Features',
 hovertemplate="<b>%{x}</b><br>|Coefficient|: %{y:.3f}<extra></extra>"
 )
)

fig_selected.update_layout(
 title="Lasso Regression: Selected Features and Coefficients",
 xaxis_title="Features",
 yaxis_title="Absolute Coefficient Value",
 height=500,
 xaxis_tickangle=-45
)
fig_selected.show()

print(f"\n Lasso vs Other Methods Comparison:")
print(f"• Lasso vs Linear - Test MSE improvement: {((test_mse_lr - test_mse_lasso) / test_mse_lr * 100):.2f}%")
print(f"• Lasso vs Ridge - Test MSE: {test_mse_lasso:.4f} vs {test_mse_ridge:.4f}")
print(f"• Feature reduction: {(1 - n_selected_features/len(feature_cols)):.1%}")

In [None]:
# 5. ELASTIC NET REGRESSION (L1 + L2 REGULARIZATION)
print("\n 5. ELASTIC NET REGRESSION (L1 + L2 REGULARIZATION)")
print("=" * 52)

# Elastic Net with cross-validation
from sklearn.linear_model import ElasticNetCV

# Define parameter ranges
l1_ratios = [0.1, 0.3, 0.5, 0.7, 0.9] # Mix of L1 and L2
alphas_elasticnet = np.logspace(-4, 2, 50)

# Fit Elastic Net with cross-validation
elasticnet_cv = ElasticNetCV(
 l1_ratio=l1_ratios,
 alphas=alphas_elasticnet,
 cv=5,
 random_state=42,
 max_iter=2000
)
elasticnet_cv.fit(X_train_scaled, y_train)

optimal_alpha_en = elasticnet_cv.alpha_
optimal_l1_ratio_en = elasticnet_cv.l1_ratio_

print(f" Optimal Elastic Net parameters:")
print(f"• Alpha: {optimal_alpha_en:.6f}")
print(f"• L1 ratio: {optimal_l1_ratio_en:.3f}")
print(f"• Effective L1 penalty: {optimal_alpha_en * optimal_l1_ratio_en:.6f}")
print(f"• Effective L2 penalty: {optimal_alpha_en * (1 - optimal_l1_ratio_en):.6f}")

# Fit Elastic Net with optimal parameters
elasticnet = ElasticNet(
 alpha=optimal_alpha_en,
 l1_ratio=optimal_l1_ratio_en,
 random_state=42,
 max_iter=2000
)
elasticnet.fit(X_train_scaled, y_train)

# Predictions
y_train_pred_en = elasticnet.predict(X_train_scaled)
y_test_pred_en = elasticnet.predict(X_test_scaled)

# Calculate metrics
train_mse_en = mean_squared_error(y_train, y_train_pred_en)
test_mse_en = mean_squared_error(y_test, y_test_pred_en)
train_r2_en = r2_score(y_train, y_train_pred_en)
test_r2_en = r2_score(y_test, y_test_pred_en)

print(" Elastic Net Performance:")
print(f"• Training MSE: {train_mse_en:.4f}")
print(f"• Test MSE: {test_mse_en:.4f}")
print(f"• Training R²: {train_r2_en:.4f}")
print(f"• Test R²: {test_r2_en:.4f}")
print(f"• Overfitting indicator: {train_r2_en - test_r2_en:.4f}")

# Feature selection analysis
n_selected_en = np.sum(elasticnet.coef_ != 0)
print(f"\n Elastic Net Feature Selection:")
print(f"• Selected features: {n_selected_en}/{len(feature_cols)}")
print(f"• Feature selection rate: {(1 - n_selected_en/len(feature_cols)):.1%}")

# Compare all methods
comparison_df = pd.DataFrame({
 'Method': ['Linear Regression', 'Ridge', 'Lasso', 'Elastic Net'],
 'Train_MSE': [train_mse_lr, train_mse_ridge, train_mse_lasso, train_mse_en],
 'Test_MSE': [test_mse_lr, test_mse_ridge, test_mse_lasso, test_mse_en],
 'Train_R2': [train_r2_lr, train_r2_ridge, train_r2_lasso, train_r2_en],
 'Test_R2': [test_r2_lr, test_r2_ridge, test_r2_lasso, test_r2_en],
 'Overfitting': [train_r2_lr - test_r2_lr, train_r2_ridge - test_r2_ridge,
 train_r2_lasso - test_r2_lasso, train_r2_en - test_r2_en],
 'Features_Used': [len(feature_cols), len(feature_cols), n_selected_features, n_selected_en]
})

print(f"\n MODEL COMPARISON SUMMARY:")
print("=" * 40)
print(comparison_df.round(4))

# Visualize model comparison
fig_comparison = make_subplots(
 rows=2, cols=2,
 subplot_titles=("Test Performance", "Overfitting Analysis",
 "Feature Usage", "Train vs Test R²")
)

# Test performance
fig_comparison.add_trace(
 go.Bar(
 x=comparison_df['Method'],
 y=comparison_df['Test_R2'],
 name='Test R²',
 marker_color=['blue', 'green', 'red', 'purple'],
 text=comparison_df['Test_R2'].round(3),
 textposition='auto'
 ),
 row=1, col=1
)

# Overfitting
fig_comparison.add_trace(
 go.Bar(
 x=comparison_df['Method'],
 y=comparison_df['Overfitting'],
 name='Overfitting',
 marker_color=['blue', 'green', 'red', 'purple'],
 text=comparison_df['Overfitting'].round(3),
 textposition='auto'
 ),
 row=1, col=2
)

# Feature usage
fig_comparison.add_trace(
 go.Bar(
 x=comparison_df['Method'],
 y=comparison_df['Features_Used'],
 name='Features Used',
 marker_color=['blue', 'green', 'red', 'purple'],
 text=comparison_df['Features_Used'],
 textposition='auto'
 ),
 row=2, col=1
)

# Train vs Test R²
fig_comparison.add_trace(
 go.Scatter(
 x=comparison_df['Train_R2'],
 y=comparison_df['Test_R2'],
 mode='markers+text',
 text=comparison_df['Method'],
 textposition="top center",
 marker=dict(size=12, color=['blue', 'green', 'red', 'purple']),
 name='Methods',
 hovertemplate="<b>%{text}</b><br>Train R²: %{x:.3f}<br>Test R²: %{y:.3f}<extra></extra>"
 ),
 row=2, col=2
)

# Add diagonal line for perfect generalization
fig_comparison.add_trace(
 go.Scatter(
 x=[0, 1],
 y=[0, 1],
 mode='lines',
 line=dict(color='black', dash='dash'),
 name='Perfect Generalization',
 showlegend=False
 ),
 row=2, col=2
)

fig_comparison.update_layout(
 title="Regularization Methods: Comprehensive Comparison",
 height=700,
 showlegend=False
)

fig_comparison.show()

# Best method recommendation
best_method_idx = comparison_df['Test_R2'].idxmax()
best_method = comparison_df.loc[best_method_idx, 'Method']
best_r2 = comparison_df.loc[best_method_idx, 'Test_R2']
best_overfitting = comparison_df.loc[best_method_idx, 'Overfitting']

print(f"\n RECOMMENDATION:")
print(f"• Best performing method: {best_method}")
print(f"• Test R²: {best_r2:.4f}")
print(f"• Overfitting level: {best_overfitting:.4f}")

# Additional analysis
if best_method == 'Lasso':
 print(f"• Features eliminated: {len(feature_cols) - n_selected_features}")
 print("• Advantage: Automatic feature selection")
elif best_method == 'Ridge':
 print("• Advantage: Handles multicollinearity well")
 print("• All features retained with shrinkage")
elif best_method == 'Elastic Net':
 print(f"• Features eliminated: {len(feature_cols) - n_selected_en}")
 print("• Advantage: Balanced L1/L2 regularization")

In [None]:
# 6. ADVANCED CROSS-VALIDATION AND HYPERPARAMETER TUNING
print("\n 6. ADVANCED CROSS-VALIDATION AND HYPERPARAMETER TUNING")
print("=" * 60)

from sklearn.model_selection import validation_curve, learning_curve
from sklearn.metrics import make_scorer

# 6.1 Validation curves for different regularization strengths
print("6.1 Validation Curves Analysis:")

# Ridge validation curve
ridge_alphas = np.logspace(-4, 4, 20)
ridge_train_scores, ridge_val_scores = validation_curve(
 Ridge(), X_train_scaled, y_train,
 param_name='alpha', param_range=ridge_alphas,
 cv=5, scoring='r2', n_jobs=-1
)

# Lasso validation curve
lasso_alphas = np.logspace(-4, 2, 20)
lasso_train_scores, lasso_val_scores = validation_curve(
 Lasso(max_iter=2000), X_train_scaled, y_train,
 param_name='alpha', param_range=lasso_alphas,
 cv=5, scoring='r2', n_jobs=-1
)

# Plot validation curves
fig_validation = make_subplots(
 rows=1, cols=2,
 subplot_titles=("Ridge Validation Curve", "Lasso Validation Curve")
)

# Ridge validation curve
ridge_train_mean = np.mean(ridge_train_scores, axis=1)
ridge_train_std = np.std(ridge_train_scores, axis=1)
ridge_val_mean = np.mean(ridge_val_scores, axis=1)
ridge_val_std = np.std(ridge_val_scores, axis=1)

fig_validation.add_trace(
 go.Scatter(
 x=ridge_alphas,
 y=ridge_train_mean,
 mode='lines',
 name='Training',
 line=dict(color='blue'),
 hovertemplate="Alpha: %{x:.6f}<br>R²: %{y:.3f}<extra></extra>"
 ),
 row=1, col=1
)

fig_validation.add_trace(
 go.Scatter(
 x=ridge_alphas,
 y=ridge_val_mean,
 mode='lines',
 name='Validation',
 line=dict(color='red'),
 hovertemplate="Alpha: %{x:.6f}<br>R²: %{y:.3f}<extra></extra>"
 ),
 row=1, col=1
)

# Lasso validation curve
lasso_train_mean = np.mean(lasso_train_scores, axis=1)
lasso_val_mean = np.mean(lasso_val_scores, axis=1)

fig_validation.add_trace(
 go.Scatter(
 x=lasso_alphas,
 y=lasso_train_mean,
 mode='lines',
 name='Training',
 line=dict(color='blue'),
 showlegend=False,
 hovertemplate="Alpha: %{x:.6f}<br>R²: %{y:.3f}<extra></extra>"
 ),
 row=1, col=2
)

fig_validation.add_trace(
 go.Scatter(
 x=lasso_alphas,
 y=lasso_val_mean,
 mode='lines',
 name='Validation',
 line=dict(color='red'),
 showlegend=False,
 hovertemplate="Alpha: %{x:.6f}<br>R²: %{y:.3f}<extra></extra>"
 ),
 row=1, col=2
)

fig_validation.update_layout(
 title="Validation Curves: Optimal Regularization Strength",
 height=500
)
fig_validation.update_xaxes(type="log", title_text="Alpha")
fig_validation.update_yaxes(title_text="R² Score")

fig_validation.show()

# 6.2 Learning curves to assess model complexity
print("\n6.2 Learning Curves Analysis:")

train_sizes = np.linspace(0.1, 1.0, 10)

# Linear regression learning curve
lr_train_sizes, lr_train_scores, lr_val_scores = learning_curve(
 LinearRegression(), X_train_scaled, y_train,
 train_sizes=train_sizes, cv=5, scoring='r2', n_jobs=-1
)

# Ridge learning curve
ridge_train_sizes, ridge_train_scores_lc, ridge_val_scores_lc = learning_curve(
 Ridge(alpha=optimal_alpha_ridge), X_train_scaled, y_train,
 train_sizes=train_sizes, cv=5, scoring='r2', n_jobs=-1
)

# Lasso learning curve
lasso_train_sizes, lasso_train_scores_lc, lasso_val_scores_lc = learning_curve(
 Lasso(alpha=optimal_alpha_lasso, max_iter=2000), X_train_scaled, y_train,
 train_sizes=train_sizes, cv=5, scoring='r2', n_jobs=-1
)

# Plot learning curves
fig_learning = go.Figure()

methods = ['Linear', 'Ridge', 'Lasso']
colors = ['blue', 'green', 'red']
train_scores_all = [lr_train_scores, ridge_train_scores_lc, lasso_train_scores_lc]
val_scores_all = [lr_val_scores, ridge_val_scores_lc, lasso_val_scores_lc]

for i, (method, color) in enumerate(zip(methods, colors)):
 train_mean = np.mean(train_scores_all[i], axis=1)
 val_mean = np.mean(val_scores_all[i], axis=1)

 # Training scores
 fig_learning.add_trace(
 go.Scatter(
 x=lr_train_sizes,
 y=train_mean,
 mode='lines',
 name=f'{method} (Train)',
 line=dict(color=color, dash='solid'),
 hovertemplate=f"<b>{method} Training</b><br>Size: %{{x}}<br>R²: %{{y:.3f}}<extra></extra>"
 )
 )

 # Validation scores
 fig_learning.add_trace(
 go.Scatter(
 x=lr_train_sizes,
 y=val_mean,
 mode='lines',
 name=f'{method} (Val)',
 line=dict(color=color, dash='dash'),
 hovertemplate=f"<b>{method} Validation</b><br>Size: %{{x}}<br>R²: %{{y:.3f}}<extra></extra>"
 )
 )

fig_learning.update_layout(
 title="Learning Curves: Model Performance vs Training Set Size",
 xaxis_title="Training Set Size",
 yaxis_title="R² Score",
 height=500
)
fig_learning.show()

# 6.3 Grid search for optimal hyperparameters
print("\n6.3 Grid Search Optimization:")

# Define parameter grids
ridge_param_grid = {'alpha': np.logspace(-4, 4, 20)}
lasso_param_grid = {'alpha': np.logspace(-4, 2, 20)}
elasticnet_param_grid = {
 'alpha': np.logspace(-4, 2, 10),
 'l1_ratio': [0.1, 0.3, 0.5, 0.7, 0.9]
}

# Grid search for each method
ridge_grid = GridSearchCV(Ridge(), ridge_param_grid, cv=5, scoring='r2', n_jobs=-1)
lasso_grid = GridSearchCV(Lasso(max_iter=2000), lasso_param_grid, cv=5, scoring='r2', n_jobs=-1)
elasticnet_grid = GridSearchCV(ElasticNet(max_iter=2000), elasticnet_param_grid, cv=5, scoring='r2', n_jobs=-1)

# Fit grid searches
ridge_grid.fit(X_train_scaled, y_train)
lasso_grid.fit(X_train_scaled, y_train)
elasticnet_grid.fit(X_train_scaled, y_train)

print("Grid Search Results:")
print(f"• Ridge - Best alpha: {ridge_grid.best_params_['alpha']:.6f}, CV Score: {ridge_grid.best_score_:.4f}")
print(f"• Lasso - Best alpha: {lasso_grid.best_params_['alpha']:.6f}, CV Score: {lasso_grid.best_score_:.4f}")
print(f"• Elastic Net - Best params: α={elasticnet_grid.best_params_['alpha']:.6f}, "
 f"l1_ratio={elasticnet_grid.best_params_['l1_ratio']:.3f}, CV Score: {elasticnet_grid.best_score_:.4f}")

# Compare grid search results
grid_comparison = pd.DataFrame({
 'Method': ['Ridge (Grid)', 'Lasso (Grid)', 'Elastic Net (Grid)'],
 'CV_Score': [ridge_grid.best_score_, lasso_grid.best_score_, elasticnet_grid.best_score_],
 'Best_Alpha': [ridge_grid.best_params_['alpha'],
 lasso_grid.best_params_['alpha'],
 elasticnet_grid.best_params_['alpha']],
 'Test_Score': [ridge_grid.score(X_test_scaled, y_test),
 lasso_grid.score(X_test_scaled, y_test),
 elasticnet_grid.score(X_test_scaled, y_test)]
})

print(f"\n Grid Search Performance Summary:")
print(grid_comparison.round(6))

# Visualize grid search results
fig_grid = go.Figure()

fig_grid.add_trace(
 go.Bar(
 x=grid_comparison['Method'],
 y=grid_comparison['Test_Score'],
 name='Test Score',
 marker_color=['green', 'red', 'purple'],
 text=grid_comparison['Test_Score'].round(4),
 textposition='auto',
 hovertemplate="<b>%{x}</b><br>Test R²: %{y:.4f}<extra></extra>"
 )
)

fig_grid.update_layout(
 title="Grid Search Results: Test Set Performance",
 xaxis_title="Method",
 yaxis_title="Test R² Score",
 height=400
)
fig_grid.show()

# Final model selection
best_grid_idx = grid_comparison['Test_Score'].idxmax()
best_grid_method = grid_comparison.loc[best_grid_idx, 'Method']
best_grid_score = grid_comparison.loc[best_grid_idx, 'Test_Score']

print(f"\n FINAL MODEL RECOMMENDATION:")
print(f"• Best method: {best_grid_method}")
print(f"• Test R² score: {best_grid_score:.4f}")
print(f"• Improvement over linear regression: {((best_grid_score - test_r2_lr) / test_r2_lr * 100):.1f}%")

In [None]:
# 7. BUSINESS INSIGHTS AND FEATURE IMPORTANCE ANALYSIS
print("\n 7. BUSINESS INSIGHTS AND FEATURE IMPORTANCE ANALYSIS")
print("=" * 58)

# Use the best performing model for business insights
if best_grid_method == 'Ridge (Grid)':
 best_model = ridge_grid.best_estimator_
 model_name = "Ridge Regression"
elif best_grid_method == 'Lasso (Grid)':
 best_model = lasso_grid.best_estimator_
 model_name = "Lasso Regression"
else:
 best_model = elasticnet_grid.best_estimator_
 model_name = "Elastic Net"

# Feature importance analysis
feature_importance_final = pd.DataFrame({
 'feature': feature_cols,
 'coefficient': best_model.coef_,
 'abs_coefficient': np.abs(best_model.coef_),
 'selected': best_model.coef_ != 0
})

# Add business context for key features
business_features = ['marketing_spend', 'product_quality', 'customer_satisfaction', 'competition_index']
business_importance = feature_importance_final[
 feature_importance_final['feature'].isin(business_features)
].sort_values('abs_coefficient', ascending=False)

print(f" {model_name} - Key Business Insights:")
print("=" * 40)

for _, row in business_importance.iterrows():
 feature = row['feature']
 coef = row['coefficient']

 if coef > 0:
 direction = "increases"
 impact = "positive"
 else:
 direction = "decreases"
 impact = "negative"

 print(f"• {feature.replace('_', ' ').title()}: {impact} impact (coef: {coef:.3f})")
 print(f" - 1 unit increase {direction} target by {abs(coef):.3f}")

# Create feature importance visualization for business features
fig_business = go.Figure()

fig_business.add_trace(
 go.Bar(
 x=business_importance['feature'],
 y=business_importance['coefficient'],
 marker_color=['red' if c < 0 else 'green' for c in business_importance['coefficient']],
 name='Coefficient',
 text=business_importance['coefficient'].round(3),
 textposition='auto',
 hovertemplate="<b>%{x}</b><br>Coefficient: %{y:.3f}<extra></extra>"
 )
)

fig_business.update_layout(
 title=f"{model_name}: Business Feature Impact Analysis",
 xaxis_title="Business Features",
 yaxis_title="Coefficient Value",
 height=400
)
fig_business.show()

# ROI and actionability analysis
print(f"\n Business ROI Analysis:")

# Simulate business scenarios
scenarios = {
 'Increase Marketing': {'marketing_spend': 1000, 'others': 0},
 'Improve Quality': {'product_quality': 1, 'others': 0},
 'Boost Satisfaction': {'customer_satisfaction': 1, 'others': 0},
 'Reduce Competition': {'competition_index': -1, 'others': 0}
}

for scenario_name, changes in scenarios.items():
 total_impact = 0

 for feature, change in changes.items():
 if feature != 'others' and feature in business_features:
 # Find coefficient for this feature
 feature_coef = business_importance[
 business_importance['feature'] == feature
 ]['coefficient'].iloc[0] if not business_importance[
 business_importance['feature'] == feature
 ].empty else 0

 total_impact += feature_coef * change

 print(f"• {scenario_name}: Expected target change = {total_impact:.3f}")

# Model stability analysis
print(f"\n Model Stability Analysis:")

# Check coefficient stability across CV folds
cv_scores = cross_val_score(best_model, X_train_scaled, y_train, cv=5, scoring='r2')
cv_mean = cv_scores.mean()
cv_std = cv_scores.std()

print(f"• Cross-validation R² mean: {cv_mean:.4f} ± {cv_std:.4f}")
print(f"• Coefficient of variation: {cv_std/cv_mean:.3f}")
print(f"• Model stability: {'High' if cv_std < 0.02 else 'Medium' if cv_std < 0.05 else 'Low'}")

# Feature selection stability (for Lasso/Elastic Net)
if hasattr(best_model, 'l1_ratio') or 'Lasso' in model_name:
 selected_features_count = np.sum(best_model.coef_ != 0)
 selection_rate = selected_features_count / len(feature_cols)

 print(f"• Feature selection rate: {selection_rate:.1%}")
 print(f"• Features eliminated: {len(feature_cols) - selected_features_count}")

 # Identify most stable features
 important_selected = [f for f in feature_cols if 'Important' in f and
 feature_importance_final[feature_importance_final['feature'] == f]['selected'].iloc[0]]
 noise_eliminated = [f for f in feature_cols if 'Noise' in f and
 not feature_importance_final[feature_importance_final['feature'] == f]['selected'].iloc[0]]

 print(f"• Important features retained: {len(important_selected)}")
 print(f"• Noise features eliminated: {len(noise_eliminated)}")

# Risk assessment
print(f"\n Model Risk Assessment:")

# Check for potential issues
max_coef = np.max(np.abs(best_model.coef_))
coef_range = np.max(best_model.coef_) - np.min(best_model.coef_)

if max_coef > 10:
 print("• HIGH RISK: Very large coefficients detected")
elif max_coef > 5:
 print("• MEDIUM RISK: Moderately large coefficients")
else:
 print("• LOW RISK: Coefficients well-controlled")

if coef_range > 20:
 print("• HIGH RISK: Very wide coefficient range")
elif coef_range > 10:
 print("• MEDIUM RISK: Moderate coefficient range")
else:
 print("• LOW RISK: Narrow coefficient range")

# Multicollinearity check
if 'Ridge' in model_name:
 print("• Ridge handles multicollinearity well")
elif 'Lasso' in model_name:
 print("• Lasso provides automatic feature selection")
else:
 print("• Elastic Net balances feature selection and multicollinearity")

# LEARNING SUMMARY: Ridge & Lasso Regression

## Key Concepts Mastered

### 1. **Regularization Fundamentals**
- **Ridge (L2)**: Shrinks coefficients toward zero, handles multicollinearity
- **Lasso (L1)**: Forces coefficients to exactly zero, automatic feature selection
- **Elastic Net**: Combines L1 + L2 benefits, balanced approach
- **Bias-Variance Tradeoff**: Understanding the regularization effect

### 2. **Hyperparameter Optimization**
- **Cross-Validation**: Robust parameter selection methodology
- **Regularization Path**: Understanding coefficient evolution with alpha
- **Grid Search**: Systematic hyperparameter optimization
- **Validation Curves**: Visualizing model complexity vs performance

### 3. **Feature Selection & Interpretation**
- **Automatic Selection**: Lasso's ability to eliminate irrelevant features
- **Coefficient Stability**: Measuring feature importance reliability
- **Business Impact**: Translating coefficients to actionable insights
- **Model Interpretability**: Understanding feature contributions

## Business Applications

### Predictive Modeling
- **High-Dimensional Data**: Genomics, text analysis, sensor data
- **Feature Engineering**: Automated selection from large feature sets
- **Risk Modeling**: Financial credit scoring, insurance pricing
- **Marketing Analytics**: Attribution modeling, budget optimization

### Decision Support
- Regularized models provide:
 - Stable predictions with many variables
 - Clear feature importance rankings
 - Reduced overfitting risk
 - Interpretable business insights

## Next Steps

1. **Tier 3: Time Series Models** - Apply regularization to temporal data
2. **Advanced Regularization** - Group Lasso, adaptive methods
3. **Ensemble Methods** - Combine regularized models
4. **Deep Learning** - Regularization in neural networks

## Pro Tips

- **Always scale features** before applying regularization
- **Use cross-validation** for reliable hyperparameter selection
- **Ridge for multicollinearity**, Lasso for feature selection
- **Elastic Net for best of both worlds**
- **Monitor coefficient paths** to understand model behavior

## Common Pitfalls

- **Scale Sensitivity**: Regularization severely affected by feature scales
- **Alpha Selection**: Too high causes underfitting, too low allows overfitting
- **Feature Leakage**: Ensure temporal validity in time series data
- **Interpretation Limits**: Coefficients don't always imply causation
- **Stability Issues**: Feature selection can be unstable with small datasets

## Advanced Considerations

### When to Use Each Method:
- **Ridge**: Multicollinear features, all features potentially useful
- **Lasso**: Need feature selection, sparse solutions preferred
- **Elastic Net**: Grouped variables, balanced selection and shrinkage
- **Linear Regression**: Small datasets, well-understood feature relationships

### Performance Optimization:
- Use warm starts for regularization path computation
- Consider coordinate descent algorithms for large datasets
- Implement early stopping for convergence efficiency
- Use cross-validation strategically to balance accuracy and speed

**Remember**: *Regularization is about finding the sweet spot between model complexity and generalization - let the data guide your choice of regularization strength!*