## 1) Quick overview (the goal)

You want a model that predicts well on unseen data and is interpretable/stable.

Use a mix of: 
- exploratory (filter) methods, 
- embedded methods (Lasso, tree importances), and 
- wrapper methods (RFE/RFECV) 
— always validated with cross-validation (preferably nested CV when tuning + selecting).

----

## 2) Step-by-step workflow

### 1. Data cleaning & EDA (mandatory)

Handle missing values, outliers (only on features if justified), correct datatypes.

Visual checks: histograms, boxplots, pairwise scatterplots for numeric features.

Treat low frequency category values. Convert categoricals (one-hot, multi-label hot encoding, target-encode where appropriate). Keep cardinality in mind. 

### 2. Split early (avoid leakage)

Split into train / validation / test (e.g. 60/20/20) or use CV. **Do not use test set during selection/tuning.**

For reproducibility use random_state.

### 3. Baseline model

Fit a simple baseline (mean predictor, then ordinary OLS / LinearRegression) to get a performance baseline (RMSE, MAE, R²).

### 4. Filter methods (fast, cheap, initial cut)

**Correlation matrix**: drop one of any pair with very high correlation (e.g. |corr| > 0.8–0.9) unless both are needed for interpretation.

**Univariate tests**: f_regression (ANOVA), mutual information (mutual_info_regression) to rank features.

**Business/domain check**: sometimes domain knowledge > statistics — keep features that matter even if stats are weak.

Example of correlation and mutual info:
```python
import pandas as pd
from sklearn.feature_selection import mutual_info_regression, f_regression

corr = X.corr().abs()

# find pairs
high_corr_pairs = [(c1,c2) for c1 in corr.columns for c2 in corr.columns if c1!=c2 and corr.loc[c1,c2]>0.9]

mi = mutual_info_regression(X, y)
mi_series = pd.Series(mi, index=X.columns).sort_values(ascending=False)

f_vals, p_vals = f_regression(X, y)
p_series = pd.Series(p_vals, index=X.columns).sort_values()
```

### 5. Multicollinearity diagnostics (VIF)

Compute VIF; drop or combine features with high VIF (common rules: VIF > 5 or > 10 are suspicious).

```python
import pandas as pd
from statsmodels.stats.outliers_influence import variance_inflation_factor
X_const = X.assign(const=1)
vif = pd.DataFrame({'feature': X_const.columns,
                    'VIF': [variance_inflation_factor(X_const.values, i) for i in range(X_const.shape[1])]})
vif = vif[vif.feature != 'const']
```

If features are collinear, Ridge often handles prediction better than OLS; Lasso may arbitrarily keep one and drop others (unstable when collinear).

### 6. Embedded methods (regularization & tree importances)

**Lasso (L1)** performs selection: many coefficients → exactly zero.

**Ridge (L2)** shrinks coefficients but rarely to zero; use it for multicollinearity / better predictive stability.

**ElasticNet mixes L1/L2** — good when correlated predictors + need sparsity.

**Tree based models (RandomForest, XGBoost)** give feature importances (useful but not selection-by-default).

Lasso model with automatic alpha selection:

```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LassoCV
from sklearn.feature_selection import SelectFromModel

pipe = Pipeline([('scaler', StandardScaler()),
                 ('lasso', LassoCV(cv=5, random_state=42))])
pipe.fit(X_train, y_train)

# Selected features
sel = SelectFromModel(pipe.named_steps['lasso'], prefit=True)
selected_features = X.columns[sel.get_support()]
```

Ridge for coeff stability:
```python
from sklearn.linear_model import RidgeCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

ridge = make_pipeline(StandardScaler(), RidgeCV(alphas=[0.01,0.1,1,10], cv=5))
ridge.fit(X_train, y_train)
coef = ridge.named_steps['ridgecv'].coef_
```

To eliminate with Ridge you can use magnitude thresholds or SelectFromModel(Ridge(...), threshold='median') — but be careful: ridge never gives exact zeros.

### 7. Wrapper methods (RFE / RFECV / sequential)

RFE: recursive elimination based on estimator's coefficients/importance.

RFECV: RFE + cross-validation chooses the number of features automatically.

These are computationally heavier but often yield a compact set that empirically performs well.

```python
from sklearn.feature_selection import RFECV
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold

est = LinearRegression()
rfecv = RFECV(estimator=est, step=1, cv=KFold(5), scoring='neg_mean_squared_error')
rfecv.fit(X_train_scaled, y_train)
selected = X.columns[rfecv.support_]
```

### 8. Nested CV for honest evaluation (important)

When doing feature selection and hyperparameter tuning, use nested CV:

Outer loop: estimate generalization error.

Inner loop: perform tuning (alpha for Lasso/Ridge) and feature selection (SelectFromModel or RFECV).

```python
from sklearn.model_selection import GridSearchCV, cross_val_score, KFold
outer_cv = KFold(5, shuffle=True, random_state=42)
inner_cv = KFold(5, shuffle=True, random_state=1)

pipe = Pipeline([('scaler', StandardScaler()),
                 ('selector', SelectFromModel(LassoCV(cv=inner_cv))),
                 ('est', LinearRegression())])

# Wrap pipe in GridSearchCV if needed, then cross_val_score on outer_cv
scores = cross_val_score(pipe, X, y, cv=outer_cv, scoring='neg_mean_squared_error')
```

### 9. Final model & diagnostics

After selecting features, retrain on train+val, test on held-out test set.

Diagnostics: residual plots, QQ plot (normality), heteroscedasticity (Breusch-Pagan), influence points, partial dependence for non-linear features.

If you used statsmodels OLS you can inspect p-values:

```python
import statsmodels.api as sm
X_sm = sm.add_constant(X_selected)
model = sm.OLS(y, X_sm).fit()
print(model.summary())
```

### 10. Iteration & domain sanity check

If performance drops after removing features, reconsider: maybe keep correlated features (Ridge), create interactions, or engineer new features.

---

## 3) Practical heuristics & thresholds

- Correlation threshold to consider removing one: |corr| > 0.8–0.9 (context dependent).

- VIF threshold: VIF > 5 (warning) or > 10 (strong multicollinearity).

- p-value (statsmodels): p > 0.05 suggests non-significance — but don’t eliminate blindly; check predictive effect and multicollinearity.

- Lasso: non-zero coefficients → keep; zero → candidate to drop.

- RFECV: choose the number of features where CV error is minimal (or within 1-SE rule).

- If many correlated features: prefer Ridge or ElasticNet over Lasso alone.

---

## 4) Pitfalls to avoid

- Data leakage: Do feature selection inside CV folds, not on full dataset before CV.

- Scaling: Regularization needs features scaled (StandardScaler) — do scaling in a Pipeline.

- Too few samples: heavy feature selection can overfit when n_samples ≈ n_features. Use regularization and/or dimensionality reduction (PCA) with caution.

- Unstable Lasso selection: with correlated predictors Lasso may jump — use ElasticNet or stability selection (bootstrap + Lasso) to find robust features.

---

## 5) Short checklist (actionable)

- Clean data, 

- EDA: distributions, correlations, domain checks.

- split (train/val/test).

- Baseline OLS model → metric.

- Remove trivially bad features (missingness, zero variance).

- Run filter methods (corr, MI, f_regression) — drop obvious redundancies.

- Check VIF; resolve collinearity.

- Use embedded (LassoCV, RidgeCV, ElasticNetCV) inside a Pipeline (StandardScaler first).

- Optionally run RFECV to confirm compact subset.

- Evaluate with nested CV if you tune + select.

- Retrain on full train+val; test on holdout; run diagnostics.

---

## 6) Minimal recommended code template (put together)

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LassoCV, RidgeCV, LinearRegression
from sklearn.feature_selection import SelectFromModel, RFECV
from sklearn.model_selection import KFold, cross_val_score

# Example pipeline with Lasso selection then LinearRegression refit
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('selector', SelectFromModel(LassoCV(cv=5, random_state=42))),
    ('est', LinearRegression())
])

cv = KFold(5, shuffle=True, random_state=42)
scores = cross_val_score(pipe, X, y, cv=cv, scoring='neg_mean_squared_error')
print("CV RMSE", (-scores.mean())**0.5)


----


## 7) Full notebook

In [None]:
# 📘 Feature Selection Notebook for Linear, Ridge, and Lasso Regression
# ===============================================================

# 1. Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, LassoCV, RidgeCV
from sklearn.feature_selection import RFECV, SelectFromModel
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score

import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor


In [None]:
# 2. Load Data
df = pd.read_csv("your_dataset.csv")

# Replace target column name with yours
target = "price"  
X = df.drop(columns=[target])
y = df[target]

print(df.shape)
df.head()


In [None]:
# 2b. Exploratory Data Analysis (EDA)
print("Dataset shape:", df.shape)
print("\nBasic Info:")
print(df.info())
print("\nMissing values:")
print(df.isna().sum())

print("\nSummary stats:")
display(df.describe(include="all"))

# Correlation heatmap (numeric features only)
plt.figure(figsize=(10,8))
sns.heatmap(df.corr(numeric_only=True), annot=False, cmap="coolwarm", center=0)
plt.title("Correlation Heatmap")
plt.show()

# Target distribution
sns.histplot(df[target], kde=True)
plt.title("Target Distribution")
plt.show()


In [None]:
# 2c. Handle Categorical Variables

from sklearn.preprocessing import OneHotEncoder, LabelEncoder

# Separate numerical & categorical
num_cols = df.select_dtypes(include=np.number).columns.drop(target)
cat_cols = df.select_dtypes(exclude=np.number).columns

print("Numeric columns:", list(num_cols))
print("Categorical columns:", list(cat_cols))

# --- One-hot encoding (for nominal categories) ---
df_encoded = pd.get_dummies(df, columns=cat_cols, drop_first=True)

# If you have *multi-label categories* (e.g. a column with "A,B,C"),
# split them into sets and binarize:
# Example: df['genres'] = "Action,Drama"
if 'genres' in df.columns:  # replace with your multi-label column name
    from sklearn.preprocessing import MultiLabelBinarizer
    mlb = MultiLabelBinarizer()
    df = df.join(pd.DataFrame(mlb.fit_transform(df['genres'].str.split(',')),
                              columns=mlb.classes_,
                              index=df.index))
    df = df.drop(columns=['genres'])

print("Encoded dataset shape:", df_encoded.shape)
df_encoded.head()


In [None]:
# 3. Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


In [None]:
# 4. Variance Inflation Factor (VIF) for Multicollinearity
X_vif = X_train.copy()
X_vif_const = sm.add_constant(X_vif)

vif = pd.DataFrame()
vif["feature"] = X_vif_const.columns
vif["VIF"] = [variance_inflation_factor(X_vif_const.values, i)
              for i in range(X_vif_const.shape[1])]

print(vif)


In [None]:
# 5. Baseline Linear Regression
lr = Pipeline([
    ('scaler', StandardScaler()),
    ('model', LinearRegression())
])
lr.fit(X_train, y_train)

y_pred = lr.predict(X_test)
print("Baseline R²:", r2_score(y_test, y_pred))
print("Baseline RMSE:", mean_squared_error(y_test, y_pred, squared=False))


In [None]:
# 6a. LassoCV for Feature Selection
lasso = Pipeline([
    ('scaler', StandardScaler()),
    ('model', LassoCV(cv=5, random_state=42))
])
lasso.fit(X_train, y_train)

print("Chosen alpha:", lasso.named_steps['model'].alpha_)
coef = pd.Series(lasso.named_steps['model'].coef_, index=X.columns)
print("Selected features:", list(coef[coef != 0].index))

# Plot coefficients
plt.figure(figsize=(10,6))
coef.plot(kind='bar')
plt.title("Lasso Coefficients")
plt.show()


In [None]:
# 6b. RidgeCV for Coefficient Shrinkage
ridge = Pipeline([
    ('scaler', StandardScaler()),
    ('model', RidgeCV(alphas=[0.01, 0.1, 1, 10, 100], cv=5))
])
ridge.fit(X_train, y_train)

print("Chosen alpha (Ridge):", ridge.named_steps['model'].alpha_)
coef_ridge = pd.Series(ridge.named_steps['model'].coef_, index=X.columns)




In [None]:
# 6c. Compare Ridge vs Lasso vs Linear
lr_model = Pipeline([
    ('scaler', StandardScaler()),
    ('model', LinearRegression())
]).fit(X_train, y_train)

coef_lr = pd.Series(lr_model.named_steps['model'].coef_, index=X.columns)
coef_lasso = pd.Series(lasso.named_steps['model'].coef_, index=X.columns)

coef_df = pd.DataFrame({
    "Linear": coef_lr,
    "Ridge": coef_ridge,
    "Lasso": coef_lasso
})

coef_df.plot(kind="bar", figsize=(12,6))
plt.title("Coefficient Comparison: Linear vs Ridge vs Lasso")
plt.axhline(0, color="black", linewidth=1)
plt.show()

In [None]:
# 7. RFECV with Linear Regression
rfecv = Pipeline([
    ('scaler', StandardScaler()),
    ('selector', RFECV(estimator=LinearRegression(), step=1, cv=5,
                       scoring='neg_mean_squared_error'))
])
rfecv.fit(X_train, y_train)

print("Optimal number of features:", rfecv.named_steps['selector'].n_features_)
selected_features = X.columns[rfecv.named_steps['selector'].support_]
print("Selected features:", selected_features)

plt.plot(rfecv.named_steps['selector'].grid_scores_)
plt.xlabel("Number of Features")
plt.ylabel("CV Score")
plt.title("RFECV Performance")
plt.show()


In [None]:
# 8. Nested CV with Lasso for Honest Evaluation
outer_cv = KFold(5, shuffle=True, random_state=42)
inner_cv = KFold(3, shuffle=True, random_state=1)

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('selector', SelectFromModel(LassoCV(cv=inner_cv, random_state=1))),
    ('est', LinearRegression())
])

scores = cross_val_score(pipe, X, y, cv=outer_cv, scoring='neg_mean_squared_error')
print("Nested CV RMSE:", np.mean(np.sqrt(-scores)))


In [None]:
# 9. Final Diagnostics (Residual Plots)
final_model = lasso.fit(X_train, y_train)
y_pred_final = final_model.predict(X_test)

residuals = y_test - y_pred_final

plt.figure(figsize=(8,6))
sns.scatterplot(x=y_pred_final, y=residuals)
plt.axhline(0, color="red", linestyle="--")
plt.xlabel("Predicted")
plt.ylabel("Residuals")
plt.title("Residual Plot")
plt.show()

sns.histplot(residuals, kde=True)
plt.title("Residual Distribution")
plt.show()


----

## Do you encode before or after train-test split?

**Option 1: Encode before train–test split**

- You run pd.get_dummies() or LabelEncoder on the full dataset first, then split.

- Problem: You risk data leakage.
- Example: Imagine a categorical variable "city" with 100 categories. If some cities only appear in the test set, your training encoding will never learn them → mismatch in columns.

- Or, if you encode before splitting, the test set “influences” the feature space (because you looked at all categories). That leaks information about the test distribution.

**Option 2: Encode inside a pipeline**

- Use OneHotEncoder (or similar) as a pipeline step after splitting.

- This way, the encoder is fit only on the training data.

- When applied to the test data, it ignores unseen categories (or handles them if you set handle_unknown="ignore").

- This avoids leakage and keeps the workflow reproducible.

In [None]:
# Example with ColumnTransformer and Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.linear_model import LassoCV
from sklearn.pipeline import Pipeline

# Identify column types
num_cols = X.select_dtypes(include=['int64','float64']).columns
cat_cols = X.select_dtypes(exclude=['int64','float64']).columns

# Preprocessor
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), num_cols),
        ('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols)
    ]
)

# Full pipeline
model = Pipeline(steps=[
    ('preprocess', preprocessor),
    ('regressor', LassoCV(cv=5, random_state=42))
])

model.fit(X_train, y_train)
print("R²:", model.score(X_test, y_test))


---

In [None]:

# 2. Load Data
df = pd.read_csv("your_dataset.csv")

target = "price"   # <-- replace with your target column
X = df.drop(columns=[target])
y = df[target]

# Split before preprocessing
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 2: Identify numeric & categorical columns

# Identify column types
num_cols = X_train.select_dtypes(include=['int64','float64']).columns
cat_cols = X_train.select_dtypes(exclude=['int64','float64']).columns

print("Numeric columns:", list(num_cols))
print("Categorical columns:", list(cat_cols)_


In [None]:
# Step 3: Preprocessor with ColumnTransformer

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

# Define preprocessing
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), num_cols),
        ('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols)
    ]
)


In [None]:
# Step 4: Pipelines for each model

# Linear Regression
lr_pipe = Pipeline([
    ('preprocess', preprocessor),
    ('model', LinearRegression())
])

# LassoCV
lasso_pipe = Pipeline([
    ('preprocess', preprocessor),
    ('model', LassoCV(cv=5, random_state=42))
])

# RidgeCV
ridge_pipe = Pipeline([
    ('preprocess', preprocessor),
    ('model', RidgeCV(alphas=[0.01, 0.1, 1, 10, 100], cv=5))
])


In [None]:
# Step 5: Fit and compare models
models = {
    "Linear": lr_pipe,
    "Lasso": lasso_pipe,
    "Ridge": ridge_pipe
}

results = {}
for name, pipe in models.items():
    pipe.fit(X_train, y_train)
    y_pred = pipe.predict(X_test)
    results[name] = {
        "R2": r2_score(y_test, y_pred),
        "RMSE": mean_squared_error(y_test, y_pred, squared=False)
    }

results_df = pd.DataFrame(results).T
print(results_df)


In [None]:
# Step 6: Extract coefficients after pipeline

# One subtlety: after encoding, feature names expand. You can recover them:

# Example: get feature names from Lasso
lasso_pipe.fit(X_train, y_train)

ohe = lasso_pipe.named_steps['preprocess'].named_transformers_['cat']
encoded_cat_cols = ohe.get_feature_names_out(cat_cols)
all_features = np.concatenate([num_cols, encoded_cat_cols])

coef = pd.Series(lasso_pipe.named_steps['model'].coef_, index=all_features)

print("Selected features:", list(coef[coef != 0].index))
coef.sort_values().plot(kind="bar", figsize=(12,6))
plt.title("Lasso Coefficients")
plt.show()


✅ Benefits of this version

- Safe: no leakage (encoding/scaling fit only on train).

- Flexible: handles unknown categories in test data.

- Clean: same pipeline can be saved and reused (joblib.dump / joblib.load).

- Transparent: easy to extract feature names and coefficients.

----

## Saving and reusing the pipeline

When we train a pipeline, it’s not just the model (like Ridge or Lasso) that gets fitted — the whole pipeline stores the preprocessing steps too (imputers, encoders, scalers, feature selectors, etc.).

That’s powerful, because:

You don’t need to repeat one-hot encoding, scaling, or VIF checks separately later.

The pipeline ensures that new/unseen data goes through exactly the same transformations as training data.

You can save the whole pipeline to disk with joblib.dump and reload it for predictions with joblib.load.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import Ridge
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
import joblib

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Preprocessor
numeric_features = X.select_dtypes(include=['int64', 'float64']).columns
categorical_features = X.select_dtypes(include=['object']).columns

numeric_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="median")),
    ("scaler", StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("onehot", OneHotEncoder(handle_unknown="ignore"))
])

preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, categorical_features)
    ]
)

# Full pipeline
model = Pipeline(steps=[
    ("preprocessor", preprocessor),
    ("regressor", Ridge(alpha=1.0))
])

# Train
model.fit(X_train, y_train)

# Save pipeline
joblib.dump(model, "ridge_pipeline.pkl")

# Later / elsewhere
loaded_model = joblib.load("ridge_pipeline.pkl")
y_pred = loaded_model.predict(X_test)


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LassoCV, LinearRegression, Ridge
from sklearn.feature_selection import RFECV
from sklearn.metrics import mean_squared_error, r2_score
import joblib

# Example dataset (replace with your own)
X = pd.DataFrame({
    "age": [23, 45, 31, 35, 40, 50, 29, 33],
    "salary": [40000, 80000, 50000, 60000, 70000, 120000, 45000, 55000],
    "city": ["A", "B", "A", "C", "B", "C", "A", "C"]
})
y = [200, 400, 250, 300, 350, 600, 220, 280]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Preprocessing
numeric_features = ["age", "salary"]
categorical_features = ["city"]

numeric_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="median")),
    ("scaler", StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("onehot", OneHotEncoder(handle_unknown="ignore"))
])

preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, categorical_features)
    ]
)

# Feature selector + model
feature_selector = RFECV(estimator=LinearRegression(), cv=3, scoring="r2")

# Final pipeline
pipeline = Pipeline(steps=[
    ("preprocessor", preprocessor),
    ("feature_selection", feature_selector),
    ("regressor", LassoCV(cv=5, random_state=42))  # Could switch to Ridge, LinearRegression, etc.
])

# Fit pipeline
pipeline.fit(X_train, y_train)

# Evaluate
y_pred = pipeline.predict(X_test)
print("MSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))

# Save entire pipeline
joblib.dump(pipeline, "regression_pipeline.pkl")

# Load later
loaded_pipeline = joblib.load("regression_pipeline.pkl")
print("Predictions (loaded):", loaded_pipeline.predict(X_test))


👉 So instead of saving just the model (which would forget how to encode or scale inputs), you save the pipeline, which knows how to:

- Impute missing values
- Encode categories
- Scale numerics
- Apply regression

That makes it clean, consistent, and production-ready.

----
| Feature                  | GridSearchCV                           | RidgeCV / LassoCV                      |
|---------------------------|-----------------------------------------|----------------------------------------|
| **Applies to**           | Any model in scikit-learn               | Only Ridge or Lasso regression         |
| **Hyperparameters**      | Any (e.g., alpha, max_iter, solver, …)  | Only alpha (regularization strength)   |
| **Cross-validation**     | Yes (user-specified folds)              | Yes (built-in, default = 5 folds)      |
| **Efficiency**           | Brute force search, can be slow         | Optimized solvers, very fast           |
| **Ease of use**          | Flexible but more boilerplate code      | Very simple, just pass `alphas`        |
| **Output**               | Best estimator + CV scores              | Best alpha + fitted model              |
| **When to use**          | General hyperparameter tuning           | Regularization tuning (Ridge/Lasso)    |
