--> To put content in collapsable format, we can use the following snippet in the markdowns:

<details>
<summary style="cursor: pointer">
<b> double click the markdown to see the code instead of this </b>
</summary>

# 3.4 Regularization-based Feature Importance
-- Penalty-induced sparsity methods

- Lasso Regression (L1 penalty)
- Ridge Regression (L2 - not for selection but for shrinkage)
- Elastic Net
- Group Lasso (if features are grouped)
- LARS (Least Angle Regression)

--------

## 3.4.1 Lasso Regression (L1 penalty)

<details>
<summary style="cursor: pointer">
<h2> { Understanding Lasso Regression (L1) for Feature Selection } </h2>
</summary>
<h3> What is Lasso Regression? </h3>
<p> Lasso (Least Absolute Shrinkage and Selection Operator) is a linear regression technique with L1 regularization that adds a penalty equal to the absolute value of the magnitude of coefficients.</p>
<h3> It's role in Feature Importance / Selection: </h3>
<ul>
    <li> L1 penalty forces some coefficients to exactly zero—hence performing feature selection.</li>
    <li> Effective when there are many irrelevant features.</li>
    <li> Helps reduce overfitting and interpret models better.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://www.youtube.com/watch?v=NGf0voTMlcs" target="_blank">Lasso Regression Explained</a></li>
</ol>
</details>

##### Parameters:

- X: Features (DataFrame)
- y: Target variable (array-like)
- alpha: Regularization strength (higher = more shrinkage, default=0.01)
- scale_data: Whether to standardize features before fitting
- plot_size: Tuple indicating plot size (width, height)

##### Returns:

- Fitted Lasso model (for inspection or reuse)
- DataFrame of selected features with their corresponding coefficients
- Displays a plot of feature importance (if show_plot=True)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso

def lasso_feature_importance(X,
                              y,
                              alpha=0.01,
                              scale_data=True,
                              random_state=42,
                              show_plot=True,
                              plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    np.random.seed(random_state)

    feature_names = X.columns.tolist()
    X_processed = X.copy()

    if scale_data:
        scaler = StandardScaler()
        X_processed = scaler.fit_transform(X_processed)
    else:
        X_processed = X_processed.values

    # Fit Lasso model
    lasso = Lasso(alpha=alpha, random_state=random_state, max_iter=10000)
    lasso.fit(X_processed, y)

    coefs = lasso.coef_

    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Coefficient': coefs,
        'Absolute_Coefficient': np.abs(coefs)
    }).sort_values('Absolute_Coefficient', ascending=False)

    selected_features_df = importance_df[importance_df['Absolute_Coefficient'] > 0].copy()

    if show_plot and not selected_features_df.empty:
        plt.figure(figsize=plot_size)
        selected_features_sorted = selected_features_df.sort_values('Absolute_Coefficient', ascending=True)

        colors = plt.cm.inferno(np.linspace(0.2, 1, len(selected_features_sorted)))
        bars = plt.barh(selected_features_sorted['Feature'],
                        selected_features_sorted['Absolute_Coefficient'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Absolute Lasso Coefficient', fontsize=12)
        plt.title('LASSO Feature Importance', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.001,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: LASSO (L1) Regression\n"
            f"Alpha: {alpha}\n"
            f"Scale Data: {scale_data}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return lasso, selected_features_df

----------

## 3.4.2 Ridge Regression (L2-not for selection but for shrinkage)

<details>
<summary style="cursor: pointer">
<h2> { Understanding Ridge Regression (L2 Penalty) } </h2>
</summary>
<h3> What is Ridge Regression? </h3>
<p> Ridge regression is a linear model that adds an L2 penalty (squared magnitude of coefficients) to reduce overfitting.</p>
<h3> Its role in Feature Selection: </h3>
<ul>
    <li> Does not perform feature selection, but shrinks all coefficients to reduce model complexity.</li>
    <li> Useful for multicollinearity and improving generalization.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://www.youtube.com/watch?v=Q81RR3yKn30" target="_blank">Ridge Regression (StatQuest)</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- y: Target variable (array-like)
- alpha: Regularization strength (default=1.0; higher = more shrinkage)
- scale_data: Whether to standardize features before fitting
- plot_size: Tuple indicating plot size (width, height)

##### Returns:
- Fitted Ridge model (for inspection or reuse)
- DataFrame of features with their corresponding coefficients
- Displays a plot of feature importance (if show_plot=True)

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge

def ridge_feature_importance(X,
                              y,
                              alpha=1.0,
                              scale_data=True,
                              random_state=42,
                              show_plot=True,
                              plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    np.random.seed(random_state)

    feature_names = X.columns.tolist()
    X_processed = X.copy()

    if scale_data:
        scaler = StandardScaler()
        X_processed = scaler.fit_transform(X_processed)
    else:
        X_processed = X_processed.values

    # Fit Ridge model
    ridge = Ridge(alpha=alpha, random_state=random_state, max_iter=10000)
    ridge.fit(X_processed, y)

    coefs = ridge.coef_

    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Coefficient': coefs,
        'Absolute_Coefficient': np.abs(coefs)
    }).sort_values('Absolute_Coefficient', ascending=False)

    if show_plot and not importance_df.empty:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Absolute_Coefficient', ascending=True)

        colors = plt.cm.cividis(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['Absolute_Coefficient'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Absolute Ridge Coefficient', fontsize=12)
        plt.title('Ridge Feature Importance (Shrinkage View)', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.001,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: Ridge (L2) Regression\n"
            f"Alpha: {alpha}\n"
            f"Scale Data: {scale_data}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return ridge, importance_df

----------

## 3.4.3 Elastic Net

<details>
<summary style="cursor: pointer">
<h2> { Understanding Elastic Net Regression } </h2>
</summary>
<h3> What is Elastic Net? </h3>
<p> Elastic Net is a linear regression that combines L1 and L2 penalties, allowing for both variable selection and regularization.</p>
<h3> Its role in Feature Selection: </h3>
<ul>
    <li> Overcomes limitations of Lasso when features are correlated.</li>
    <li> Selects groups of related variables and improves prediction accuracy.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://www.youtube.com/watch?v=2IAfQdAPdLY" target="_blank">Elastic Net Explanation</a></li>
</ol>
</details>

##### Parameters:

- X: Features (DataFrame)
- y: Target variable (array-like)
- alpha: Overall regularization strength (default=1.0)
- l1_ratio: Mix ratio (0 = pure Ridge, 1 = pure Lasso; default=0.5)
- scale_data: Whether to standardize features before fitting
- plot_size: Tuple indicating plot size (width, height)

##### Returns:

- Fitted ElasticNet model (for inspection or reuse)
- DataFrame of features with their corresponding coefficients
- Displays a plot of feature importance (if show_plot=True)

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import ElasticNet

def elasticnet_feature_importance(X,
                                   y,
                                   alpha=1.0,
                                   l1_ratio=0.5,
                                   scale_data=True,
                                   random_state=42,
                                   show_plot=True,
                                   plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    np.random.seed(random_state)

    feature_names = X.columns.tolist()
    X_processed = X.copy()

    if scale_data:
        scaler = StandardScaler()
        X_processed = scaler.fit_transform(X_processed)
    else:
        X_processed = X_processed.values

    # Fit ElasticNet model
    elastic = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=random_state, max_iter=10000)
    elastic.fit(X_processed, y)

    coefs = elastic.coef_

    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Coefficient': coefs,
        'Absolute_Coefficient': np.abs(coefs)
    }).sort_values('Absolute_Coefficient', ascending=False)

    selected_features_df = importance_df[importance_df['Absolute_Coefficient'] > 0].copy()

    if show_plot and not selected_features_df.empty:
        plt.figure(figsize=plot_size)
        selected_features_sorted = selected_features_df.sort_values('Absolute_Coefficient', ascending=True)

        colors = plt.cm.plasma(np.linspace(0.2, 1, len(selected_features_sorted)))
        bars = plt.barh(selected_features_sorted['Feature'],
                        selected_features_sorted['Absolute_Coefficient'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Absolute ElasticNet Coefficient', fontsize=12)
        plt.title('ElasticNet Feature Importance (Shrinkage + Selection)', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.001,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: ElasticNet Regression\n"
            f"Alpha: {alpha} | L1 Ratio: {l1_ratio}\n"
            f"Scale Data: {scale_data}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return elastic, selected_features_df

--------

## 3.4.4 Group Lasso (if features are grouped)

<details>
<summary style="cursor: pointer">
<h2> { Understanding Group Lasso } </h2>
</summary>
<h3> What is Group Lasso? </h3>
<p> Group Lasso extends Lasso by allowing selection of entire groups of features, rather than individual ones.</p>
<h3> Its role in Feature Selection: </h3>
<ul>
    <li> Useful when features are naturally grouped (e.g., polynomial terms or embeddings).</li>
    <li> Selects or discards entire groups, promoting interpretability.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://statisticseasily.com/glossario/what-is-group-lasso-a-comprehensive-guide/" target="_blank">Group Lasso Article</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- y: Target variable (array-like)
- groups: List/array defining group memberships (same length as number of features)
- group_reg: Regularization strength for group penalty (default=0.05)
- l1_reg: Regularization strength for individual features inside groups (default=0.01)
- scale_data: Whether to standardize features before fitting
- max_iter: Maximum number of iterations
- plot_size: Tuple indicating plot size (width, height)

##### Returns:

- Fitted GroupLasso model (for inspection or reuse)
- DataFrame of group-level feature importances
- Displays a plot of group importance (if show_plot=True)

In [5]:
!pip install group_lasso

Collecting group_lasso
  Downloading group_lasso-1.5.0-py3-none-any.whl (33 kB)
Installing collected packages: group_lasso
Successfully installed group_lasso-1.5.0


In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from group_lasso import GroupLasso

def group_lasso_feature_importance(X,
                                    y,
                                    groups,
                                    group_reg=0.05,
                                    l1_reg=0.01,
                                    scale_data=True,
                                    random_state=42,
                                    max_iter=1000,
                                    show_plot=True,
                                    plot_size=(12, 8)):
    

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    np.random.seed(random_state)

    feature_names = X.columns.tolist()
    X_processed = X.copy()

    if scale_data:
        scaler = StandardScaler()
        X_processed = scaler.fit_transform(X_processed)
    else:
        X_processed = X_processed.values

    groups_array = np.array(groups)

    # Fit Group Lasso
    model = GroupLasso(groups=groups_array,
                       group_reg=group_reg,
                       l1_reg=l1_reg,
                       n_iter=max_iter,
                       scale_reg='group_size',
                       random_state=random_state,
                       supress_warning=True)
    model.fit(X_processed, y)

    coefs = model.coef_

    group_importances = {}
    for group_id in np.unique(groups_array):
        if group_id == -1:
            continue  # Ignore unassigned features
        group_indices = np.where(groups_array == group_id)[0]
        group_coef_norm = np.linalg.norm(coefs[group_indices], ord=2)  # L2 norm of group coefficients
        group_importances[group_id] = group_coef_norm

    importance_df = pd.DataFrame({
        'Group': list(group_importances.keys()),
        'Importance': list(group_importances.values())
    }).sort_values('Importance', ascending=False)

    if show_plot and not importance_df.empty:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)

        colors = plt.cm.inferno(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Group'].astype(str),
                        importance_df_sorted['Importance'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Group Coefficient L2 Norm', fontsize=12)
        plt.title('Group Lasso Feature Group Importance', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.001,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: Group Lasso\n"
            f"Group Reg: {group_reg} | L1 Reg: {l1_reg}\n"
            f"Scale Data: {scale_data}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return model, importance_df

--------

## 3.4.5 LARS (Least Angle Regression)

<details>
<summary style="cursor: pointer">
<h2> { Understanding Least Angle Regression (LARS) } </h2>
</summary>
<h3> What is LARS? </h3>
<p> LARS is a regression algorithm particularly useful when the number of features is greater than the number of samples.</p>
<h3> Its role in Feature Selection: </h3>
<ul>
    <li> Efficiently finds the most influential features in a stepwise fashion.</li>
    <li> Closely related to Lasso and can approximate Lasso paths.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://scikit-learn.org/stable/modules/linear_model.html#least-angle-regression" target="_blank">scikit-learn LARS Docs</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- y: Target variable (array-like)
- normalize: Whether to standardize features before fitting
- n_nonzero_coefs: Target number of non-zero coefficients (optional, controls sparsity)
- random_state: Random seed for reproducibility
- show_plot: Whether to plot feature importances
- plot_size: Tuple indicating plot size (width, height)

##### Returns:
- Fitted LARS model
- DataFrame of feature importances (absolute coefficient values)
- Displays a plot of feature importances if show_plot=True

In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lars

def lars_feature_importance(X,
                             y,
                             normalize=True,
                             n_nonzero_coefs=None,
                             random_state=42,
                             show_plot=True,
                             plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    np.random.seed(random_state)

    feature_names = X.columns.tolist()
    X_processed = X.copy()

    if normalize:
        scaler = StandardScaler()
        X_processed = scaler.fit_transform(X_processed)
    else:
        X_processed = X_processed.values

    # Fit LARS
    model = Lars(n_nonzero_coefs=n_nonzero_coefs)
    model.fit(X_processed, y)

    coefs = model.coef_

    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Importance': np.abs(coefs)
    }).sort_values('Importance', ascending=False)

    if show_plot and not importance_df.empty:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)

        colors = plt.cm.plasma(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['Importance'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Absolute Coefficient Value', fontsize=12)
        plt.title('LARS Feature Importance', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.0005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: LARS Regression\n"
            f"Normalize: {normalize}\n"
            f"Target Non-Zero Coefs: {n_nonzero_coefs}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return model, importance_df

------------