--> To put content in collapsable format, we can use the following snippet in the markdowns:

<details>
<summary style="cursor: pointer">
<b> double click this markdown to see the code instead of this </b>
</summary>
    <p> Collapsable content is marked with {} </p>
</details>

--------

# 3.1 Model-based Feature Importance
-- Importance score derived from model internals (weights, impurity, gain, etc.)
- Tree-based models (Decision Tree/ Random Forest/ Extra Trees Importance)
-  XGBoost/LightGBM/CatBoost Feature Importance
- Logistic regression (coefficients)
- SVM (Linear)

---------

## 3.1.1 Tree-based models

#### Best used when:

-  Your data has nonlinear relationships and interactions between variables.
- You have mixed data types (categorical + numerical).
- You're looking for quick, intuitive feature rankings.
- Not ideal for high cardinality categorical features without proper encoding (e.g., too many labels in a single feature).
- May be biased toward features with more levels or higher variance.

#### Use for:

- Feature ranking.
- Pre-filtering before final model training.

---------

### 3.1.1.1 Decision Trees

<details>
<summary style="cursor: pointer">
<h2> { Understanding Decision Trees and it's role in Feature Importance } </h2>
    </summary>
    <h3> What are Decision trees? </h3>
    <p> Decision trees are a type of supervised learning algorithm used for both classification and regression tasks. They work by recursively splitting the dataset inyo subsets based on attribute values.</p>
    <h3> It's role in Feature Importance:</h3>
    <p> Decision trees evaluate feature importance by assessing how much each feature contributes to the reduction of impurity or disorder in the dataset.</p>
    <ul>
        <li> This is often measured using metrics like Gini impurity or entropy </li>
        <li> Feature importance scores can be derived from the decision tree by summing up the weighted impurity reductions for each feature across all the nodes where that feature is used. </li>
        <li> They choose the most informative features for splitting.</li>
    </ul>
    <h3> Resources: </h3>
    <ol>
        <li><a href="https://www.youtube.com/watch?v=7VeUPuFGJHk" target="_blank">Decision Trees Explained (StatQuest)</a></li>
  <li><a href="https://www.youtube.com/watch?v=ZVR2Way4nwQ" target="_blank">Feature Importance in Decision Trees (StatQuest)</a></li>
</details>

#### Decision Tree-specific feature importance with unique tree parameters.
    
##### Parameters:
    
- max_depth: Maximum tree depth (None for unlimited)
- min_samples_split: Minimum samples required to split a node
- min_samples_leaf: Minimum samples required at each leaf node
- criterion: 'gini' or 'entropy' (classification), 'mse' or 'friedman_mse' (regression)
- plot_tree_visualization: Whether to plot the full decision tree
    
##### Returns:

- DataFrame with feature importances
- Optionally displays the decision tree structure

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.tree import plot_tree  # For tree visualization

def decision_tree_feature_importance(x, y, task_type='classification',
                                    max_depth=None, min_samples_split=2,
                                    min_samples_leaf=1, criterion='gini',
                                    random_state=None, figsize=(15, 8),
                                    plot_tree_visualization=False):
    
    # Convert x to DataFrame if it isn't already
    if not isinstance(X, pd.DataFrame):
        x = pd.DataFrame(X)
    
    # Initialize model with decision tree-specific parameters
    if task_type == 'classification':
        model = DecisionTreeClassifier(
            max_depth=max_depth,
            min_samples_split=min_samples_split,
            min_samples_leaf=min_samples_leaf,
            criterion=criterion,
            random_state=random_state
        )
    elif task_type == 'regression':
        model = DecisionTreeRegressor(
            max_depth=max_depth,
            min_samples_split=min_samples_split,
            min_samples_leaf=min_samples_leaf,
            criterion=criterion,
            random_state=random_state
        )
    else:
        raise ValueError("task_type must be 'classification' or 'regression'")
    
    # Fit model
    model.fit(x, y)
    
    # Get feature importance
    importance = model.feature_importances_
    feature_names = x.columns
    
    # Create importance DataFrame
    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Importance': importance
    }).sort_values('Importance', ascending=False)
    
    # Plot feature importance
    plt.figure(figsize=(figsize[0], figsize[1]//2))
    bars = plt.barh(importance_df['Feature'], importance_df['Importance'], color='darkcyan')
    plt.xlabel('Mean Decrease in Impurity')
    plt.title('Decision Tree Feature Importance')
    plt.gca().invert_yaxis()
    
    # Add importance values on bars   # bars contain all the rectangle objects representing the bars
    for bar in bars:                  # Loops through each bar in the horizontal bar plot
                                      # Gets the width of the current bar i.e the feature importance score
        width = bar.get_width()       # For horizontal bars, width = the x-axis value
        plt.text(width + 0.01, bar.get_y() + bar.get_height()/2,
                f'{width:.4f}',
                va='center')
        # width+0.01 : places text just to the right of the bar
        # bar.get_y() + bar.get_height()/2: 
        # - Gets the bottom position of the bar(where the bar touches y/x axis)
        # Texts starts at the position (base+h/2)
        # f'{width:.4f}': Feature Importance value has 4 decimal places
        # va: vertical alignment: ensures numbers stay aligned with the middlle of each bar
    
    # plt.savefig("plot.png", bbox_inches="tight")  # Alternative for saving
    plt.tight_layout()
    plt.show()
    
    # Optional tree visualization
    if plot_tree_visualization:
        plt.figure(figsize=figsize)
        plot_tree(model, 
                 feature_names=feature_names,  
                 filled=True, 
                 rounded=True,
                 proportion=True,
                 impurity=True,
                 class_names=model.classes_ if task_type == 'classification' else None)
        plt.title('Decision Tree Structure')
        plt.show()
    
    return importance_df

### 3.1.1.2 Random Forests

<details>
<summary style="cursor: pointer">
<h2> { Understanding Random Forests and it's role in Feature Importance } </h2>
</summary>
    <h3> What are Random Forests? </h3>
    <p> Random Forests are a type of supervised learning algorithm used for both classification and regression tasks. Random forest produces multiple decision trees, randomly choosing features to make decisions when splitting nodes to create each tree. It then takes these randomized observations from each tree and averages them out to build a final model.</p>
    <h3> It's role in Feature Importance:</h3>
    <p>Random Forests excel at feature selection by providing robust importance rankings that can help identify the most relevant variables in a dataset.</p>
    <ul>
        <li>They calculate feature importance by averaging the importance scores across all trees in the forest, making the rankings more stable than single decision trees.</li>
        <li>Random Forests use Out-of-Bag (OOB) samples to compute permutation importance, which measures how much prediction error increases when a feature is randomly shuffled.</li>
        <li>For a dataset with many features, Random Forests can identify the top variables that contribute most to predictive power, allowing you to reduce dimensionality while preserving model performance.</li>
        <li>They naturally handle feature interactions, making them effective at identifying variables that may not be important individually but are valuable in combination with others.</li>
        <li>When working with high-dimensional data, you can use Random Forest importance scores to create a reduced feature set before training your final model.</li>
    </ul>
    <h3> Resources: </h3>
    <ol>
        <li><a href="https://www.youtube.com/watch?v=J4Wdy0Wc_xQ" target="_blank">Random Forest Explained (StatQuest)</a></li>
  <li><a href="https://www.youtube.com/watch?v=XLFeN8GfD74" target="_blank">Feature Importance in Random Forests (StatQuest)</a></li>
  <li><a href="https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html" target="_blank">Scikit-learn: Feature Importance using Random Forest</a></li>
  <li><a href="https://towardsdatascience.com/random-forest-feature-importance-how-to-interpret-it-and-why-it-matters-cc6c2f4a9f42" target="_blank">Random Forest Feature Importance – How to Interpret It (TDS article)</a></li>
  <li><a href="https://explained.ai/rf-importance/index.html" target="_blank">Explained.ai – Visual Introduction to Random Forest Feature Importance</a></li>
</details>

#### Random Forest-specific feature importance with key RF hyperparameters.
    
##### Parameters:

- n_estimators: Number of trees in the forest
- max_depth: Maximum tree depth
- min_samples_split: Minimum samples to split a node
- min_samples_leaf: Minimum samples at leaf nodes
- max_features: Features considered at each split ('auto', 'sqrt', log2', or int/float)
- bootstrap: Whether bootstrap samples are used

##### Returns:
- DataFrame with feature importances
- Displays a styled importance plot (if show_plot=True)

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split

def random_forest_feature_importance(X, y, task_type='classification',
                                    n_estimators=100, max_depth=None,
                                    min_samples_split=2, min_samples_leaf=1,
                                    max_features='auto', bootstrap=True,
                                    random_state=42, top_n_features=20,
                                    show_plot=True, plot_size=(12, 8)):
    
    # Input validation
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)
    
    # Train-test split
    X_train, x_test, y_train,y_test = train_test_split(X, y, test_size=0.2, random_state=random_state)
    
    # Initialize Random Forest with RF-specific parameters
    if task_type == 'classification':
        model = RandomForestClassifier(
            n_estimators=n_estimators,
            max_depth=max_depth,
            min_samples_split=min_samples_split,
            min_samples_leaf=min_samples_leaf,
            max_features=max_features,
            bootstrap=bootstrap,
            random_state=random_state,
            n_jobs=-1  # Use all cores
        )
    elif task_type == 'regression':
        model = RandomForestRegressor(
            n_estimators=n_estimators,
            max_depth=max_depth,
            min_samples_split=min_samples_split,
            min_samples_leaf=min_samples_leaf,
            max_features=max_features,
            bootstrap=bootstrap,
            random_state=random_state,
            n_jobs=-1
        )
    else:
        raise ValueError("task_type must be 'classification' or 'regression'")
    
    # Train model
    model.fit(X_train, y_train)
    
    # Get importance scores
    importances = model.feature_importances_
    feature_names = X.columns
    std = np.std([tree.feature_importances_ for tree in model.estimators_], axis=0)
    
    # Create importance DataFrame
    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Importance': importances,
        'Std_Deviation': std
    }).sort_values('Importance', ascending=False)
    
    # Select top N features
    if top_n_features:
        importance_df = importance_df.head(top_n_features)
    
    # Enhanced visualization
    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df = importance_df.sort_values('Importance', ascending=True)
        
        # Create colored bars (gradient from low to high importance)
        colors = plt.cm.viridis(np.linspace(0.2, 1, len(importance_df)))
        
        bars = plt.barh(importance_df['Feature'], 
                       importance_df['Importance'], 
                       color=colors,
                       xerr=importance_df['Std_Deviation'],
                       capsize=3)
        
        # Style the plot
        plt.xlabel('Mean Decrease in Impurity (MDI)', fontsize=12)
        plt.title('Random Forest Feature Importance', fontsize=14, pad=20)
        plt.grid(axis='x', alpha=0.3)
        
        # Add importance values on bars
        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005, 
                    bar.get_y() + bar.get_height()/2, 
                    f'{width:.4f}',
                    va='center',
                    fontsize=10)
        
        # Add RF hyperparameters as annotation
        params_text = (
            f"RF Hyperparameters:\n"
            f"n_estimators={n_estimators}, max_features={max_features}\n"
            f"max_depth={max_depth}, min_samples_split={min_samples_split}"
        )
        plt.annotate(params_text,
                    xy=(0.98, 0.02),
                    xycoords='axes fraction',
                    ha='right',
                    va='bottom',
                    bbox=dict(boxstyle='round', alpha=0.1))
        
        plt.tight_layout()
        plt.show()
    
    return importance_df

### 3.1.1.3 Extra Trees (Extremely Randomized Trees)

<details>
<summary style="cursor: pointer">
<h2>{ Understanding Extra Trees and their role in Feature Importance }</h2>
</summary>
    <h3>What are Extra Trees?</h3>
    <p>Extra Trees (Extremely Randomized Trees) are an ensemble learning method that builds upon the Random Forest concept but introduces additional randomization. Unlike Random Forests, Extra Trees randomly select splitting points for each feature rather than searching for the optimal split, and they often use the entire original dataset instead of bootstrap samples.</p>
    <h3>Its role in Feature Importance:</h3>
    <p>Extra Trees provide a unique approach to feature selection by introducing more randomization, which can help identify consistently important features in noisy datasets.</p>
    <ul>
        <li>They calculate feature importance through Mean Decrease in Impurity (MDI), averaging importance scores across all trees with less bias towards high-cardinality features compared to standard decision trees.</li>
        <li>Due to their extreme randomization in splitting, features that consistently perform well across many random splits are truly informative, making the importance rankings more robust against overfitting.</li>
        <li>Unlike Random Forests, Extra Trees typically use bootstrap=False by default, which means they use the entire dataset for each tree, potentially capturing more subtle feature relationships.</li>
        <li>They're particularly effective at feature selection when dealing with datasets containing many correlated features, as the randomization helps reduce the preference for any particular feature among correlated groups.</li>
        <li>The standard deviation of feature importance across trees provides valuable insight into the stability of feature rankings, helping identify which features are consistently important across model variations.</li>
    </ul>
    <h3> Resources: </h3>
    <ol>
        <li><a href="https://people.montefiore.uliege.be/ernst/uploads/news/id63/extremely-randomized-trees.pdf" target="_blank">Extremely Randomized Trees Paper (PDF)</a></li>
<li><a href="https://www.youtube.com/watch?v=Fh4aqbdfFvs" target="_blank">Extremely Randomized Trees Explained (YouTube)</a></li>
    </ol>
</details>

#### Extra Trees-specific feature importance with unique ET parameters.
    
##### Parameters:

- n_estimators: Number of trees
- max_depth: Maximum tree depth
- min_samples_split: Minimum samples to split a node
- min_samples_leaf: Minimum samples at leaf nodes
- max_features: Features considered at each split ('auto', 'sqrt', etc.)
- bootstrap: Whether bootstrap samples are used (typically False for Extra Trees)

##### Returns:

- DataFrame with feature importances and variability
- Displays a styled importance plot (if show_plot=True)

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import ExtraTreesClassifier, ExtraTreesRegressor
from sklearn.model_selection import train_test_split

def extra_trees_feature_importance(X, y, task_type='classification',
                                  n_estimators=100, max_depth=None,
                                  min_samples_split=2, min_samples_leaf=1,
                                  max_features='auto', bootstrap=False,  # Key difference: often bootstrap=False
                                  random_state=42, top_n_features=None,
                                  show_plot=True, plot_size=(12, 8)):
    
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)
    
    # Initialize Extra Trees model with ET-specific defaults
    if task_type == 'classification':
        model = ExtraTreesClassifier(
            n_estimators=n_estimators,
            max_depth=max_depth,
            min_samples_split=min_samples_split,
            min_samples_leaf=min_samples_leaf,
            max_features=max_features,
            bootstrap=bootstrap,  # Often False for Extra Trees
            random_state=random_state,
            n_jobs=-1
        )
    elif task_type == 'regression':
        model = ExtraTreesRegressor(
            n_estimators=n_estimators,
            max_depth=max_depth,
            min_samples_split=min_samples_split,
            min_samples_leaf=min_samples_leaf,
            max_features=max_features,
            bootstrap=bootstrap,
            random_state=random_state,
            n_jobs=-1
        )
    else:
        raise ValueError("task_type must be 'classification' or 'regression'")
    
    # Train model
    model.fit(X, y)
    
    # Get importance scores and variability
    importances = model.feature_importances_
    std = np.std([tree.feature_importances_ for tree in model.estimators_], axis=0)
    
    # Create importance DataFrame
    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': importances,
        'Std_Deviation': std
    }).sort_values('Importance', ascending=False)
    
    # Select top N features
    if top_n_features:
        importance_df = importance_df.head(top_n_features)
    
    # Enhanced visualization
    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df = importance_df.sort_values('Importance', ascending=True)
        
        # Create colored bars with error bars
        colors = plt.cm.plasma(np.linspace(0.2, 1, len(importance_df)))
        bars = plt.barh(importance_df['Feature'], 
                       importance_df['Importance'], 
                       color=colors,
                       xerr=importance_df['Std_Deviation'],
                       capsize=4,
                       alpha=0.7)
        
        # Style the plot
        plt.xlabel('Mean Decrease in Impurity (MDI)', fontsize=12)
        plt.title('Extra Trees Feature Importance\n(More Randomized Splits Than Random Forest)', 
                fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.4)
        
        # Add importance values
        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005, 
                    bar.get_y() + bar.get_height()/2, 
                    f'{width:.4f}',
                    va='center',
                    fontsize=10)
        
        # Add ET-specific parameters as annotation
        params_text = (
            f"Extra Trees Parameters:\n"
            f"n_estimators={n_estimators}, max_features={max_features}\n"
            f"bootstrap={bootstrap}, min_samples_split={min_samples_split}"
        )
        plt.annotate(params_text,
                   xy=(0.98, 0.02),
                   xycoords='axes fraction',
                   ha='right',
                   va='bottom',
                   bbox=dict(boxstyle='round', alpha=0.1))
        
        plt.tight_layout()
        plt.show()
    
    return importance_df

-------------------------------------------------------------------------

## 3.1.2 XGBoost/LightGBM/CatBoost Feature Importance

#### Best used when:

- You want high-performance models with built-in interpretability.
- Your data has missing values, imbalanced classes, or is sparse.
- You want multiple importance views (gain vs. cover vs. permutation).
- May be sensitive to model tuning; different hyperparameters may shift feature rankings.
- May not reflect causal impact, only predictive contribution.

#### Use for:

- Feature selection in high-dimensional, nonlinear tasks.

- Detecting redundant or low-contribution variables.

---------------

### 3.1.2.1 XG Boost

<details>
<summary style="cursor: pointer">
<h2> { Understanding XGBoost and its Role in Feature Importance } </h2>
</summary>
<div>
    <h3> What is XGBoost? </h3>
    <p> XGBoost (Extreme Gradient Boosting) is an optimized gradient boosting framework that uses decision trees in an ensemble method. It is designed to be highly efficient, flexible, and portable, delivering state-of-the-art results on many machine learning tasks with speed and performance optimizations.</p>
    <h3> Its Role in Feature Importance:</h3>
    <p>XGBoost provides several built-in techniques to assess the importance of features in your data, helping practitioners understand which variables are driving model predictions.</p>
    <ul>
        <li>XGBoost calculates feature importance using three main metrics: weight (number of times a feature is used to split), gain (average improvement in accuracy), and cover (relative number of observations affected).</li>
        <li>The <code>plot_importance()</code> function in XGBoost helps visualize the most relevant features ranked by your chosen metric.</li>
        <li>It supports both gain-based and permutation-based importance, allowing for deeper insights into feature relevance and stability.</li>
        <li>Regularization in XGBoost helps prevent overfitting, which in turn leads to more reliable feature importance scores compared to unregularized tree methods.</li>
        <li>In feature selection workflows, XGBoost can serve as a first-pass filter to prune irrelevant or redundant features before model refinement.</li>
    </ul>
    <h3> Resources: </h3>
    <ol>
        <li><a href="https://arxiv.org/abs/1706.06060" target="_blank">Consistent Feature Attribution for Tree Ensembles (arXiv)</a></li>
        <li><a href="https://arxiv.org/abs/2105.05328" target="_blank">Comparing Interpretability and Explainability for Feature Selection (arXiv)</a></li>
        <li><a href="https://arxiv.org/abs/1901.08433" target="_blank">A XGBoost Risk Model via Feature Selection and Bayesian Hyper-parameter Optimization (arXiv)</a></li>
        <li><a href="https://www.youtube.com/watch?v=OtD8wVaFm6E" target="_blank">XGBoost Tutorial for Beginners (YouTube)</a></li>
        <li><a href="https://www.youtube.com/watch?v=3CC4N4z3GJc" target="_blank">XGBoost Feature Importance Explained (YouTube)</a></li>
    </ol>
</div>
</details>

XGBoost-specific feature importance with multiple importance types.
    
##### Parameters:
    
- n_estimators: Number of boosting rounds
- learning_rate: Boosting learning rate
- max_depth: Maximum tree depth
- gamma: Minimum loss reduction to make a split
- subsample: Subsample ratio of training instances
- colsample_bytree: Subsample ratio of features
- reg_alpha: L1 regularization (alpha)
- reg_lambda: L2 regularization (lambda)
- random_state: Random seed
- importance_type: 'weight', 'gain', 'cover', 'total_gain', 'total_cover
    
##### Returns:

- Dictionary of DataFrames for all importance types
- Displays a styled importance plot (if show_plot=True)

In [4]:
import pandas as pd
import matplotlib.pyplot as plt
from xgboost import XGBClassifier, XGBRegressor
from sklearn.model_selection import train_test_split

def xgboost_feature_importance(X, y, task_type='classification',
                             n_estimators=100, learning_rate=0.1,
                             max_depth=3, gamma=0, subsample=0.8,
                             colsample_bytree=0.8, reg_alpha=0, reg_lambda=1,
                             random_state=42, importance_type='weight',
                             top_n_features=20, show_plot=True,
                             plot_size=(12, 8)):
   
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)
    
    # Initialize XGBoost model with XGB-specific parameters
    if task_type == 'classification':
        model = XGBClassifier(
            n_estimators=n_estimators,
            learning_rate=learning_rate,
            max_depth=max_depth,
            gamma=gamma,
            subsample=subsample,
            colsample_bytree=colsample_bytree,
            reg_alpha=reg_alpha,
            reg_lambda=reg_lambda,
            random_state=random_state,
            n_jobs=-1,
            eval_metric='logloss' if len(set(y)) == 2 else 'mlogloss'
        )
    elif task_type == 'regression':
        model = XGBRegressor(
            n_estimators=n_estimators,
            learning_rate=learning_rate,
            max_depth=max_depth,
            gamma=gamma,
            subsample=subsample,
            colsample_bytree=colsample_bytree,
            reg_alpha=reg_alpha,
            reg_lambda=reg_lambda,
            random_state=random_state,
            n_jobs=-1,
            eval_metric='rmse'
        )
    else:
        raise ValueError("task_type must be 'classification' or 'regression'")
    
    # Train model
    model.fit(X, y)
    
    # Get all importance types
    importance_types = ['weight', 'gain', 'cover', 'total_gain', 'total_cover']
    importance_dfs = {}
    
    for imp_type in importance_types:
        importance_scores = model.get_booster().get_score(importance_type=imp_type)
        importance_df = pd.DataFrame({
            'Feature': list(importance_scores.keys()),
            imp_type: list(importance_scores.values())
        }).sort_values(imp_type, ascending=False)
        
        if top_n_features:
            importance_df = importance_df.head(top_n_features)
        
        importance_dfs[imp_type] = importance_df
    
    # Plot specified importance type
    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df = importance_dfs[importance_type].sort_values(importance_type, ascending=True)
        
        # Create gradient-colored bars
        colors = plt.cm.viridis(np.linspace(0.2, 1, len(importance_df)))
        bars = plt.barh(importance_df['Feature'], 
                       importance_df[importance_type], 
                       color=colors,
                       alpha=0.7)
        
        # Style the plot
        plt.xlabel(f'XGBoost Importance ({importance_type})', fontsize=12)
        plt.title(f'XGBoost Feature Importance\n(Importance Type: {importance_type})', 
                 fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)
        
        # Add values on bars
        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005, 
                    bar.get_y() + bar.get_height()/2, 
                    f'{width:.2f}',
                    va='center',
                    fontsize=10)
        
        # Add XGB-specific parameters
        params_text = (
            f"XGBoost Parameters:\n"
            f"learning_rate={learning_rate}, max_depth={max_depth}\n"
            f"subsample={subsample}, colsample_bytree={colsample_bytree}\n"
            f"reg_alpha={reg_alpha}, reg_lambda={reg_lambda}"
        )
        plt.annotate(params_text,
                   xy=(0.98, 0.02),
                   xycoords='axes fraction',
                   ha='right',
                   va='bottom',
                   fontsize=9,
                   bbox=dict(boxstyle='round', alpha=0.1))
        
        plt.tight_layout()
        plt.show()
    
    return importance_dfs

### 3.1.2.2 Light GBM

<details>
<summary style="cursor: pointer">
<h2>{ Understanding LightGBM and its role in Feature Importance }</h2>
</summary>
<h3>What is LightGBM?</h3>
<p>LightGBM (Light Gradient Boosting Machine) is a gradient boosting framework that uses tree-based learning algorithms. It's designed for efficiency and speed, particularly with large datasets, through its unique histogram-based approach and leaf-wise tree growth strategy rather than the level-wise approach used by other boosting algorithms.</p>

<h3>Its role in Feature Importance:</h3>
<p>LightGBM offers powerful built-in capabilities for feature selection that can significantly improve model performance while reducing dimensionality.</p>
<ul>
    <li>It provides multiple feature importance metrics: 'split' counts how many times a feature is used in splits; 'gain' measures the total reduction of loss contributed by splits on a feature.</li>
    <li>LightGBM's exclusive leaf-wise growth strategy focuses on the most promising leaves rather than expanding all nodes at the same level, naturally prioritizing the most informative features.</li>
    <li>It handles categorical features natively through efficient binning, providing more accurate importance measurements for categorical variables without requiring preprocessing like one-hot encoding.</li>
    <li>For high-dimensional datasets, LightGBM includes built-in feature selection capabilities via its 'feature_fraction' parameter, which randomly selects a subset of features for each tree iteration.</li>
    <li>It offers Gradient-based One-Side Sampling (GOSS) that focuses on instances with larger gradients while randomly sampling instances with smaller gradients, implicitly highlighting features that are most relevant for difficult-to-predict samples.</li>
</ul>
<h3>Resources:</h3>
<ol>
    <li><a href="https://lightgbm.readthedocs.io/en/latest/Features.html" target="_blank">LightGBM Official Documentation</a></li>
    <li><a href="https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf" target="_blank">LightGBM: A Highly Efficient Gradient Boosting Decision Tree (Original Paper)</a></li>
    <li><a href="https://www.youtube.com/watch?v=n_ZMQj09S6w" target="_blank">Understanding LightGBM Parameters (Video Tutorial)</a></li>
</ol>
</details>

#### LightGBM-specific feature importance with unique boosting parameters.
    
##### Parameters:
   
- num_leaves: Maximum number of leaves in one tree
- max_depth: Limit tree depth (-1 for no limit)
- learning_rate: Shrinkage rate
- n_estimators: Number of boosting iterations
- min_child_samples: Minimum data in one leaf
- subsample: Row subsample ratio
- colsample_bytree: Column subsample ratio
- reg_alpha: L1 regularization
- reg_lambda: L2 regularization
- importance_type: 'split' (count) or 'gain' (average gain)  
    
##### Returns:

- DataFrame with feature importances
- Displays a styled importance plot (if show_plot=True)

In [5]:
!pip install lightgbm





In [6]:
import pandas as pd
import matplotlib.pyplot as plt
import lightgbm as lgb
from sklearn.model_selection import train_test_split
import numpy as np

def lightgbm_feature_importance(X, y, task_type='classification',
                              num_leaves=31, max_depth=-1,
                              learning_rate=0.1, n_estimators=100,
                              min_child_samples=20, subsample=1.0,
                              colsample_bytree=1.0, reg_alpha=0.0,
                              reg_lambda=0.0, random_state=42,
                              importance_type='split', top_n_features=20,
                              show_plot=True, plot_size=(12, 8)):
    
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)
    
    # LightGBM dataset format
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=random_state)
    train_data = lgb.Dataset(X_train, label=y_train)
    val_data = lgb.Dataset(X_val, label=y_val, reference=train_data)
    
    # LightGBM-specific parameters
    params = {
        'objective': 'binary' if task_type == 'classification' and len(set(y)) == 2 
                    else 'multiclass' if task_type == 'classification' 
                    else 'regression',
        'num_leaves': num_leaves,
        'max_depth': max_depth,
        'learning_rate': learning_rate,
        'feature_fraction': colsample_bytree,
        'bagging_fraction': subsample,
        'min_child_samples': min_child_samples,
        'lambda_l1': reg_alpha,
        'lambda_l2': reg_lambda,
        'verbosity': -1,
        'seed': random_state,
        'num_threads': -1
    }
    
    # Train model
    model = lgb.train(
        params,
        train_data,
        num_boost_round=n_estimators,
        valid_sets=[val_data],
        callbacks=[lgb.early_stopping(stopping_rounds=20, verbose=False)]
    )
    
    # Get importance scores
    importance = model.feature_importance(importance_type=importance_type)
    feature_names = model.feature_name()
    
    # Create importance DataFrame
    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Importance': importance
    }).sort_values('Importance', ascending=False)
    
    # Select top N features
    if top_n_features:
        importance_df = importance_df.head(top_n_features)
    
    # Enhanced visualization
    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df = importance_df.sort_values('Importance', ascending=True)
        
        # Gradient coloring
        colors = plt.cm.coolwarm(np.linspace(0, 1, len(importance_df)))
        bars = plt.barh(importance_df['Feature'], 
                       importance_df['Importance'], 
                       color=colors,
                       alpha=0.7)
        
        # Style the plot
        plt.xlabel(f'Importance ({importance_type})', fontsize=12)
        title_type = 'Split Count' if importance_type == 'split' else 'Average Gain'
        plt.title(f'LightGBM Feature Importance\n({title_type})', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)
        
        # Add values on bars
        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005, 
                    bar.get_y() + bar.get_height()/2, 
                    f'{width:.0f}' if importance_type == 'split' else f'{width:.2f}',
                    va='center',
                    fontsize=10)
        
        # Add LightGBM-specific parameters
        params_text = (
            f"LightGBM Parameters:\n"
            f"num_leaves={num_leaves}, max_depth={max_depth}\n"
            f"learning_rate={learning_rate}, min_child_samples={min_child_samples}\n"
            f"subsample={subsample}, colsample_bytree={colsample_bytree}"
        )
        plt.annotate(params_text,
                    xy=(0.98, 0.02),
                    xycoords='axes fraction',
                    ha='right',
                    va='bottom',
                    fontsize=9,
                    bbox=dict(boxstyle='round', alpha=0.1))
        
        plt.tight_layout()
        plt.show()
    
    return importance_df

### 3.1.2.3 Cat Boost 

<details>
<summary style="cursor: pointer">
<h2> { Understanding CatBoost and its Role in Feature Importance }</h2>
</summary>
<div>
    <h3> What is CatBoost? </h3>
    <p> CatBoost is an open-source gradient boosting library developed by Yandex, designed to handle categorical features efficiently without extensive preprocessing. It employs techniques like ordered boosting and symmetric trees to reduce overfitting and enhance performance across various machine learning tasks, including classification, regression, and ranking. </p>
    <h3> Its Role in Feature Importance:</h3>
    <p>CatBoost offers multiple methods to evaluate feature importance, aiding in understanding model behavior and refining feature selection:</p>
    <ul>
        <li><b>PredictionValuesChange</b>: Measures the average change in predictions when a feature's value changes, indicating its impact on the model's output.</li>
        <li><b>LossFunctionChange</b>: Assesses the change in the loss function when a feature is excluded, providing insight into its contribution to model performance.</li>
        <li><b>ShapValues</b>: Utilizes SHAP (SHapley Additive exPlanations) to attribute contributions of each feature to individual predictions, offering a detailed interpretability framework.</li>
        <li><b>Interaction</b>: Evaluates how combinations of features contribute to the model's predictions, identifying synergistic effects between features.</li>
        <li>Feature importance values can be accessed using the <code>get_feature_importance()</code> method, allowing for analysis and visualization of feature contributions.</li>
    </ul>
    <h3> Resources: </h3>
    <ol>
        <li><a href="https://catboost.ai/docs/en/features/feature-importances-calculation" target="_blank">CatBoost Feature Importance Documentation</a></li>
        <li><a href="https://www.geeksforgeeks.org/catboost-feature-importance/" target="_blank">CatBoost Feature Importance | GeeksforGeeks</a></li>
        <li><a href="https://www.rasgoml.com/feature-engineering-tutorials/how-to-generate-feature-importance-plots-using-catboost" target="_blank">How To Generate Feature Importance Plots Using CatBoost - Rasgo</a></li>
        <li><a href="https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/tree_based_models/Catboost%20tutorial.html" target="_blank">CatBoost Tutorial — SHAP Documentation</a></li>
        <li><a href="https://www.youtube.com/watch?v=9rXW1uHhZxI" target="_blank">CatBoost Tutorial for Beginners (YouTube)</a></li>
    </ol>
</div>
</details>

#### CatBoost-specific feature importance with multiple importance types.
    
##### Parameters:

- iterations: Number of boosting rounds
- learning_rate: Boosting learning rate
- depth: Depth of the trees
- l2_leaf_reg: L2 regularization term
- random_strength: Randomness strength to use at score calculation
- bagging_temperature: Controls intensity of Bayesian bagging
- border_count: Number of splits for numerical features
- random_state: Random seed
- importance_type: 'PredictionValuesChange', 'LossFunctionChange', 'ShapValues'

    
##### Returns:

- Dictionary of DataFrames for all importance types
- Displays a styled importance plot (if show_plot=True)

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from catboost import CatBoostClassifier, CatBoostRegressor, Pool
from sklearn.model_selection import train_test_split

def catboost_feature_importance(X, y, task_type='classification',
                                 iterations=500, learning_rate=0.03,
                                 depth=6, l2_leaf_reg=3,
                                 random_strength=1,
                                 bagging_temperature=1,
                                 border_count=128,
                                 random_state=42,
                                 importance_type='PredictionValuesChange',
                                 top_n_features=20,
                                 show_plot=True,
                                 plot_size=(12, 8)):

    
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)
    
    # Initialize CatBoost model with CatBoost-specific parameters
    model_params = {
        'iterations': iterations,
        'learning_rate': learning_rate,
        'depth': depth,
        'l2_leaf_reg': l2_leaf_reg,
        'random_strength': random_strength,
        'bagging_temperature': bagging_temperature,
        'border_count': border_count,
        'random_state': random_state,
        'verbose': 0
    }
    
    if task_type == 'classification':
        model = CatBoostClassifier(**model_params, loss_function='Logloss')
    elif task_type == 'regression':
        model = CatBoostRegressor(**model_params, loss_function='RMSE')
    else:
        raise ValueError("task_type must be 'classification' or 'regression'")
    
    # Train model
    model.fit(X, y)
    
    # Importance Types
    importance_types = ['PredictionValuesChange', 'LossFunctionChange', 'ShapValues']
    importance_dfs = {}
    
    for imp_type in importance_types:
        if imp_type == 'ShapValues':
            shap_values = model.get_feature_importance(Pool(X, label=y), type=imp_type)
            feature_importance = np.abs(shap_values).mean(axis=0)[:-1]  # exclude last element (bias)
        else:
            feature_importance = model.get_feature_importance(Pool(X, label=y), type=imp_type)
        
        importance_df = pd.DataFrame({
            'Feature': X.columns,
            imp_type: feature_importance
        }).sort_values(imp_type, ascending=False)
        
        if top_n_features:
            importance_df = importance_df.head(top_n_features)
        
        importance_dfs[imp_type] = importance_df
    
    # Plot specified importance type
    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df = importance_dfs[importance_type].sort_values(importance_type, ascending=True)
        
        # Gradient color bars
        colors = plt.cm.plasma(np.linspace(0.2, 1, len(importance_df)))
        bars = plt.barh(importance_df['Feature'],
                        importance_df[importance_type],
                        color=colors,
                        alpha=0.8)
        
        # Style
        plt.xlabel(f'CatBoost Importance ({importance_type})', fontsize=12)
        plt.title(f'CatBoost Feature Importance\n(Importance Type: {importance_type})',
                  fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)
        
        # Annotate values
        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005,
                     bar.get_y() + bar.get_height()/2,
                     f'{width:.2f}',
                     va='center',
                     fontsize=10)
        
        # Add CatBoost-specific hyperparameters
        params_text = (
            f"CatBoost Parameters:\n"
            f"learning_rate={learning_rate}, depth={depth},\n"
            f"l2_leaf_reg={l2_leaf_reg}, random_strength={random_strength}\n"
            f"bagging_temperature={bagging_temperature}, border_count={border_count}"
        )
        plt.annotate(params_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))
        
        plt.tight_layout()
        plt.show()
    
    return importance_dfs

----------

## 3.1.3 Logistic Regression (Coefficients)

#### Best used when:

- You have linearly separable data.
- Your features are standardized or scaled.
- You need interpretable models (e.g., healthcare, finance).
- You want to understand feature direction (positive/negative influence).
- Not ideal for capturing nonlinear interactions.
- Correlated features can distort the interpretation (use L1 regularization for selection).

#### Use for:

- Feature selection for interpretable and simple models.
- Feature ranking when the direction of impact matters.

--------------------

<details>
<summary style="cursor: pointer">
<h2>{ Understanding Logistic Regression and its Role in Feature Importance }</h2>
</summary>
<div>
    <h3> What is Logistic Regression? </h3>
    <p> Logistic Regression is a linear model used for binary (and multiclass) classification tasks. It estimates the probability that a given input belongs to a particular class using the logistic (sigmoid) function and models the relationship between features and the log-odds of the outcome. </p>
    <h3> Its Role in Feature Importance:</h3>
    <p>Logistic Regression directly provides feature importance through its learned coefficients, which reflect the strength and direction of each feature's impact on the target prediction:</p>
    <ul>
        <li>Each feature is assigned a coefficient representing the change in the log-odds of the outcome for a one-unit increase in that feature, holding others constant.</li>
        <li>Positive coefficients indicate that increasing the feature value increases the predicted probability, while negative coefficients indicate the opposite.</li>
        <li>By examining the magnitude of standardized coefficients, one can assess which features have the most influence on predictions.</li>
        <li>Regularization techniques like L1 (Lasso) can be applied to perform automatic feature selection by shrinking less important coefficients to zero.</li>
        <li>Logistic Regression is especially valuable when interpretability is critical, as it provides a transparent mapping between inputs and outputs.</li>
    </ul>
    <h3> Resources: </h3>
    <ol>
        <li><a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html" target="_blank">LogisticRegression — scikit-learn Documentation</a></li>
        <li><a href="https://www.statsmodels.org/stable/examples/notebooks/generated/logit.html" target="_blank">Logistic Regression Example — Statsmodels</a></li>
        <li><a href="https://towardsdatascience.com/interpreting-coefficients-in-logistic-regression-ebd6d4a6f2b0" target="_blank">Interpreting Coefficients in Logistic Regression (Towards Data Science)</a></li>
        <li><a href="https://www.youtube.com/watch?v=yIYKR4sgzI8" target="_blank">StatQuest: Logistic Regression Clearly Explained (YouTube)</a></li>
        <li><a href="https://www.youtube.com/watch?v=zAULhNrnuL4" target="_blank">Feature Importance with Logistic Regression (YouTube)</a></li>
    </ol>
</div>
</details>

#### Logistic Regression-specific feature importance using model coefficients.
    
##### Parameters:
    
- penalty: Regularization type ('l1', 'l2', 'elasticnet', or 'none')
- C: Inverse of regularization strength (smaller -> stronger regularization)
- solver: Optimization algorithm ('liblinear', 'saga', 'lbfgs', etc.)
- max_iter: Maximum number of iterations
- random_state: Random seed
- scale_features: Whether to standardize features before fitting
    
    
##### Returns:
    
- DataFrame with features and their absolute importance (coefficients)
- Displays a sorted bar plot if show_plot=True

In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

def logistic_regression_feature_importance(X, y,
                                            penalty='l2', C=1.0,
                                            solver='lbfgs', max_iter=1000,
                                            random_state=42,
                                            scale_features=True,
                                            top_n_features=20,
                                            show_plot=True,
                                            plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)
    
    feature_names = X.columns.tolist()

    # Standardize features if required
    if scale_features:
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
    else:
        X_scaled = X.values

    # Initialize Logistic Regression model
    model = LogisticRegression(
        penalty=penalty,
        C=C,
        solver=solver,
        max_iter=max_iter,
        random_state=random_state
    )

    # Fit model
    model.fit(X_scaled, y)

    # Get feature importance (absolute value of coefficients)
    if len(np.array(model.coef_).shape) > 1 and model.coef_.shape[0] > 1:
        # For multiclass, take mean absolute value across classes
        coef_importance = np.mean(np.abs(model.coef_), axis=0)
    else:
        coef_importance = np.abs(model.coef_.flatten())

    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Importance': coef_importance
    }).sort_values('Importance', ascending=False)

    # Keep top N features
    if top_n_features:
        importance_df = importance_df.head(top_n_features)

    # Plot feature importance
    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)

        # Create gradient-colored bars
        colors = plt.cm.plasma(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['Importance'],
                        color=colors,
                        alpha=0.8)

        plt.xlabel('Absolute Coefficient Value', fontsize=12)
        plt.title('Logistic Regression Feature Importance\n(Based on Coefficients)', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        # Annotate coefficient values
        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.01, bar.get_y() + bar.get_height()/2,
                     f'{width:.4f}', va='center', fontsize=9)

        # Annotate model parameters
        params_text = (
            f"Logistic Regression Parameters:\n"
            f"penalty={penalty}, C={C}, solver={solver}\n"
            f"scale_features={scale_features}"
        )
        plt.annotate(params_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))
        
        plt.tight_layout()
        plt.show()

    return importance_df

-----------------

--------------------

## 3.1.4 SVM (Linear)

#### Best used when:

- You have high-dimensional data (e.g., text classification, genomics).
- You need sparse and interpretable models (use L1 regularization).
- You want to evaluate directional influence of features.
- Doesn’t handle nonlinear interactions unless kernelized.
- Can be sensitive to feature scaling—standardization is a must.

#### Use for:

- Feature ranking in high-dimensional linear problems.
- Pre-filtering for downstream nonlinear models.-

----------

<details>
<summary style="cursor: pointer">
<h2>{ Understanding Linear SVM and its Role in Feature Importance }</h2>
</summary>
<div>
    <h3> What is Linear SVM? </h3>
    <p> Linear Support Vector Machine (SVM) is a supervised learning algorithm used primarily for classification tasks. It finds the optimal hyperplane that separates data into different classes by maximizing the margin between the closest points (support vectors) of each class.</p>
    <h3> Its Role in Feature Importance:</h3>
    <p>Linear SVMs can provide insight into feature importance through the weights of the hyperplane coefficients:</p>
    <ul>
        <li>The model assigns a weight to each feature, which corresponds to the feature’s contribution to the separating hyperplane.</li>
        <li>Larger absolute values of weights indicate features that have a stronger influence on the decision boundary.</li>
        <li>Feature importance in Linear SVMs is direction-sensitive; positive weights push classification in one direction, negative in the other.</li>
        <li>Regularization (e.g., L1 or L2 penalties) can be applied to induce sparsity or smoothness, helping identify relevant features and reduce overfitting.</li>
        <li>Linear SVMs are particularly effective in high-dimensional settings like text classification, where feature interpretability is important.</li>
    </ul>
    <h3> Resources: </h3>
    <ol>
        <li><a href="https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html" target="_blank">LinearSVC — scikit-learn Documentation</a></li>
        <li><a href="https://www.csie.ntu.edu.tw/~cjlin/papers/linear.pdf" target="_blank">A Practical Guide to Support Vector Classification (PDF)</a></li>
        <li><a href="https://towardsdatascience.com/svm-feature-importance-and-implementation-in-python-1b5b80291d80" target="_blank">SVM Feature Importance and Implementation in Python (Medium)</a></li>
        <li><a href="https://www.youtube.com/watch?v=efR1C6CvhmE" target="_blank">StatQuest: Support Vector Machines Clearly Explained (YouTube)</a></li>
        <li><a href="https://www.youtube.com/watch?v=Y6RRHw9uN9o" target="_blank">SVM Intuition and Feature Impact (YouTube)</a></li>
    </ol>
</div>
</details>

#### SVM (Linear Kernel) specific feature importance based on model coefficients.

##### Parameters:
    
- penalty: Regularization ('l1', 'l2') [classification only]
- loss: Loss function for classification
- dual: Dual or primal formulation
- tol: Tolerance for stopping criterion
- C: Regularization parameter
- fit_intercept: Whether to fit the intercept
- scale_data: Whether to standardize features
    
##### Returns:
    
- DataFrame of features and absolute coefficient importance
- Displays a styled bar plot (if show_plot=True)

In [9]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.svm import LinearSVC, LinearSVR
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

def svm_linear_feature_importance(X, y, task_type='classification',
                                   penalty='l2', loss='squared_hinge',
                                   dual=True, tol=1e-4, C=1.0,
                                   fit_intercept=True, max_iter=1000,
                                   random_state=42,
                                   top_n_features=20, show_plot=True,
                                   scale_data=True, plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    # Feature scaling (SVM sensitive to scale)
    if scale_data:
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        X = pd.DataFrame(X_scaled, columns=X.columns)

    # Initialize SVM model
    if task_type == 'classification':
        model = LinearSVC(
            penalty=penalty,
            loss=loss,
            dual=dual,
            tol=tol,
            C=C,
            fit_intercept=fit_intercept,
            max_iter=max_iter,
            random_state=random_state
        )
    elif task_type == 'regression':
        model = LinearSVR(
            epsilon=0.0,
            tol=tol,
            C=C,
            fit_intercept=fit_intercept,
            max_iter=max_iter,
            random_state=random_state
        )
    else:
        raise ValueError("task_type must be 'classification' or 'regression'")

    # Train model
    model.fit(X, y)

    # Get feature importance (coefficients)
    if task_type == 'classification' and len(np.unique(y)) > 2:
        # For multi-class: take mean of absolute coefficients across classes
        coef = np.mean(np.abs(model.coef_), axis=0)
    else:
        coef = np.abs(model.coef_.ravel())

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': coef
    }).sort_values('Importance', ascending=False)

    if top_n_features:
        importance_df = importance_df.head(top_n_features)

    # Plot
    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)
        
        # Gradient coloring
        colors = plt.cm.plasma(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'], 
                        importance_df_sorted['Importance'], 
                        color=colors,
                        alpha=0.8)

        plt.xlabel('Absolute Coefficient Value', fontsize=12)
        plt.title(f'SVM (Linear) Feature Importance\n(Task: {task_type.capitalize()})', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        # Add values on bars
        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.01, 
                     bar.get_y() + bar.get_height()/2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        # Annotate model hyperparameters
        params_text = (
            f"SVM Parameters:\n"
            f"C={C}, penalty={penalty if task_type == 'classification' else 'N/A'}\n"
            f"loss={loss if task_type == 'classification' else 'N/A'}, tol={tol}\n"
            f"scale_data={scale_data}"
        )
        plt.annotate(params_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))
        
        plt.tight_layout()
        plt.show()

    return importance_df

--------------