--> To put content in collapsable format, we can use the following snippet in the markdowns:

<details>
<summary style="cursor: pointer">
<b> double click the markdown to see the code instead of this </b>
</summary>

# 3.5 Advanced & Research-based Techniques
-- High-value academic & ensemble techniques

- Boruta Algorithm
- Recursive Feature Elimination (RFE)/ RFECV
- Stability Selection
- Drop-Column Importance
- SHAP (all types)
- LIME
- Permutation Importance
- PDP
- ReliefF
- MRMR (Minimum Redundancy Maximum Relevance)
- Leave-One-Covariate-Out (LOCO)
- Forward/Backward Sequential Feature Selection
- OMP

--------

## 3.5.1 Boruta Algorithm

<details>
<summary style="cursor: pointer">
<h2> { Understanding Boruta for Feature Selection } </h2>
</summary>
<h3> What is Boruta? </h3>
<p> Boruta is a wrapper method built around Random Forest to perform all-relevant feature selection.</p>
<h3> Its role in Feature Selection: </h3>
<ul>
    <li> Compares real features with randomized "shadow" features to decide importance.</li>
    <li> Very robust for high-dimensional data and noise-resistant.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://github.com/scikit-learn-contrib/boruta_py" target="_blank">Boruta Python GitHub</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- y: Target variable (array-like)
- estimator: Base estimator for Boruta (must have feature_importances_ attribute)
- max_iter: Maximum number of Boruta iterations
- n_estimators: Number of trees for tree-based estimators (if estimator supports it)
- random_state: Random seed for reproducibility
- show_plot: Whether to plot feature importances
- plot_size: Tuple indicating plot size (width, height)

##### Returns:
- Fitted Boruta selector
- DataFrame of selected feature importances
- Displays a plot of feature importances if show_plot=True


In [7]:
!pip install boruta

Collecting boruta
  Downloading Boruta-0.4.3-py3-none-any.whl (57 kB)
     ---------------------------------------- 57.9/57.9 kB 3.2 MB/s eta 0:00:00
Installing collected packages: boruta
Successfully installed boruta-0.4.3


In [8]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from boruta import BorutaPy

def boruta_feature_importance(X,
                               y,
                               estimator=None,
                               max_iter=100,
                               n_estimators=100,
                               random_state=42,
                               show_plot=True,
                               plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    np.random.seed(random_state)

    feature_names = X.columns.tolist()
    X_processed = X.copy()

    if estimator is None:
        # Default to RandomForest based on problem type
        if len(np.unique(y)) > 10:
            estimator = RandomForestRegressor(n_estimators=n_estimators, random_state=random_state)
        else:
            estimator = RandomForestClassifier(n_estimators=n_estimators, random_state=random_state)

    boruta_selector = BorutaPy(estimator,
                               n_estimators='auto',
                               max_iter=max_iter,
                               random_state=random_state,
                               verbose=0)

    boruta_selector.fit(X_processed.values, np.array(y))

    selected_features = np.array(feature_names)[boruta_selector.support_].tolist()
    weak_features = np.array(feature_names)[boruta_selector.support_weak_].tolist()
    all_features = selected_features + weak_features

    importances = boruta_selector.ranking_
    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Importance (Rank)': importances
    }).sort_values('Importance (Rank)', ascending=True)

    if show_plot and not importance_df.empty:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance (Rank)', ascending=False)

        colors = plt.cm.coolwarm(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        -importance_df_sorted['Importance (Rank)'],  # Inverted because lower rank is better
                        color=colors,
                        alpha=0.9)

        plt.xlabel('-(Feature Rank)', fontsize=12)
        plt.title('Boruta Feature Selection (Feature Importance via Ranks)', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width - 0.2,
                     bar.get_y() + bar.get_height() / 2,
                     f'{-int(width)}',
                     va='center',
                     fontsize=9,
                     color='white')

        method_text = (
            f"Method: Boruta Selection\n"
            f"Base Estimator: {type(estimator).__name__}\n"
            f"Max Iter: {max_iter}"
        )
        plt.annotate(method_text,
                     xy=(0.02, 0.02),
                     xycoords='axes fraction',
                     ha='left',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return boruta_selector, importance_df

--------

## 3.5.2 Recursive Feature Elimination (RFE)/ RFECV

<details>
<summary style="cursor: pointer">
<h2> { Understanding Recursive Feature Elimination (RFE) / RFECV } </h2>
</summary>
<h3> What is RFE? </h3>
<p> RFE works by recursively removing the least important features and building the model on the remaining attributes.</p>
<h3> Its role in Feature Selection: </h3>
<ul>
    <li> Model-agnostic, supports any estimator with `coef_` or `feature_importances_`.</li>
    <li> RFECV extends RFE with cross-validation to automatically select the optimal number of features.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://scikit-learn.org/stable/modules/feature_selection.html#rfe" target="_blank">scikit-learn RFE Docs</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- y: Target variable (array-like)
- estimator: Base model to use (must have coef_ or feature_importances_ attribute)
- step: Number (or fraction) of features to remove at each iteration
- cv: Number of cross-validation folds (only for RFECV; optional)
- scoring: Scoring metric for cross-validation (optional)
- n_features_to_select: Desired number of features to select (optional, for RFE only)
- random_state: Random seed for reproducibility
- show_plot: Whether to plot feature ranking
- plot_size: Tuple indicating plot size (width, height)

##### Returns:
- Fitted RFE or RFECV selector
- DataFrame of selected feature rankings
- Displays a plot of feature rankings if show_plot=True

In [9]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.feature_selection import RFE, RFECV

def rfe_feature_importance(X,
                            y,
                            estimator,
                            step=1,
                            cv=None,
                            scoring=None,
                            n_features_to_select=None,
                            random_state=42,
                            show_plot=True,
                            plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    np.random.seed(random_state)

    feature_names = X.columns.tolist()
    X_processed = X.copy()

    if cv:
        selector = RFECV(estimator=estimator,
                         step=step,
                         cv=cv,
                         scoring=scoring,
                         n_jobs=-1)
    else:
        selector = RFE(estimator=estimator,
                       n_features_to_select=n_features_to_select,
                       step=step)

    selector.fit(X_processed, y)

    ranking = selector.ranking_
    support = selector.support_

    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Ranking': ranking,
        'Selected': support
    }).sort_values('Ranking', ascending=True)

    if show_plot and not importance_df.empty:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Ranking', ascending=False)

        colors = plt.cm.plasma(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        -importance_df_sorted['Ranking'],  # Lower rank = better
                        color=colors,
                        alpha=0.9)

        plt.xlabel('-(Feature Ranking)', fontsize=12)
        plt.title('Recursive Feature Elimination (Feature Importance by Ranking)', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width - 0.2,
                     bar.get_y() + bar.get_height() / 2,
                     f'{-int(width)}',
                     va='center',
                     fontsize=9,
                     color='white')

        method_text = (
            f"Method: {'RFECV' if cv else 'RFE'}\n"
            f"Estimator: {type(estimator).__name__}\n"
            f"Step: {step}"
        )
        plt.annotate(method_text,
                     xy=(0.02, 0.02),
                     xycoords='axes fraction',
                     ha='left',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return selector, importance_df

--------

## 3.5.3 Stability Selection

<details>
<summary style="cursor: pointer">
<h2> { Understanding Stability Selection } </h2>
</summary>
<h3> What is Stability Selection? </h3>
<p> Stability selection is a method that combines bootstrapping with feature selection techniques like Lasso to identify features that are consistently selected across multiple subsamples.</p>
<h3> Its role in Feature Selection: </h3>
<ul>
    <li> Improves robustness and reduces overfitting by choosing stable features.</li>
    <li> Helps avoid selecting features that only appear due to random variation.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://scikit-learn.org/stable/whats_new/v0.24.html#id13" target="_blank">sklearn StabilitySelection</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- y: Target variable (array-like)
- base_estimator: Base model (must support feature selection via coef_ or feature_importances_)
- n_bootstrap_iterations: Number of bootstrap resampling iterations
- sample_fraction: Fraction of samples used in each bootstrap sample
- selection_threshold: Minimum selection frequency for a feature to be considered important
- random_state: Random seed for reproducibility
- show_plot: Whether to plot feature stability scores
- plot_size: Tuple indicating plot size (width, height)

##### Returns:
- DataFrame of features with their selection frequencies
- List of selected features (based on selection threshold)
- Displays a plot of feature stability scores if show_plot=True


In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.base import clone

def stability_selection(X,
                         y,
                         base_estimator,
                         n_bootstrap_iterations=100,
                         sample_fraction=0.75,
                         selection_threshold=0.5,
                         random_state=42,
                         show_plot=True,
                         plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    np.random.seed(random_state)

    n_samples, n_features = X.shape
    feature_names = X.columns.tolist()
    selection_counts = np.zeros(n_features)

    for iteration in range(n_bootstrap_iterations):
        bootstrap_idx = np.random.choice(n_samples,
                                         size=int(sample_fraction * n_samples),
                                         replace=True)
        X_sample = X.iloc[bootstrap_idx]
        y_sample = np.array(y)[bootstrap_idx]

        estimator = clone(base_estimator)
        estimator.fit(X_sample, y_sample)

        if hasattr(estimator, "coef_"):
            coefs = estimator.coef_
            if coefs.ndim > 1:
                coefs = coefs[0]  # For multi-output models
        elif hasattr(estimator, "feature_importances_"):
            coefs = estimator.feature_importances_
        else:
            raise ValueError("Estimator must have coef_ or feature_importances_ attribute.")

        selected = np.abs(coefs) > 1e-6  # Threshold for non-zero importance
        selection_counts += selected

    stability_scores = selection_counts / n_bootstrap_iterations

    stability_df = pd.DataFrame({
        'Feature': feature_names,
        'Stability_Score': stability_scores
    }).sort_values('Stability_Score', ascending=False)

    selected_features = stability_df[stability_df['Stability_Score'] >= selection_threshold]['Feature'].tolist()

    if show_plot:
        plt.figure(figsize=plot_size)
        stability_df_sorted = stability_df.sort_values('Stability_Score', ascending=True)

        colors = plt.cm.cividis(np.linspace(0.2, 1, len(stability_df_sorted)))
        bars = plt.barh(stability_df_sorted['Feature'],
                        stability_df_sorted['Stability_Score'],
                        color=colors,
                        alpha=0.9)

        plt.axvline(selection_threshold, color='red', linestyle='--', label=f'Threshold ({selection_threshold})')
        plt.xlabel('Stability Score (Selection Frequency)', fontsize=12)
        plt.title('Stability Selection Feature Importance', fontsize=14, pad=20)
        plt.legend()
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.01,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.2f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: Stability Selection\n"
            f"Bootstrap Iterations: {n_bootstrap_iterations}\n"
            f"Sample Fraction: {sample_fraction}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return stability_df, selected_features

--------

## 3.5.4 Drop-Column Importance

<details>
<summary style="cursor: pointer">
<details>
<summary style="cursor: pointer">
<h2> { Understanding Drop-Column Feature Importance } </h2>
</summary>
<h3> What is Drop-Column Importance? </h3>
<p> This technique assesses the importance of a feature by training a model with and without the feature and measuring the performance difference.</p>
<h3> Its role in Feature Importance: </h3>
<ul>
    <li> Measures real impact of each feature on model performance.</li>
    <li> Model-agnostic but computationally expensive.</li>
</ul>
</details>

##### Parameters:
- X: Features (DataFrame)
- y: Target variable (array-like)
- model: A fitted model (must have fit and score methods)
- scoring: Scoring metric function (e.g., accuracy_score, r2_score) or string for model.score
- greater_is_better: Whether higher scores are better (True for accuracy, False for error metrics)
- random_state: Random seed for reproducibility (used if model has stochasticity)
- show_plot: Whether to plot drop-column importances
- plot_size: Tuple indicating plot size (width, height)

##### Returns:
- DataFrame with feature importance scores (performance drop when feature removed)
- List of most important features (descending order)
- Displays a plot of drop-column importances if show_plot=True

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.base import clone
from sklearn.utils import check_random_state

def drop_column_importance(X,
                            y,
                            model,
                            scoring=None,
                            greater_is_better=True,
                            random_state=42,
                            show_plot=True,
                            plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    random_state = check_random_state(random_state)

    base_model = clone(model)
    base_model.fit(X, y)

    if scoring is not None:
        baseline_score = scoring(y, base_model.predict(X))
    else:
        baseline_score = base_model.score(X, y)

    feature_names = X.columns.tolist()
    importances = []

    for feature in feature_names:
        X_dropped = X.drop(columns=[feature])
        model_clone = clone(model)

        # Ensure model's randomness stays consistent if model has random_state
        if hasattr(model_clone, 'random_state'):
            setattr(model_clone, 'random_state', random_state.randint(0, 10000))

        model_clone.fit(X_dropped, y)

        if scoring is not None:
            dropped_score = scoring(y, model_clone.predict(X_dropped))
        else:
            dropped_score = model_clone.score(X_dropped, y)

        # Importance is how much the performance drops
        if greater_is_better:
            importance = baseline_score - dropped_score
        else:
            importance = dropped_score - baseline_score

        importances.append(importance)

    importance_df = pd.DataFrame({
        'Feature': feature_names,
        'Importance': importances
    }).sort_values('Importance', ascending=False)

    important_features = importance_df['Feature'].tolist()

    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)

        colors = plt.cm.plasma(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['Importance'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Performance Drop After Feature Removal', fontsize=12)
        plt.title('Drop-Column Feature Importance', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.001,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: Drop-Column Importance\n"
            f"Greater is Better: {greater_is_better}\n"
            f"Scoring: {'Custom' if scoring is not None else 'model.score'}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df, important_features

--------

## 3.5.5 SHAP (all types)

<details>
<summary style="cursor: pointer">
<h2> { Understanding SHAP (Tree, Kernel, Deep, Linear) } </h2>
</summary>
<h3> What is SHAP? </h3>
<p> SHAP is a game-theoretic approach to explain the output of any machine learning model by assigning each feature an importance value for a particular prediction.</p>
<h3> Its role in Feature Importance: </h3>
<ul>
    <li> Provides consistent, local + global interpretability.</li>
    <li> Supports different models via SHAP variants:
        <ul>
            <li><b>TreeSHAP</b>: For tree-based models.</li>
            <li><b>DeepSHAP</b>: For deep learning models.</li>
            <li><b>KernelSHAP</b>: Model-agnostic.</li>
            <li><b>LinearSHAP</b>: For linear models.</li>
        </ul>
    </li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://shap.readthedocs.io/en/latest/index.html" target="_blank">SHAP Documentation</a></li>
    <li><a href="https://christophm.github.io/interpretable-ml-book/shap.html" target="_blank">Interpretable ML Book - SHAP</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- model: A fitted model (must be compatible with SHAP)
- model_type: Type of model ("tree", "linear", "deep", "kernel", or "auto")
- background_sample_size: Number of background samples to use for KernelExplainer (if needed)
- show_plot: Whether to plot global feature importance
- plot_size: Tuple indicating plot size (width, height)
- max_display: Max number of features to display in plot (optional)
- random_state: Random seed for reproducibility

##### Returns:
- DataFrame of mean absolute SHAP values per feature (global importance)
- SHAP explainer object (for further local explanations if needed)
- Displays a SHAP summary bar plot if show_plot=True

In [14]:
!pip install shap

Collecting shap
  Downloading shap-0.47.2-cp310-cp310-win_amd64.whl (544 kB)
     -------------------------------------- 544.2/544.2 kB 4.9 MB/s eta 0:00:00
Collecting slicer==0.0.8
  Downloading slicer-0.0.8-py3-none-any.whl (15 kB)
Installing collected packages: slicer, shap
Successfully installed shap-0.47.2 slicer-0.0.8


In [15]:
import numpy as np
import pandas as pd
import shap
import matplotlib.pyplot as plt
from sklearn.utils import check_random_state

def shap_feature_importance(X,
                             model,
                             model_type="auto",
                             background_sample_size=100,
                             show_plot=True,
                             plot_size=(12, 8),
                             max_display=None,
                             random_state=42):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    random_state = check_random_state(random_state)

    X_sampled = X.copy()

    # Determine appropriate SHAP Explainer
    if model_type == "tree":
        explainer = shap.TreeExplainer(model)
    elif model_type == "linear":
        explainer = shap.LinearExplainer(model, X_sampled)
    elif model_type == "deep":
        explainer = shap.DeepExplainer(model, X_sampled)
    elif model_type == "kernel":
        background = X_sampled.sample(n=min(background_sample_size, len(X_sampled)),
                                      random_state=random_state)
        explainer = shap.KernelExplainer(model.predict, background)
    elif model_type == "auto":
        try:
            explainer = shap.Explainer(model, X_sampled)
        except Exception as e:
            raise ValueError(f"Auto Explainer could not initialize. Please specify model_type manually. Error: {e}")
    else:
        raise ValueError(f"Unsupported model_type '{model_type}'. Choose from 'tree', 'linear', 'deep', 'kernel', 'auto'.")

    # Compute SHAP values
    shap_values = explainer(X_sampled)

    # If model outputs multiple dimensions, take the first
    if hasattr(shap_values, 'values') and isinstance(shap_values.values, list):
        shap_array = np.abs(shap_values.values[0])
    else:
        shap_array = np.abs(shap_values.values)

    mean_abs_shap = np.mean(shap_array, axis=0)

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'MeanAbsSHAP': mean_abs_shap
    }).sort_values('MeanAbsSHAP', ascending=False)

    if show_plot:
        plt.figure(figsize=plot_size)

        display_df = importance_df if max_display is None else importance_df.head(max_display)
        importance_df_sorted = display_df.sort_values('MeanAbsSHAP', ascending=True)

        colors = plt.cm.magma(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['MeanAbsSHAP'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Mean |SHAP value|', fontsize=12)
        plt.title('SHAP Feature Importance (Global)', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.001,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: SHAP\n"
            f"Model Type: {model_type}\n"
            f"Samples: {len(X_sampled)}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df, explainer

--------

## 3.5.6 LIME

<details>
<summary style="cursor: pointer">
<h2> { Understanding LIME (Local Interpretable Model-Agnostic Explanations) } </h2>
</summary>
<h3> What is LIME? </h3>
<p> LIME explains individual predictions by locally approximating the model with an interpretable one (like a linear model).</p>
<h3> Its role in Feature Importance: </h3>
<ul>
    <li> Provides local feature importance for single predictions.</li>
    <li> Useful for debugging or explaining black-box models.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://github.com/marcotcr/lime" target="_blank">LIME GitHub</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- model: A fitted model
- num_samples: Number of synthetic samples for LIME explanation
- instance_index: Row index of the instance to explain (single instance)
- mode: "classification" or "regression" depending on the model type
- show_plot: Whether to plot local feature contributions
- plot_size: Tuple indicating plot size (width, height)
- feature_names: Custom feature names (optional)
- class_names: Custom class names (for classification, optional)
- random_state: Random seed for reproducibility

##### Returns:
- DataFrame of feature contributions for the explained instance
- LIME explanation object
- Displays a plot of local feature importance if show_plot=True


In [17]:
!pip install lime

Collecting lime
  Downloading lime-0.2.0.1.tar.gz (275 kB)
     -------------------------------------- 275.7/275.7 kB 8.6 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: lime
  Building wheel for lime (setup.py): started
  Building wheel for lime (setup.py): finished with status 'done'
  Created wheel for lime: filename=lime-0.2.0.1-py3-none-any.whl size=283846 sha256=819018af771353c10cff25bc32b045b2bf788600d1db1d9a734e1cc16fb64f3d
  Stored in directory: c:\users\hp\appdata\local\pip\cache\wheels\ac\fc\ba\bc2e218408e730b7ad32dc45fbaa1ae6f0ab314e581101bdff
Successfully built lime
Installing collected packages: lime
Successfully installed lime-0.2.0.1


In [18]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import lime
import lime.lime_tabular
from sklearn.utils import check_random_state

def lime_feature_importance(X,
                             model,
                             num_samples=5000,
                             instance_index=0,
                             mode="regression",
                             show_plot=True,
                             plot_size=(10, 6),
                             feature_names=None,
                             class_names=None,
                             random_state=42):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    random_state = check_random_state(random_state)

    feature_names = feature_names if feature_names else X.columns.tolist()
    X_values = X.values

    # Define LIME explainer
    explainer = lime.lime_tabular.LimeTabularExplainer(
        training_data=X_values,
        feature_names=feature_names,
        class_names=class_names,
        mode=mode,
        discretize_continuous=True,
        random_state=random_state
    )

    # Explain the specific instance
    instance = X_values[instance_index].reshape(1, -1)
    if mode == "classification":
        explanation = explainer.explain_instance(instance.flatten(), model.predict_proba, num_samples=num_samples)
    else:
        explanation = explainer.explain_instance(instance.flatten(), model.predict, num_samples=num_samples)

    # Extract feature contributions
    contributions = dict(explanation.as_list())
    contrib_df = pd.DataFrame({
        'Feature': list(contributions.keys()),
        'Contribution': list(contributions.values())
    }).sort_values('Contribution', ascending=True)

    if show_plot:
        plt.figure(figsize=plot_size)
        colors = plt.cm.coolwarm(np.linspace(0.2, 0.8, len(contrib_df)))
        bars = plt.barh(contrib_df['Feature'],
                        contrib_df['Contribution'],
                        color=colors,
                        alpha=0.9)

        plt.axvline(x=0, color='black', linestyle='--', linewidth=1)
        plt.xlabel('Contribution to Prediction', fontsize=12)
        plt.title(f'LIME Feature Contributions (Instance {instance_index})', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + np.sign(width) * 0.01,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: LIME\n"
            f"Samples: {num_samples}\n"
            f"Mode: {mode}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return contrib_df, explanation

--------

## 3.5.7 Permutation Importance

<details>
<summary style="cursor: pointer">
<h2> { Understanding Permutation Importance } </h2>
</summary>
<h3> What is Permutation Importance? </h3>
<p> Measures the change in model performance after shuffling the values of a feature — breaking the relationship between the feature and the outcome.</p>
<h3> Its role in Feature Importance: </h3>
<ul>
    <li> Model-agnostic and robust.</li>
    <li> Measures a feature’s effect on actual performance metrics.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://scikit-learn.org/stable/modules/permutation_importance.html" target="_blank">sklearn Permutation Importance</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- y: Target values (Series or array)
- model: A fitted model
- scoring: Scoring function or string (e.g., 'r2', 'accuracy') for evaluation
- n_repeats: Number of times to permute a feature
- random_state: Random seed for reproducibility
- show_plot: Whether to plot permutation feature importances
- plot_size: Tuple indicating plot size (width, height)

##### Returns:
- DataFrame of feature importances (mean and standard deviation of importance scores)
- Displays a feature importance plot if show_plot=True

In [19]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.inspection import permutation_importance
from sklearn.utils import check_random_state

def permutation_feature_importance(X,
                                    y,
                                    model,
                                    scoring=None,
                                    n_repeats=10,
                                    random_state=42,
                                    show_plot=True,
                                    plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    random_state = check_random_state(random_state)

    # Compute permutation importance
    result = permutation_importance(
        estimator=model,
        X=X,
        y=y,
        scoring=scoring,
        n_repeats=n_repeats,
        random_state=random_state,
        n_jobs=-1
    )

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance_Mean': result.importances_mean,
        'Importance_Std': result.importances_std
    }).sort_values('Importance_Mean', ascending=True)

    if show_plot:
        plt.figure(figsize=plot_size)
        colors = plt.cm.plasma(np.linspace(0.2, 0.8, len(importance_df)))

        bars = plt.barh(
            importance_df['Feature'],
            importance_df['Importance_Mean'],
            xerr=importance_df['Importance_Std'],
            color=colors,
            alpha=0.9,
            capsize=4
        )

        plt.xlabel('Mean Importance Score', fontsize=12)
        plt.title('Permutation Feature Importance', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + np.sign(width) * 0.0005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: Permutation Importance\n"
            f"Scoring: {scoring}\n"
            f"Repeats: {n_repeats}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df

--------

## 3.5.8 PDP

<details>
<summary style="cursor: pointer">
<h2> { Understanding Partial Dependence Plots (PDP) } </h2>
</summary>
<h3> What is PDP? </h3>
<p> PDPs show the marginal effect of a feature (or a pair) on the predicted outcome, averaging over the values of all other features.</p>
<h3> Its role in Feature Interpretation: </h3>
<ul>
    <li> Not for selection, but helps visualize whether the relationship is linear, monotonic, etc.</li>
    <li> Useful for understanding global feature effect.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://christophm.github.io/interpretable-ml-book/pdp.html" target="_blank">Interpretable ML Book - PDP</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- model: A fitted model
- features: List of feature names (single feature or feature pairs for 2D PDP)
- grid_resolution: Number of points to plot along feature axes
- kind: 'average' (default) or 'individual' for ICE plots
- show_plot: Whether to display the PDP plot
- plot_size: Tuple indicating plot size (width, height)

##### Returns:
- PDP results (as Bunch object)
- Displays the PDP plot if show_plot=True

In [20]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.inspection import PartialDependenceDisplay, partial_dependence

def partial_dependence_plot(X,
                             model,
                             features,
                             grid_resolution=100,
                             kind='average',
                             show_plot=True,
                             plot_size=(12, 8)):
    
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    # Calculate Partial Dependence
    pdp_result = partial_dependence(
        estimator=model,
        X=X,
        features=features,
        grid_resolution=grid_resolution,
        kind=kind
    )

    if show_plot:
        fig, ax = plt.subplots(figsize=plot_size)
        display = PartialDependenceDisplay.from_estimator(
            estimator=model,
            X=X,
            features=features,
            grid_resolution=grid_resolution,
            kind=kind,
            ax=ax
        )

        title_text = (
            f"Method: Partial Dependence Plot\n"
            f"Features: {features}\n"
            f"Kind: {kind}"
        )
        plt.suptitle(title_text, fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()

    return pdp_result

--------

## 3.5.9 ReliefF

<details>
<summary style="cursor: pointer">
<h2> { Understanding ReliefF Algorithm } </h2>
</summary>
<h3> What is ReliefF? </h3>
<p> ReliefF evaluates the importance of features based on how well they differentiate between instances that are near each other.</p>
<h3> Its role in Feature Selection: </h3>
<ul>
    <li> Works well with noisy, multi-class, and non-linear data.</li>
    <li> Ranks features by how well they separate similar/dissimilar instances.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://medium.datadriveninvestor.com/feature-selection-with-relieff-algorithm-96f8cd30c5e3" target="_blank">ReliefF Explained</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- y: Target values (Series or array)
- n_neighbors: Number of neighbors to consider in ReliefF
- n_features_to_select: Number of top features to keep (optional; if None, return all scores)
- discrete_threshold: Threshold to treat features as discrete (optional)
- show_plot: Whether to display feature importance plot
- plot_size: Tuple indicating plot size (width, height)
- random_state: Random seed for reproducibility

##### Returns:
- DataFrame of feature importances (sorted)
- Displays a feature importance plot if show_plot=True

In [22]:
!pip install skrebate

Collecting skrebate
  Downloading skrebate-0.62.tar.gz (19 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: skrebate
  Building wheel for skrebate (setup.py): started
  Building wheel for skrebate (setup.py): finished with status 'done'
  Created wheel for skrebate: filename=skrebate-0.62-py3-none-any.whl size=29266 sha256=53954fcb7d0e790d612cb4d7fa32d22560f1f3ada66e4b87787fa2bcf6e6185f
  Stored in directory: c:\users\hp\appdata\local\pip\cache\wheels\80\cc\df\85c526cc1ab20b1421f3411ee8c2cdad3e6bb1320c846d6943
Successfully built skrebate
Installing collected packages: skrebate
Successfully installed skrebate-0.62


In [23]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.utils import check_random_state
from skrebate import ReliefF  # Requires `scikit-rebate` library

def relieff_feature_importance(X,
                                y,
                                n_neighbors=100,
                                n_features_to_select=None,
                                discrete_threshold=None,
                                show_plot=True,
                                plot_size=(12, 8),
                                random_state=42):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    random_state = check_random_state(random_state)

    relieff = ReliefF(
        n_neighbors=n_neighbors,
        discrete_threshold=discrete_threshold
    )

    relieff.fit(X.values, y)

    importance_scores = relieff.feature_importances_
    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': importance_scores
    }).sort_values('Importance', ascending=True)

    if n_features_to_select:
        importance_df = importance_df.tail(n_features_to_select)

    if show_plot:
        plt.figure(figsize=plot_size)
        colors = plt.cm.cividis(np.linspace(0.2, 0.8, len(importance_df)))

        bars = plt.barh(
            importance_df['Feature'],
            importance_df['Importance'],
            color=colors,
            alpha=0.9
        )

        plt.xlabel('ReliefF Importance Score', fontsize=12)
        plt.title('ReliefF Feature Importance', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + np.sign(width) * 0.0005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: ReliefF\n"
            f"n_neighbors: {n_neighbors}\n"
            f"Selected Features: {n_features_to_select if n_features_to_select else 'All'}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df

--------

## 3.5.10 MRMR (Minimum Redundancy Maximum Relevance)

<details>
<summary style="cursor: pointer">
<h2> { Understanding MRMR (Minimum Redundancy Maximum Relevance) } </h2>
</summary>
<h3> What is MRMR? </h3>
<p> MRMR selects features that are highly relevant to the target and minimally redundant with each other.</p>
<h3> Its role in Feature Selection: </h3>
<ul>
    <li> Balances relevance and diversity among features.</li>
    <li> Often used in high-dimensional datasets like genomics.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://elliot-weissberg.medium.com/another-feature-selection-algorithm-mrmr-3827b6b19e33" target="_blank">MRMR</a></li>
</ol>
</details>

##### Parameters:
- X: Features (DataFrame)
- y: Target values (Series or array)
- n_features_to_select: Number of top features to select
- method: Scoring method for relevance ('mi' for mutual information, 'f' for F-statistic, etc.)
- discrete_features: Whether features are discrete (important for mutual information)
- show_plot: Whether to display feature importance plot
- plot_size: Tuple indicating plot size (width, height)
- random_state: Random seed for reproducibility

##### Returns:
- List of selected feature names (ranked by MRMR)
- DataFrame of feature scores (relevance, redundancy, combined score)
- Displays a feature importance plot if show_plot=True

In [24]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.feature_selection import mutual_info_classif, mutual_info_regression, f_classif, f_regression
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import check_random_state

def mrmr_feature_selection(X,
                            y,
                            n_features_to_select=10,
                            method='mi',
                            discrete_features='auto',
                            show_plot=True,
                            plot_size=(12, 8),
                            random_state=42):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    random_state = check_random_state(random_state)

    if method == 'mi':
        if len(np.unique(y)) <= 10:
            relevance = mutual_info_classif(X, y, discrete_features=discrete_features, random_state=random_state)
        else:
            relevance = mutual_info_regression(X, y, discrete_features=discrete_features, random_state=random_state)
    elif method == 'f':
        if len(np.unique(y)) <= 10:
            relevance, _ = f_classif(X, y)
        else:
            relevance, _ = f_regression(X, y)
    else:
        raise ValueError("Invalid method. Choose 'mi' or 'f'.")

    relevance_scores = pd.Series(relevance, index=X.columns)

    selected_features = []
    candidate_features = list(X.columns)

    redundancy_matrix = X.corr().abs()

    while len(selected_features) < n_features_to_select and candidate_features:
        scores = {}
        for feature in candidate_features:
            if selected_features:
                redundancy = redundancy_matrix.loc[feature, selected_features].mean()
            else:
                redundancy = 0

            score = relevance_scores[feature] - redundancy
            scores[feature] = score

        best_feature = max(scores, key=scores.get)
        selected_features.append(best_feature)
        candidate_features.remove(best_feature)

    final_scores = pd.DataFrame({
        'Feature': selected_features,
        'Relevance': relevance_scores[selected_features].values,
        'Redundancy': [redundancy_matrix.loc[f, selected_features].mean() for f in selected_features],
        'MRMR_Score': [relevance_scores[f] - redundancy_matrix.loc[f, selected_features].mean() for f in selected_features]
    }).sort_values('MRMR_Score', ascending=True)

    if show_plot:
        plt.figure(figsize=plot_size)
        colors = plt.cm.plasma(np.linspace(0.3, 0.9, len(final_scores)))

        bars = plt.barh(
            final_scores['Feature'],
            final_scores['MRMR_Score'],
            color=colors,
            alpha=0.9
        )

        plt.xlabel('MRMR Score (Relevance - Redundancy)', fontsize=12)
        plt.title('MRMR Feature Importance', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + np.sign(width) * 0.0005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: MRMR ({method.upper()})\n"
            f"Features Selected: {n_features_to_select}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return selected_features, final_scores

--------

## 3.5.11 Leave-One-Covariate-Out (LOCO)

<details>
<summary style="cursor: pointer">
<h2> { Understanding LOCO (Leave-One-Covariate-Out) } </h2>
</summary>
<h3> What is LOCO? </h3>
<p> LOCO measures a feature's importance by training a model without it and checking how much the performance drops.</p>
<h3> Its role in Feature Importance: </h3>
<ul>
    <li> Similar to drop-column but framed statistically.</li>
    <li> Useful for creating confidence intervals for importance.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://arxiv.org/abs/1806.03827" target="_blank">LOCO Paper</a></li>
</ol>
</details>

##### Parameters:
- model: A fitted predictive model (must implement predict or predict_proba)
- X: Features (DataFrame)
- y: Target values (Series or array)
- scoring: Scoring function (e.g., accuracy_score, r2_score) depending on task
- task_type: 'classification' or 'regression'
- higher_is_better: Whether a higher score indicates better model performance
- show_plot: Whether to plot LOCO feature importances
- plot_size: Tuple indicating plot size (width, height)
- sample_weight: Optional sample weights for scoring
- random_state: Random seed for reproducibility

##### Returns:
- DataFrame with feature importance scores (performance drop after removing feature)
- Displays LOCO feature importance plot if show_plot=True

In [25]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.utils import check_random_state
from copy import deepcopy

def loco_feature_importance(model,
                             X,
                             y,
                             scoring,
                             task_type='classification',
                             higher_is_better=True,
                             show_plot=True,
                             plot_size=(12, 8),
                             sample_weight=None,
                             random_state=42):
    
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    random_state = check_random_state(random_state)

    X = X.reset_index(drop=True)
    y = pd.Series(y).reset_index(drop=True)

    model = deepcopy(model)

    # Baseline Score (full feature set)
    if task_type == 'classification' and hasattr(model, "predict_proba"):
        y_pred = model.predict_proba(X)
        y_true = y
    else:
        y_pred = model.predict(X)
        y_true = y

    baseline_score = scoring(y_true, y_pred, sample_weight=sample_weight)

    importance_dict = {}

    for feature in X.columns:
        X_temp = X.drop(columns=[feature])

        try:
            y_pred_temp = model.predict(X_temp) if task_type == 'regression' else model.predict_proba(X_temp)
            score_temp = scoring(y_true, y_pred_temp, sample_weight=sample_weight)
        except Exception:
            # Model might depend on fixed input shapes; skip feature if incompatible
            score_temp = np.nan

        importance = (baseline_score - score_temp) if higher_is_better else (score_temp - baseline_score)
        importance_dict[feature] = importance

    importance_df = pd.DataFrame({
        'Feature': list(importance_dict.keys()),
        'LOCO_Importance': list(importance_dict.values())
    }).sort_values('LOCO_Importance', ascending=False).reset_index(drop=True)

    if show_plot:
        plt.figure(figsize=plot_size)
        colors = plt.cm.inferno(np.linspace(0.3, 0.9, len(importance_df)))

        bars = plt.barh(
            importance_df['Feature'],
            importance_df['LOCO_Importance'],
            color=colors,
            alpha=0.9
        )

        plt.xlabel('Decrease in Score After Removing Feature', fontsize=12)
        plt.title('LOCO Feature Importance', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + np.sign(width) * 0.0005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: Leave-One-Covariate-Out (LOCO)\n"
            f"Task: {task_type.title()}\n"
            f"Higher Score Better: {higher_is_better}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df

--------

## 3.5.12 Forward/Backward Sequential Feature Selection

<details>
<summary style="cursor: pointer">
<h2> { Understanding Forward and Backward Selection } </h2>
</summary>
<h3> What is Forward/Backward Selection? </h3>
<p> Forward Selection starts with no features and adds the best one at each step. Backward starts with all and removes the least useful.</p>
<h3> Its role in Feature Selection: </h3>
<ul>
    <li> Simple and greedy heuristic for selecting a subset of features.</li>
    <li> Often used with cross-validation for scoring.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://scikit-learn.org/stable/whats_new/v0.24.html#id13" target="_blank">sklearn Sequential Feature Selector</a></li>
</ol>
</details>

##### Parameters:
- model: A fitted predictive model (must implement fit and predict/predict_proba)
- X: Features (DataFrame)
- y: Target values (Series or array)
- direction: 'forward' or 'backward' selection
- scoring: Scoring function (e.g., accuracy_score, r2_score) depending on task
- n_features_to_select: Number of features to select (optional, defaults to half)
- task_type: 'classification' or 'regression'
- higher_is_better: Whether a higher score indicates better model performance
- show_plot: Whether to plot feature selection process
- plot_size: Tuple indicating plot size (width, height)
- sample_weight: Optional sample weights for scoring
- random_state: Random seed for reproducibility

##### Returns:
- List of selected feature names
- DataFrame of feature selection progression (feature, score)
- Displays the feature selection plot if show_plot=True

In [26]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.base import clone
from sklearn.utils import check_random_state

def sequential_feature_selection(model,
                                  X,
                                  y,
                                  direction='forward',
                                  scoring=None,
                                  n_features_to_select=None,
                                  task_type='classification',
                                  higher_is_better=True,
                                  show_plot=True,
                                  plot_size=(12, 8),
                                  sample_weight=None,
                                  random_state=42):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    random_state = check_random_state(random_state)
    X = X.reset_index(drop=True)
    y = pd.Series(y).reset_index(drop=True)

    all_features = list(X.columns)
    selected_features = [] if direction == 'forward' else all_features.copy()
    remaining_features = all_features.copy()

    if n_features_to_select is None:
        n_features_to_select = max(1, int(len(all_features) / 2))

    selection_progress = []

    while (len(selected_features) < n_features_to_select) if direction == 'forward' else (len(selected_features) > n_features_to_select):
        best_score = -np.inf if higher_is_better else np.inf
        best_feature = None

        search_features = remaining_features if direction == 'forward' else selected_features

        for feature in search_features:
            if direction == 'forward':
                trial_features = selected_features + [feature]
            else:
                trial_features = [f for f in selected_features if f != feature]

            X_trial = X[trial_features]

            temp_model = clone(model)
            temp_model.fit(X_trial, y, sample_weight=sample_weight)
            
            if task_type == 'classification' and hasattr(temp_model, "predict_proba"):
                y_pred = temp_model.predict_proba(X_trial)
            else:
                y_pred = temp_model.predict(X_trial)

            trial_score = scoring(y, y_pred, sample_weight=sample_weight)

            is_better = (trial_score > best_score) if higher_is_better else (trial_score < best_score)

            if is_better:
                best_score = trial_score
                best_feature = feature

        if best_feature is not None:
            if direction == 'forward':
                selected_features.append(best_feature)
                remaining_features.remove(best_feature)
            else:
                selected_features.remove(best_feature)

            selection_progress.append((best_feature, best_score))
        else:
            break

    progression_df = pd.DataFrame(selection_progress, columns=['Feature', 'Score'])

    if show_plot:
        plt.figure(figsize=plot_size)
        plt.plot(range(1, len(progression_df) + 1),
                 progression_df['Score'],
                 marker='o', linestyle='-',
                 color='teal')
        plt.xticks(range(1, len(progression_df) + 1), progression_df['Feature'], rotation=90)
        plt.xlabel('Features Selected', fontsize=12)
        plt.ylabel('Score', fontsize=12)
        plt.title(f"Sequential Feature Selection ({direction.title()})", fontsize=14, pad=20)
        plt.grid(True, linestyle='--', alpha=0.6)

        method_text = (
            f"Method: Sequential Feature Selection\n"
            f"Direction: {direction.title()}\n"
            f"Selected Features: {len(selected_features)}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return selected_features, progression_df

--------