--> To put content in collapsable format, we can use the following snippet in the markdowns:

<details>
<summary style="cursor: pointer">
<b> double click the markdown to see the code instead of this </b>
</summary>

# 3.8 Meta / Consensus Techniques
-- Aggregate rankings across multiple methods

- Rank Aggregation Across Methods
- Weighted Voting of Feature Ranks
- Clustering Importance Scores
- Consensus Stability Score across folds/methods
- SHAP + RFE + Correlation Overlay (hybrid consensus)

----------

## 3.8.1 Rank Aggregation Across Methods

<details>
<summary style="cursor: pointer">
<h2> { Understanding Rank Aggregation Across Methods } </h2>
</summary>
<h3> What is Rank Aggregation? </h3>
<p> Rank aggregation refers to the process of combining feature importance rankings from multiple methods (e.g., SHAP, Decision Trees, Lasso Regression) to obtain a final consensus ranking.</p>
<h3> Role in Feature Selection: </h3>
<ul>
    <li> Improves robustness by combining different perspectives on feature importance.</li>
    <li> Reduces bias that may arise from a single method.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://www.arxiv.org/abs/2204.12563" target="_blank">Combining multiple feature selection algorithms</a></li>
</ol>
</details>

##### Parameters:
- feature_importances_list: List of DataFrames, each containing feature names and importance scores (or rankings) from different methods
- aggregation_method: Strategy to combine ranks (choices: 'mean', 'median', 'geometric_mean', 'borda_count')
- normalize: Whether to normalize individual importance scores before ranking
- show_plot: Whether to plot the final aggregated feature importance
- plot_size: Tuple indicating plot size (width, height)

##### Returns:
- Aggregated feature importance DataFrame
- Displays a plot of aggregated feature rankings if show_plot=True

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import gmean

def rank_aggregation_feature_importance(feature_importances_list,
                                         aggregation_method='mean',
                                         normalize=True,
                                         show_plot=True,
                                         plot_size=(12, 8)):

    if not feature_importances_list:
        raise ValueError("The feature_importances_list cannot be empty.")

    all_features = feature_importances_list[0]['Feature'].tolist()

    importance_matrix = pd.DataFrame(index=all_features)

    for idx, df in enumerate(feature_importances_list):
        imp = df.set_index('Feature')['Importance']
        if normalize:
            imp = (imp - imp.min()) / (imp.max() - imp.min() + 1e-9)
        importance_matrix[f'Method_{idx+1}'] = imp

    # Replace missing values with 0 (if any feature missing in a method)
    importance_matrix = importance_matrix.fillna(0)

    # Ranking
    ranks = importance_matrix.rank(ascending=False, method='average')

    # Aggregation
    if aggregation_method == 'mean':
        aggregated_scores = ranks.mean(axis=1)
    elif aggregation_method == 'median':
        aggregated_scores = ranks.median(axis=1)
    elif aggregation_method == 'geometric_mean':
        aggregated_scores = gmean(ranks + 1e-9, axis=1)  # to avoid zero
    elif aggregation_method == 'borda_count':
        aggregated_scores = ranks.sum(axis=1)
    else:
        raise ValueError("Unsupported aggregation_method. Choose from 'mean', 'median', 'geometric_mean', 'borda_count'.")

    final_importance_df = pd.DataFrame({
        'Feature': aggregated_scores.index,
        'Aggregated_Rank': aggregated_scores
    }).sort_values('Aggregated_Rank', ascending=True).reset_index(drop=True)

    if show_plot and not final_importance_df.empty:
        plt.figure(figsize=plot_size)
        colors = plt.cm.cividis(np.linspace(0.2, 1, len(final_importance_df)))

        bars = plt.barh(final_importance_df['Feature'],
                        -final_importance_df['Aggregated_Rank'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('-(Aggregated Rank)', fontsize=12)
        plt.title('Rank Aggregation (Feature Importance Across Methods)', fontsize=14, pad=20)
        plt.gca().invert_yaxis()
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width - 0.2,
                     bar.get_y() + bar.get_height() / 2,
                     f'{-int(width)}',
                     va='center',
                     fontsize=9,
                     color='white')

        method_text = (
            f"Aggregation: {aggregation_method.capitalize()}\n"
            f"Normalization: {'Yes' if normalize else 'No'}"
        )
        plt.annotate(method_text,
                     xy=(0.02, 0.02),
                     xycoords='axes fraction',
                     ha='left',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return final_importance_df

-------

## 3.8.2 Weighted Voting of Feature Ranks

<details>
<summary style="cursor: pointer">
<h2> { Understanding Weighted Voting of Feature Ranks } </h2>
</summary>
<h3> What is Weighted Voting of Feature Ranks? </h3>
<p> Weighted voting of feature ranks combines rankings from various feature selection techniques by giving different weights to each method based on their performance or reliability.</p>
<h3> Role in Feature Selection: </h3>
<ul>
    <li> Assigns greater importance to methods that have shown higher predictive performance.</li>
    <li> Provides a more balanced and context-sensitive final ranking of features.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://en.wikipedia.org/wiki/Weighted_voting" target="_blank">Feature Selection Techniques: Weighted Voting</a></li>
</ol>
</details>

##### Parameters:
- feature_importances_list: List of DataFrames, each containing 'Feature' and 'Importance' columns
- method_weights: List of floats assigning relative importance to each method (must match feature_importances_list length)
- normalize: Whether to normalize individual importance scores before ranking
- aggregation_rule: How to combine (choices: 'weighted_mean', 'weighted_median')
- show_plot: Whether to plot the final weighted voting feature ranking
- plot_size: Tuple indicating plot size (width, height)

##### Returns:
- Final weighted aggregated feature importance DataFrame
- Displays a plot of weighted feature rankings if show_plot=Truem

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def weighted_voting_feature_importance(feature_importances_list,
                                        method_weights,
                                        normalize=True,
                                        aggregation_rule='weighted_mean',
                                        show_plot=True,
                                        plot_size=(12, 8)):

    if len(feature_importances_list) != len(method_weights):
        raise ValueError("Length of method_weights must match number of feature importance lists.")

    all_features = feature_importances_list[0]['Feature'].tolist()
    importance_matrix = pd.DataFrame(index=all_features)

    for idx, df in enumerate(feature_importances_list):
        imp = df.set_index('Feature')['Importance']
        if normalize:
            imp = (imp - imp.min()) / (imp.max() - imp.min() + 1e-9)
        importance_matrix[f'Method_{idx+1}'] = imp

    importance_matrix = importance_matrix.fillna(0)

    # Rank each method
    ranks = importance_matrix.rank(ascending=False, method='average')

    # Apply weights
    weighted_ranks = ranks.multiply(method_weights, axis=1)

    if aggregation_rule == 'weighted_mean':
        aggregated_scores = weighted_ranks.sum(axis=1) / np.sum(method_weights)
    elif aggregation_rule == 'weighted_median':
        aggregated_scores = weighted_ranks.median(axis=1)
    else:
        raise ValueError("aggregation_rule must be either 'weighted_mean' or 'weighted_median'.")

    final_importance_df = pd.DataFrame({
        'Feature': aggregated_scores.index,
        'Weighted_Aggregated_Rank': aggregated_scores
    }).sort_values('Weighted_Aggregated_Rank', ascending=True).reset_index(drop=True)

    if show_plot and not final_importance_df.empty:
        plt.figure(figsize=plot_size)
        colors = plt.cm.viridis(np.linspace(0.2, 1, len(final_importance_df)))

        bars = plt.barh(final_importance_df['Feature'],
                        -final_importance_df['Weighted_Aggregated_Rank'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('-(Weighted Aggregated Rank)', fontsize=12)
        plt.title('Weighted Voting (Feature Importance by Aggregated Ranks)', fontsize=14, pad=20)
        plt.gca().invert_yaxis()
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width - 0.2,
                     bar.get_y() + bar.get_height() / 2,
                     f'{-int(width)}',
                     va='center',
                     fontsize=9,
                     color='white')

        method_text = (
            f"Aggregation: {aggregation_rule.capitalize()}\n"
            f"Normalization: {'Yes' if normalize else 'No'}"
        )
        plt.annotate(method_text,
                     xy=(0.02, 0.02),
                     xycoords='axes fraction',
                     ha='left',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return final_importance_df

-------

## 3.8.3 Clustering Importance Scores

<details>
<summary style="cursor: pointer">
<h2> { Understanding Clustering Importance Scores } </h2>
</summary>
<h3> What is Clustering Importance? </h3>
<p> Clustering importance scores quantify the contribution of each feature by measuring how well a feature helps in distinguishing or clustering the data points into different groups.</p>
<h3> Role in Feature Ranking: </h3>
<ul>
    <li> Helps in identifying features that significantly impact clustering quality.</li>
    <li> Can be derived from clustering algorithms like K-means or DBSCAN, by evaluating how changes in features affect cluster separation.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6519994/" target="_blank">Clustering Algorithms for Feature Importance</a></li>
</ol>
</details>

##### Parameters:
- feature_importances_list: List of DataFrames, each containing 'Feature' and 'Importance' columns
- clustering_method: Clustering algorithm to use (e.g., 'kmeans', 'agglomerative')
- n_clusters: Number of clusters to form
- cluster_selection_criteria: How to select important clusters (e.g., highest average importance)
- normalize: Whether to normalize importances before clustering
- show_plot: Whether to plot clustered feature importance
- plot_size: Tuple indicating plot size (width, height)
- random_state: Random seed for reproducibility

##### Returns:
- DataFrame of clustered feature importances
- Displays cluster membership and feature ranking if show_plot=True

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.preprocessing import StandardScaler

def clustering_feature_importance(feature_importances_list,
                                   clustering_method='kmeans',
                                   n_clusters=3,
                                   cluster_selection_criteria='highest_mean',
                                   normalize=True,
                                   show_plot=True,
                                   plot_size=(12, 8),
                                   random_state=42):

    if len(feature_importances_list) == 0:
        raise ValueError("feature_importances_list must not be empty.")

    all_features = feature_importances_list[0]['Feature'].tolist()
    importance_matrix = pd.DataFrame(index=all_features)

    for idx, df in enumerate(feature_importances_list):
        imp = df.set_index('Feature')['Importance']
        if normalize:
            imp = (imp - imp.min()) / (imp.max() - imp.min() + 1e-9)
        importance_matrix[f'Method_{idx+1}'] = imp

    importance_matrix = importance_matrix.fillna(0)

    if normalize:
        scaler = StandardScaler()
        data_for_clustering = scaler.fit_transform(importance_matrix)
    else:
        data_for_clustering = importance_matrix.values

    if clustering_method == 'kmeans':
        clustering_model = KMeans(n_clusters=n_clusters, random_state=random_state)
    elif clustering_method == 'agglomerative':
        clustering_model = AgglomerativeClustering(n_clusters=n_clusters)
    else:
        raise ValueError("clustering_method must be either 'kmeans' or 'agglomerative'.")

    cluster_labels = clustering_model.fit_predict(data_for_clustering)

    importance_matrix['Cluster'] = cluster_labels

    # Compute cluster importance means
    cluster_stats = importance_matrix.groupby('Cluster').mean().mean(axis=1)

    if cluster_selection_criteria == 'highest_mean':
        selected_cluster = cluster_stats.idxmax()
    elif cluster_selection_criteria == 'lowest_mean':
        selected_cluster = cluster_stats.idxmin()
    else:
        raise ValueError("cluster_selection_criteria must be 'highest_mean' or 'lowest_mean'.")

    selected_features = importance_matrix[importance_matrix['Cluster'] == selected_cluster]

    final_importance_df = selected_features.reset_index()[['Feature', 'Cluster']]

    if show_plot and not final_importance_df.empty:
        plt.figure(figsize=plot_size)
        colors = plt.cm.tab10(np.linspace(0, 1, n_clusters))

        plt.barh(final_importance_df['Feature'],
                 -final_importance_df['Cluster'],
                 color=[colors[label % 10] for label in final_importance_df['Cluster']],
                 alpha=0.9)

        plt.xlabel('-(Cluster Label)', fontsize=12)
        plt.title(f'Feature Importance by Clustering ({clustering_method.capitalize()})', fontsize=14, pad=20)
        plt.gca().invert_yaxis()
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        plt.tight_layout()
        plt.show()

    return final_importance_df

-------

## 3.8.4 Consensus Stability Score across folds/methods

<details>
<summary style="cursor: pointer">
<h2> { Understanding Consensus Stability Score Across Folds/Methods } </h2>
</summary>
<h3> What is Consensus Stability Score? </h3>
<p> The Consensus Stability Score measures the consistency of feature importance rankings across different data splits (cross-validation folds) or feature selection methods. A high stability score indicates a reliable ranking.</p>
<h3> Role in Feature Selection: </h3>
<ul>
    <li> Helps validate feature importance rankings by testing them across different settings.</li>
    <li> Enhances confidence in the final feature set chosen for model training.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC5872818/" target="_blank">Consensus Stability Score</a></li>
</ol>
</details>

##### Parameters:
- feature_importances_list: List of DataFrames, each with 'Feature' and 'Importance' columns
- threshold: Minimum importance to consider a feature "selected" (optional)
- stability_metric: Stability computation method ('jaccard', 'spearman')
- normalize: Whether to normalize importance scores before thresholding
- plot_stability: Whether to show a plot of stability scores
- plot_size: Tuple indicating plot size (width, height)
- random_state: Random seed for reproducibility

##### Returns:
- DataFrame with feature stability scores
- Displays a stability plot if plot_stability=True

In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import spearmanr
from itertools import combinations

def consensus_stability_score(feature_importances_list,
                               threshold=None,
                               stability_metric='jaccard',
                               normalize=True,
                               plot_stability=True,
                               plot_size=(12, 8),
                               random_state=42):

    if len(feature_importances_list) < 2:
        raise ValueError("Need at least two importance lists for consensus stability.")

    np.random.seed(random_state)

    features = feature_importances_list[0]['Feature'].tolist()

    # Build matrix
    importance_matrix = pd.DataFrame(index=features)

    for idx, df in enumerate(feature_importances_list):
        imp = df.set_index('Feature')['Importance']
        if normalize:
            imp = (imp - imp.min()) / (imp.max() - imp.min() + 1e-9)
        importance_matrix[f'Method_{idx+1}'] = imp

    importance_matrix = importance_matrix.fillna(0)

    stability_scores = []

    # Compute pairwise stability
    for f_idx, feature in enumerate(features):
        values = importance_matrix.loc[feature].values
        pairwise_stability = []

        for i, j in combinations(range(len(values)), 2):
            v1, v2 = values[i], values[j]

            if stability_metric == 'jaccard':
                selected1 = v1 >= (threshold if threshold is not None else 0.5)
                selected2 = v2 >= (threshold if threshold is not None else 0.5)
                intersection = int(selected1 and selected2)
                union = int(selected1 or selected2)
                score = intersection / union if union != 0 else 1.0
            elif stability_metric == 'spearman':
                score, _ = spearmanr([v1], [v2])
                score = score if not np.isnan(score) else 0
            else:
                raise ValueError("stability_metric must be 'jaccard' or 'spearman'.")

            pairwise_stability.append(score)

        stability_scores.append(np.mean(pairwise_stability))

    stability_df = pd.DataFrame({
        'Feature': features,
        'Stability_Score': stability_scores
    }).sort_values('Stability_Score', ascending=False)

    if plot_stability and not stability_df.empty:
        plt.figure(figsize=plot_size)
        colors = plt.cm.viridis(np.linspace(0.2, 1, len(stability_df)))

        bars = plt.barh(stability_df['Feature'],
                        stability_df['Stability_Score'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Consensus Stability Score', fontsize=12)
        plt.title(f'Consensus Feature Stability ({stability_metric.capitalize()})', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)
        plt.gca().invert_yaxis()

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.01,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.2f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: {stability_metric.capitalize()}\n"
            f"Threshold: {threshold if threshold is not None else '0.5'}"
        )
        plt.annotate(method_text,
                     xy=(0.02, 0.02),
                     xycoords='axes fraction',
                     ha='left',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return stability_df

-------

## 3.8.5 SHAP + RFE + Correlation Overlay (hybrid consensus)

<details>
<summary style="cursor: pointer">
<h2> { Understanding SHAP + RFE + Correlation Overlay (Hybrid Consensus) } </h2>
</summary>
<h3> What is Hybrid Consensus? </h3>
<p> This method involves combining multiple feature ranking techniques (SHAP, Recursive Feature Elimination (RFE), and correlation-based methods) to obtain a final feature ranking that reflects both model-specific and statistical perspectives.</p>
<h3> Role in Feature Selection: </h3>
<ul>
    <li> Combines global model interpretability (SHAP), recursive feature selection (RFE), and feature redundancy assessment (Correlation) to provide a well-rounded ranking.</li>
    <li> Offers both local (SHAP) and global (RFE, Correlation) insights into feature importance.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://shap.readthedocs.io/en/latest/" target="_blank">SHAP Documentation</a></li>
    <li><a href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html" target="_blank">Scikit-learn RFE</a></li>
    <li><a href="https://en.wikipedia.org/wiki/Correlation" target="_blank">Correlation Methods Overview</a></li>
</ol>
</details>

##### Parameters:
- model: Trained model (must be compatible with SHAP explainer)
- X: Features (DataFrame)
- y: Target variable (array-like)
- correlation_threshold: Maximum allowed feature inter-correlation (e.g., 0.9)
- step: Number or fraction of features to remove at each RFE iteration
- scoring: Scoring metric for RFE (optional)
- cv: Number of cross-validation folds for RFE (optional)
- shap_sample_size: Subsample size for SHAP value computation (optional)
- random_state: Random seed for reproducibility
- plot_result: Whether to plot final hybrid feature importance
- plot_size: Tuple indicating plot size (width, height)

##### Returns:
- DataFrame of hybrid consensus feature importance
- Displays hybrid importance plot if plot_result=True

In [7]:
import numpy as np
import pandas as pd
import shap
import matplotlib.pyplot as plt
from sklearn.feature_selection import RFE
from sklearn.model_selection import StratifiedKFold
from sklearn.base import clone

def hybrid_shap_rfe_correlation(model,
                                 X,
                                 y,
                                 correlation_threshold=0.9,
                                 step=1,
                                 scoring=None,
                                 cv=5,
                                 shap_sample_size=1000,
                                 random_state=42,
                                 plot_result=True,
                                 plot_size=(14, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    np.random.seed(random_state)

    # Step 1: Compute SHAP Importances
    explainer = shap.Explainer(model, X)
    shap_values = explainer(X.sample(n=min(shap_sample_size, len(X)), random_state=random_state))
    shap_importance = np.abs(shap_values.values).mean(axis=0)
    shap_df = pd.DataFrame({
        'Feature': X.columns,
        'SHAP_Importance': shap_importance
    }).sort_values('SHAP_Importance', ascending=False)

    # Step 2: Recursive Feature Elimination
    estimator = clone(model)
    rfe = RFE(estimator=estimator, step=step)
    rfe.fit(X, y)
    rfe_ranking = rfe.ranking_

    rfe_df = pd.DataFrame({
        'Feature': X.columns,
        'RFE_Rank': rfe_ranking
    })

    # Step 3: Correlation Overlay
    corr_matrix = X.corr().abs()
    upper_tri = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
    correlated_features = [column for column in upper_tri.columns if any(upper_tri[column] > correlation_threshold)]

    correlation_df = pd.DataFrame({
        'Feature': X.columns,
        'Highly_Correlated': X.columns.isin(correlated_features)
    })

    # Step 4: Combine All
    hybrid_df = shap_df.merge(rfe_df, on='Feature').merge(correlation_df, on='Feature')
    
    # Final Scoring: SHAP Importance (higher) + 1/RFE Rank (lower rank = better) - Correlation Penalty
    hybrid_df['Hybrid_Score'] = (
        (hybrid_df['SHAP_Importance'].rank(ascending=False) + 
         (1 / hybrid_df['RFE_Rank']).rank(ascending=False))
    )

    # Apply penalty for correlated features
    hybrid_df.loc[hybrid_df['Highly_Correlated'], 'Hybrid_Score'] *= 0.8

    hybrid_df = hybrid_df.sort_values('Hybrid_Score', ascending=False).reset_index(drop=True)

    if plot_result and not hybrid_df.empty:
        plt.figure(figsize=plot_size)
        colors = plt.cm.coolwarm(np.linspace(0.2, 1, len(hybrid_df)))

        bars = plt.barh(hybrid_df['Feature'],
                        hybrid_df['Hybrid_Score'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Hybrid Feature Score', fontsize=12)
        plt.title('SHAP + RFE + Correlation Overlay (Hybrid Importance)', fontsize=15, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)
        plt.gca().invert_yaxis()

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.01,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.2f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Hybrid Method: SHAP + RFE + Correlation\n"
            f"Correlation Threshold: {correlation_threshold}"
        )
        plt.annotate(method_text,
                     xy=(0.02, 0.02),
                     xycoords='axes fraction',
                     ha='left',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return hybrid_df

-------