--> To put content in collapsable format, we can use the following snippet in the markdowns:

<details>
<summary style="cursor: pointer">
<b> double click the markdown to see the code instead of this </b>
</summary>

-------

# 3.2 Statistical Feature Importance (Correlational Methods)

-- Purely data-driven, pre-modeling filters

- Correlation (Pearson, Spearman, Kendall)
- Mutual Information
- Chi-Square Test (for categorical target)
- ANOVA F-test
- Variance Threshold
- Maximal Information Coefficient (MIC)
- Kolmogorov-Smirnov Statistic

-------------

## 3.2.1 Correlation (Pearson, Spearman, Kendall)

<details>
<summary style="cursor: pointer">
<h2> { Understanding Correlation-based Feature Selection } </h2>
</summary>
<h3> What is Correlation? </h3>
<p> Correlation measures the strength and direction of the relationship between two variables. It helps to determine how changes in one feature reflect in another.</p>
<h3> Types of Correlation Coefficients: </h3>
<ul>
    <li><strong>Pearson:</strong> Measures linear relationships between continuous variables.</li>
    <li><strong>Spearman:</strong> Rank-based; captures monotonic relationships, useful for non-linear data.</li>
    <li><strong>Kendall:</strong> Rank correlation coefficient that evaluates the ordinal association between two quantities.</li>
</ul>
<h3> It's role in Feature Importance:</h3>
<ul>
    <li> Helps identify multicollinearity and remove redundant features.</li>
    <li> Correlation with the target variable indicates predictive power (for regression/classification).</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://www.youtube.com/watch?v=Vfo5le26IhY" target="_blank">Pearson, Spearman, Kendall Explained</a></li>
</ol>
</details>

---------

### 3.2.1.1 Pearson Correlation

Pearson Correlation-based Feature Importance.

##### Parameters:

- X: Features (DataFrame)
- y: Target variable (Series or array-like)
- task_type: 'classification' or 'regression'
- top_n_features: Number of top features to display
- handle_categorical: Encode categorical features automatically
- show_plot: Whether to display the bar plot
- plot_size: Size of the bar plot (width, height)

##### Returns:

- DataFrame containing features and their absolute Pearson correlation
- Displays a styled bar plot (if show_plot=True)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
from sklearn.preprocessing import LabelEncoder

def pearson_feature_importance(X, y,
                                task_type='classification',
                                top_n_features=20,
                                handle_categorical=True,
                                show_plot=True,
                                plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    if handle_categorical:
        X = X.copy()
        for col in X.select_dtypes(include=['object', 'category']).columns:
            X[col] = LabelEncoder().fit_transform(X[col])

    if task_type == 'classification' and not np.issubdtype(np.array(y).dtype, np.number):
        y = LabelEncoder().fit_transform(y)

    importance_scores = []
    for feature in X.columns:
        try:
            corr, _ = pearsonr(X[feature], y)
            importance_scores.append(abs(corr))
        except:
            importance_scores.append(0.0)

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': importance_scores
    }).sort_values('Importance', ascending=False)

    if top_n_features:
        importance_df = importance_df.head(top_n_features)

    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)

        colors = plt.cm.coolwarm(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['Importance'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Absolute Pearson Correlation', fontsize=12)
        plt.title(f'Pearson Correlation Feature Importance\n(Task: {task_type.capitalize()})', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: |Pearson Correlation|\n"
            f"Handle Categorical: {handle_categorical}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df

### 3.2.1.1 Spearman Correlation

<details>
<summary style="cursor: pointer">
<b> Understanding Spearman Correlation and it's role in Feature Importance </b>
</summary>

##### Parameters:

- X: Features (DataFrame)
- y: Target variable (Series or array-like)
- task_type: 'classification' or 'regression'
- top_n_features: Number of top features to display
- handle_categorical: Automatically encode categorical features if necessary
- show_plot: Whether to display the feature importance plot
- plot_size: Tuple indicating the size of the plot (width, height)

##### Returns:

- DataFrame containing features and their absolute Spearman correlation
- Displays a styled bar plot (if show_plot=True)

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import spearmanr
from sklearn.preprocessing import LabelEncoder

def spearman_feature_importance(X, y,
                                 task_type='classification',
                                 top_n_features=20,
                                 handle_categorical=True,
                                 show_plot=True,
                                 plot_size=(12, 8)):
    
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    if handle_categorical:
        X = X.copy()
        for col in X.select_dtypes(include=['object', 'category']).columns:
            X[col] = LabelEncoder().fit_transform(X[col])

    if task_type == 'classification' and not np.issubdtype(np.array(y).dtype, np.number):
        y = LabelEncoder().fit_transform(y)

    importance_scores = []
    for feature in X.columns:
        try:
            corr, _ = spearmanr(X[feature], y)
            importance_scores.append(abs(corr))
        except:
            importance_scores.append(0.0)

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': importance_scores
    }).sort_values('Importance', ascending=False)

    if top_n_features:
        importance_df = importance_df.head(top_n_features)

    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)

        colors = plt.cm.plasma(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['Importance'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Absolute Spearman Correlation', fontsize=12)
        plt.title(f'Spearman Correlation Feature Importance\n(Task: {task_type.capitalize()})', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                   fontsize=9)

        method_text = (
            f"Method: |Spearman Correlation|\n"
            f"Handle Categorical: {handle_categorical}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df

### 3.2.1.1 Kendall Correlation

<details>
<summary style="cursor: pointer">
<b> Understanding Kendall Correlation and it's role in Feature Importance </b>
</summary>

Parameters:

- X: Features (DataFrame)
- y: Target variable (Series or array-like)
- task_type: 'classification' or 'regression'
- top_n_features: Number of top features to display
- handle_categorical: Automatically encode categorical features if necessary
- show_plot: Whether to display the feature importance plot
- plot_size: Tuple indicating the size of the plot (width, height)

Returns:

- DataFrame containing features and their absolute Kendall correlation
- Displays a styled bar plot (if show_plot=True)

In [14]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import kendalltau
from sklearn.preprocessing import LabelEncoder

def kendall_feature_importance(X, y,
                                task_type='classification',
                                top_n_features=20,
                                handle_categorical=True,
                                show_plot=True,
                                plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    if handle_categorical:
        X = X.copy()
        for col in X.select_dtypes(include=['object', 'category']).columns:
            X[col] = LabelEncoder().fit_transform(X[col])

    if task_type == 'classification' and not np.issubdtype(np.array(y).dtype, np.number):
        y = LabelEncoder().fit_transform(y)

    importance_scores = []
    for feature in X.columns:
        try:
            corr, _ = kendalltau(X[feature], y)
            importance_scores.append(abs(corr))
        except:
            importance_scores.append(0.0)

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': importance_scores
    }).sort_values('Importance', ascending=False)

    if top_n_features:
        importance_df = importance_df.head(top_n_features)

    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)

        colors = plt.cm.viridis(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['Importance'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Absolute Kendall Correlation', fontsize=12)
        plt.title(f'Kendall Correlation Feature Importance\n(Task: {task_type.capitalize()})', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: |Kendall Correlation|\n"
            f"Handle Categorical: {handle_categorical}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df

-----------

## 3.2.2 Mutual Information

<details>
<summary style="cursor: pointer">
<h2> { Understanding Mutual Information for Feature Selection } </h2>
</summary>
<h3> What is Mutual Information? </h3>
<p> Mutual Information (MI) measures the amount of information one feature provides about the target. It is non-linear and captures any kind of relationship (not just linear).</p>
<h3> It's role in Feature Importance:</h3>
<ul>
    <li> High MI implies the feature provides a lot of useful information about the target variable.</li>
    <li> Works for both classification and regression problems.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://scikit-learn.org/stable/modules/feature_selection.html#mutual-information" target="_blank">Mutual Information - scikit-learn</a></li>
</ol>
</details>

##### Parameters:

- X: Features (DataFrame)
- y: Target variable (Series or array-like)
- task_type: 'classification' or 'regression'
- top_n_features: Number of top features to display
- handle_categorical: Automatically encode categorical features if necessary
- show_plot: Whether to display the feature importance plot
- discrete_features: Whether to treat features as discrete (default 'auto')
- plot_size: Tuple indicating the size of the plot (width, height)

##### Returns:

- DataFrame containing features and their mutual information scores
- Displays a styled bar plot (if show_plot=True)

In [15]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.feature_selection import mutual_info_classif, mutual_info_regression
from sklearn.preprocessing import LabelEncoder

def mutual_info_feature_importance(X, y,
                                    task_type='classification',
                                    top_n_features=20,
                                    handle_categorical=True,
                                    discrete_features='auto',
                                    show_plot=True,
                                    plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    if handle_categorical:
        X = X.copy()
        for col in X.select_dtypes(include=['object', 'category']).columns:
            X[col] = LabelEncoder().fit_transform(X[col])

    if task_type == 'classification' and not np.issubdtype(np.array(y).dtype, np.number):
        y = LabelEncoder().fit_transform(y)

    if task_type == 'classification':
        mi_scores = mutual_info_classif(X, y, discrete_features=discrete_features, random_state=42)
    elif task_type == 'regression':
        mi_scores = mutual_info_regression(X, y, discrete_features=discrete_features, random_state=42)
    else:
        raise ValueError("task_type must be either 'classification' or 'regression'")

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': mi_scores
    }).sort_values('Importance', ascending=False)

    if top_n_features:
        importance_df = importance_df.head(top_n_features)

    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)

        colors = plt.cm.cividis(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['Importance'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Mutual Information Score', fontsize=12)
        plt.title(f'Mutual Information Feature Importance\n(Task: {task_type.capitalize()})', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: Mutual Information\n"
            f"Handle Categorical: {handle_categorical}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df


----------------

## 3.2.3 Chi-Square Test (for categorical target)

<details>
<summary style="cursor: pointer">
<h2> { Understanding Chi-Square Test for Feature Selection } </h2>
</summary>
<h3> What is the Chi-Square Test? </h3>
<p> The Chi-Square test evaluates if two categorical variables are independent. It compares observed vs. expected frequencies in a contingency table.</p>
<h3> It's role in Feature Importance:</h3>
<ul>
    <li> Useful for categorical input and output variables.</li>
    <li> A high chi-square statistic implies a strong relationship with the target.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://www.youtube.com/watch?v=mjSnOaoqeGo&t=107s" target="_blank">Chi-Square Test Explained</a></li>
</ol>
</details>

##### Parameters:

- X: Features (DataFrame)
- y: Target variable (Series or array-like) — must be categorical
- top_n_features: Number of top features to display
- handle_categorical: Automatically encode categorical features if necessary
- show_plot: Whether to display the feature importance plot
- plot_size: Tuple indicating the size of the plot (width, height)

##### Returns:

- DataFrame containing features and their chi-square statistic scores
- Displays a styled bar plot (if show_plot=True)

In [16]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.feature_selection import chi2
from sklearn.preprocessing import LabelEncoder, MinMaxScaler

def chi2_feature_importance(X, y,
                             top_n_features=20,
                             handle_categorical=True,
                             show_plot=True,
                             plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    if handle_categorical:
        X = X.copy()
        for col in X.select_dtypes(include=['object', 'category']).columns:
            X[col] = LabelEncoder().fit_transform(X[col])

    # Chi2 expects non-negative values, scale features between 0 and 1
    scaler = MinMaxScaler()
    X_scaled = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)

    # Encode target if not numeric
    if not np.issubdtype(np.array(y).dtype, np.number):
        y = LabelEncoder().fit_transform(y)

    chi2_scores, _ = chi2(X_scaled, y)

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': chi2_scores
    }).sort_values('Importance', ascending=False)

    if top_n_features:
        importance_df = importance_df.head(top_n_features)

    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)

        colors = plt.cm.plasma(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['Importance'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Chi-Square Statistic', fontsize=12)
        plt.title('Chi-Square Feature Importance\n(Classification Only)', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.2f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: Chi-Square Test\n"
            f"Handle Categorical: {handle_categorical}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df


-------------------

## 3.2.4 ANOVA F-test

<details>
<summary style="cursor: pointer">
<h2> { Understanding ANOVA F-Test for Feature Selection } </h2>
</summary>
<h3> What is ANOVA F-Test? </h3>
<p> The ANOVA (Analysis of Variance) F-test measures whether the means of two or more groups are significantly different from each other.</p>
<h3> It's role in Feature Importance:</h3>
<ul>
    <li> Used for continuous input features and a categorical output.</li>
    <li> A higher F-score means the feature is more discriminatory with respect to the target class.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://scikit-learn.org/stable/modules/feature_selection.html#f-test-for-feature-selection" target="_blank">ANOVA F-Test - scikit-learn</a></li>
</ol>
</details>

##### Parameters:

- X: Features (DataFrame)
- y: Target variable (Series or array-like) — must be categorical
- top_n_features: Number of top features to display
- handle_categorical: Automatically encode categorical features if necessary
- show_plot: Whether to display the feature importance plot
- plot_size: Tuple indicating the size of the plot (width, height)

##### Returns:

- DataFrame containing features and their ANOVA F-statistic scores
- Displays a styled bar plot (if show_plot=True)

In [17]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.feature_selection import f_classif
from sklearn.preprocessing import LabelEncoder, StandardScaler

def anova_f_feature_importance(X, y,
                                top_n_features=20,
                                handle_categorical=True,
                                show_plot=True,
                                plot_size=(12, 8)):
    
    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    if handle_categorical:
        X = X.copy()
        for col in X.select_dtypes(include=['object', 'category']).columns:
            X[col] = LabelEncoder().fit_transform(X[col])

    # Standardize features
    scaler = StandardScaler()
    X_scaled = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)

    # Encode target if not numeric
    if not np.issubdtype(np.array(y).dtype, np.number):
        y = LabelEncoder().fit_transform(y)

    f_scores, _ = f_classif(X_scaled, y)

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': f_scores
    }).sort_values('Importance', ascending=False)

    if top_n_features:
        importance_df = importance_df.head(top_n_features)

    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)

        colors = plt.cm.viridis(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['Importance'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('ANOVA F-Statistic', fontsize=12)
        plt.title('ANOVA F-Test Feature Importance\n(Classification Only)', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.5,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.1f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: ANOVA F-Test\n"
            f"Handle Categorical: {handle_categorical}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df

-----------------

## 3.2.5 Variance Threshold

<details>
<summary style="cursor: pointer">
<h2> { Understanding Variance Threshold for Feature Selection } </h2>
</summary>
<h3> What is Variance Threshold? </h3>
<p> A simple baseline method that removes features with low variance. Low variance features don't contribute much to learning.</p>
<h3> It's role in Feature Importance:</h3>
<ul>
    <li> Eliminates features that are mostly constant.</li>
    <li> Especially useful in sparse datasets like text data (bag of words).</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://scikit-learn.org/stable/modules/feature_selection.html#variance-threshold" target="_blank">Variance Threshold - scikit-learn</a></li>
</ol>
</details>

##### Parameters:

- X: Features (DataFrame)
- threshold: Variance threshold below which features will be considered low-importance
- top_n_features: Number of top features to display (highest variance first)
- handle_categorical: Automatically encode categorical features if necessary
- show_plot: Whether to display the feature importance plot
- plot_size: Tuple indicating the size of the plot (width, height)

##### Returns:

- DataFrame containing features and their variance scores
- Displays a styled bar plot (if show_plot=True)

In [18]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.feature_selection import VarianceThreshold
from sklearn.preprocessing import LabelEncoder

def variance_threshold_feature_importance(X,
                                           threshold=0.0,
                                           top_n_features=20,
                                           handle_categorical=True,
                                           show_plot=True,
                                           plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)

    if handle_categorical:
        X = X.copy()
        for col in X.select_dtypes(include=['object', 'category']).columns:
            X[col] = LabelEncoder().fit_transform(X[col])

    # Compute variance for each feature
    selector = VarianceThreshold(threshold=threshold)
    selector.fit(X)

    variance_scores = selector.variances_

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': variance_scores
    }).sort_values('Importance', ascending=False)

    if top_n_features:
        importance_df = importance_df.head(top_n_features)

    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('Importance', ascending=True)

        colors = plt.cm.cividis(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['Importance'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('Variance', fontsize=12)
        plt.title('Variance Threshold Feature Importance\n(Unsupervised)', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.0005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: Variance Threshold\n"
            f"Handle Categorical: {handle_categorical}\n"
            f"Threshold: {threshold}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df

-------------------

## 3.2.6 Maximal Information Coefficient (MIC)

<details>
<summary style="cursor: pointer">
<h2> { Understanding Maximal Information Coefficient (MIC) } </h2>
</summary>
<h3> What is MIC? </h3>
<p> MIC captures both linear and non-linear relationships between features and targets. It is part of the Maximal Information-based Nonparametric Exploration (MINE) statistics.</p>
<h3> It's role in Feature Importance:</h3>
<ul>
    <li> Unlike correlation, MIC can detect a wide range of functional relationships.</li>
    <li> Particularly useful when relationships are complex or unknown.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://minepy.readthedocs.io/en/latest/" target="_blank">MINE Documentation</a></li>
</ol>
</details>

##### Parameters:

- X: Features (DataFrame)
- y: Target (Series or 1D array)
- model: A fitted model (for model-specific feature evaluation, optional but recommended)
- top_n_features: Number of top features to display (highest MIC scores first)
- normalize: Whether to normalize MIC scores between 0 and 1
- show_plot: Whether to display the feature importance plot
- plot_size: Tuple indicating the size of the plot (width, height)

##### Returns:

- DataFrame containing features, MIC scores, and optionally model feature importances (if model provided)
- Displays a styled bar plot (if show_plot=True)

In [20]:
!pip install minepy

Collecting minepy
  Downloading minepy-1.2.6.tar.gz (496 kB)
     -------------------------------------- 497.0/497.0 kB 7.9 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: minepy
  Building wheel for minepy (setup.py): started
  Building wheel for minepy (setup.py): finished with status 'done'
  Created wheel for minepy: filename=minepy-1.2.6-cp310-cp310-win_amd64.whl size=48032 sha256=bce2f6b68a8f34f0f6239553180d00465fa047527dbaed21c9cdf53a581ca9df
  Stored in directory: c:\users\hp\appdata\local\pip\cache\wheels\cd\02\e9\6bd979a2348bb20625593ce81029d4b1730194c261077be128
Successfully built minepy
Installing collected packages: minepy
Successfully installed minepy-1.2.6


In [21]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from minepy import MINE
from sklearn.preprocessing import LabelEncoder

def mic_feature_importance(X,
                            y,
                            model=None,
                            top_n_features=20,
                            normalize=True,
                            show_plot=True,
                            plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)
    
    if isinstance(y, (pd.Series, pd.DataFrame)):
        y = y.values.ravel()

    X = X.copy()
    # Label encode categorical features if necessary
    for col in X.select_dtypes(include=['object', 'category']).columns:
        X[col] = LabelEncoder().fit_transform(X[col])

    mic_scores = []
    mic = MINE()

    for col in X.columns:
        mic.compute_score(X[col], y)
        score = mic.mic()
        mic_scores.append(score)

    mic_scores = np.array(mic_scores)

    if normalize:
        mic_scores = mic_scores / mic_scores.max()

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'MIC_Score': mic_scores
    }).sort_values('MIC_Score', ascending=False)

    # Optionally add model feature importances if provided
    if model is not None and hasattr(model, 'feature_importances_'):
        importance_df['Model_Importance'] = model.feature_importances_
    elif model is not None and hasattr(model, 'coef_'):
        importance_df['Model_Importance'] = np.abs(model.coef_).flatten()

    if top_n_features:
        importance_df = importance_df.head(top_n_features)

    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('MIC_Score', ascending=True)

        colors = plt.cm.plasma(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['MIC_Score'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('MIC Score', fontsize=12)
        plt.title('Maximal Information Coefficient (MIC) Feature Importance', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: MIC\n"
            f"Model-Specific: {'Yes' if model else 'No'}\n"
            f"Normalize Scores: {normalize}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df


--------------

## 3.2.7 Kolmogorov-Smirnov Statistic

<details>
<summary style="cursor: pointer">
<h2> { Understanding Kolmogorov-Smirnov (KS) Statistic } </h2>
</summary>
<h3> What is KS Statistic? </h3>
<p> The KS statistic measures the maximum distance between the cumulative distributions of two samples. It evaluates how well a feature separates the classes.</p>
<h3> It's role in Feature Importance:</h3>
<ul>
    <li> Common in binary classification, especially in credit scoring.</li>
    <li> Higher KS value means better separability between classes based on that feature.</li>
</ul>
<h3> Resources: </h3>
<ol>
    <li><a href="https://www.youtube.com/watch?v=ZO2RmSkXK3c" target="_blank">KS Statistic Explained</a></li>
</ol>
</details>

##### Parameters:
    
- X: Features (DataFrame)
- y: Target (Series or 1D array, binary classification only)
- model: A fitted classification model (optional, for combining model-specific feature importances)
- top_n_features: Number of top features to display (highest KS scores first)
- handle_categorical: Automatically encode categorical features if necessary
- show_plot: Whether to display the feature importance plot
- plot_size: Tuple indicating the size of the plot (width, height)

##### Returns:
    
- DataFrame containing features, KS statistic scores, and optionally model feature importances
- Displays a styled bar plot (if show_plot=True)

In [24]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from scipy.stats import ks_2samp

def ks_feature_importance(X,
                           y,
                           model=None,
                           top_n_features=20,
                           handle_categorical=True,
                           show_plot=True,
                           plot_size=(12, 8)):

    if not isinstance(X, pd.DataFrame):
        X = pd.DataFrame(X)
    
    if isinstance(y, (pd.Series, pd.DataFrame)):
        y = y.values.ravel()

    X = X.copy()

    if handle_categorical:
        for col in X.select_dtypes(include=['object', 'category']).columns:
            X[col] = LabelEncoder().fit_transform(X[col])

    ks_scores = []

    for col in X.columns:
        class_0 = X[y == 0][col]
        class_1 = X[y == 1][col]
        ks_stat, _ = ks_2samp(class_0, class_1)
        ks_scores.append(ks_stat)

    ks_scores = np.array(ks_scores)

    importance_df = pd.DataFrame({
        'Feature': X.columns,
        'KS_Score': ks_scores
    }).sort_values('KS_Score', ascending=False)

    # Optionally add model feature importances if provided
    if model is not None and hasattr(model, 'feature_importances_'):
        importance_df['Model_Importance'] = model.feature_importances_
    elif model is not None and hasattr(model, 'coef_'):
        importance_df['Model_Importance'] = np.abs(model.coef_).flatten()

    if top_n_features:
        importance_df = importance_df.head(top_n_features)

    if show_plot:
        plt.figure(figsize=plot_size)
        importance_df_sorted = importance_df.sort_values('KS_Score', ascending=True)

        colors = plt.cm.magma(np.linspace(0.2, 1, len(importance_df_sorted)))
        bars = plt.barh(importance_df_sorted['Feature'],
                        importance_df_sorted['KS_Score'],
                        color=colors,
                        alpha=0.9)

        plt.xlabel('KS Statistic', fontsize=12)
        plt.title('Kolmogorov-Smirnov (KS) Feature Importance', fontsize=14, pad=20)
        plt.grid(axis='x', linestyle='--', alpha=0.3)

        for bar in bars:
            width = bar.get_width()
            plt.text(width + 0.005,
                     bar.get_y() + bar.get_height() / 2,
                     f'{width:.4f}',
                     va='center',
                     fontsize=9)

        method_text = (
            f"Method: Kolmogorov-Smirnov (KS)\n"
            f"Model-Specific: {'Yes' if model else 'No'}\n"
            f"Handle Categorical: {handle_categorical}"
        )
        plt.annotate(method_text,
                     xy=(0.98, 0.02),
                     xycoords='axes fraction',
                     ha='right',
                     va='bottom',
                     fontsize=9,
                     bbox=dict(boxstyle='round', alpha=0.1))

        plt.tight_layout()
        plt.show()

    return importance_df

--------------