<font color="red" size="6"><b>Filter Methods</b></font>
<p><font color="Yellow" size="5"><b>8_Dispersion_Ratio</b></font>

The Dispersion Ratio is a measure that compares the variability of features in a dataset. It is calculated as the ratio of the standard deviation to the mean for each feature. Dispersion Ratio is often used in feature selection to rank features based on their relative variability.

The formula for the Dispersion Ratio is:
<font color="skyblue">Dispersion Ratio=σ/μ</font>


In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_wine

# Load the Wine dataset
data = load_wine()
X = pd.DataFrame(data.data, columns=data.feature_names)  # Features as a DataFrame
y = data.target  # Target variable

# Step 1: Calculate the Dispersion Ratio for each feature
def dispersion_ratio(column):
    std_dev = np.std(column)  # Standard deviation
    mean = np.mean(column)  # Mean
    return std_dev / mean if mean != 0 else 0  # Avoid division by zero

# Apply the Dispersion Ratio calculation to each feature
dispersion_ratios = X.apply(dispersion_ratio, axis=0)

# Step 2: Create a DataFrame for Dispersion Ratios and rank features
dispersion_df = pd.DataFrame({
    'Feature': X.columns,
    'Dispersion Ratio': dispersion_ratios
})
dispersion_df = dispersion_df.sort_values(by='Dispersion Ratio', ascending=False).reset_index(drop=True)

# Display features ranked by Dispersion Ratio
print("Features Ranked by Dispersion Ratio:")
print(dispersion_df)

# Step 3: Select features based on a threshold (e.g., top N features or a dispersion ratio threshold)
threshold = dispersion_ratios.mean()  # Use mean Dispersion Ratio as a threshold
selected_features = dispersion_df[dispersion_df['Dispersion Ratio'] > threshold]['Feature'].tolist()

print("\nSelected Features Based on Dispersion Ratio Threshold:")
print(selected_features)


Features Ranked by Dispersion Ratio:
                         Feature  Dispersion Ratio
0                     flavanoids          0.490841
1                     malic_acid          0.476814
2                color_intensity          0.457043
3                        proline          0.420437
4                proanthocyanins          0.358759
5           nonflavanoid_phenols          0.342965
6                  total_phenols          0.271922
7   od280/od315_of_diluted_wines          0.271087
8                            hue          0.238058
9              alcalinity_of_ash          0.170822
10                     magnesium          0.142792
11                           ash          0.115601
12                       alcohol          0.062270

Selected Features Based on Dispersion Ratio Threshold:
['flavanoids', 'malic_acid', 'color_intensity', 'proline', 'proanthocyanins', 'nonflavanoid_phenols']
