<font color="red" size="6">Filter Methods</font>
<p><font color="Yellow" size="4">1_Variance Threshold</font>

<b><font color="blue">Variance Threshold :</font></b>
The Variance Threshold feature selection method removes features with low variance, assuming that low-variance features provide less information for the model.

<b>Steps:</b>
<ol>
   <li> Calculate the variance of each feature in the dataset.</li>
   <li> Remove features with variance below a predefined threshold.</li></ol>

In [1]:
import numpy as np
import pandas as pd
from sklearn.feature_selection import VarianceThreshold

In [2]:
# Create a sample dataset
data = {
    "Feature1": [1, 1, 1, 1, 1],   # Variance = 0
    "Feature2": [1, 2, 3, 4, 5],   # Variance = 2.5
    "Feature3": [5, 5, 5, 5, 5],   # Variance = 0
    "Feature4": [1, 0, 1, 0, 1]    # Variance = 0.25
}

In [3]:
df = pd.DataFrame(data)
print("Original Dataset:")
print(df)

Original Dataset:
   Feature1  Feature2  Feature3  Feature4
0         1         1         5         1
1         1         2         5         0
2         1         3         5         1
3         1         4         5         0
4         1         5         5         1


In [4]:
# Apply Variance Threshold
selector = VarianceThreshold(threshold=0.1)  # Set threshold
X_selected = selector.fit_transform(df)

In [5]:
# Get the selected features
selected_columns = df.columns[selector.get_support()]
print("\nSelected Features:")
print(selected_columns)


Selected Features:
Index(['Feature2', 'Feature4'], dtype='object')


In [6]:
# Display the resulting dataset
df_selected = pd.DataFrame(X_selected, columns=selected_columns)
print("\nDataset After Applying Variance Threshold:")
print(df_selected)


Dataset After Applying Variance Threshold:
   Feature2  Feature4
0         1         1
1         2         0
2         3         1
3         4         0
4         5         1


Variance Threshold on the Iris Dataset

In [7]:
from sklearn.datasets import load_iris
from sklearn.feature_selection import VarianceThreshold

# Load the Iris dataset
iris = load_iris()
X = iris.data
feature_names = iris.feature_names

# Apply Variance Threshold
selector = VarianceThreshold(threshold=0.2)  # Example threshold
X_selected = selector.fit_transform(X)

# Selected feature indices and names
selected_indices = selector.get_support(indices=True)
selected_features = [feature_names[i] for i in selected_indices]

print("Original Features:", feature_names)
print("Selected Features:", selected_features)

# Display the shape of the datasets
print("Original shape:", X.shape)
print("Reduced shape:", X_selected.shape)


Original Features: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Selected Features: ['sepal length (cm)', 'petal length (cm)', 'petal width (cm)']
Original shape: (150, 4)
Reduced shape: (150, 3)


<b><font color="pink">Threshold Value:</font></b>
</ol>
    <li>Adjust the threshold parameter to control which features are removed.</li><li> Features with variance below this value are dropped.</li>
    <li>Common values: 0.01 (low threshold) to 0.1 (moderate).</li></ol>

<b><font color="pink">Scaling Impact:</font></b>
</ol>
    <li>If the dataset isn’t scaled, features with larger magnitudes might dominate.</li><li> Consider scaling (e.g., StandardScaler) before applying the threshold.</li></ol>