# Feature Selection Techniques in Machine Learning

In [1]:
import warnings
warnings.filterwarnings("ignore")

## Removing features with low variance

The VarianceThreshold method from scikit-learn is used for feature selection based on variance. It removes features whose variance does not meet a certain threshold.

In [10]:
from sklearn.feature_selection import VarianceThreshold

# Input features
X = [[0, 0, 1], 
     [0, 1, 0], 
     [1, 0, 0], 
     [0, 1, 1], 
     [0, 1, 0], 
     [0, 1, 1]]

# Initialize VarianceThreshold with threshold
sel = VarianceThreshold(threshold=(.8 * (1 - .8)))

# Fit and transform the input features
selected_features = sel.fit_transform(X)

print(selected_features)


[[0 1]
 [1 0]
 [0 0]
 [1 1]
 [1 0]
 [1 1]]


## Filter Methods

These methods select features from the dataset irrespective of the use of any machine learning algorithm. 

### Information gain

Information gain calculates the reduction in entropy from the transformation of a dataset. It can be used for feature selection by evaluating the Information gain of each variable in the context of the target variable.

In [6]:
from sklearn.feature_selection import mutual_info_classif

# Calculate information gain for feature selection
information_gain = mutual_info_classif(X, y)

# selected features
selected_features = np.argsort(information_gain)[::-1][:2]
print("Selected features:", selected_features)

Selected features: [3 2]


### Univariate feature selection

#### SelectKBest

Selects the K highest scoring features based on univariate statistical tests. 

In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Feature selection using ANOVA F-value
selector = SelectKBest(score_func=f_classif, k=2)
X_new = selector.fit_transform(X, y)

# selected features
selected_features = np.where(selector.get_support())[0]
print("Selected features:", selected_features)



Selected features: [2 3]


## SelectPercentile

Selects a user-specified percentage of the highest scoring features.

In [12]:
from sklearn.feature_selection import SelectPercentile, chi2

# Select top 10% features using chi-square test
selector = SelectPercentile(score_func=chi2, percentile=10)
X_new = selector.fit_transform(X, y)

## SelectFpr

Selects features based on a false positive rate criterion. 

In [13]:
from sklearn.feature_selection import SelectFpr, chi2

# Select features with false positive rate < 0.01 using chi-square test
selector = SelectFpr(score_func=chi2, alpha=0.01)
X_new = selector.fit_transform(X, y)


##  SelectFdr

Selects features based on a false discovery rate criterion.

In [14]:
from sklearn.feature_selection import SelectFdr, chi2

# Select features with false discovery rate < 0.01 using chi-square test
selector = SelectFdr(score_func=chi2, alpha=0.01)
X_new = selector.fit_transform(X, y)


##  SelectFwe

Selects features based on family wise error criterion.

In [15]:
from sklearn.feature_selection import SelectFwe, chi2

# Select features with family wise error < 0.01 using chi-square test
selector = SelectFwe(score_func=chi2, alpha=0.01)
X_new = selector.fit_transform(X, y)


## Generic Univariate Select 

Allows performing univariate feature selection with a configurable strategy. 

In [16]:
from sklearn.feature_selection import GenericUnivariateSelect, chi2

# Select top 5 features using chi-square test
selector = GenericUnivariateSelect(score_func=chi2, mode='k_best', param=5)
X_new = selector.fit_transform(X, y)


## Wrapper Methods

### Recursive Feature Elimination 

The goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. 

First, The estimator is trained on the initial set of features and the importance of each feature is obtained either through any specific attribute or callable. Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.

In [17]:
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression

# Generate example data
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)

# Create a linear regression estimator
estimator = LinearRegression()

# Create RFE selector
selector = RFE(estimator, n_features_to_select=5, step=1)

# Fit RFE selector
selector = selector.fit(X, y)

# Selected features and ranks of features
print("Selected features:", selector.support_)
print("Feature ranking:", selector.ranking_)


Selected features: [ True  True  True  True  True False False False False False]
Feature ranking: [1 1 1 1 1 2 3 6 5 4]


In [4]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

# Feature selection using RFE with Logistic Regression
estimator = LogisticRegression()
selector = RFE(estimator, n_features_to_select=2, step=1)
selector = selector.fit(X, y)

# selected features
selected_features = np.where(selector.support_)[0]
print("Selected features:", selected_features)

Selected features: [2 3]


## Recursive Feature Elimination with Cross-Validation (RFECV)

In [18]:
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.linear_model import LinearRegression

# Generate example data
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)

# Create a linear regression estimator
estimator = LinearRegression()

# Create RFECV selector
selector = RFECV(estimator, step=1, cv=5)

# Fit RFECV selector
selector = selector.fit(X, y)

# Selected features
print("Selected features:", selector.support_)

# Rank of features
print("Feature ranking:", selector.ranking_)
print("Optimal number of features:", selector.n_features_)

Selected features: [ True  True  True  True  True  True False False False False]
Feature ranking: [1 1 1 1 1 1 2 5 4 3]
Optimal number of features: 6


## Embedded Methods

### Feature selection using SelectFromModel 

### L1-based feature selection 

Linear models penalized with the L1 norm is  used to reduce the dimensionality of the data alongwith SelectFromModel to select the non-zero coefficients. 

In [21]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectFromModel

# Load the diabetes dataset
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Create LassoCV model
lasso = LassoCV(cv=5, random_state=0)

# Fit the LassoCV model
lasso.fit(X_scaled, y)

# Create SelectFromModel instance with the trained LassoCV model
model = SelectFromModel(lasso, prefit=True)

# Transform the original feature matrix to select features based on importance
X_selected = model.transform(X_scaled)

# Print selected features
print("Selected features shape:", X_selected.shape)


Selected features shape: (442, 9)


### Tree-based Feature Importance

Random forest regressors provide built-in feature importance through the feature_importances_ attribute. This attribute captures the average decrease in impurity across all trees in the forest indicating how much a feature contributes to improving the model's predictions.

In [5]:
from sklearn.ensemble import RandomForestClassifier

# Feature selection using Random Forest feature importance
rf = RandomForestClassifier()
rf.fit(X, y)

# feature importances
feature_importances = rf.feature_importances_
selected_features = np.argsort(feature_importances)[::-1][:2]

# selected features
print("Selected features:", selected_features)

Selected features: [2 3]
