# Univariate Feature Selection

- f_classif <br>
ANOVA F-value between label/feature for classification tasks.

- mutual_info_classif <br>
Mutual information for a discrete target.

- chi2 <br>
Chi-squared stats of non-negative features for classification tasks.

- f_regression <br>
F-value between label/feature for regression tasks.

- mutual_info_regression <br>
Mutual information for a continuous target.

- SelectPercentile <br>
Select features based on percentile of the highest scores.

- SelectFpr <br>
Select features based on a false positive rate test.

- SelectFdr <br>
Select features based on an estimated false discovery rate.

- SelectFwe <br>
Select features based on family-wise error rate.

- GenericUnivariateSelect <br>
Univariate feature selector with configurable mode.

In [30]:
regression_class = ['f_regression', 'mutual_info_regression']
classification_class = ['chi2', 'f_classif', 'mutual_info_classif']

For regression: f_regression, mutual_info_regression

For classification: chi2, f_classif, mutual_info_classif

In [20]:
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.feature_selection import SelectPercentile
from sklearn.feature_selection import GenericUnivariateSelect
from sklearn.feature_selection import f_classif
from sklearn.feature_selection import mutual_info_classif

In [5]:
X, y = load_iris(return_X_y=True)
X.shape

(150, 4)

# Select KBest

In [12]:
classification_class

['chi2',
 'f_classif',
 'mutual_info_classif',
 'SelectPercentile',
 'SelectPercentile']

In [17]:
X_new = SelectKBest(chi2, k=4).fit_transform(X, y)
X_new.shape

(150, 4)

In [18]:
X_new = SelectKBest(f_classif, k=4).fit_transform(X, y)
X_new.shape

(150, 4)

In [21]:
X_new = SelectKBest(mutual_info_classif, k=4).fit_transform(X, y)
X_new.shape

(150, 4)

# Select Percentile

In [23]:
X_new = SelectPercentile(chi2, percentile=10).fit_transform(X, y)
X_new.shape

(150, 1)

In [29]:
X_new = SelectPercentile(f_classif,percentile=10).fit_transform(X, y)
X_new.shape

(150, 1)

# GenericUnivariateSelect 

In [27]:
transformer = GenericUnivariateSelect(chi2, mode='k_best', param=4)
X_new = transformer.fit_transform(X, y)
X_new.shape

(150, 4)

# Refference
1. https://scikit-learn.org/stable/modules/feature_selection.html
2. https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html