#### Univariate Selection Methods

- SelectKBest
- SelectPercentile

1.1 **SelectKBest**

- This method select features according to the k highest scores.
- We can perform chi-square test to the samples to retrieve only the two best features from iris dataset

In [2]:
# Lets import our dataset

from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, chi2

X, y = load_iris(return_X_y = True)
X.shape

(150, 4)

In [3]:
# Select the 2 best features

X_new = SelectKBest(chi2, k=2).fit_transform(X, y)
X_new.shape

(150, 2)

- Thus we have selected the two best features from the iris dataset

1.2 **SelectPercentile**

- Selects features according to a percentile of the highest scores

In [4]:
# Reading the data

from sklearn.datasets import load_digits
from sklearn.feature_selection import SelectPercentile
X, y = load_digits(return_X_y= True)
X.shape

(1797, 64)

In [5]:
# We are now going to select our feature based on top 10 percentile

X_new = SelectPercentile(chi2, percentile=10).fit_transform(X, y)
X_new.shape

(1797, 7)

**Important Information**

- These objects take as input a scoring function that returns univariate score and p-values(or only scores in case of SelectKBest and SelectPercentile)

- For regression tasks: f_regression, mutual_info_regression
- For classification tasks: chi2, f_classif, mutual_info_classif

- If you use sparse data(ie data represented as sparse matrices), chi2, mutuak_info_regression, mutual_info_classif will deal with the data without making it dense

- Do not use a regression scoring function with a classification problem, you will get useless results