# Comprehensive Guide on Feature Selection

参考学习链接：https://www.kaggle.com/code/prashant111/comprehensive-guide-on-feature-selection/notebook

## 1.删除恒定特征 Remove constant/quasi-constant features
使用sklearn的VarianceThreshold函数<br>
sklearn.feature_selection.VarianceThreshold(threshold=0.0) #默认方差为0，remove the features that have the same value in all samples.<br>
threshold 可以自行设定 从而达到筛选的目的<br>
https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html <br>
**代码示列如下:**

In [3]:
from sklearn.feature_selection import VarianceThreshold
X = [[0, 2, 0, 3], [0, 1, 4, 3], [0, 1, 1, 3]]
print(X)
selector = VarianceThreshold(threshold=0)
selector.fit_transform(X)

[[0, 2, 0, 3], [0, 1, 4, 3], [0, 1, 1, 3]]


array([[2, 0],
       [1, 4],
       [1, 1]])

## 2. 按照百分比筛选 SelectPercentile
sklearn.feature_selection.SelectPercentile(f_classif, percentile=10)<br>
https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectPercentile.html<br>

For regression tasks: f_regression, mutual_info_regression<br>
For classification tasks: chi2, f_classif, mutual_info_classif<br>
**代码示例如下:**

In [7]:
from sklearn.datasets import load_digits
from sklearn.feature_selection import SelectPercentile, chi2
X, y = load_digits(return_X_y=True)
X.shape

(1797, 64)

In [8]:
X_new = SelectPercentile(chi2, percentile=10).fit_transform(X, y)
X_new.shape

(1797, 7)

## 3. SelectKBest:Select features according to the k highest scores
sklearn.feature_selection.SelectKBest(chi2, k=10)<br>
https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html <br>

For regression tasks: f_regression, mutual_info_regression<br>
For classification tasks: chi2, f_classif, mutual_info_classif<br>
**代码示例如下:**

In [9]:
from sklearn.datasets import load_digits
from sklearn.feature_selection import SelectKBest, chi2
X, y = load_digits(return_X_y=True)
X.shape

(1797, 64)

In [10]:
X_new = SelectKBest(chi2, k=20).fit_transform(X, y)
X_new.shape

(1797, 20)