# Feature Selection

Reference: [5 Feature Selection Method from Scikit-Learn you should know](https://towardsdatascience.com/5-feature-selection-method-from-scikit-learn-you-should-know-ed4d116e4172)

### Table of Contents

* [Variance Threshold Feature Selection](#variance_threshold)
* [Univariate Feature Selection with SelectKBest](#univariate_feature)
* [Recursive Feature Elimination (RFE)](#RFE)
* [Feature Selection via SelectFromModel](#select_from_model)
* [Feature Selection Sequential Feature Selection (SFS)](#SFS)

In [1]:
import pandas as pd
import seaborn as sns

### Variance Threshold Feature Selection <a class="anchor" id="variance_threshold"></a>

Feature with a higher variance means that the value within that feature varies or has a high cardinality. On the other hand, lower variance means the value within the feature is similar, and zero variance means you have a feature with the same value.

The Variance Threshold feature selection only sees the input features (X) without considering any information from the dependent variable (y). It is only useful for eliminating features for __Unsupervised Modelling__ rather than Supervised Modelling.

In [2]:
mpg = sns.load_dataset('mpg').select_dtypes('number')
mpg.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year
0,18.0,8,307.0,130.0,3504,12.0,70
1,15.0,8,350.0,165.0,3693,11.5,70
2,18.0,8,318.0,150.0,3436,11.0,70
3,16.0,8,304.0,150.0,3433,12.0,70
4,17.0,8,302.0,140.0,3449,10.5,70


We need to __transform__ all of these numerical features before we use the Variance Threshold Feature Selection as the variance is affected by the numerical scale.

In [3]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

In [4]:
mpg = pd.DataFrame(scaler.fit_transform(mpg), columns = mpg.columns)
mpg.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year
0,-0.706439,1.498191,1.090604,0.664133,0.63087,-1.295498,-1.627426
1,-1.090751,1.498191,1.503514,1.574594,0.854333,-1.477038,-1.627426
2,-0.706439,1.498191,1.196232,1.184397,0.55047,-1.658577,-1.627426
3,-0.962647,1.498191,1.061796,1.184397,0.546923,-1.295498,-1.627426
4,-0.834543,1.498191,1.042591,0.924265,0.565841,-1.840117,-1.627426


With all the features on the same scale, let’s try to select only the features we want using the Variance Threshold method. 

In [5]:
from sklearn.feature_selection import VarianceThreshold
selector = VarianceThreshold(1)

In [6]:
selector.fit(mpg)
mpg.columns[selector.get_support()]

Index(['weight'], dtype='object')

Only the weight feature are selected based on our Variance Threshold we set.

### Univariate Feature Selection with SelectKBest<a class="anchor" id="univariate_feature"></a>



Univariate Feature Selection is a feature selection method based on the univariate statistical test, e,g: chi2, Pearson-correlation, and many more.

Intended for __Supervised Learning__.

__The premise with SelectKBest is combining the univariate statistical test with selecting the K-number of features based on the statistical result between the X and y.__

In [7]:
mpg = sns.load_dataset('mpg')
mpg = mpg.select_dtypes('number').dropna()

In [8]:
# Divide the features into Independent and Dependent Variable
X = mpg.drop('mpg' , axis =1)
y = mpg['mpg']

Select the features using SelectKBest based on the __mutual info regression__:

In [9]:
from sklearn.feature_selection import SelectKBest, mutual_info_regression

#Select top 2 features based on mutual info regression
selector = SelectKBest(mutual_info_regression, k = 2)
selector.fit(X, y)
X.columns[selector.get_support()]

Index(['displacement', 'weight'], dtype='object')

### Recursive Feature Elimination (RFE)<a class="anchor" id="RFE"></a>

Backward selection.

RFE: utilizing a machine learning model to selecting the features by eliminating the least important feature after recursively training.

__tl;dr__ RFE selects top k features based on the machine learning model that has `coef_attribute` or `feature_importances_` attribute from their model (Almost any model). RFE would eliminate the least important features then retrain the model until it only selects the K-features you want.

In [10]:
# Load the dataset and only selecting the numerical features for example purposes
titanic = sns.load_dataset('titanic')[['survived', 'pclass', 'age', 'parch', 'sibsp', 'fare']].dropna()
X = titanic.drop('survived', axis = 1)
y = titanic['survived']

In [11]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

# Selecting the Best important features according to Logistic Regression
rfe_selector = RFE(estimator = LogisticRegression(), n_features_to_select = 2, step = 1)
rfe_selector.fit(X, y)
X.columns[rfe_selector.get_support()]

Index(['pclass', 'parch'], dtype='object')

### Feature Selection via SelectFromModel<a class="anchor" id="select_from_model"></a>
Similar to RFE. Backward selection.

The differences are that SelectFromModel feature selection is based on the __importance attribute__ (often is `coef_` or `feature_importances_` but it could be any callable) __threshold__. 

By default, the threshold is __the mean__.

In [25]:
from sklearn.feature_selection import SelectFromModel

# Selecting the Best important features according to Logistic Regression using SelectFromModel
sfm_selector = SelectFromModel(estimator = LogisticRegression())
sfm_selector.fit(X, y)
X.columns[sfm_selector.get_support()]

Index(['pclass'], dtype='object')

Using SelectFromModel, we found out that only one feature passed the threshold: the ‘pclass’ feature.

### Feature Selection Sequential Feature Selection (SFS)<a class="anchor" id="SFS"></a>
Forward selection.

SFS: a __greedy__ algorithm to find the best features by either going forward or backward based on the __cross-validation score__ an estimator.

SFS-Forward made a feature selection by starting with zero feature and find the one feature that maximizes a cross-validated score when a machine learning model is trained on this single feature. Once that first feature is selected, the procedure is repeated by adding a new feature to selected features. The procedure is stopped when we find the desired number of features is reached.

SFS-Backward follows the same idea but works in the opposite direction: It starts with all the features and greedily removes all the features until it reached the desired number of features.

In [40]:
from sklearn.feature_selection import SequentialFeatureSelector

# Selecting the Best important features according to Logistic Regression
sfs_selector = SequentialFeatureSelector(estimator = LogisticRegression(), n_features_to_select = 3, cv = 10, direction = 'backward')
sfs_selector.fit(X, y)
X.columns[sfs_selector.get_support()]