# Powerful Feature Selection with Recursive Feature Elimination (RFE) of Sklearn
## Feature selection based on single model performance
<img src='images/unsplash.jpg'></img>
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://unsplash.com/@victoriano?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>Victoriano Izquierdo</a>
        on 
        <a href='https://unsplash.com/s/photos/selection?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>Unsplash</a>
    </strong>
</figcaption>

### Setup

In [1]:
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

warnings.filterwarnings("ignore")

import datetime
import time

The basic methods of feature selection are mostly about individual properties of features and how they interact with each other. [*Variance thresholding*](https://towardsdatascience.com/how-to-use-variance-thresholding-for-robust-feature-selection-a4503f2b5c3f?source=your_stories_page-------------------------------------) and [*pairwise feature selection*](https://towardsdatascience.com/how-to-use-pairwise-correlation-for-robust-feature-selection-20a60ef7d10?source=your_stories_page-------------------------------------) are a few examples that remove unnecessary features based on the amount of variance and the correlation between features. However, a more pragmatic approach would select features based on how they affect a particular model's performance. One such technique offered by Sklearn is Recursive Feature Elimination (RFE). It reduces model complexity by removing features one by one until the desired number of features are left.

### The idea behind Recursive Feature Elimination

Consider this subset of [Ansur Male dataset](https://www.kaggle.com/seshadrikolluri/ansur-ii):

In [6]:
ansur = pd.read_csv("data/ansur_male.csv", encoding="latin").select_dtypes(
    include="number"
)
ansur.iloc[:, -7:].head()

Unnamed: 0,wristcircumference,wristheight,SubjectNumericRace,DODRace,Age,Heightin,Weightlbs
0,175,853,1,1,41,71,180
1,167,815,1,1,35,68,160
2,180,831,2,2,42,68,205
3,176,793,1,1,31,66,175
4,188,954,2,2,21,77,213


It records more than 100 different types of body measurements of more than 6000 US Army Personnel. Our goal is to predict the weight in pounds using only the numeric features (there are 93) for simplicity. 

Let's establish a base performance with Random Forest Regressor. We will first build the feature and target arrays and divide them into train and test sets. Then, we will fit the estimator and score its performance using R-squared:

In [52]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Feature, target arrays
X, y = ansur.iloc[:, :-1], ansur.iloc[:, -1]

# Train/test set generation
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=1121218
)

# Scale train and test sets with StandardScaler
X_train_std = StandardScaler().fit_transform(X_train)
X_test_std = StandardScaler().fit_transform(X_test)

# Fix the dimensions of the target array
y_train = y_train.values.reshape(-1, 1)
y_test = y_test.values.reshape(-1, 1)

# Init, fit, test Lasso Regressor
forest = RandomForestRegressor()
_ = forest.fit(X_train_std, y_train)
forest.score(X_test_std, y_test)

0.9485677708139042

We achieved a really good R-squared of 0.948. We were able to do this using all 98 features which is much more than we might need. All Sklearn estimators have special attributes that show feature weights (or coefficients), either given as `coef_` or `.feature_importances_`. Let's see the computed coefficients for our Random Forest Regressor model:

In [58]:
pd.DataFrame(
    zip(X_train.columns, abs(forest.feature_importances_)),
    columns=["feature", "weight"],
).sort_values("weight").reset_index(drop=True)

Unnamed: 0,feature,weight
0,Heightin,0.000097
1,suprasternaleheight,0.000170
2,crotchheight,0.000180
3,DODRace,0.000182
4,cervicaleheight,0.000212
...,...,...
93,forearmforearmbreadth,0.001464
94,elbowrestheight,0.001519
95,forearmcircumferenceflexed,0.003284
96,bideltoidbreadth,0.005946


To reduce model complexity, always start by removing features with close to 0 weights. Since all weights are multiplied by the values of features, such small weights contribute very little to the overall predictions. Looking at the above weights, we can see that many weights are close to 0.

We could set some low threshold and filter out features based on it. But we have to remember that even removing a single feature forces other coefficients to change. So, we have to eliminate them step-by-step, leaving out lowest weighted feature by sorting the fitted models coefficients. Doing this manually for 98 features would be cumbersome, but thankfully Sklearn provides us with RFE class to do the task.