# Recursive Feature Elimination
Recursive feature elimination is another wrapper method for feature selection. It __starts by training a model with all available features__. It then __ranks each feature according to an importance metric__ and __removes the least important feature__. The algorithm then __trains the model on the smaller feature set__, ranks those features, and removes the least important one. The process stops when the desired number of features is reached.

In regression problems, features are ranked by the size of the absolute value of their coefficients.

It’s important to note that you might __need to standardize data before__ doing recursive feature elimination. In regression problems in particular, it’s necessary to standardize data so that the scale of features doesn’t affect the size of the coefficients.

Note that recursive feature elimination is different from sequential backward selection. Sequential backward selection removes features by training a model on a collection of subsets (one for each possible feature removal) and greedily proceeding with whatever subset performs best. Recursive feature elimination, on the other hand, only trains a model on one feature subset before deciding which feature to remove next.

This is one advantage of recursive feature elimination. Since it only needs to train and test a model on one feature subset per feature removal, it can be much faster than the sequential selection methods that we’ve covered.

## Recursive Feature Elimination with scikit-learn

We can use scikit-learn to implement recursive feature elimination. Since we’re using a logistic regression model, it’s important to standardize data before we proceed.

We can standardize features using scikit-learn’s `StandardScaler()`.

Once the data is standardized, you can train the model and do recursive feature elimination using `RFE()` from `scikit-learn`. As before with the sequential feature selection methods, you have to specify a scikit-learn model for the estimator parameter (in this case, lr for our logistic regression model). n_features_to_select is self-explanatory: set it to the number of features you want to select.

In [3]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE
from sklearn.preprocessing import StandardScaler

# Load the data
health = pd.read_csv("dataR2.csv")
X = np.array(health.iloc[:,:-1])
y = np.array(health.iloc[:,-1])

# Standardize the data
X = StandardScaler().fit_transform(X)

# Logistic regression model
lr = LogisticRegression(max_iter=1000)

# Recursive feature elimination
from sklearn.feature_selection import RFE

# Recursive feature elimination
rfe = RFE(lr, n_features_to_select=3)
rfe.fit(X, y)

## Evaluating the Result of Recursive Feature Elimination

You can inspect the results of recursive feature elimination by looking at `rfe.ranking_` and `rfe.support_`.

`rfe.ranking_` is an array that contains __the rank of each feature__.

In [4]:
print(rfe.ranking_)

[4 1 1 2 3 5 7 1 6]


In [5]:
print(rfe.support_)

[False  True  True False False False False  True False]


In [6]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE
from sklearn.preprocessing import StandardScaler

# Load the data
health = pd.read_csv("dataR2.csv")
X = health.iloc[:,:-1]
y = health.iloc[:,-1]

# Create a list of feature names
feature_list = list(X.columns)

# Standardize the data
X = StandardScaler().fit_transform(X)

# Logistic regression
lr = LogisticRegression(max_iter=1000)

# Recursive feature elimination
rfe = RFE(estimator=lr, n_features_to_select=3)
rfe.fit(X, y)

# List of features chosen by recursive feature elimination
rfe_features = [f for (f, support) in zip(feature_list, rfe.support_) if support]

print(rfe_features)
# Print the accuracy of the model with features chosen by recursive feature elimination
print(rfe.score(X,y))


['BMI', 'Glucose', 'Resistin']
0.7327586206896551
