<b>Recursive Feature Elimination (RFE) is a feature selection method commonly used in machine learning. Its goal is to recursively remove features from the dataset, fitting the model each time, and assessing the impact on model performance. This process continues until a predetermined number of features is reached or until performance metrics reach a certain threshold.</b>

Here's a basic outline of how Recursive Feature Elimination works:

<b>Train a Model:</b>

*   Train a model on the original set of features.

<b>Rank Features:</b>

*   Evaluate the importance of each feature in the model.

<b>Remove Features:</b>

*   Remove the least important feature(s).

<b>Repeat:</b>

*   Repeat steps 1-3 with the reduced set of features until a stopping criterion is met.

<b>Evaluate:</b>

*   Assess the model performance using a predefined metric.

<b>Stop Criterion:</b>

*   Decide when to stop the recursive elimination process (e.g., reaching a
    specific number of features, achieving a certain level of performance).

*Commonly used algorithms for RFE include linear models (like linear regression or support vector machines) and tree-based models (like decision trees or random forests). The choice of algorithm depends on the nature of the data and the problem you're trying to solve.*

## Load the breast cancer dataset from sklearn

In [None]:
from sklearn.datasets import load_breast_cancers
from sklearn.feature_selection import RFE
from sklearn.svm import SVR
import numpy as np
import pandas as pd


In [None]:
cancer = load_breast_cancer()

data = pd.DataFrame(cancer.data,columns=cancer.feature_names)
data['Target'] = cancer.target
data.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,Target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


## Seperate the input (x) and output features (y)

In [None]:
x = data.drop(['Target'],axis=1)
y = data['Target']
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x = scaler.fit_transform(x)

In [None]:
x.shape

(569, 30)

## Train the model

In [None]:
estimator = SVR(kernel="linear")
selector = RFE(estimator, n_features_to_select=5, step=1)
selector = selector.fit(x, y)
selector.support_

array([False, False,  True,  True, False, False,  True, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,  True, False, False,  True, False, False, False,
       False, False, False])

## Features Ranking

In [None]:
selector.ranking_

array([ 7, 15,  1,  1, 23,  6,  1, 11, 25, 19,  4, 20, 13,  3, 16, 24,  2,
        8, 12, 18,  1, 21, 14,  1,  9,  5, 22, 26, 17, 10])

In [None]:
selector.get_feature_names_out()

array(['x2', 'x3', 'x6', 'x20', 'x23'], dtype=object)