## Recursive Feature Elimination (RFE)

Recursive Feature Elimination (RFE) is a feature selection technique used to identify the most important features for a machine learning model. It works by iteratively removing the least important features from the dataset until a desired number of features to select is reached. Here's a more detailed explanation of how RFE works:

1. **Initialization**:
   - The process begins by initializing a machine learning model, often a classifier or regressor, which serves as the estimator for feature ranking. In your code, an SVM (Support Vector Machine) classifier with a linear kernel is used.

2. **Ranking Features**:
   - The RFE algorithm starts with all features in the dataset. The estimator (SVM in this case) is trained on the dataset with all features, and feature importance or weights are calculated. These feature importance scores are used to rank the features.

3. **Feature Elimination**:
   - The feature with the lowest importance score is identified and eliminated from the dataset. This means that the feature with the lowest rank is removed. In your code, features with a rank of 0 are eliminated.

4. **Iteration**:
   - The process is repeated iteratively. In each iteration, the model is retrained on the reduced dataset (with one less feature), and feature importance scores are re-evaluated. The feature with the lowest importance score in the current dataset is removed in each iteration.

5. **Stopping Criterion**:
   - The iterations continue until the desired number of features to select is reached. In your code, `n_features_to_select` is set to 2, meaning that the algorithm will stop when two features are left.

6. **Result**:
   - At the end of the process, the algorithm provides two pieces of information:
     - **Selected Features**: These are the features that have not been eliminated and are considered the most important for the model.
     - **Feature Rankings**: Each feature is assigned a rank (1 for selected and 0 for eliminated), indicating its importance in the model.

RFE is a backward feature selection technique, meaning it starts with all features and progressively removes the least important ones. This process is based on the idea that removing irrelevant or redundant features can lead to a simpler and more interpretable model while preserving or even improving predictive performance.

The selected features can be used to train a final model with a reduced feature set, which may result in better model generalization and improved efficiency, especially when dealing with high-dimensional datasets.

In [1]:
# Import necessary libraries
# In this example we show the SVC model 

from sklearn.feature_selection import RFE
from sklearn.svm import SVC
from sklearn.datasets import load_iris

In [2]:
# Load a sample dataset (Iris dataset)
data = load_iris()
# Independent and dependent feature
X = data.data
y = data.target

In [3]:
# Create an SVM classifier as the initial model
model = SVC(kernel="linear")

In [28]:
# Create an RFE model with the SVM classifier and specify the number of features to select
n_features_to_select = 2
rfe = RFE(estimator=model, n_features_to_select=n_features_to_select)

In [29]:
# Fit the RFE model to the data
rfe.fit(X, y)

In [30]:
# Get the ranking of features (1 means selected, 0 means not selected)
feature_ranking = rfe.ranking_
feature_ranking

array([3, 2, 1, 1])

In [31]:
# Get the support for selected features (True for selected features, False for non-selected)
selected_features = rfe.support_
selected_features

array([False, False,  True,  True])

In [32]:
# Print the results
print("Selected Features:")
for i, feature in enumerate(data.feature_names):
    if selected_features[i]:
        print(f"{feature} (Rank: {feature_ranking[i]})")

Selected Features:
petal length (cm) (Rank: 1)
petal width (cm) (Rank: 1)
