<a href="https://colab.research.google.com/github/asifahsaan/data-preprocessing-beginners/blob/main/notebooks/10_feature_selection_wrapper_methods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 10 — Feature Selection: Wrapper Methods
In this notebook, we'll explore **wrapper-based feature selection** — where feature subsets are selected based on model performance.

We'll cover:
- What wrapper methods are and how they differ
- Recursive Feature Elimination (RFE)
- RFE + Cross-Validation (RFECV)
- Pipeline integration for clean workflows

## 1. What Are Wrapper Methods?
**Wrapper methods** use a predictive model to evaluate which feature subsets work best. Unlike filter methods, they consider feature interactions and model accuracy.

Common techniques:
- `RFE` (Recursive Feature Elimination): Iteratively removes least important features
- `RFECV`: Combines RFE with cross-validation to choose the best number of features

⚠️ **Note**: These are more computationally expensive than filter methods.

## 2. Load Dataset

In [None]:
from sklearn.datasets import load_breast_cancer
import pandas as pd

# Load toy dataset
X_data = load_breast_cancer()
X = pd.DataFrame(X_data.data, columns=X_data.feature_names)
y = X_data.target

X.shape

## 3. Recursive Feature Elimination (RFE)

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE

model = LogisticRegression(max_iter=500)
rfe = RFE(estimator=model, n_features_to_select=5)
X_rfe = rfe.fit_transform(X, y)

selected_features = X.columns[rfe.support_]
print("Selected Features:", selected_features.tolist())

## 4. RFECV — RFE + Cross-Validation

In [None]:
from sklearn.feature_selection import RFECV
from sklearn.model_selection import StratifiedKFold

cv = StratifiedKFold(n_splits=5)
rfecv = RFECV(estimator=model, cv=cv)
rfecv.fit(X, y)

best_features = X.columns[rfecv.support_]
print("Optimal features selected by RFECV:", best_features.tolist())

## 5. Visualize RFECV Scores

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 5))
plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
plt.xlabel("Number of Features Selected")
plt.ylabel("Cross-Validation Score")
plt.title("RFECV Feature Selection")
plt.grid(True)
plt.show()

## 6. Use RFE in a Pipeline

In [None]:
from sklearn.pipeline import Pipeline

rfe_pipeline = Pipeline([
    ("feature_selection", RFE(estimator=LogisticRegression(max_iter=500), n_features_to_select=5)),
    ("classifier", LogisticRegression(max_iter=500))
])

rfe_pipeline.fit(X, y)
print("Pipeline trained with RFE-selected features.")

## Summary
- Wrapper methods evaluate feature subsets using model performance.
- `RFE` removes least useful features iteratively.
- `RFECV` automates feature count using cross-validation.
- Pipelines keep wrapper selection integrated and reusable.

## What’s Next?
In the next notebook:
**`11_feature_selection_embedded_methods.ipynb`**
We'll explore **embedded methods** that perform selection during model training — such as L1 (Lasso), Decision Trees, and Ensemble Models.