# 11 — Feature Selection: Embedded Methods
In this notebook, we’ll explore **embedded methods** — techniques that perform feature selection **during model training** using built-in importance scores.

We’ll cover:
- What embedded methods are and how they differ
- Feature selection using L1 regularization (Lasso)
- Tree-based feature importances
- Integration in pipelines

## 📘 1. What Are Embedded Methods?
**Embedded methods** select features as part of model training by penalizing or rewarding feature usage. These methods strike a balance between filters and wrappers.

Examples include:
- **Lasso (L1)**: Shrinks irrelevant coefficients to zero
- **Tree-based models**: Provide natural feature importance scores

They are faster than wrappers and often more informative than filters.

## 2. Load Dataset

In [None]:
from sklearn.datasets import load_breast_cancer
import pandas as pd

# Load toy classification dataset
X_data = load_breast_cancer()
X = pd.DataFrame(X_data.data, columns=X_data.feature_names)
y = X_data.target

## 3. Lasso (L1) for Feature Selection

In [None]:
from sklearn.linear_model import LogisticRegression
import numpy as np

# L1 regularization
lasso = LogisticRegression(penalty="l1", solver="liblinear")
lasso.fit(X, y)

coeff = pd.Series(lasso.coef_[0], index=X.columns)
selected = coeff[coeff != 0].index

print("Selected features by Lasso:")
print(selected.tolist())

## 4. Feature Importances from Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier

forest = RandomForestClassifier(random_state=42)
forest.fit(X, y)

importances = pd.Series(forest.feature_importances_, index=X.columns)
importances = importances.sort_values(ascending=False)
print("Top 5 features:")
print(importances.head())

## 5. Visualize Feature Importances

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
importances.head(10).plot(kind="barh")
plt.title("Top 10 Important Features (Random Forest)")
plt.xlabel("Importance Score")
plt.gca().invert_yaxis()
plt.grid(True)
plt.show()

## 6. Use Embedded Method in Pipeline

In [None]:
from sklearn.feature_selection import SelectFromModel
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ("select", SelectFromModel(RandomForestClassifier(n_estimators=100))),
    ("clf", RandomForestClassifier())
])

pipeline.fit(X, y)
print("Pipeline trained using embedded feature selection.")

## Summary
- Embedded methods select features during model training
- Lasso (L1) can zero-out less useful features
- Tree-based models output natural importance scores
- Easily integrate using `SelectFromModel` in pipelines

## What’s Next?
In the next notebook:
**`12_combining_with_pipeline.ipynb`**
We’ll combine all preprocessing steps into a single `Pipeline`, ready for deployment or cross-validation.