<h4 style="color:#1a73e8;">2.3.6 Feature Selection: Beyond Naive Approaches</h4>

While encoding prepares features for modeling, **feature selection** reduces dimensionality, improves interpretability, and mitigates overfitting. We now expand on the initial coverage with deeper context.

---

### **Why Feature Selection Matters**

- **Curse of Dimensionality**: More features → sparser data → higher variance.
- **Computational Efficiency**: Fewer features → faster training.
- **Model Interpretability**: Simpler models are easier to explain.
- **Noise Reduction**: Irrelevant features add variance without signal.

---

### **Three Paradigms of Feature Selection**

1. **Filter Methods**:  
   Use statistical metrics **independent of any model**. Fast and scalable.
   - **Chi-Square Test**: For **categorical input vs. categorical target** (classification).
     

In [None]:
from sklearn.feature_selection import chi2, SelectKBest
selector = SelectKBest(score_func=chi2, k=5)
X_selected = selector.fit_transform(X_train_cat, y_train_cat)

- **ANOVA F-test**: For **numerical input vs. categorical target**.
    

In [None]:
from sklearn.feature_selection import f_classif
selector = SelectKBest(score_func=f_classif, k=10)

- **Mutual Information**: Model-agnostic; works for **any input/target type**.
     

In [None]:
from sklearn.feature_selection import mutual_info_classif, mutual_info_regression
# For classification:
mi_scores = mutual_info_classif(X_train, y_train)
# For regression:
mi_scores = mutual_info_regression(X_train, y_train)

2. **Wrapper Methods**:  
   Use a **specific model** to evaluate feature subsets. More accurate but computationally heavy.
   - **Recursive Feature Elimination (RFE)**:

In [None]:
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestRegressor

estimator = RandomForestRegressor(n_estimators=50, random_state=42)
rfe = RFE(estimator, n_features_to_select=8, step=1)
X_rfe = rfe.fit_transform(X_train, y_train)
selected = X_train.columns[rfe.support_]

- **Forward/Backward Selection**: Not natively in sklearn; use `mlxtend`.

3. **Embedded Methods**:  
   Feature selection **built into the model training**.
   - **Lasso (L1 regularization)**: Shrinks irrelevant feature weights to zero. 

In [None]:
from sklearn.linear_model import LassoCV
lasso = LassoCV(cv=5).fit(X_train_scaled, y_train)
selected = X_train.columns[lasso.coef_ != 0]

- **Tree-based importance**: From Random Forest or XGBoost.   

In [None]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier().fit(X_train, y_train)
importances = pd.Series(rf.feature_importances_, index=X_train.columns)
top_features = importances.nlargest(10).index