# Feature Selection in Machine Learning

- Feature Selection is the process of selecting a subset of relevant features (predictor variables) for use in model construction. 
- It helps improve model performance, reduces overfitting, and decreases training time.

# Why Feature Selection is Important
| Benefit               | Description                            |
| --------------------- | -------------------------------------- |
| 🎯 Improves Accuracy  | Removes noise and irrelevant data      |
| ⚡ Reduces Overfitting | Less chance to learn spurious patterns |
| 🚀 Speeds up Training | Fewer features = faster computation    |
| 📉 Simplifies Models  | Easier to interpret and visualize      |
| 💾 Reduces Storage    | Smaller dataset size                   |


# 🔍 Types of Feature Selection Methods

### 1. Filter Methods (Statistical Tests)
- These methods use statistical techniques to score the relevance of features with respect to the target.
- 📌 Common Techniques:


| Method                                  | Description                                       | Suitable For                       |
| --------------------------------------- | ------------------------------------------------- | ---------------------------------- |
| **Variance Threshold**                  | Removes low-variance features                     | All data types                     |
| **Correlation Coefficient (Pearson)**   | Removes highly correlated features                | Numeric-Numeric                    |
| **Chi-Square Test**                     | Measures dependence between variables             | Categorical input & output         |
| **ANOVA F-test**                        | Checks if mean values differ significantly        | Numeric input & categorical output |
| **Mutual Information**                  | Measures shared information between variables     | All types                          |
| **Information Gain**                    | Reduction in entropy after splitting on a feature | Classification                     |
| **Fisher Score**                        | Measures the discriminative power of a feature    | Binary classification              |
| **Kendall / Spearman Rank Correlation** | Non-parametric rank-based correlation             | Ordinal/Numeric                    |



In [None]:
# Example Code Using Scikit-learn
from sklearn.feature_selection import SelectKBest, chi2

selector = SelectKBest(score_func=chi2, k=10)
X_new = selector.fit_transform(X, y)

### 2. Wrapper Methods (Model-based Search)
- These methods evaluate feature subsets by training a model on them and selecting the best performing combination.
- Common Techniques:

| Method                                  | Description                                       | Process                |
| --------------------------------------- | ------------------------------------------------- | ---------------------- |
| **Forward Selection**                   | Start with no features; add features one by one   | Greedy inclusion       |
| **Backward Elimination**                | Start with all features; remove least significant | Greedy removal         |
| **Stepwise Selection**                  | Combination of forward and backward methods       | Bidirectional search   |
| **Recursive Feature Elimination (RFE)** | Recursively remove features based on model weight | Uses model performance |
| **Exhaustive Feature Selection**        | Try all combinations of features                  | Very slow but optimal  |
| **Sequential Feature Selector (SFS)**   | Sklearn’s sequential wrapper for forward/backward | Uses cross-validation  |

In [None]:
# Example RFE with Logistic Regression
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
rfe = RFE(model, n_features_to_select=5)
X_rfe = rfe.fit_transform(X, y)

### 3. Embedded Methods (Built into Models)
- These methods perform feature selection as part of the model training process.
- Examples:

| Method                                             | Description                                               | Suitable Models          |
| -------------------------------------------------- | --------------------------------------------------------- | ------------------------ |
| **Lasso Regression (L1 Regularization)**           | Shrinks some coefficients to zero                         | Linear Models            |
| **Ridge Regression (L2 Regularization)**           | Shrinks coefficients but doesn't remove                   | Linear Models            |
| **ElasticNet**                                     | Combination of L1 and L2                                  | Linear Models            |
| **Tree-based Feature Importance**                  | Extract importance from decision trees                    | RF, XGBoost, etc.        |
| **Embedded Feature Selection in LightGBM/XGBoost** | Feature importance via gain, split, cover                 | Gradient Boosting Models |
| **Stability Selection**                            | Selects features that appear frequently across bootstraps | Regularized Models       |


In [None]:
# Example (Lasso)
from sklearn.linear_model import Lasso
model = Lasso(alpha=0.01)
model.fit(X, y)
importance = model.coef_