# Day-18: Feature Selection Techniques

 In today's session, we're diving into a crucial part of the machine learning pipeline: **feature selection**. It's all about picking the most relevant features to improve your model's performance and efficiency. Let's get started! 🚀


## Topics Covered

- What is Feature Selection?
- When to Use It?
- Relation with Feature Engineering
- Types of Feature Selection Techniques
- Usecase + Code Example

## What is Feature Selection?

Feature selection is the process of choosing a subset of **relevant features** (variables, predictors) from a larger set to use in model construction. Think of it like this: if you're baking a cake, you don't use every single ingredient in your pantry; you only pick the ones that will make the cake taste good. Similarly, feature selection helps us find the "ingredients" that will make our model perform best.

Just remeber that “Just because a feature is in the dataset doesn’t mean it’s helpful.”

## When to Use It?


- **High-dimensional data:** When you have a massive number of features (e.g., thousands of genes in a bioinformatics dataset).
- **Model interpretability:** Fewer features make the model easier to understand and explain.

## Relation with Feature Engineering


- **Feature Engineering** is about creating **new features** from existing ones. This is a creative process where you might use domain knowledge to extract more information.
- **Feature Selection** is about **choosing the best features** from a given set (which might include features you've just engineered). They often go hand-in-hand: you engineer new features and then select the best ones from the combined set.


## Types of feature selection Technique

### **Filter Methods:** 

These methods select features based on their **statistical score** relative to the target variable, **independent of the machine learning algorithm**. They act like a "pre-filter" for your data. Examples include **Variance Threshold** and correlation-based methods.

In [1]:
from sklearn.feature_selection import SelectKBest, f_classif

# Select top 3 features with highest ANOVA F-value
selector = SelectKBest(score_func=f_classif, k=3)
X_new = selector.fit_transform(X, y)
print(X.columns[selector.get_support()])


NameError: name 'X' is not defined

### **Wrapper Methods:** 


These methods "wrap" around a specific machine learning algorithm. They use a **search strategy** to find the best subset of features by repeatedly training and evaluating the model. **Recursive Feature Elimination (RFE)** is a classic example. It's computationally expensive but often yields better results.

In [None]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
rfe = RFE(model, n_features_to_select=3)
X_selected = rfe.fit_transform(X, y)
print(X.columns[rfe.support_])

### **Embedded Methods:**


 These methods perform feature selection as an **integral part of the model training process**. The algorithm itself has a built-in mechanism to select the most important features. **Lasso (L1 regularization)** is a prime example, as it can shrink some feature coefficients to exactly zero.

In [2]:
from sklearn.linear_model import LassoCV

lasso = LassoCV()
lasso.fit(X, y)

# Keep features where coefficient is non-zero
selected_features = X.columns[lasso.coef_ != 0]
print(selected_features)

NameError: name 'X' is not defined

### Usecase + Code Example


In [4]:

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import VarianceThreshold, RFE
from sklearn.linear_model import LogisticRegression, Lasso
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# --- 1. Filter Method: Variance Threshold ---
# This method removes features with low variance.
# Low variance means a feature's value is nearly constant, thus not very informative.
print("--- Variance Threshold ---")
selector_vt = VarianceThreshold(threshold=0.5)
X_train_vt = selector_vt.fit_transform(X_train)
# Let's see which features were kept
features_vt = X.columns[selector_vt.get_support()]
print(f"Features selected by Variance Threshold: {list(features_vt)}")
print("-" * 30)

# --- 2. Wrapper Method: Recursive Feature Elimination (RFE) ---
# RFE works by recursively considering smaller and smaller sets of features.
# It starts with all features and removes the least important ones until the specified number of features is reached.
print("--- Recursive Feature Elimination (RFE) ---")
model_rfe = LogisticRegression(solver='liblinear')
# Let's select the top 2 features
selector_rfe = RFE(model_rfe, n_features_to_select=2, step=1)
selector_rfe.fit(X_train, y_train)
X_train_rfe = selector_rfe.transform(X_train)
X_test_rfe = selector_rfe.transform(X_test)
features_rfe = X.columns[selector_rfe.get_support()]
print(f"Features selected by RFE: {list(features_rfe)}")

# Train a model on the selected features
model_rfe.fit(X_train_rfe, y_train)
predictions_rfe = model_rfe.predict(X_test_rfe)
accuracy_rfe = accuracy_score(y_test, predictions_rfe)
print(f"Accuracy with RFE-selected features: {accuracy_rfe:.2f}")
print("-" * 30)

# --- 3. Embedded Method: Lasso (L1 regularization) ---
# Lasso adds a penalty term that can force some coefficients to become zero.
# Features with a non-zero coefficient are considered important.
print("--- Embedded Method (Lasso) ---")
# Lasso is for regression, but we can use Logistic Regression with L1 penalty for classification
model_lasso = LogisticRegression(penalty='l1', solver='liblinear', random_state=42)
model_lasso.fit(X_train, y_train)

# Get the coefficients
coefficients = model_lasso.coef_
# Features with non-zero coefficients are the important ones
features_lasso = X.columns[np.any(coefficients != 0, axis=0)]
print(f"Features selected by Lasso: {list(features_lasso)}")

# Train a model on the selected features
predictions_lasso = model_lasso.predict(X_test)
accuracy_lasso = accuracy_score(y_test, predictions_lasso)
print(f"Accuracy with Lasso-selected features: {accuracy_lasso:.2f}")
print("-" * 30)

--- Variance Threshold ---
Features selected by Variance Threshold: ['sepal length (cm)', 'petal length (cm)', 'petal width (cm)']
------------------------------
--- Recursive Feature Elimination (RFE) ---
Features selected by RFE: ['sepal width (cm)', 'petal width (cm)']
Accuracy with RFE-selected features: 0.91
------------------------------
--- Embedded Method (Lasso) ---
Features selected by Lasso: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Accuracy with Lasso-selected features: 1.00
------------------------------





### Interpreting the Code Example Output

#### Variance Threshold

##### Output:

    Features selected by Variance Threshold: ['sepal length (cm)', 'petal length (cm)', 'petal width (cm)']

Interpretation:

- The ***Variance Threshold*** method has evaluated the statistical variance of each feature.

- It determined that the feature ***'sepal width (cm)'*** has a ***low variance***, meaning its values are very similar across the dataset. Because of this, it was filtered out.

- The remaining features ***('sepal length (cm)', 'petal length (cm)', and 'petal width (cm)')*** were kept because their variance exceeded the threshold we set (0.5 in our code). This suggests they have more variability and are potentially more informative for the model.

#### Wrapper Method: Recursive Feature Elimination (RFE)

##### Output:

    Features selected by RFE: ['sepal width (cm)', 'petal width (cm)']
    Accuracy with RFE-selected features: 0.91

##### Interpretation:

- **RFE** works by repeatedly training a model (in our case, **LogisticRegression**) and removing the least important feature until it reaches a desired number of features (2 in our code example).

- The selected features, **'sepal width (cm)' and 'petal width (cm)'**, are the top two most important features according to the logistic regression model's internal importance ranking.

- The **accuracy score** of **0.91** tells us that a model trained using only these two features performs very well, achieving 91% accuracy on the test set. This demonstrates the power of RFE in identifying a small, powerful subset of features.


#### Embedded Method: Lasso (L1 regularization)

##### Variance Threshold

##### Output:

    Features selected by Lasso: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
    Accuracy with Lasso-selected features: 1.00

Interpretation:

- **Lasso (L1 regularization)** is a technique that penalizes large coefficients, effectively pushing the coefficients of less important features to zero.

- In this specific run, none of the coefficients were pushed to zero. This means the Lasso model considered all four features to be important and necessary for the classification task.

- The model trained with all features selected by Lasso achieved a perfect **1.00 accuracy** on the test set. This is a great result, but it’s important to remember that a perfect score on a simple dataset like Iris can sometimes indicate overfitting. In a real-world scenario, we'd want to be cautious with a perfect score.

#### Summary of Results:

- The Variance Threshold method was the most aggressive, removing one feature.

- RFE showed that we can achieve high accuracy (91%) using only two features, proving that not all features are equally important.

- Lasso found all features to be relevant, leading to a perfect accuracy score.

By comparing these results, we can see how different methods yield different insights. RFE provides a great balance of simplicity and high performance, while Lasso can be a strong contender when all features are relevant. This comparison helps us decide which features to move forward with for our final model.

## Summary


Today, we explored the world of feature selection! We learned that it's a critical step for reducing dimensionality, preventing overfitting, and improving model efficiency. We covered three main types of techniques:

    - Filter Methods: 
        - Fast and simple, they use statistical metrics to rank features. We saw an example with VarianceThreshold.

    - Wrapper Methods: 
        - They use a machine learning model to evaluate feature subsets. We used RFE with LogisticRegression to find the best subset.

    - Embedded Methods: 
        - They perform feature selection during the model training process itself, with Lasso being a great example.

Each method has its pros and cons, and the best choice often depends on your specific problem and dataset.



## What's Next?


On Day 19, we'll shift gears from improving models to understanding them. We'll dive into the fascinating world of model interpretability. We'll learn about powerful tools like SHAP, LIME, and Permutation Importance that help us answer the question, "Why did my model make that prediction?"