#  Forward Feature Selection
Forward Elimination is a type of wrapper-based feature selection method where features are added one at a time to the model based on their ability to improve performance (e.g., accuracy, F1 score, R²).

### What is Forward Elimination?
- Forward Elimination starts with no features and iteratively adds the most significant features, one by one, until adding more features does not improve model performance significantly.

### Step-by-Step Working of Forward Elimination
- Step 1: Start with zero features.
- Step 2: Train the model using each individual feature.
- Step 3: Select the feature that improves the model the most (based on evaluation metric).
- Step 4: Add that feature to the selected set.
- Step 5: Repeat steps 2–4 by adding one new feature at a time from the remaining pool.
- Step 6: Stop when: a) No further improvement; b) Maximum number of features reached, or; c) A performance threshold is met.

#  Forward Feature Selection Using MLxtend
MLxtend (Machine Learning Extensions) is a Python library that provides powerful tools for feature selection, including Forward Feature Selection (Forward Elimination) with cross-validation support.

###  What is Forward Elimination in MLxtend?
In MLxtend, forward selection is implemented using:
```
from mlxtend.feature_selection import SequentialFeatureSelector
```
- This function adds one feature at a time to the model, selecting the feature that improves the performance the most (based on a scoring metric), until a predefined number of features is selected or no further improvement is observed.

``` bash
# Example Format
# Initialize Forward Selector
sfs = SFS(log_reg,     # the modeal defined in the variable as estimator
          k_features='best',      # select best number of features (can also be an int like 3)
          forward=True,           # Forward selection
          floating=False,         # No backward steps
          scoring='accuracy',     # Metric to evaluate performance
          cv=5,                   # 5-fold cross-validation
          n_jobs=-1               # Use all CPUs
         )
```
- Mostly we use estimator (model), k_features = 'Depends how many you want to select', forword = 'Ture/False' (Depends which type of feature selection you are doing)

---

# Backward Feature Selection
Backward Feature Selection is a wrapper-based technique for feature selection where the process begins with all features, and features are eliminated one at a time based on their insignificance until the best subset remains.

### What is Backward Feature Selection?
- It is an iterative method that starts with all features in the dataset and removes the least important feature at each step, based on model performance (e.g., p-value, accuracy, R²), until a stopping criterion is met.

### Steps in Backward Feature Selection
- Step 1:Start with all features.
- Step 2:Fit the model on the training data.
- Step 3:Evaluate performance metric or statistical test for each feature.
- Step 4:Remove the least important feature (e.g., highest p-value or least impact).
- Step 5:Repeat steps 2–4 until: a) A desired number of features is reached; b) Model performance stops improving; c) All remaining features are statistically significant.

# Backward Feature Selection using MLxtend

- Backward Feature Selection, also called Backward Elimination, is supported in MLxtend through the SequentialFeatureSelector class by setting forward=False.
- MLxtend allows this process to be done easily, with cross-validation, any estimator, and support for performance visualization.

- In MLxtend, backword selection is implemented using:
```
from mlxtend.feature_selection import SequentialFeatureSelector
```

``` bash
# Example Format
sfs = SFS(estimator=model,
          k_features='best',      # Can also be an integer like 5
          forward=False,          # ← BACKWARD Selection as you have used False there
          floating=False,         # Set True for SFFS (optional)
          scoring='r2',           # You can use 'neg_mean_squared_error' or others
          cv=5,                   # 5-fold cross-validation
          n_jobs=-1)              # Use all CPU cores
```
- Mostly we use estimator (model), k_features = 'Depends how many you want to select', forword = 'Ture/False' (Depends which type of feature selection you are doing)

---

### Key Parameters of SequentialFeatureSelector

| Parameter    | Description                                                |
| ------------ | ---------------------------------------------------------- |
| `estimator`  | Your ML model (LogisticRegression, RandomForest, etc.)     |
| `k_features` | Number of features to select or `'best'`                   |
| `forward`    | `True` = forward selection; `False` = backward elimination |
| `floating`   | Enables stepwise (adds and removes features)               |
| `scoring`    | Scoring metric (`'accuracy'`, `'r2'`, etc.)                |
| `cv`         | Number of cross-validation folds                           |
| `n_jobs`     | Parallel processing (set to -1 for all cores)              |


---

# Example

In [13]:
# Import Libraries
from mlxtend.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
import numpy as np
import pandas as pd

In [14]:
# Load The dataset
ds = pd.read_csv('Sales_data.csv')
ds.head()

Unnamed: 0,Group,Customer_Segment,Sales_Before,Sales_After,Customer_Satisfaction_Before,Customer_Satisfaction_After,Purchase_Made
0,Control,High Value,240.548359,300.007568,74.684767,,No
1,Treatment,High Value,246.862114,381.337555,100.0,100.0,Yes
2,Control,High Value,156.978084,179.330464,98.780735,100.0,No
3,Control,Medium Value,192.126708,229.278031,49.333766,39.811841,Yes
4,,High Value,229.685623,,83.974852,87.738591,Yes


In [15]:
# load data into x axis and y axis
x = ds.iloc[:, :-1]
y = ds['Purchase_Made'] 
x.shape   # to get the total number of featueres in the data

(10000, 6)

##### Have to preprocess the in order to work on the model (Will do it later)

In [16]:
# loading the model
lr = LogisticRegression()

In [None]:
# Feature Selection
sfs = SequentialFeatureSelector(lr, k_features = 5, forward= True)  # Here, I have used forword feature selection method
sfs.fit(x, y)

In [None]:
# Get Features (selected and original)
print(sfs.feature_names)  # original featues
print(sfs.k_feature_names_)   # Selected Features using forword feature selection
sfs.k_score_   # The score of the model (accuracy)

---

### ChatGPT Based Dataset

In [1]:
# Import Libraries
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.metrics import r2_score
import numpy as np

In [2]:
# Load Dataset
data = load_diabetes()
X = data.data
y = data.target
feature_names = data.feature_names

In [3]:
# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [4]:
# Initialize Model
model = LinearRegression()

In [None]:
# Feature Selection with MLxtent
# sfs = SFS(model, k_features= 6, forward=False)  # Backword Feature Selection 
sfs = SFS(model, k_features= 6, forward=True)   # forword Feature Selection

In [12]:
sfs = sfs.fit(X_train, y_train)

In [13]:
# Original Feature names
feature_names

['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

In [14]:
# Selected Feature Names
[feature_names[i] for i in sfs.k_feature_idx_]

['sex', 'bmi', 'bp', 's1', 's3', 's5']

In [15]:
sfs.k_score_   # Based on this score I have will set the number of feature i have to select (max values we have to take so based on that try multiple features)

np.float64(0.45273020610465514)

In [16]:
# Transform training and testing sets
X_train_selected = sfs.transform(X_train)
X_test_selected = sfs.transform(X_test)

# Train model on selected features
model.fit(X_train_selected, y_train)
y_pred = model.predict(X_test_selected)

print("Test R² Score:", r2_score(y_test, y_pred))

Test R² Score: 0.48163855615378837
