## 🔁 Backward Elimination (via MLxtend)
Definition:Backward Elimination is a feature selection technique that starts with all features and removes the least significant one at a time until removing more features would harm model performance.

⚙️ How it works (with MLxtend):
- Start with all features.
- Remove one feature at a time.
- Evaluate model performance.
- Eliminate the feature whose removal improves or least affects the model performance.
- Repeat until a stopping criterion is met (e.g., fixed number of features or no improvement).

🎯 Why use it?
- Reduces overfitting by eliminating irrelevant or redundant features.
- Makes the model simpler and faster.
- Helps interpret the model by reducing the number of predictors.

## ➕ Forward Elimination (via MLxtend)
✅ Definition:
Forward Elimination is a feature selection technique that starts with no features and adds the most significant feature one at a time until no further improvement is achieved.

⚙️ How it works (with MLxtend):
- Start with no features.
- Add one feature at a time.
- Evaluate model performance.
- Keep the feature that improves performance the most.
- Repeat until the model performance stops improving or a desired number of features is reached.

🎯 Why use it?
- Useful when you want to build a model from scratch with only the most important features.
- Helps reduce model complexity.
- Often used when the number of features is very large.



## 🧠 Code Walkthrough

In [1]:
import pandas as pd
from mlxtend.feature_selection import SequentialFeatureSelector

## 🟢 Step 1: Load Dataset

In [2]:
# Read only specific columns from the CSV
df = pd.read_csv("diabetes.csv", usecols=["Glucose", "BloodPressure","SkinThickness","BMI","Age","Outcome"])
df.head(5)

Unnamed: 0,Glucose,BloodPressure,SkinThickness,BMI,Age,Outcome
0,148,72,35,33.6,50,1
1,85,66,29,26.6,31,0
2,183,64,0,23.3,32,1
3,89,66,23,28.1,21,0
4,137,40,35,43.1,33,1


- Loads selected columns from diabetes.csv.
- The target variable is Outcome.

In [3]:
X = df.iloc[:, :-1]  # all columns except 'Outcome'
y = df['Outcome']    # target column

- X contains features: Glucose, BloodPressure, SkinThickness, BMI, Age.
- y is the target: Outcome.

In [4]:
X.shape

(768, 5)

## 🟢 Step 3: Logistic Regression Model

In [5]:
from sklearn.linear_model import LogisticRegression

In [6]:
lr = LogisticRegression()

- Initializes the logistic regression model.

## 🔼 Forward Elimination

In [7]:
sfs = SequentialFeatureSelector(lr,
                                k_features=4,        # You can change this number
                                forward=True,
                                scoring='accuracy',
                                cv=5)

sfs = sfs.fit(X, y)

- Performs forward feature selection to find the best 4 features.

In [8]:
sfs.feature_names

['Glucose', 'BloodPressure', 'SkinThickness', 'BMI', 'Age']

In [9]:
sfs.k_feature_names_

('Glucose', 'BloodPressure', 'BMI', 'Age')

In [10]:
sfs.k_score_

np.float64(0.7682709447415329)

In [11]:
4- 0.7683048977166624

3.2316951022833376

## 🔽 Backward Elimination

In [12]:
sfs = SequentialFeatureSelector(lr,
                                k_features=4,        # You can change this number
                                forward=False,
                                scoring='accuracy',
                                cv=5)

sfs = sfs.fit(X, y)

- Does the reverse: starts with all features, removes least useful ones.

In [13]:
sfs.feature_names

['Glucose', 'BloodPressure', 'SkinThickness', 'BMI', 'Age']

In [14]:
sfs.k_feature_names_

('Glucose', 'BloodPressure', 'BMI', 'Age')

In [15]:
sfs.k_score_

np.float64(0.7682709447415329)