## Feature Selection Using Forward Selection and Backward Selection

### Concepts:

**Forward Selection:**
- Starts with an empty model (no features).
- Adds one feature at a time based on a selection criterion (e.g., highest accuracy, lowest p-value).
- The feature that improves the model the most is added at each step.
- Stops when no significant improvement is observed by adding more features.

**Backward Selection:**
- Starts with all features in the model.
- Removes one feature at a time based on a selection criterion (e.g., lowest importance, highest p-value).
- The feature whose removal least affects the model performance is removed at each step.
- Stops when removing any further features would degrade performance.

Both methods aim to find the optimal set of features for a machine learning model.


In [3]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from mlxtend.feature_selection import SequentialFeatureSelector as SFS


! pip install mlxtend

In [4]:
# Load the Iris dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


#### Forward Selection

In [5]:
# Initialize the model
rf_model = RandomForestClassifier(random_state=42)

# Perform Forward Selection
forward_selector = SFS(rf_model, 
                       k_features='best', 
                       forward=True, 
                       floating=False, 
                       scoring='accuracy', 
                       cv=5)

forward_selector = forward_selector.fit(X_train, y_train)

# Display Selected Features
print("Forward Selected Features:", forward_selector.k_feature_names_)


Forward Selected Features: ('petal length (cm)', 'petal width (cm)')


#### Backward Selection

In [6]:
# Perform Backward Selection
backward_selector = SFS(rf_model, 
                        k_features='best', 
                        forward=False, 
                        floating=False, 
                        scoring='accuracy', 
                        cv=5)

backward_selector = backward_selector.fit(X_train, y_train)

# Display Selected Features
print("Backward Selected Features:", backward_selector.k_feature_names_)


Backward Selected Features: ('petal length (cm)', 'petal width (cm)')


### Explanation of Code:

**SequentialFeatureSelector:**

- `k_features='best'`: Automatically selects the best subset of features.
- `forward=True/False`: Determines whether forward or backward selection is applied.
- `scoring='accuracy'`: Uses accuracy as the selection criterion.
- `cv=5`: Performs 5-fold cross-validation.

**Selected Features:**

- After fitting, `k_feature_names_` provides the names of selected features.
- These features are optimal for improving model performance.


#### Evaluation with Selected Features

In [7]:
# Use Forward-Selected Features
selected_features_forward = list(forward_selector.k_feature_idx_)
X_train_forward = X_train.iloc[:, selected_features_forward]
X_test_forward = X_test.iloc[:, selected_features_forward]

rf_model.fit(X_train_forward, y_train)
accuracy_forward = rf_model.score(X_test_forward, y_test)
print("Accuracy with Forward Selection:", accuracy_forward)

# Use Backward-Selected Features
selected_features_backward = list(backward_selector.k_feature_idx_)
X_train_backward = X_train.iloc[:, selected_features_backward]
X_test_backward = X_test.iloc[:, selected_features_backward]

rf_model.fit(X_train_backward, y_train)
accuracy_backward = rf_model.score(X_test_backward, y_test)
print("Accuracy with Backward Selection:", accuracy_backward)


Accuracy with Forward Selection: 1.0
Accuracy with Backward Selection: 1.0


### Advantages and Disadvantages:

#### Forward Selection:
- **Advantages:**
  - Computationally efficient for small datasets.
  - Can stop early if no improvement is observed.
- **Disadvantages:**
  - May miss interactions between features.

#### Backward Selection:
- **Advantages:**
  - Considers all features initially, avoiding missing interactions.
- **Disadvantages:**
  - Computationally expensive for large datasets.

### Use Case:
- Use **Forward Selection** when you have a large number of features and want a quick solution.
- Use **Backward Selection** when you can afford more computation and want a more comprehensive analysis.
