##  Wrapper methods - Backward Selection

Backward feature selection is another feature selection technique, which is essentially the reverse of forward selection. In backward selection, you start with all available features and iteratively remove one feature at a time, evaluating the impact on model performance. Here are the advantages and disadvantages of backward selection:

**Advantages of Backward Selection:**

1. **Simplicity:** Similar to forward selection, backward selection is relatively easy to understand and implement. It starts with all features and eliminates them one by one, making it accessible to users with varying levels of expertise.

2. **Guaranteed Subset Size:** Backward selection allows you to specify the desired size of the feature subset. This can be useful if you want to limit the number of features for practical or interpretability reasons.

3. **Model Generalization:** By iteratively removing features, backward selection can help in reducing overfitting, which is particularly valuable when working with complex models or small datasets.

4. **Independence from Feature Order:** Unlike forward selection, where the order of feature addition is critical, backward selection is less sensitive to the order in which features are removed. This can make the process more stable and less dependent on specific feature sequences.

**Disadvantages of Backward Selection:**

1. **Not Guaranteed to Find the Best Subset:** Like forward selection, backward selection does not guarantee that it will find the optimal subset of features. It may miss important feature interactions and fail to explore all possible combinations.

2. **Computational Intensity:** Backward selection can be computationally intensive for datasets with a large number of features. It requires evaluating models multiple times, which can be time-consuming.

3. **Loss of Information:** Removing features may lead to a loss of potentially valuable information, especially if there are interactions or dependencies between features.

4. **Over-Pruning:** If not used carefully, backward selection can lead to overly simplified models that might underperform in capturing the underlying patterns in the data.

5. **Limited Interpretability:** The final subset of features selected by backward selection may not be as interpretable as those selected by forward selection, especially if it includes complex interactions or dependencies between features.

In summary, backward selection can be a useful feature selection technique, especially when you want to limit the number of features and reduce overfitting. However, it shares some limitations with forward selection, such as not guaranteeing the best subset and potential computational intensity. The choice between forward and backward selection often depends on the specific characteristics of your dataset and the goals of your feature selection process.

In [5]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [6]:
# Load the Iris dataset from scikit-learn
from sklearn.datasets import load_iris
iris = load_iris()
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)
data['species'] = iris.target_names[iris.target]

ERROR! Session/line number was not unique in database. History logging moved to new session 20


In [7]:
# Split the dataset into training and testing sets
X = data.drop('species', axis=1)
y = data['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [8]:
# Create a Random Forest classifier
classifier = RandomForestClassifier(n_estimators=100, random_state=42)

In [9]:
# Backward Selection for feature selection
selected_features = list(X_train.columns)
best_accuracy = 0.0

for i in range(X_train.shape[1]):
    worst_feature = None
    for feature in selected_features:
        temp_features = selected_features.copy()
        temp_features.remove(feature)
        classifier.fit(X_train[temp_features], y_train)
        y_pred = classifier.predict(X_test[temp_features])
        accuracy = accuracy_score(y_test, y_pred)
        if accuracy > best_accuracy:
            best_accuracy = accuracy
            worst_feature = feature
    if worst_feature is not None:
        selected_features.remove(worst_feature)

print("Selected features using Backward Selection:", selected_features)
print("Best Accuracy using Backward Selection:", best_accuracy)

Selected features using Backward Selection: ['sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Best Accuracy using Backward Selection: 1.0


#### NOTE:
   The Forward Selection and Backward Selection are can use any of ML algorithm not only Random Forest Classifier.It depends on the problem and datasets.