# Step Forward Feature Selection by `Mr. Harshit Dawar!`

* This algorithm falls under the category of Wrapper Methods, that guarantees to select the best subset of features for a particular Machine Learning algorithm!

* This algorithm starts by training the model using one feature from all the features that are present in the dataset, then selects the one that is performing best, then it again runs with the best feature selected in the previous round, & selects the best 2 features in a group that provides the best performance. Likewise, this algorithm keeps on increasing the features, unless & until a stopping condition is met.

* Stopping Condition can be a predefined number of features or the model performance threshold.

#### Now, that being said, let's proceed towards the practical.

In [1]:
# Importing the required libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import roc_auc_score, r2_score
from sklearn.datasets import load_boston
from mlxtend.feature_selection import SequentialFeatureSelector

In [2]:
# Loading the Dataset!

data = pd.read_csv("../ds/Titanic.csv")

In [3]:
data.head()

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,0,3,male,-0.590495,1,0,-0.50024,S
1,1,1,female,0.643971,1,0,0.788947,C
2,1,3,female,-0.281878,0,0,-0.48665,S
3,1,1,female,0.412509,1,0,0.422861,S
4,0,3,male,0.412509,0,0,-0.484133,S


In [6]:
X = data.drop("Survived", axis = 1)
y = data.Survived

In [7]:
X.head()

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,3,male,-0.590495,1,0,-0.50024,S
1,1,female,0.643971,1,0,0.788947,C
2,3,female,-0.281878,0,0,-0.48665,S
3,1,female,0.412509,1,0,0.422861,S
4,3,male,0.412509,0,0,-0.484133,S


In [8]:
y.head()

0    0
1    1
2    1
3    1
4    0
Name: Survived, dtype: int64

In [25]:
# Label Encoding the Catrgorical Variables

from sklearn.preprocessing import LabelEncoder

In [26]:
X.Sex = LabelEncoder().fit_transform(X.Sex)

In [28]:
X.Embarked = LabelEncoder().fit_transform(X.Embarked)

In [29]:
X.head()

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,3,1,-0.590495,1,0,-0.50024,2
1,1,0,0.643971,1,0,0.788947,0
2,3,0,-0.281878,0,0,-0.48665,2
3,1,0,0.412509,1,0,0.422861,2
4,3,1,0.412509,0,0,-0.484133,2


In [35]:
# Creating the Feature Selector!

Feature_Selector = SequentialFeatureSelector(
                        RandomForestClassifier(n_estimators = 15, n_jobs = 2),
                        scoring = "roc_auc",
                        cv = 3,
                        floating = False,
                        forward = True,
                        k_features = 3,
                        verbose = 2
                        )

Feature_Selector.fit(X, y)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.5s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:    3.5s finished

[2021-03-13 23:44:54] Features: 1/3 -- score: 0.7659024368309915[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.6s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    3.8s finished

[2021-03-13 23:44:58] Features: 2/3 -- score: 0.832071353004107[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.7s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    2.6s finished

[2021-03-13 23:45:00] Features: 3/3 -- score: 0.8413103215992491

SequentialFeatureSelector(cv=3,
                          estimator=RandomForestClassifier(n_estimators=15,
                                                           n_jobs=2),
                          k_features=3, scoring='roc_auc', verbose=2)

In [36]:
Feature_Selector.k_feature_names_

('Pclass', 'Sex', 'Parch')

In [40]:
# Printing the Score of the 
Feature_Selector.k_score_

0.8413103215992491