# Exhaustive Feature Selection by `Mr. Harshit Dawar!`

* This algorithm falls under the category of Wrapper Methods, that guarantees to select the best subset of features for a particular Machine Learning algorithm!

* This algorithm starts by training the model using all the possible subsets of the Features. This is the most computationally expensive way to select the Features, but on the other hand, it provides the best feature subset for a particular alogrithm.

* To make it practically possible, a custom range can be given for the featues, like (1 - 5), this means that the algorithm will make all the possible combinations by taking 1 to 5 features, & then returning the best subset of the features.

#### Now, that being said, let's proceed towards the practical.

In [2]:
# Importing the required libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import roc_auc_score, r2_score
from sklearn.datasets import load_boston
from mlxtend.feature_selection import ExhaustiveFeatureSelector

### Classification Use-Case

In [3]:
# Loading the Dataset!

data = pd.read_csv("../ds/Titanic.csv")
data.head()

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,0,3,male,-0.590495,1,0,-0.50024,S
1,1,1,female,0.643971,1,0,0.788947,C
2,1,3,female,-0.281878,0,0,-0.48665,S
3,1,1,female,0.412509,1,0,0.422861,S
4,0,3,male,0.412509,0,0,-0.484133,S


In [7]:
# Dividing the Dataset into Target & Features!
X = data.drop("Survived", axis = 1)
y = data.Survived

In [8]:
X.head()

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,3,male,-0.590495,1,0,-0.50024,S
1,1,female,0.643971,1,0,0.788947,C
2,3,female,-0.281878,0,0,-0.48665,S
3,1,female,0.412509,1,0,0.422861,S
4,3,male,0.412509,0,0,-0.484133,S


In [9]:
y

0      0
1      1
2      1
3      1
4      0
      ..
884    0
885    1
886    0
887    1
888    0
Name: Survived, Length: 889, dtype: int64

In [12]:
# Label Encoding the Categorical Variables

from sklearn.preprocessing import LabelEncoder
X.Sex = LabelEncoder().fit_transform(X.Sex)
X.Embarked = LabelEncoder().fit_transform(X.Embarked)

In [13]:
# Creating the Feature Selector!

Feature_Selector = ExhaustiveFeatureSelector(
                        RandomForestClassifier(n_estimators = 15, n_jobs = 2),
                        scoring = "roc_auc",
                        cv = 3,
                        print_progress = True,
                        min_features = 1,
                        max_features = 7
                        )

Feature_Selector.fit(X, y)

Features: 127/127

ExhaustiveFeatureSelector(cv=3,
                          estimator=RandomForestClassifier(n_estimators=15,
                                                           n_jobs=2),
                          max_features=7, scoring='roc_auc')

***Above number represents that the algorithm has tried 127 different combinations of the features!***

In [14]:
# Printing the Names of the Selected Features
Feature_Selector.best_feature_names_

('Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare')

***6 Features are selected for the Best Performance!***

In [15]:
# Printing the Index of the Selected Features
Feature_Selector.best_idx_

(0, 1, 2, 3, 4, 5)

In [16]:
# Best Roc-Auc Score!
Feature_Selector.best_score_

0.8470842527761446

In [17]:
# Transforming the Dataset to selected Features
X = Feature_Selector.transform(X)

In [18]:
X

array([[ 3.        ,  1.        , -0.59049493,  1.        ,  0.        ,
        -0.50023975],
       [ 1.        ,  0.        ,  0.64397101,  1.        ,  0.        ,
         0.78894661],
       [ 3.        ,  0.        , -0.28187844,  0.        ,  0.        ,
        -0.48664993],
       ...,
       [ 3.        ,  0.        ,  0.00352373,  1.        ,  2.        ,
        -0.17408416],
       [ 1.        ,  1.        , -0.28187844,  0.        ,  0.        ,
        -0.0422126 ],
       [ 3.        ,  1.        ,  0.18104628,  0.        ,  0.        ,
        -0.49017322]])