# Exhaustive Feature Selection 

In an exhaustive feature selection the best subset of features is selected, over all possible feature subsets, by optimizing a specified performance metric for a certain machine learning algorithm. For example, if the classifier is a logistic regression and the dataset consists of 4 features, the algorithm will evaluate all 15 feature combinations as follows:
<br><br>
all possible combinations of 1 feature<br>
all possible combinations of 2 features<br>
all possible combinations of 3 features<br>
all the 4 features<br>
and select the one that results in the best performance (e.g., classification accuracy) of the logistic regression classifier.
<br><br>
This is another greedy algorithm as it evaluates all possible feature combinations. It is quite computationally expensive, and sometimes, if feature space is big, even unfeasible.
<br><br>
There is a special package for python that implements this type of feature selection: mlxtend.
<br>
<br>
In the mlxtend implementation of the exhaustive feature selection, the stopping criteria is an arbitrarily set number of features. So the search will finish when we reach the desired number of selected features.
<br><br>
This is somewhat arbitrary because we may be selecting a subopimal number of features, or likewise, a high number of features.

In [4]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS

iris = load_iris()
X = iris.data
y = iris.target

knn = KNeighborsClassifier(n_neighbors=2)

efs1 = EFS(knn, 
           min_features=1,
           max_features=4,
           scoring='accuracy',
           print_progress=True,
           cv=5)

efs1 = efs1.fit(X, y)

print('Best accuracy score: %.2f' % efs1.best_score_)
print('Best subset (indices):', efs1.best_idx_)
print('Best subset (corresponding names):', efs1.best_feature_names_)

Features: 15/15

Best accuracy score: 0.97
Best subset (indices): (1, 2, 3)
Best subset (corresponding names): ('1', '2', '3')


In [7]:
import pandas as pd

In [10]:
df = pd.DataFrame.from_dict(efs1.get_metric_dict()).T
df.sort_values('avg_score', inplace=True, ascending=False)

In [11]:
df

Unnamed: 0,feature_idx,cv_scores,avg_score,feature_names,ci_bound,std_dev,std_err
13,"(1, 2, 3)","[0.9666666666666667, 0.9666666666666667, 0.966...",0.966667,"(1, 2, 3)",1.42696e-16,1.11022e-16,5.55112e-17
9,"(2, 3)","[0.9666666666666667, 0.9666666666666667, 0.966...",0.96,"(2, 3)",0.0171372,0.0133333,0.00666667
11,"(0, 1, 3)","[0.9666666666666667, 0.9666666666666667, 0.933...",0.96,"(0, 1, 3)",0.0320608,0.0249444,0.0124722
3,"(3,)","[0.9666666666666667, 0.9333333333333333, 0.933...",0.953333,"(3,)",0.0342744,0.0266667,0.0133333
8,"(1, 3)","[0.9666666666666667, 0.9666666666666667, 0.9, ...",0.953333,"(1, 3)",0.0436915,0.0339935,0.0169967
14,"(0, 1, 2, 3)","[0.9666666666666667, 0.9333333333333333, 0.933...",0.946667,"(0, 1, 2, 3)",0.0436915,0.0339935,0.0169967
12,"(0, 2, 3)","[0.9666666666666667, 0.9666666666666667, 0.9, ...",0.94,"(0, 2, 3)",0.0419774,0.0326599,0.0163299
6,"(0, 3)","[0.9333333333333333, 0.9666666666666667, 0.933...",0.94,"(0, 3)",0.0320608,0.0249444,0.0124722
5,"(0, 2)","[0.9666666666666667, 0.9666666666666667, 0.933...",0.933333,"(0, 2)",0.0469322,0.0365148,0.0182574
10,"(0, 1, 2)","[0.9666666666666667, 0.9333333333333333, 0.9, ...",0.926667,"(0, 1, 2)",0.0320608,0.0249444,0.0124722


###  I'm not writing all the code , But this site ->
<br>
<br>

http://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/

# 