### **Wrapper Methods**

- In wrapper methods, we try to use a subset of features and train a model using them. Based on the inferences that we draw from the previous model, we decide to add or remove features from the subset. The problem is essentially reduced to a search problem. These methods are usually computationally very expensive.

- There are various wrapper methods. Common ones are:
1. Forward Selection
2. Backward Elimination
3. Exhaustive Feature selection
4. Recursive Feature Elimination
5. Recursive Feature Elimination with Cross Validation

1.1 **Forward Selection**

- Forward Selection is an iterative method in which we start with having no feature in the model. In each iteration, we keep adding the feature which best improves our model till an addition of a new variable does not improve the performance of the model
- The procedure starts with an empty set of features[reduced set]. The best of the original features is determined and added to the reduced set. At each subsequent iteration, the best of the remaining original attributes is added to the set.
- Forward feature selection starts by evaluating all features individually and selects the one that generates the best performing algorithm, according to a pre set evaluation criteria. In the second step, it evaluates all possible combinations of the selected feature and a second feature, and selects the pair that produce the best performing algorithm based on the same pre-set criteria
- The pre-set criteria can be roc_auc for classification and the r-squared for regression for example
- This selection procedure is called greedy, because it evaluates all possible single, double, triple and so on feature combiantions. Therefore, it is quite expensive computtionally and sometimes, if features space is big even it can be unfeasible.
-There is a special package for python that implements this type of feature selection:**mlxtend**
- In the mixtend implementation of the step forward feature selection, the stopping criteria is an arbitrarily set number of features. So the search will finish when we reach the desired number of selected features



In [15]:
# step forward feature selection

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

In [16]:
# Load dataset

data = pd.read_csv("datasets\\house-prices-advanced-regression-techniques\\train.csv")
data.shape

(1460, 81)

- In practice, feature selection should be done after data preprocessing, so ideally, all the categorical variables are encoded into numbers, and then you can assess how deterministic they are of the target. Here for simplicity we will use only numerical variables.

In [17]:
# select the numerical columns

numerics = ["int16", "int32", "int64", "float16", "float32", "float64"]
numerical_vars = list(data.select_dtypes(include=numerics).columns)
data = data[numerical_vars]
data.shape

(1460, 38)

In [18]:
# separate train and test sets

X_train, X_test, y_train, y_test = train_test_split(data.drop(labels=["SalePrice"], axis = 1),
                                   data["SalePrice"], test_size = 0.3, random_state = 0)

X_train.shape, X_test.shape

((1022, 37), (438, 37))

In [19]:
# Find and remove correlated features

def correlation(dataset, threshold):

    col_corr = set() # Set of all the names of correlated columns
    corr_matrix = dataset.corr()
    for i in range(len(corr_matrix.columns)):
        for j in range(i):
            if abs(corr_matrix.iloc[i, j]) > threshold: # we are interested in absolute coeff value

                colname = corr_matrix.columns[i] # getting the name of column

                col_corr.add(colname)

    return col_corr

corr_features = correlation(X_train, 0.8)
print("correlated features:", len(set(corr_features)))

correlated features: 3


In [20]:
# removed correlated features

X_train.drop(labels = corr_features, axis = 1, inplace = True)
X_test.drop(labels = corr_features, axis = 1, inplace = True)

X_train.shape, X_test.shape

((1022, 34), (438, 34))

In [21]:
X_train.fillna(0, inplace = True)

In [23]:
# step forward feature selection

from mlxtend.feature_selection import SequentialFeatureSelector as SequentialFeatureSelector

sfs1 = SFS(RandomForestRegressor(), k_features=10, forward = True, floating = False, verbose=2, scoring="r2", cv=3)

sfs1 = sfs1.fit(np.array(X_train), y_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.8s remaining:    0.0s
[Parallel(n_jobs=1)]: Done  34 out of  34 | elapsed:   19.1s finished

[2022-01-18 04:16:33] Features: 1/10 -- score: 0.6678308761670556[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done  33 out of  33 | elapsed:   20.9s finished

[2022-01-18 04:16:54] Features: 2/10 -- score: 0.722584705090004[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done  32 out of  32 | elapsed:   20.8s finished

[2022-01-18 04:17:15] Features: 3/10 -- score: 0.7442189536873473[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   

In [24]:
sfs1.k_feature_idx_

(4, 5, 6, 9, 14, 16, 17, 18, 19, 24)

In [25]:
X_train.columns[list(sfs1.k_feature_idx_)]

Index(['OverallQual', 'OverallCond', 'YearBuilt', 'BsmtFinSF1', '2ndFlrSF',
       'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'GarageCars'],
      dtype='object')