In forward selection, variables are progressively incorporated into larger and larger subsets. The
algorithms start by training all possible single-variable machine learning models. Then, it selects
the feature that returned the best performing classifier or regression model. In the second step, it
creates machine learning models for all combinations of the features from the previous step with
all remaining variables in the data. It selects the pair of features that produce the best performing
algorithm. And it continues, adding 1 feature at a time to the feature subset from the previous step
until a stopping criteria is met.

The feature subsets are nested because they include the feature or features from the previous steps.
By progressively evaluating promising features, forward feature selection is more efficient than
exhaustive search.

Forward feature selection has the advantage that, by starting with smaller feature subsets, it is more
computationally efficient than other wrapper methods. However, for this same reason, it does not
contemplate feature interactions. Or at least not until sufficient features have been added to the
subset.

Forward feature selection needs a criteria to stop the search. The most obvious stopping condition
is when the performance of the classifier or regression model does not improve beyond a certain

threshold. This has the advantage of focusing the search on performance. On the downside, the
threshold for improvement is arbitrarily set by the user. Alternatively, we can stop the search after
a certain number of features have been selected.

In the coming paragraphs, we will implement forwardfeature selection with Scikit-learn and
MLXtend. The 2 Python implementations are very similar. Both offer a number of features as
stopping criteria. Scikit-learn also offers a threshold on performance improvement as a method to
stop the search.

In [1]:
from sklearn.feature_selection import mutual_info_classif, mutual_info_regression

# to select the features
from sklearn.feature_selection import SelectKBest, SelectPercentile
import pandas as pd
import csv
import seaborn as sns
sns.set_theme(style="whitegrid")
from tabulate import tabulate
import missingno as msno
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from plotly.subplots import make_subplots
import plotly.graph_objs as go
import plotly.io as pio
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_validate
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from catboost import CatBoostRegressor
from sklearn.metrics import make_scorer, mean_absolute_error, r2_score, mean_squared_error
import numpy as np

In [2]:
data = pd.read_csv('train.csv')
data

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
0,4,88.944468,57.862692,66.361592,36.116612,1.181795,1.062396,122.90607,31.794921,51.968828,...,2.257143,2.213364,2.219783,1.368922,1.066221,1,1.085714,0.433013,0.437059,29.00
1,5,92.729214,58.518416,73.132787,36.396602,1.449309,1.057755,122.90607,36.161939,47.094633,...,2.257143,1.888175,2.210679,1.557113,1.047221,2,1.128571,0.632456,0.468606,26.00
2,4,88.944468,57.885242,66.361592,36.122509,1.181795,0.975980,122.90607,35.741099,51.968828,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,19.00
3,4,88.944468,57.873967,66.361592,36.119560,1.181795,1.022291,122.90607,33.768010,51.968828,...,2.264286,2.213364,2.226222,1.368922,1.048834,1,1.100000,0.433013,0.440952,22.00
4,4,88.944468,57.840143,66.361592,36.110716,1.181795,1.129224,122.90607,27.848743,51.968828,...,2.242857,2.213364,2.206963,1.368922,1.096052,1,1.057143,0.433013,0.428809,23.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21258,4,106.957877,53.095769,82.515384,43.135565,1.177145,1.254119,146.88130,15.504479,65.764081,...,3.555556,3.223710,3.519911,1.377820,0.913658,1,2.168889,0.433013,0.496904,2.44
21259,5,92.266740,49.021367,64.812662,32.867748,1.323287,1.571630,188.38390,7.353333,69.232655,...,2.047619,2.168944,2.038991,1.594167,1.337246,1,0.904762,0.400000,0.212959,122.10
21260,2,99.663190,95.609104,99.433882,95.464320,0.690847,0.530198,13.51362,53.041104,6.756810,...,4.800000,4.472136,4.781762,0.686962,0.450561,1,3.200000,0.500000,0.400000,1.98
21261,2,99.663190,97.095602,99.433882,96.901083,0.690847,0.640883,13.51362,31.115202,6.756810,...,4.690000,4.472136,4.665819,0.686962,0.577601,1,2.210000,0.500000,0.462493,1.84


In [3]:
data = data.drop_duplicates()
data = data.drop(columns=['number_of_elements'])
X = data.iloc[:, 0:-1]
y = data.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=0,
)

X_train.shape, X_test.shape

((14835, 80), (6358, 80))

In [4]:
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

In [5]:
# step forward feature selection

sfs = SFS(
    estimator=RandomForestRegressor(n_estimators=5, random_state=0),
    k_features=15,  # the number of features to retain
    forward=True, # the direction of  the search
    verbose=1,  # print out intermediate steps
    scoring='r2',
    cv=3,
)

sfs = sfs.fit(X_train, y_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  80 out of  80 | elapsed:   18.2s finished
Features: 1/15[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  79 out of  79 | elapsed:   26.6s finished
Features: 2/15[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  78 out of  78 | elapsed:   38.5s finished
Features: 3/15[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  77 out of  77 | elapsed:   50.0s finished
Features: 4/15[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  76 out of  76 | elapsed:   55.0s finished
Features: 5/15[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  75 out of  75 | elapsed:  1.1min finished
Features: 6/15[Parallel(

In [6]:
sfs.k_feature_names_

('mean_atomic_mass',
 'wtd_gmean_atomic_mass',
 'range_atomic_mass',
 'wtd_std_atomic_mass',
 'range_fie',
 'entropy_atomic_radius',
 'entropy_Density',
 'range_Density',
 'std_ElectronAffinity',
 'mean_ThermalConductivity',
 'wtd_range_ThermalConductivity',
 'mean_Valence',
 'wtd_mean_Valence',
 'gmean_Valence',
 'entropy_Valence')

In [7]:
X_train_t = sfs.transform(X_train)
X_test_t = sfs.transform(X_test)

X_test_t

array([[ 76.5177175 ,  65.41137811, 122.90607   , ...,   2.4625    ,
          2.21336384,   1.36892236],
       [ 67.4025    ,  66.40422577,  23.115     , ...,   4.5       ,
          4.24264069,   0.63651417],
       [ 89.33718   ,  69.13330879, 124.90825   , ...,   2.17142857,
          2.49146188,   1.56495725],
       ...,
       [ 90.30468   ,  64.31362755, 128.2426    , ...,   2.465     ,
          2.49146188,   1.56495725],
       [ 63.45766667,  66.73940342,  31.093     , ...,   5.24675325,
          3.63424119,   1.01140426],
       [ 69.17125   ,  35.42963674, 121.3276    , ...,   2.07103394,
          2.16894354,   1.5941667 ]])