# Forward feature selection

Using this function enables the selection of the best features for a machine learning algorithm.<br> 
One thing that I find inconvenient is that when there are one-hot encoded features for instance, the selection procedure will test each one of the new created variables from each label. But the information about the original variable is contained as a whole with the entire vector of ones and zeros. So with this function is possible to test one-hot encoded and cyclical encoded features as a whole instead of column by column.

[Data used](https://www.kaggle.com/jsphyg/weather-dataset-rattle-package).

Modules:<br>
[forward_feature_selection](https://github.com/abreukuse/ml_utilities/blob/master/forward_feature_selection.py)

[feature_engineering_time_series](https://github.com/abreukuse/ml_utilities/blob/master/feature_engineering_time_series.py)

In [None]:
# Feature Engine is a very nice, easy to use and well documented library that helps a lot in the task of building 
# pipelines for data preprocessing: https://feature-engine.readthedocs.io/en/latest/quickstart.html
!pip install feature-engine

In [2]:
import pandas as pd
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import FunctionTransformer
from feature_engine.encoding import OneHotEncoder
from feature_engine.wrappers import SklearnTransformerWrapper
from sklearn.linear_model import LogisticRegression

pd.set_option('display.max_columns', 3000)

from forward_feature_selection import forward_feature_selection
from feature_engineering_time_series import seasonal_features # create cyclical features

import os
os.environ['KAGGLE_USERNAME'] = 'kaggle_username'
os.environ['KAGGLE_KEY'] = 'kaggle_api_key'

In [3]:
!kaggle datasets download -d jsphyg/weather-dataset-rattle-package
!unzip 'weather-dataset-rattle-package.zip'

weather-dataset-rattle-package.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive:  weather-dataset-rattle-package.zip
replace weatherAUS.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: weatherAUS.csv          


In [4]:
data = pd.read_csv('weatherAUS.csv')
data.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,WindDir3pm,WindSpeed9am,WindSpeed3pm,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,W,WNW,20.0,24.0,71.0,22.0,1007.7,1007.1,8.0,,16.9,21.8,No,No
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,NNW,WSW,4.0,22.0,44.0,25.0,1010.6,1007.8,,,17.2,24.3,No,No
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,W,WSW,19.0,26.0,38.0,30.0,1007.6,1008.7,,2.0,21.0,23.2,No,No
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,SE,E,11.0,9.0,45.0,16.0,1017.6,1012.8,,,18.1,26.5,No,No
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,ENE,NW,7.0,20.0,82.0,33.0,1010.8,1006.0,7.0,8.0,17.8,29.7,No,No


In [5]:
def select(X, variables):
    """Select just a few variables for the demonstration."""
    X = X[variables].copy()
    return X

def drop(X, variables):
    X = X.drop(columns=variables)
    return X

In [6]:
pipeline = make_pipeline(

    FunctionTransformer(select, 
                        kw_args={
                                 'variables': [
                                               'Date',
                                               'Humidity3pm',
                                               'Pressure3pm',
                                               'WindGustDir',
                                               'WindDir9am',
                                               'WindDir3pm'
                                               ]
                                 }
                        ),

    SklearnTransformerWrapper(transformer=SimpleImputer(strategy='most_frequent'),
                              variables=[
                                         'Humidity3pm',
                                         'Pressure3pm',
                                         'WindGustDir',
                                         'WindDir9am',
                                         'WindDir3pm'
                                         ]
                              ),

    FunctionTransformer(seasonal_features, 
                        kw_args={
                                 'date_column': 'Date',
                                 'which_ones': [
                                                'day',
                                                'week',
                                                'month',
                                                'year',
                                                'dayofyear'
                                                ],
                                 'cyclical': True,
                                 'copy': True
                                }
                        ),
    
    OneHotEncoder(variables=[
                             'WindGustDir',
                             'WindDir9am',
                             'WindDir3pm'
                             ], 
                  drop_last=True
                  ),

    FunctionTransformer(drop, 
                        kw_args={
                                 'variables': ['Date']
                                 }
                        )                                        
                         
)

In [7]:
# There are missing data in the target
data = data.loc[~data['RainTomorrow'].isnull(), :].copy()

In [8]:
data['Date'] = pd.to_datetime(data['Date'])

In [9]:
# Simple split
train = data.query('Date < "2015-01-01"').copy()
validation = data.query('Date >= "2015-01-01"').copy()

In [10]:
X_train = pipeline.fit_transform(train)
X_validation = pipeline.transform(validation)

In [11]:
X_train.head()

Unnamed: 0,Humidity3pm,Pressure3pm,Date_day_cos,Date_day_sin,Date_week_cos,Date_week_sin,Date_month_cos,Date_month_sin,Date_year_cos,Date_year_sin,Date_dayofyear_cos,Date_dayofyear_sin,WindGustDir_W,WindGustDir_WNW,WindGustDir_WSW,WindGustDir_NE,WindGustDir_NNW,WindGustDir_N,WindGustDir_NNE,WindGustDir_SW,WindGustDir_ENE,WindGustDir_SSE,WindGustDir_S,WindGustDir_NW,WindGustDir_SE,WindGustDir_ESE,WindGustDir_E,WindDir9am_W,WindDir9am_NNW,WindDir9am_SE,WindDir9am_ENE,WindDir9am_SW,WindDir9am_SSE,WindDir9am_S,WindDir9am_NE,WindDir9am_N,WindDir9am_SSW,WindDir9am_WSW,WindDir9am_ESE,WindDir9am_E,WindDir9am_NW,WindDir9am_WNW,WindDir3pm_WNW,WindDir3pm_WSW,WindDir3pm_E,WindDir3pm_NW,WindDir3pm_W,WindDir3pm_SSE,WindDir3pm_ESE,WindDir3pm_ENE,WindDir3pm_NNW,WindDir3pm_SSW,WindDir3pm_SW,WindDir3pm_SE,WindDir3pm_N,WindDir3pm_S,WindDir3pm_NNE
0,22.0,1007.1,0.97953,0.201299,0.889657,-0.456629,1.0,-2.449294e-16,0.999825,-0.018717,0.870285,-0.492548,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,25.0,1007.8,0.918958,0.394356,0.889657,-0.456629,1.0,-2.449294e-16,0.999825,-0.018717,0.878612,-0.477536,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,30.0,1008.7,0.820763,0.571268,0.889657,-0.456629,1.0,-2.449294e-16,0.999825,-0.018717,0.88668,-0.462383,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
3,16.0,1012.8,0.688967,0.724793,0.889657,-0.456629,1.0,-2.449294e-16,0.999825,-0.018717,0.894487,-0.447094,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
4,33.0,1006.0,0.528964,0.848644,0.889657,-0.456629,1.0,-2.449294e-16,0.999825,-0.018717,0.90203,-0.431673,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


In [12]:
X_validation.head()

Unnamed: 0,Humidity3pm,Pressure3pm,Date_day_cos,Date_day_sin,Date_week_cos,Date_week_sin,Date_month_cos,Date_month_sin,Date_year_cos,Date_year_sin,Date_dayofyear_cos,Date_dayofyear_sin,WindGustDir_W,WindGustDir_WNW,WindGustDir_WSW,WindGustDir_NE,WindGustDir_NNW,WindGustDir_N,WindGustDir_NNE,WindGustDir_SW,WindGustDir_ENE,WindGustDir_SSE,WindGustDir_S,WindGustDir_NW,WindGustDir_SE,WindGustDir_ESE,WindGustDir_E,WindDir9am_W,WindDir9am_NNW,WindDir9am_SE,WindDir9am_ENE,WindDir9am_SW,WindDir9am_SSE,WindDir9am_S,WindDir9am_NE,WindDir9am_N,WindDir9am_SSW,WindDir9am_WSW,WindDir9am_ESE,WindDir9am_E,WindDir9am_NW,WindDir9am_WNW,WindDir3pm_WNW,WindDir3pm_WSW,WindDir3pm_E,WindDir3pm_NW,WindDir3pm_W,WindDir3pm_SSE,WindDir3pm_ESE,WindDir3pm_ENE,WindDir3pm_NNW,WindDir3pm_SSW,WindDir3pm_SW,WindDir3pm_SE,WindDir3pm_N,WindDir3pm_S,WindDir3pm_NNE
2133,14.0,1011.0,0.97953,0.201299,0.992981,0.118273,0.866025,0.5,0.999981,-0.00623,0.999853,0.017166,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2134,12.0,1012.4,0.918958,0.394356,0.992981,0.118273,0.866025,0.5,0.999981,-0.00623,0.999411,0.034328,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
2135,19.0,1012.3,0.820763,0.571268,0.992981,0.118273,0.866025,0.5,0.999981,-0.00623,0.998674,0.051479,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
2136,37.0,1012.1,0.688967,0.724793,0.992981,0.118273,0.866025,0.5,0.999981,-0.00623,0.997643,0.068615,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2137,34.0,1014.7,0.528964,0.848644,0.972023,0.234886,0.866025,0.5,0.999981,-0.00623,0.996318,0.085731,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0


In [13]:
y_train = train.RainTomorrow.replace({'No':0, 'Yes':1})
y_validation = validation.RainTomorrow.replace({'No':0, 'Yes':1})

In [14]:
X_train.shape, y_train.shape

((98988, 57), (98988,))

In [15]:
X_validation.shape, y_validation.shape

((43205, 57), (43205,))

In [16]:
logistic_regression = LogisticRegression(class_weight='balanced', random_state=50, max_iter=1000)

In [17]:
# A better approach would be to reduce the cardinality of the categorical variables, 
# but here I just want to showcase the functionality of the module.

together = [
            # Date_day
            ['Date_day_cos', 
             'Date_day_sin'],
            
            # Date_week
            ['Date_week_cos', 
             'Date_week_sin'],
            
            # Date_month
            ['Date_month_cos', 
             'Date_month_sin'],
            
            # Date_year
            ['Date_year_cos', 
             'Date_year_sin'],
            
            # Date_dayofyear
            ['Date_dayofyear_cos', 
             'Date_dayofyear_sin'],
            
            # WindGustDir
            ['WindGustDir_W',
            'WindGustDir_WNW',
            'WindGustDir_WSW',
            'WindGustDir_NE',
            'WindGustDir_NNW',
            'WindGustDir_N',
            'WindGustDir_NNE',
            'WindGustDir_SW',
            'WindGustDir_ENE',
            'WindGustDir_SSE',
            'WindGustDir_S',
            'WindGustDir_NW',
            'WindGustDir_SE',
            'WindGustDir_ESE',
            'WindGustDir_E'],

            # WindDir9am
            ['WindDir9am_W',
             'WindDir9am_NNW',
             'WindDir9am_SE',
             'WindDir9am_ENE',
             'WindDir9am_SW',
             'WindDir9am_SSE',
             'WindDir9am_S',
             'WindDir9am_NE',
             'WindDir9am_N',
             'WindDir9am_SSW',
             'WindDir9am_WSW',
             'WindDir9am_ESE',
             'WindDir9am_E',
             'WindDir9am_NW',
             'WindDir9am_WNW'],
            
            # WindDir3pm
            ['WindDir3pm_WNW',
             'WindDir3pm_WSW',
             'WindDir3pm_E',
             'WindDir3pm_NW',
             'WindDir3pm_W',
             'WindDir3pm_SSE',
             'WindDir3pm_ESE',
             'WindDir3pm_ENE',
             'WindDir3pm_NNW',
             'WindDir3pm_SSW',
             'WindDir3pm_SW',
             'WindDir3pm_SE',
             'WindDir3pm_N',
             'WindDir3pm_S',
             'WindDir3pm_NNE']
            ]

In [18]:
variables_selected = forward_feature_selection(X_train = X_train, 
                                               X_test = X_validation,
                                               y_train = y_train,
                                               y_test = y_validation,
                                               model = logistic_regression,
                                               task_type = 'classification',
                                               metric = 'f1_score', 
                                               probability = False,
                                               analyse_together = together,
                                               steps = 20,
                                               greater_is_better = True,
                                               pos_label = 1,
                                               average = 'binary')

Round: 1
Variable selected: ['Humidity3pm']
Score: 0.5146134935809888

Round: 2
Variable selected: ['WindDir3pm_WNW', 'WindDir3pm_WSW', 'WindDir3pm_E', 'WindDir3pm_NW', 'WindDir3pm_W', 'WindDir3pm_SSE', 'WindDir3pm_ESE', 'WindDir3pm_ENE', 'WindDir3pm_NNW', 'WindDir3pm_SSW', 'WindDir3pm_SW', 'WindDir3pm_SE', 'WindDir3pm_N', 'WindDir3pm_S', 'WindDir3pm_NNE']
Score: 0.533439744612929

Round: 3
Variable selected: ['WindGustDir_W', 'WindGustDir_WNW', 'WindGustDir_WSW', 'WindGustDir_NE', 'WindGustDir_NNW', 'WindGustDir_N', 'WindGustDir_NNE', 'WindGustDir_SW', 'WindGustDir_ENE', 'WindGustDir_SSE', 'WindGustDir_S', 'WindGustDir_NW', 'WindGustDir_SE', 'WindGustDir_ESE', 'WindGustDir_E']
Score: 0.5362679733311911

Round: 4
Variable selected: ['Pressure3pm']
Score: 0.5381198513971895

Round: 5
Variable selected: ['Date_week_cos', 'Date_week_sin']
Score: 0.5399910543650632

Round: 6
Variable selected: ['WindDir9am_W', 'WindDir9am_NNW', 'WindDir9am_SE', 'WindDir9am_ENE', 'WindDir9am_SW', 'WindDir9a

In [19]:
X_train_final = X_train[variables_selected]
X_validation_final = X_validation[variables_selected]

In [20]:
X_train_final.head()

Unnamed: 0,Humidity3pm,WindDir3pm_WNW,WindDir3pm_WSW,WindDir3pm_E,WindDir3pm_NW,WindDir3pm_W,WindDir3pm_SSE,WindDir3pm_ESE,WindDir3pm_ENE,WindDir3pm_NNW,WindDir3pm_SSW,WindDir3pm_SW,WindDir3pm_SE,WindDir3pm_N,WindDir3pm_S,WindDir3pm_NNE,WindGustDir_W,WindGustDir_WNW,WindGustDir_WSW,WindGustDir_NE,WindGustDir_NNW,WindGustDir_N,WindGustDir_NNE,WindGustDir_SW,WindGustDir_ENE,WindGustDir_SSE,WindGustDir_S,WindGustDir_NW,WindGustDir_SE,WindGustDir_ESE,WindGustDir_E,Pressure3pm,Date_week_cos,Date_week_sin,WindDir9am_W,WindDir9am_NNW,WindDir9am_SE,WindDir9am_ENE,WindDir9am_SW,WindDir9am_SSE,WindDir9am_S,WindDir9am_NE,WindDir9am_N,WindDir9am_SSW,WindDir9am_WSW,WindDir9am_ESE,WindDir9am_E,WindDir9am_NW,WindDir9am_WNW,Date_dayofyear_cos,Date_dayofyear_sin
0,22.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1007.1,0.889657,-0.456629,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.870285,-0.492548
1,25.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1007.8,0.889657,-0.456629,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0.878612,-0.477536
2,30.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1008.7,0.889657,-0.456629,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.88668,-0.462383
3,16.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1012.8,0.889657,-0.456629,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0.894487,-0.447094
4,33.0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1006.0,0.889657,-0.456629,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0.90203,-0.431673


In [21]:
X_validation_final.head()

Unnamed: 0,Humidity3pm,WindDir3pm_WNW,WindDir3pm_WSW,WindDir3pm_E,WindDir3pm_NW,WindDir3pm_W,WindDir3pm_SSE,WindDir3pm_ESE,WindDir3pm_ENE,WindDir3pm_NNW,WindDir3pm_SSW,WindDir3pm_SW,WindDir3pm_SE,WindDir3pm_N,WindDir3pm_S,WindDir3pm_NNE,WindGustDir_W,WindGustDir_WNW,WindGustDir_WSW,WindGustDir_NE,WindGustDir_NNW,WindGustDir_N,WindGustDir_NNE,WindGustDir_SW,WindGustDir_ENE,WindGustDir_SSE,WindGustDir_S,WindGustDir_NW,WindGustDir_SE,WindGustDir_ESE,WindGustDir_E,Pressure3pm,Date_week_cos,Date_week_sin,WindDir9am_W,WindDir9am_NNW,WindDir9am_SE,WindDir9am_ENE,WindDir9am_SW,WindDir9am_SSE,WindDir9am_S,WindDir9am_NE,WindDir9am_N,WindDir9am_SSW,WindDir9am_WSW,WindDir9am_ESE,WindDir9am_E,WindDir9am_NW,WindDir9am_WNW,Date_dayofyear_cos,Date_dayofyear_sin
2133,14.0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1011.0,0.992981,0.118273,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0.999853,0.017166
2134,12.0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1012.4,0.992981,0.118273,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0.999411,0.034328
2135,19.0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1012.3,0.992981,0.118273,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0.998674,0.051479
2136,37.0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1012.1,0.992981,0.118273,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0.997643,0.068615
2137,34.0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1014.7,0.972023,0.234886,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0.996318,0.085731


In [22]:
X_train.shape, X_train_final.shape

((98988, 57), (98988, 51))

In [23]:
X_validation.shape, X_validation_final.shape

((43205, 57), (43205, 51))

In [24]:
# Testing with roc_auc_score
variables_selected = forward_feature_selection(X_train = X_train, 
                                               X_test = X_validation,
                                               y_train = y_train,
                                               y_test = y_validation,
                                               model = logistic_regression,
                                               task_type = 'classification',
                                               metric = 'roc_auc_score', 
                                               probability = True,
                                               analyse_together = together,
                                               steps = 20,
                                               greater_is_better = True)

Round: 1
Variable selected: ['Humidity3pm']
Score: 0.7813134134870414

Round: 2
Variable selected: ['WindDir3pm_WNW', 'WindDir3pm_WSW', 'WindDir3pm_E', 'WindDir3pm_NW', 'WindDir3pm_W', 'WindDir3pm_SSE', 'WindDir3pm_ESE', 'WindDir3pm_ENE', 'WindDir3pm_NNW', 'WindDir3pm_SSW', 'WindDir3pm_SW', 'WindDir3pm_SE', 'WindDir3pm_N', 'WindDir3pm_S', 'WindDir3pm_NNE']
Score: 0.7924537322055991

Round: 3
Variable selected: ['WindDir9am_W', 'WindDir9am_NNW', 'WindDir9am_SE', 'WindDir9am_ENE', 'WindDir9am_SW', 'WindDir9am_SSE', 'WindDir9am_S', 'WindDir9am_NE', 'WindDir9am_N', 'WindDir9am_SSW', 'WindDir9am_WSW', 'WindDir9am_ESE', 'WindDir9am_E', 'WindDir9am_NW', 'WindDir9am_WNW']
Score: 0.7945944832391705

Round: 4
Variable selected: ['Date_month_cos', 'Date_month_sin']
Score: 0.7965359871842287

Round: 5
Variable selected: ['WindGustDir_W', 'WindGustDir_WNW', 'WindGustDir_WSW', 'WindGustDir_NE', 'WindGustDir_NNW', 'WindGustDir_N', 'WindGustDir_NNE', 'WindGustDir_SW', 'WindGustDir_ENE', 'WindGustDir_S

In [25]:
# Testing with log_loss
variables_selected = forward_feature_selection(X_train = X_train, 
                                               X_test = X_validation,
                                               y_train = y_train,
                                               y_test = y_validation,
                                               model = logistic_regression,
                                               task_type = 'classification',
                                               metric = 'log_loss', 
                                               probability = True,
                                               analyse_together = together,
                                               steps = 20,
                                               greater_is_better = False)

Round: 1
Variable selected: ['Humidity3pm']
Score: 0.5488570177173386

Round: 2
Variable selected: ['WindDir3pm_WNW', 'WindDir3pm_WSW', 'WindDir3pm_E', 'WindDir3pm_NW', 'WindDir3pm_W', 'WindDir3pm_SSE', 'WindDir3pm_ESE', 'WindDir3pm_ENE', 'WindDir3pm_NNW', 'WindDir3pm_SSW', 'WindDir3pm_SW', 'WindDir3pm_SE', 'WindDir3pm_N', 'WindDir3pm_S', 'WindDir3pm_NNE']
Score: 0.5371464626294861

Round: 3
Variable selected: ['Date_week_cos', 'Date_week_sin']
Score: 0.5335247944233654

Round: 4
Variable selected: ['Pressure3pm']
Score: 0.5306889858396708

Round: 5
Variable selected: ['WindGustDir_W', 'WindGustDir_WNW', 'WindGustDir_WSW', 'WindGustDir_NE', 'WindGustDir_NNW', 'WindGustDir_N', 'WindGustDir_NNE', 'WindGustDir_SW', 'WindGustDir_ENE', 'WindGustDir_SSE', 'WindGustDir_S', 'WindGustDir_NW', 'WindGustDir_SE', 'WindGustDir_ESE', 'WindGustDir_E']
Score: 0.528158174096713

Round: 6
Variable selected: ['Date_dayofyear_cos', 'Date_dayofyear_sin']
Score: 0.5275556500411902

Round: 7
Variable selecte