### Hyperparameter tuning 

For the AdaBoost model, we tune the hyperparameters `learning_rate` (weight applied to each classifier at each boosting iteration. A higher learning rate increases the contribution of each classifier) and `n_estimtors` (maximum number of estimators used). 

In [22]:
import numpy as np
import pandas as pd

from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import GridSearchCV

## this is to suppress warnings I was getting in this code. 
import warnings
# Suppress FutureWarning messages
warnings.simplefilter(action='ignore', category=FutureWarning)


In [27]:
param_grid = {"n_estimators": np.arange(50,750,100),
              "learning_rate": [0.01, 0.1, 1]}

In [24]:
survey_df = pd.read_csv('survey_data_train.csv')

In [25]:
#features we want
features = ['S2', 'D4', 'Fan_magnitude']
#removed VL2, 'D6'
targets = ['VL1r1','VL1r2','VL1r4','VL1r5','VL1r7',
           'VL1r10','VL1r11','VL1r12','VL1r13' ,'VL1r14']

In [28]:
Ada = AdaBoostClassifier()
VL_dict = {}
for VL in targets:
    search = GridSearchCV(Ada, param_grid, cv=5).fit(survey_df[features], survey_df['VL1r1'])
    VL_dict[VL] = search.best_params_
    print("The best hyperparameters for", VL, "are ", search.best_params_)

The best hyperparameters for VL1r1 are  {'learning_rate': 1, 'n_estimators': 350}
The best hyperparameters for VL1r2 are  {'learning_rate': 1, 'n_estimators': 350}
The best hyperparameters for VL1r4 are  {'learning_rate': 1, 'n_estimators': 350}
The best hyperparameters for VL1r5 are  {'learning_rate': 1, 'n_estimators': 350}
The best hyperparameters for VL1r7 are  {'learning_rate': 1, 'n_estimators': 350}
The best hyperparameters for VL1r10 are  {'learning_rate': 1, 'n_estimators': 350}
The best hyperparameters for VL1r11 are  {'learning_rate': 1, 'n_estimators': 350}
The best hyperparameters for VL1r12 are  {'learning_rate': 1, 'n_estimators': 350}
The best hyperparameters for VL1r13 are  {'learning_rate': 1, 'n_estimators': 350}
The best hyperparameters for VL1r14 are  {'learning_rate': 1, 'n_estimators': 350}


In [29]:
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score


In [38]:
kfold = KFold(5, shuffle = True, random_state = 5555)

accuracy = {}

for VL in targets:
    accur = np.zeros((5,1))
    print('Working on ', VL)
    for i, (train_index, test_index) in enumerate(kfold.split(survey_df)):
        survey_tt = survey_df.iloc[train_index]
        survey_ho = survey_df.iloc[test_index]
    
        Ada = AdaBoostClassifier(n_estimators = 350)
    

        Ada.fit(survey_tt[features].values, survey_tt[VL].values)
        
        pred = Ada.predict(survey_ho[features].values)
        
        accur[i,0] = accuracy_score(survey_ho[VL].values, pred)
    
    accuracy[VL] = accur.mean()
    

Working on  VL1r1
Working on  VL1r2
Working on  VL1r4
Working on  VL1r5
Working on  VL1r7
Working on  VL1r10
Working on  VL1r11
Working on  VL1r12
Working on  VL1r13
Working on  VL1r14


In [40]:
accuracy

{'VL1r1': 0.6549670113194526,
 'VL1r2': 0.640969298651115,
 'VL1r4': 0.8010618402172893,
 'VL1r5': 0.6485714368912674,
 'VL1r7': 0.5882494341729679,
 'VL1r10': 0.7271075009482015,
 'VL1r11': 0.7648702401552648,
 'VL1r12': 0.917601970803398,
 'VL1r13': 0.7842918844861111,
 'VL1r14': 0.8442515686275935}