## Final model and predictions

### Hyperparameter tuning 

For the AdaBoost model, we tune the hyperparameters `learning_rate` (weight applied to each classifier at each boosting iteration. A higher learning rate increases the contribution of each classifier) and `n_estimtors` (maximum number of estimators used). 

In [4]:
import numpy as np
import pandas as pd

from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import GridSearchCV

## this is to suppress warnings I was getting in this code. 
import warnings
# Suppress FutureWarning messages
warnings.simplefilter(action='ignore', category=FutureWarning)


In [5]:
## these are our parameters we want to tune.
param_grid = {"n_estimators": np.arange(50,750,100),
              "learning_rate": [0.01, 0.1, 1]}

In [6]:
#importing the clean survey training data to tune the model
survey_train = pd.read_csv('Data/survey_data_train.csv')

In [7]:
#features we are focusing on for our model
features = ['S2', 'D4', 'Fan_magnitude']

## the outputs we are predicting
targets = ['VL1r1','VL1r2','VL1r4','VL1r5','VL1r7',
           'VL1r10','VL1r11','VL1r12','VL1r13' ,'VL1r14']

In [None]:
## initialize our model
Ada = AdaBoostClassifier()

## dictionary for our hyperparameters
VL_dict = {}

## for our outputs, we determine the best parameters and store those in a dictionary
## to use later.
for VL in targets:
    print(VL)
    search = GridSearchCV(Ada, param_grid, cv=5).fit(survey_train[features], survey_train[VL])
    VL_dict[VL] = search.best_params_

VL1r1


In [None]:
## viewing our dictionary of best parameters for each VL output
VL_dict

### Now, we run the final test on the chosen model

With the above best performing hyper parameters, we run the model on the test data. We store the accuracy scores and feature importances for each `VL1r` question in distinct dictionaries. 

In [None]:
from sklearn.metrics import accuracy_score

survey_test = pd.read_csv('Data/survey_data_test.csv')

In [None]:
accuracy = {}
ada_importance = {}

## reminder of our features and outputs
# features = ['S2', 'D4', 'Fan_magnitude']
# targets = ['VL1r1','VL1r2','VL1r4','VL1r5','VL1r7',
#            'VL1r10','VL1r11','VL1r12','VL1r13' ,'VL1r14']

for v in VL_dict.items():
    ## initialize the model with the best_params_ found above
    Ada = AdaBoostClassifier(**v[1])
    
    ## fit the model with the training data
    Ada.fit(survey_train[features].values, survey_train[v[0]].values)
    
    ## predict the test data
    pred = Ada.predict(survey_test[features].values)
    
    ## store the accuracy score for the test VL values and predicted values
    accuracy[v[0]] = accuracy_score(survey_test[v[0]].values, pred)
    
    ## store the feature importance for each VL
    ada_importance[v[0]] = Ada.feature_importances_

In [None]:
## viewing our final accuracy scores
accuracy

In [None]:
## viewing the feature importance
# features = ['S2', 'D4', 'Fan_magnitude']
ada_importance