# Tuning Models with Cloud Computing

This notebook goes over the process I went through for tuning the models I settled on during my [Cross Validation Process](Cross_Validation.ipynb)

The following models were tuned: 
   - GradientBoostingClassifier
   - AdaBoostClassifier
   - RandomForestClassifier
   - BaggingClassifier

I went through the same process for each classifier: 
- Select a wide variety of parameters to test
- If the parameter selected as best was on the higher end, select more higher end options to test. If it was on the lower end, more lower end options were added.
- Repeat until model performance no longer changes when parameters are altered, and the gridsearch continues to select the same ones as "best"

Model performance is evaluated on precision, with the goal of reducing the number of false positives. 

<span style="color: red">**Warning:** </span>This is an extremely heavy notebook to run. I used google cloud computing, with 8 cpus. Attempting to run this on a personal computer that has, for example, 2 cpus will take an extremely long time. 

## Load Data, Packages, & Functions

In [1]:
from src.modeling import *

%reload_ext autoreload
%autoreload 2

X_train, y_train, X_test, y_test = load_train_test_data()

## Tuning RandomForestClassifier

Vanilla Model:

In [7]:
RF_vanilla = RandomForestClassifier(random_state=6)

RF_vanilla.fit(X_train, y_train)

print(f"""RandomForest Vanilla Model
Train score: {RF_vanilla.score(X_train, y_train)}
      
Test score: {RF_vanilla.score(X_test, y_test)}
""")

RandomForest Vanilla Model
Train score: 0.988841611670719
      
Test score: 0.6935946047642478



### 1st RF Gridsearch

In [23]:
RF_Params_0 = {
    'n_estimators' : [10],
    'criterion' : ['gini' , 'entropy'],
    'max_depth' : [10, 30],
    'min_samples_split' : [2, 10],
    'min_samples_leaf' : [1, 10],
    'max_features' : ['auto', 'sqrt', 'log2'],
    'bootstrap' : [True, False], 
    'max_samples' : [None, 100],
    'random_state' : [6]
}

RF_grid = run_gridsearch(RandomForestClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         RF_Params_0, verbose = 1, num_jobs = -1)

pickle.dump(RF_grid, open('models/RandomForest_Grid_0.p', 'wb'))

Fitting 5 folds for each of 192 candidates, totalling 960 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    5.8s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:   27.0s
[Parallel(n_jobs=-1)]: Done 434 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 784 tasks      | elapsed:  2.8min
[Parallel(n_jobs=-1)]: Done 960 out of 960 | elapsed:  4.0min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.80
---
Test Score: 0.72

Best Parameters:
{'bootstrap': False, 'criterion': 'entropy', 'max_depth': 30, 'max_features': 'auto', 'max_samples': None, 'min_samples_leaf': 10, 'min_samples_split': 2, 'n_estimators': 10, 'random_state': 6}



### 2nd RF Gridsearch

In [24]:
RF_Params_1 = {
    'n_estimators' : [5, 10],
    'max_depth' : [30, 50],
    'min_samples_split' : [2, 6],
    'min_samples_leaf' : [10, 20],
    'max_samples' : [None, 300],
    
    'max_features' : ['auto'],
    'bootstrap' : [False], 
    'criterion' : ['entropy'],
    'random_state' : [6]
}

RF_grid = run_gridsearch(RandomForestClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         RF_Params_1, verbose = 1, num_jobs = -1)

pickle.dump(RF_grid, open('models/RandomForest_Grid_1.p', 'wb'))

Fitting 5 folds for each of 32 candidates, totalling 160 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   11.0s
[Parallel(n_jobs=-1)]: Done 160 out of 160 | elapsed:   47.5s finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.80
---
Test Score: 0.72

Best Parameters:
{'bootstrap': False, 'criterion': 'entropy', 'max_depth': 30, 'max_features': 'auto', 'max_samples': None, 'min_samples_leaf': 10, 'min_samples_split': 2, 'n_estimators': 10, 'random_state': 6}



### 3rd RF Gridsearch

In [29]:
RF_Params_2 = {
    'n_estimators' : [8, 10],
    'max_depth' : [30, 40],
    'min_samples_split' : [2, 4],
    'min_samples_leaf' : [10, 15],
    'max_samples' : [None, 1000],
    
    'max_features' : ['auto'],
    'bootstrap' : [False], 
    'criterion' : ['entropy'],
    'random_state' : [6]
}

RF_grid = run_gridsearch(RandomForestClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         RF_Params_2, verbose = 1, num_jobs = -1)

pickle.dump(RF_grid, open('models/RandomForest_Grid_2.p', 'wb'))

Fitting 5 folds for each of 32 candidates, totalling 160 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   15.7s
[Parallel(n_jobs=-1)]: Done 160 out of 160 | elapsed:  1.0min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.78
---
Test Score: 0.72

Best Parameters:
{'bootstrap': False, 'criterion': 'entropy', 'max_depth': 30, 'max_features': 'auto', 'max_samples': None, 'min_samples_leaf': 15, 'min_samples_split': 2, 'n_estimators': 10, 'random_state': 6}



### 4th RF Gridsearch

In [30]:
RF_Params_3 = {
    'max_samples' : [None, 5000],
    
    'n_estimators' : [10],
    'criterion' : ['entropy', 'gini'],
    'max_depth' : [30],
    
    'min_samples_split' : [2],
    'min_samples_leaf' : [15, 18],
    'max_features' : ['auto'],
    'max_leaf_nodes' : [None, 50, 100],
    'bootstrap' : [False], 
    'random_state' : [6]
}

RF_grid = run_gridsearch(RandomForestClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         RF_Params_3, verbose = 1, num_jobs = -1)

pickle.dump(RF_grid, open('models/RandomForest_Grid_3.p', 'wb'))

Fitting 5 folds for each of 24 candidates, totalling 120 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   11.1s
[Parallel(n_jobs=-1)]: Done 120 out of 120 | elapsed:   31.9s finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.78
---
Test Score: 0.72

Best Parameters:
{'bootstrap': False, 'criterion': 'entropy', 'max_depth': 30, 'max_features': 'auto', 'max_leaf_nodes': None, 'max_samples': None, 'min_samples_leaf': 15, 'min_samples_split': 2, 'n_estimators': 10, 'random_state': 6}



### 5th RF Gridsearch (FINAL!) 

In [33]:
RF_Params_4 = {
    'max_samples' : [None, 10000],
    'max_leaf_nodes' : [None, 1000],
    
    'n_estimators' : [10],
    'criterion' : ['entropy'],
    'max_depth' : [30],
    'min_samples_split' : [2],
    'min_samples_leaf' : [15],
    'max_features' : ['auto'],
    'bootstrap' : [False], 
    'random_state' : [6]
}

RF_grid = run_gridsearch(RandomForestClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         RF_Params_4, verbose = 1, num_jobs = -1)

pickle.dump(RF_grid, open('models/RandomForest_Grid_4.p', 'wb'))

Fitting 5 folds for each of 4 candidates, totalling 20 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  20 out of  20 | elapsed:    7.7s finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.78
---
Test Score: 0.72

Best Parameters:
{'bootstrap': False, 'criterion': 'entropy', 'max_depth': 30, 'max_features': 'auto', 'max_leaf_nodes': None, 'max_samples': None, 'min_samples_leaf': 15, 'min_samples_split': 2, 'n_estimators': 10, 'random_state': 6}



## Tuning GradientBoostingClassifier

Vanilla Model:

In [3]:
GB_vanilla = GradientBoostingClassifier(random_state=6)
GB_vanilla.fit(X_train, y_train)

print(f"""GradientBoosting Vanilla Model
Train score: {GB_vanilla.score(X_train, y_train)}
      
Test score: {GB_vanilla.score(X_test, y_test)}
""")

GradientBoosting Vanilla Model
Train score: 0.7182398402222994
      
Test score: 0.7205707835248487



### 1st GB Gridsearch

In [39]:
GB_Params_0 = {
    'loss' : ['deviance', 'exponential'],
    'learning_rate' : [1, 0.5],
    'n_estimators' : [20, 30], 
    'subsample' : [0.5, 1],
    'criterion' : ['friedman_mse'],
    'min_samples_split' : [2, 10],
    'max_depth' : [1, 3],
    'random_state' : [6]
}

GB_Grid = run_gridsearch(GradientBoostingClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         GB_Params_0, verbose = 1, num_jobs = -1)

pickle.dump(GB_Grid, open('models/GradientBoost_Grid_0.p', 'wb'))

Fitting 5 folds for each of 64 candidates, totalling 320 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    8.8s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:  1.0min
[Parallel(n_jobs=-1)]: Done 320 out of 320 | elapsed:  1.9min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.73
---
Test Score: 0.73

Best Parameters:
{'criterion': 'friedman_mse', 'learning_rate': 1, 'loss': 'deviance', 'max_depth': 3, 'min_samples_split': 10, 'n_estimators': 30, 'random_state': 6, 'subsample': 1}



### 2nd GB Gridsearch

In [46]:
GB_Params_1 = {
    'loss' : ['deviance'],
    'learning_rate' : [1, 1.5],
    'n_estimators' : [30, 50], 
    'subsample' : [1, 1.5],
    'criterion' : ['friedman_mse'],
    'min_samples_split' : [10, 15],
    'max_depth' : [3, 10],
    'random_state' : [6]
}

GB_Grid = run_gridsearch(GradientBoostingClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         GB_Params_1, verbose = 1, num_jobs = -1)

pickle.dump(GB_Grid, open('models/GradientBoost_Grid_1.p', 'wb'))

Fitting 5 folds for each of 32 candidates, totalling 160 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   15.5s
[Parallel(n_jobs=-1)]: Done 160 out of 160 | elapsed:  2.6min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.74
---
Test Score: 0.73

Best Parameters:
{'criterion': 'friedman_mse', 'learning_rate': 1, 'loss': 'deviance', 'max_depth': 3, 'min_samples_split': 15, 'n_estimators': 50, 'random_state': 6, 'subsample': 1}



### 3rd GB Gridsearch

In [47]:
GB_Params_2 = {
    'loss' : ['deviance'],
    'learning_rate' : [1],
    'n_estimators' : [50, 100], 
    'subsample' : [1],
    'criterion' : ['friedman_mse'],
    'min_samples_split' : [15, 20],
    'max_depth' : [3, 5],
    'random_state' : [6]
}

GB_Grid = run_gridsearch(GradientBoostingClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         GB_Params_2, verbose = 1, num_jobs = -1)

pickle.dump(GB_Grid, open('models/GradientBoost_Grid_2.p', 'wb'))

Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:  1.4min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.74
---
Test Score: 0.73

Best Parameters:
{'criterion': 'friedman_mse', 'learning_rate': 1, 'loss': 'deviance', 'max_depth': 3, 'min_samples_split': 15, 'n_estimators': 100, 'random_state': 6, 'subsample': 1}



### 4th GB Gridsearch

In [49]:
GB_Params_3 = {
    'loss' : ['deviance'],
    'learning_rate' : [1],
    'n_estimators' : [100, 500], 
    'subsample' : [1],
    'criterion' : ['friedman_mse'],
    'min_samples_split' : [15, 18],
    'max_depth' : [3, 5],
    'random_state' : [6]
}

GB_Grid = run_gridsearch(GradientBoostingClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         GB_Params_3, verbose = 1, num_jobs = -1)

pickle.dump(GB_Grid, open('models/GradientBoost_Grid_3.p', 'wb'))

Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:  5.8min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.74
---
Test Score: 0.73

Best Parameters:
{'criterion': 'friedman_mse', 'learning_rate': 1, 'loss': 'deviance', 'max_depth': 3, 'min_samples_split': 15, 'n_estimators': 100, 'random_state': 6, 'subsample': 1}



### 5th GB Gridsearch

In [50]:
GB_Params_4 = {
    'loss' : ['deviance'],
    'learning_rate' : [1],
    'n_estimators' : [100, 200, 300], 
    'subsample' : [1],
    'criterion' : ['friedman_mse'],
    'min_samples_split' : [15, 18],
    'max_depth' : [3, 5],
    'random_state' : [6]
}

GB_Grid = run_gridsearch(GradientBoostingClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         GB_Params_4, verbose = 1, num_jobs = -1)

pickle.dump(GB_Grid, open('models/GradientBoost_Grid_4.p', 'wb'))

Fitting 5 folds for each of 12 candidates, totalling 60 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:  2.5min
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:  5.6min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.74
---
Test Score: 0.73

Best Parameters:
{'criterion': 'friedman_mse', 'learning_rate': 1, 'loss': 'deviance', 'max_depth': 3, 'min_samples_split': 15, 'n_estimators': 100, 'random_state': 6, 'subsample': 1}



### 6th GB Gridsearch (FINAL!) 

In [51]:
GB_Params_5 = {
    'loss' : ['deviance'],
    'learning_rate' : [1],
    'n_estimators' : [100, 125, 150], 
    'subsample' : [1],
    'criterion' : ['friedman_mse'],
    'min_samples_split' : [15],
    'max_depth' : [3, 5],
    'random_state' : [6]
}

GB_Grid = run_gridsearch(GradientBoostingClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         GB_Params_5, verbose = 1, num_jobs = -1)

pickle.dump(GB_Grid, open('models/GradientBoost_Grid_5.p', 'wb'))

Fitting 5 folds for each of 6 candidates, totalling 30 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:  1.8min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.74
---
Test Score: 0.73

Best Parameters:
{'criterion': 'friedman_mse', 'learning_rate': 1, 'loss': 'deviance', 'max_depth': 3, 'min_samples_split': 15, 'n_estimators': 100, 'random_state': 6, 'subsample': 1}



## Tuning BaggingClassifier

Vanilla Model:

In [5]:
Bag_vanilla = BaggingClassifier(random_state=6)

Bag_vanilla.fit(X_train, y_train)

print(f"""Bagging Vanilla Model
Train score: {Bag_vanilla.score(X_train, y_train)}
      
Test score: {Bag_vanilla.score(X_test, y_test)}
""")

Bagging Vanilla Model
Train score: 0.9724586083130716
      
Test score: 0.66971547656951



### 1st Bag GridSearch

In [87]:
Bag_Params_0 = {
    'base_estimator': [None, 'LogisticRegression', 'GaussianNB'],
    'n_estimators': [5, 10],
    'max_samples' : [1, 4],
    'max_features': [.5, 1],
    'bootstrap' : [True, False],
    'warm_start' : [True, False],
    'random_state' : [6]
}

Bag_Grid = run_gridsearch(BaggingClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         Bag_Params_0, verbose = 1, num_jobs = -1)

pickle.dump(Bag_Grid, open('models/Bagging_Grid_0.p', 'wb'))

Fitting 5 folds for each of 96 candidates, totalling 480 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    3.5s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:    8.8s
[Parallel(n_jobs=-1)]: Done 480 out of 480 | elapsed:   11.8s finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.58
---
Test Score: 0.58

Best Parameters:
{'base_estimator': None, 'bootstrap': True, 'max_features': 0.5, 'max_samples': 4, 'n_estimators': 10, 'random_state': 6, 'warm_start': True}



### 2nd Bag GridSearch

In [88]:
Bag_Params_1 = {
    'base_estimator': [None, 'LogisticRegression', 'GaussianNB'],
    'n_estimators': [10, 20, 40],
    'max_samples' : [4, 10],
    'max_features': [.5],
    'bootstrap' : [True],
    'warm_start' : [True],
    'oob_score' : [True, False],
    'random_state' : [6]
}

Bag_Grid = run_gridsearch(BaggingClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         Bag_Params_1, verbose = 1, num_jobs = -1)

pickle.dump(Bag_Grid, open('models/Bagging_Grid_1.p', 'wb'))

Fitting 5 folds for each of 36 candidates, totalling 180 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  52 tasks      | elapsed:    3.2s
[Parallel(n_jobs=-1)]: Done 165 out of 180 | elapsed:    5.5s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed:    6.1s finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.62
---
Test Score: 0.62

Best Parameters:
{'base_estimator': None, 'bootstrap': True, 'max_features': 0.5, 'max_samples': 10, 'n_estimators': 10, 'oob_score': False, 'random_state': 6, 'warm_start': True}



### 3rd Bag GridSearch

In [89]:
Bag_Params_2 = {
    'base_estimator': [None],
    'n_estimators': [10, 15, 18],
    'max_samples' : [10, 20, 30],
    'max_features': [.5],
    'bootstrap' : [True],
    'warm_start' : [True],
    'oob_score' : [False],
    'random_state' : [6]
}

Bag_Grid = run_gridsearch(BaggingClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         Bag_Params_2, verbose = 1, num_jobs = -1)

pickle.dump(Bag_Grid, open('models/Bagging_Grid_2.p', 'wb'))

Fitting 5 folds for each of 9 candidates, totalling 45 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:    4.4s finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.63
---
Test Score: 0.63

Best Parameters:
{'base_estimator': None, 'bootstrap': True, 'max_features': 0.5, 'max_samples': 30, 'n_estimators': 18, 'oob_score': False, 'random_state': 6, 'warm_start': True}



### 4th Bag GridSearch

In [90]:
Bag_Params_3 = {
    'base_estimator': [None],
    'n_estimators': [18, 19],
    'max_samples' : [35, 40, 45],
    'max_features': [0.1, .5, 1],
    'bootstrap' : [True],
    'warm_start' : [True],
    'oob_score' : [False],
    'random_state' : [6]
}

Bag_Grid = run_gridsearch(BaggingClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         Bag_Params_3, verbose = 1, num_jobs = -1)

pickle.dump(Bag_Grid, open('models/Bagging_Grid_3.p', 'wb'))

Fitting 5 folds for each of 18 candidates, totalling 90 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    3.0s
[Parallel(n_jobs=-1)]: Done  90 out of  90 | elapsed:    7.7s finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.65
---
Test Score: 0.65

Best Parameters:
{'base_estimator': None, 'bootstrap': True, 'max_features': 0.5, 'max_samples': 45, 'n_estimators': 18, 'oob_score': False, 'random_state': 6, 'warm_start': True}



### 5th Bag GridSearch

In [91]:
Bag_Params_4 = {
    'base_estimator': [None],
    'n_estimators': [18, 19],
    'max_samples' : [35, 40, 45],
    'max_features': [0.1, .5, 1],
    'bootstrap' : [True],
    'bootstrap_features' : [True, False], 
    'warm_start' : [True],
    'oob_score' : [False],
    'random_state' : [6]
}

Bag_Grid = run_gridsearch(BaggingClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         Bag_Params_4, verbose = 1, num_jobs = -1)

pickle.dump(Bag_Grid, open('models/Bagging_Grid_4.p', 'wb'))

Fitting 5 folds for each of 36 candidates, totalling 180 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    3.0s
[Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed:   15.1s finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.63
---
Test Score: 0.63

Best Parameters:
{'base_estimator': None, 'bootstrap': True, 'bootstrap_features': True, 'max_features': 0.5, 'max_samples': 35, 'n_estimators': 19, 'oob_score': False, 'random_state': 6, 'warm_start': True}



# Tuning AdaBoostClassifier

Vanilla Model:

In [6]:
ADA_vanilla = AdaBoostClassifier(random_state=6)
ADA_vanilla.fit(X_train, y_train)

print(f"""AdaBoost Vanilla Model
Train score: {ADA_vanilla.score(X_train, y_train)}
      
Test score: {ADA_vanilla.score(X_test, y_test)}
""")

AdaBoost Vanilla Model
Train score: 0.7000115780942456
      
Test score: 0.701091203797505



### 1st ADA Gridsearch

In [92]:
ADA_Params_0 = {
    'base_estimator' : [None, 'DecisionTreeClassifier(max_depth = 5)'],
    'n_estimators' : [25, 50, 100],
    'learning_rate' : [.5, 1, 1.5], 
    'algorithm' : ['SAMME', 'SAMME.R'],
    'random_state' : [6]
}

ADA_Grid = run_gridsearch(AdaBoostClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         ADA_Params_0, verbose = 1, num_jobs = -1)

pickle.dump(ADA_Grid, open('models/ADA_Grid_0.p', 'wb'))

Fitting 5 folds for each of 36 candidates, totalling 180 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   42.8s
[Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed:  2.1min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.72
---
Test Score: 0.72

Best Parameters:
{'algorithm': 'SAMME.R', 'base_estimator': None, 'learning_rate': 1.5, 'n_estimators': 100, 'random_state': 6}



### 2nd ADA Gridsearch

In [93]:
ADA_Params_1 = {
    'n_estimators' : [100, 200, 300],
    'learning_rate' : [1.5, 2, 3], 
    'algorithm' : ['SAMME.R'],
    'random_state' : [6]
}

ADA_Grid = run_gridsearch(AdaBoostClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         ADA_Params_1, verbose = 1, num_jobs = -1)

pickle.dump(ADA_Grid, open('models/ADA_Grid_1.p', 'wb'))

Fitting 5 folds for each of 9 candidates, totalling 45 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:  3.4min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.73
---
Test Score: 0.73

Best Parameters:
{'algorithm': 'SAMME.R', 'learning_rate': 1.5, 'n_estimators': 300, 'random_state': 6}



### 3rd ADA Gridsearch

In [94]:
ADA_Params_2 = {
    'n_estimators' : [300, 500, 750],
    'learning_rate' : [1.5], 
    'algorithm' : ['SAMME.R'],
    'random_state' : [6]
}

ADA_Grid = run_gridsearch(AdaBoostClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         ADA_Params_2, verbose = 1, num_jobs = -1)

pickle.dump(ADA_Grid, open('models/ADA_Grid_2.p', 'wb'))

Fitting 5 folds for each of 3 candidates, totalling 15 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  15 out of  15 | elapsed:  3.4min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.73
---
Test Score: 0.73

Best Parameters:
{'algorithm': 'SAMME.R', 'learning_rate': 1.5, 'n_estimators': 750, 'random_state': 6}



### 4th ADA Gridsearch

In [95]:
ADA_Params_3 = {
    'n_estimators' : [50, 750, 1000],
    'learning_rate' : [1.5], 
    'algorithm' : ['SAMME.R'],
    'random_state' : [6]
}

ADA_Grid = run_gridsearch(AdaBoostClassifier(), 
                         X_train, y_train,
                         X_test, y_test,
                         ADA_Params_3, verbose = 1, num_jobs = -1)

pickle.dump(ADA_Grid, open('models/ADA_Grid_3.p', 'wb'))

Fitting 5 folds for each of 3 candidates, totalling 15 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  15 out of  15 | elapsed:  4.6min finished


       Results
~~~~~~~~~~~~~~~~~~~~~

Train Score: 0.73
---
Test Score: 0.73

Best Parameters:
{'algorithm': 'SAMME.R', 'learning_rate': 1.5, 'n_estimators': 1000, 'random_state': 6}



### 5th ADA Gridsearch (FINAL!) 

In [98]:
Best_Models = []
Best_Models.append(pickle.load(open('models/ADA_Grid_3.p', 'rb')))
Best_Models.append(pickle.load(open('models/Bagging_Grid_4.p', 'rb')))
Best_Models.append(pickle.load(open('models/GradientBoost_Grid_5.p', 'rb')))
Best_Models.append(pickle.load(open('models/RandomForest_Grid_4.p', 'rb')))

Best_Models_ = {}
for model in Best_Models: 
    key = model.estimator
    value = model.best_params_
    
    Best_Models_[key] = value
    
pickle.dump(Best_Models_, open('models/BestTunedClassifiers.p', 'wb'))