# tuning hyperparameters

We will do a simple example of tuning hyperparameters using sklearn's [model_selection.GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

## get data

We will use the SST-2 (Stanford Sentiment Treebank) data set.

The input features are short sentences and the labels are the standard sentiment polarity of:
*    0 for negative 
*    1 for positive.

In [1]:
%%bash
python3 ./glue_examples/download_glue_data.py --data_dir ./glue_examples//glue_data --tasks SST 

Downloading and extracting SST...
	Completed!


In [1]:
import os
import math
import random
import csv
import sys

import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.metrics import classification_report

from bert_sklearn import BertClassifier
from bert_sklearn import BertRegressor
from bert_sklearn import load_model

DATADIR = './glue_examples/glue_data'


def get_sst_data(train_file=DATADIR + '/SST-2/train.tsv',
                 dev_file=DATADIR + '/SST-2/dev.tsv'):
    
    train = pd.read_csv(train_file, sep='\t', encoding = 'utf8', keep_default_na=False)
    train.columns=['text','label']
    print("SST-2 train data size: %d "%(len(train)))
    
    dev = pd.read_csv(dev_file, sep='\t', encoding = 'utf8', keep_default_na=False)
    dev.columns=['text', 'label']
    print("SST-2 dev data size: %d "%(len(dev)))
    label_list = np.unique(train['label'])
    
    return train,dev,label_list

train,dev,label_list = get_sst_data()

# subsample data for demo
train = train.sample(1000, random_state=42)

X_train = train['text']
y_train = train['label']

X_dev = dev['text']
y_dev = dev['label']

train.head()

SST-2 train data size: 67349 
SST-2 dev data size: 872 


Unnamed: 0,text,label
66730,with outtakes in which most of the characters ...,0
29890,enigma is well-made,1
45801,is ) so stoked to make an important film about...,0
29352,the closest thing to the experience of space t...,1
19858,lose their luster,0


## do  grid search

Suppose we want to tune over some the hyperparameters mentioned in the paper:

* **`epochs`** in  [3, 4]


* **`learning rate`** in  [2e-5, 3e-5, 5e-5]

In [2]:
%%time
from sklearn.model_selection import GridSearchCV

params = {'epochs':[3, 4], 'learning_rate':[2e-5, 3e-5, 5e-5]}

# wrap classifier/regressor in GridSearchCV
clf = GridSearchCV(BertClassifier(validation_fraction=0, max_seq_length=64), 
                   params,
                   cv=3,
                   scoring='accuracy',
                   verbose=True)

# fit gridsearch 
clf.fit(X_train ,y_train)

Building sklearn text classifier...
Building sklearn text classifier...
Fitting 3 folds for each of 6 candidates, totalling 18 fits
Building sklearn text classifier...


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 666, validation data size: 0


Training  : 100%|██████████| 21/21 [00:08<00:00,  2.48it/s, loss=0.66] 
Training  : 100%|██████████| 21/21 [00:08<00:00,  2.48it/s, loss=0.256]
Training  : 100%|██████████| 21/21 [00:08<00:00,  2.49it/s, loss=0.104]
Predicting: 100%|██████████| 42/42 [00:01<00:00, 27.11it/s]
Predicting: 100%|██████████| 84/84 [00:03<00:00, 27.72it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.40it/s, loss=0.612]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.33it/s, loss=0.183]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.31it/s, loss=0.0299]
Predicting: 100%|██████████| 42/42 [00:01<00:00, 26.64it/s]
Predicting: 100%|██████████| 84/84 [00:02<00:00, 28.79it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:08<00:00,  2.44it/s, loss=0.673]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.41it/s, loss=0.29] 
Training  : 100%|██████████| 21/21 [00:08<00:00,  2.43it/s, loss=0.117]
Predicting: 100%|██████████| 42/42 [00:01<00:00, 29.16it/s]
Predicting: 100%|██████████| 84/84 [00:02<00:00, 28.01it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 666, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.25it/s, loss=0.664]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.30it/s, loss=0.416]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.33it/s, loss=0.189]
Predicting: 100%|██████████| 42/42 [00:01<00:00, 24.68it/s]
Predicting: 100%|██████████| 84/84 [00:03<00:00, 26.68it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.29it/s, loss=0.66] 
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.41it/s, loss=0.245]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.40it/s, loss=0.0903]
Predicting: 100%|██████████| 42/42 [00:01<00:00, 27.08it/s]
Predicting: 100%|██████████| 84/84 [00:02<00:00, 28.25it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.39it/s, loss=0.595]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.40it/s, loss=0.312]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.41it/s, loss=0.0778]
Predicting: 100%|██████████| 42/42 [00:01<00:00, 26.67it/s]
Predicting: 100%|██████████| 84/84 [00:02<00:00, 28.03it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 666, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.44it/s, loss=0.711]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.44it/s, loss=0.699]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.36it/s, loss=0.691]
Predicting: 100%|██████████| 42/42 [00:01<00:00, 26.19it/s]
Predicting: 100%|██████████| 84/84 [00:03<00:00, 27.56it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.41it/s, loss=0.677]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.36it/s, loss=0.444]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.36it/s, loss=0.14] 
Predicting: 100%|██████████| 42/42 [00:01<00:00, 25.64it/s]
Predicting: 100%|██████████| 84/84 [00:03<00:00, 24.98it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.36it/s, loss=0.656]
Training  : 100%|██████████| 21/21 [00:10<00:00,  2.25it/s, loss=0.341]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.21it/s, loss=0.0817]
Predicting: 100%|██████████| 42/42 [00:01<00:00, 25.65it/s]
Predicting: 100%|██████████| 84/84 [00:03<00:00, 25.22it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 666, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.38it/s, loss=0.662]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.21it/s, loss=0.295]
Training  : 100%|██████████| 21/21 [00:10<00:00,  2.11it/s, loss=0.112]
Training  : 100%|██████████| 21/21 [00:09<00:00,  2.12it/s, loss=0.0482]
Predicting: 100%|██████████| 42/42 [00:01<00:00, 23.38it/s]
Predicting: 100%|██████████| 84/84 [00:03<00:00, 23.09it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.20it/s, loss=0.614]
Training  : 100%|██████████| 21/21 [00:10<00:00,  2.05it/s, loss=0.25] 
Training  : 100%|██████████| 21/21 [00:10<00:00,  2.02it/s, loss=0.0705]
Training  : 100%|██████████| 21/21 [00:10<00:00,  1.75it/s, loss=0.0114]
Predicting: 100%|██████████| 42/42 [00:02<00:00, 16.62it/s]
Predicting: 100%|██████████| 84/84 [00:03<00:00, 21.29it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.21it/s, loss=0.619]
Training  : 100%|██████████| 21/21 [00:10<00:00,  2.05it/s, loss=0.268]
Training  : 100%|██████████| 21/21 [00:10<00:00,  1.98it/s, loss=0.0857]
Training  : 100%|██████████| 21/21 [00:11<00:00,  1.94it/s, loss=0.0187]
Predicting: 100%|██████████| 42/42 [00:01<00:00, 22.29it/s]
Predicting: 100%|██████████| 84/84 [00:03<00:00, 22.63it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 666, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.23it/s, loss=0.697]
Training  : 100%|██████████| 21/21 [00:10<00:00,  2.02it/s, loss=0.575]
Training  : 100%|██████████| 21/21 [00:11<00:00,  1.89it/s, loss=0.305]
Training  : 100%|██████████| 21/21 [00:11<00:00,  1.84it/s, loss=0.155]
Predicting: 100%|██████████| 42/42 [00:01<00:00, 22.15it/s]
Predicting: 100%|██████████| 84/84 [00:03<00:00, 22.83it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:10<00:00,  2.04it/s, loss=0.669]
Training  : 100%|██████████| 21/21 [00:10<00:00,  1.88it/s, loss=0.404]
Training  : 100%|██████████| 21/21 [00:12<00:00,  1.78it/s, loss=0.116]
Training  : 100%|██████████| 21/21 [00:11<00:00,  1.92it/s, loss=0.0353]
Predicting: 100%|██████████| 42/42 [00:02<00:00, 21.03it/s]
Predicting: 100%|██████████| 84/84 [00:04<00:00, 20.64it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:10<00:00,  2.10it/s, loss=0.608]
Training  : 100%|██████████| 21/21 [00:13<00:00,  1.66it/s, loss=0.32] 
Training  : 100%|██████████| 21/21 [00:11<00:00,  1.76it/s, loss=0.118]
Training  : 100%|██████████| 21/21 [00:12<00:00,  1.74it/s, loss=0.0226]
Predicting: 100%|██████████| 42/42 [00:02<00:00, 19.44it/s]
Predicting: 100%|██████████| 84/84 [00:04<00:00, 22.29it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 666, validation data size: 0


Training  : 100%|██████████| 21/21 [00:11<00:00,  1.74it/s, loss=0.66] 
Training  : 100%|██████████| 21/21 [00:11<00:00,  2.10it/s, loss=0.536]
Training  : 100%|██████████| 21/21 [00:11<00:00,  1.93it/s, loss=0.382]
Training  : 100%|██████████| 21/21 [00:11<00:00,  1.94it/s, loss=0.16] 
Predicting: 100%|██████████| 42/42 [00:02<00:00, 18.55it/s]
Predicting: 100%|██████████| 84/84 [00:04<00:00, 19.45it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:09<00:00,  2.17it/s, loss=0.67] 
Training  : 100%|██████████| 21/21 [00:11<00:00,  1.79it/s, loss=0.519]
Training  : 100%|██████████| 21/21 [00:11<00:00,  2.00it/s, loss=0.254]
Training  : 100%|██████████| 21/21 [00:11<00:00,  1.78it/s, loss=0.0843]
Predicting: 100%|██████████| 42/42 [00:02<00:00, 17.41it/s]
Predicting: 100%|██████████| 84/84 [00:04<00:00, 17.85it/s]


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 667, validation data size: 0


Training  : 100%|██████████| 21/21 [00:11<00:00,  2.08it/s, loss=0.643]
Training  : 100%|██████████| 21/21 [00:12<00:00,  1.75it/s, loss=0.345]
Training  : 100%|██████████| 21/21 [00:11<00:00,  1.73it/s, loss=0.0949]
Training  : 100%|██████████| 21/21 [00:11<00:00,  1.87it/s, loss=0.0323]
Predicting: 100%|██████████| 42/42 [00:02<00:00, 20.51it/s]
Predicting: 100%|██████████| 84/84 [00:04<00:00, 20.89it/s]
[Parallel(n_jobs=1)]: Done  18 out of  18 | elapsed: 13.6min finished


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 1000, validation data size: 0


Training  : 100%|██████████| 32/32 [00:17<00:00,  2.15it/s, loss=0.565]
Training  : 100%|██████████| 32/32 [00:18<00:00,  2.13it/s, loss=0.246]
Training  : 100%|██████████| 32/32 [00:19<00:00,  2.10it/s, loss=0.0835]
Training  : 100%|██████████| 32/32 [00:18<00:00,  2.08it/s, loss=0.0355]

CPU times: user 9min 12s, sys: 5min 12s, total: 14min 24s
Wall time: 14min 50s





## results

In [3]:
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']

for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))
        
# best scores
print("\nBest score:", clf.best_score_,"with params:", clf.best_params_)        

0.869 (+/-0.013) for {'epochs': 3, 'learning_rate': 2e-05}
0.868 (+/-0.034) for {'epochs': 3, 'learning_rate': 3e-05}
0.750 (+/-0.282) for {'epochs': 3, 'learning_rate': 5e-05}
0.880 (+/-0.005) for {'epochs': 4, 'learning_rate': 2e-05}
0.869 (+/-0.022) for {'epochs': 4, 'learning_rate': 3e-05}
0.856 (+/-0.000) for {'epochs': 4, 'learning_rate': 5e-05}

Best score: 0.88 with params: {'epochs': 4, 'learning_rate': 2e-05}
