# Tuning Hyperparms

We will do a simple example of tuning hyperparameters using sklearn's [model_selection.GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

## get data

We will use the SST-2 (Stanford Sentiment Treebank) data set.

The input features are short sentences and the labels are the standard sentiment polarity of:
*    0 for negative 
*    1 for positive.

In [2]:
%%bash
python3 ./glue_examples/download_glue_data.py --data_dir ./glue_examples//glue_data --tasks SST 

Downloading and extracting SST...
	Completed!


In [1]:
import os
import math
import random
import csv
import sys

import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.metrics import classification_report

from bert_sklearn import BertClassifier
from bert_sklearn import BertRegressor
from bert_sklearn import load_model

DATADIR = './glue_examples/glue_data'


def get_sst_data(train_file=DATADIR + '/SST-2/train.tsv',
                 dev_file=DATADIR + '/SST-2/dev.tsv'):
    
    train = pd.read_csv(train_file, sep='\t', encoding = 'utf8', keep_default_na=False)
    train.columns=['text','label']
    print("SST-2 train data size: %d "%(len(train)))
    
    dev = pd.read_csv(dev_file, sep='\t', encoding = 'utf8', keep_default_na=False)
    dev.columns=['text', 'label']
    print("SST-2 dev data size: %d "%(len(dev)))
    label_list = np.unique(train['label'])
    
    return train,dev,label_list

train,dev,label_list = get_sst_data()

# subsample data for demo
train = train.sample(1000, random_state=42)

X_train = train['text']
y_train = train['label']

X_dev = dev['text']
y_dev = dev['label']

train.head()

SST-2 train data size: 67349 
SST-2 dev data size: 872 


Unnamed: 0,text,label
66730,with outtakes in which most of the characters ...,0
29890,enigma is well-made,1
45801,is ) so stoked to make an important film about...,0
29352,the closest thing to the experience of space t...,1
19858,lose their luster,0


## do  grid search

Suppose we want to tune over some the hyperparameters mentioned in the paper:

* **`epochs`** in  [3, 4]


* **`learning rate`** in  [2e-5, 3e-5, 5e-5]

In [3]:
%%time
from sklearn.model_selection import GridSearchCV

params = {'epochs':[3, 4], 'learning_rate':[2e-5, 3e-5, 5e-5]}

# wrap classifier/regressor in GridSearchCV
clf = GridSearchCV(BertClassifier(validation_fraction=0,max_seq_length=64), 
                    params,
                    scoring='accuracy',
                    verbose=True)

# fit gridsearch 
clf.fit(X_train ,y_train)

Building sklearn text classifier...
Building sklearn text classifier...
Fitting 3 folds for each of 6 candidates, totalling 18 fits
Building sklearn text classifier...


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 666, validation data size: 0


Training: 100%|██████████| 21/21 [00:09<00:00,  2.30it/s, loss=0.704]
Training: 100%|██████████| 21/21 [00:09<00:00,  2.48it/s, loss=0.696]
Training: 100%|██████████| 21/21 [00:09<00:00,  2.44it/s, loss=0.694]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:09<00:00,  2.37it/s, loss=0.607]
Training: 100%|██████████| 21/21 [00:09<00:00,  2.40it/s, loss=0.169]
Training: 100%|██████████| 21/21 [00:09<00:00,  2.43it/s, loss=0.0286]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:09<00:00,  2.38it/s, loss=0.649]
Training: 100%|██████████| 21/21 [00:09<00:00,  2.38it/s, loss=0.254]
Training: 100%|██████████| 21/21 [00:09<00:00,  1.99it/s, loss=0.0918]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 666, validation data size: 0


Training: 100%|██████████| 21/21 [00:09<00:00,  2.43it/s, loss=0.668]
Training: 100%|██████████| 21/21 [00:10<00:00,  1.64it/s, loss=0.524]
Training: 100%|██████████| 21/21 [00:10<00:00,  2.05it/s, loss=0.212]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:09<00:00,  2.19it/s, loss=0.597]
Training: 100%|██████████| 21/21 [00:12<00:00,  1.77it/s, loss=0.246]
Training: 100%|██████████| 21/21 [00:12<00:00,  1.35it/s, loss=0.0574]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:10<00:00,  2.01it/s, loss=0.588]
Training: 100%|██████████| 21/21 [00:13<00:00,  1.42it/s, loss=0.233]
Training: 100%|██████████| 21/21 [00:11<00:00,  1.81it/s, loss=0.0708]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 666, validation data size: 0


Training: 100%|██████████| 21/21 [00:11<00:00,  1.71it/s, loss=0.705]
Training: 100%|██████████| 21/21 [00:12<00:00,  1.71it/s, loss=0.707]
Training: 100%|██████████| 21/21 [00:13<00:00,  1.47it/s, loss=0.695]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:11<00:00,  1.84it/s, loss=0.585]
Training: 100%|██████████| 21/21 [00:13<00:00,  1.52it/s, loss=0.316]
Training: 100%|██████████| 21/21 [00:14<00:00,  1.53it/s, loss=0.121]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:11<00:00,  1.56it/s, loss=0.711]
Training: 100%|██████████| 21/21 [00:14<00:00,  1.54it/s, loss=0.46] 
Training: 100%|██████████| 21/21 [00:14<00:00,  1.35it/s, loss=0.209]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 666, validation data size: 0


Training: 100%|██████████| 21/21 [00:12<00:00,  1.68it/s, loss=0.641]
Training: 100%|██████████| 21/21 [00:15<00:00,  1.47it/s, loss=0.265]
Training: 100%|██████████| 21/21 [00:13<00:00,  1.60it/s, loss=0.0989]
Training: 100%|██████████| 21/21 [00:15<00:00,  1.53it/s, loss=0.0437]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:11<00:00,  1.68it/s, loss=0.614]
Training: 100%|██████████| 21/21 [00:16<00:00,  1.41it/s, loss=0.229]
Training: 100%|██████████| 21/21 [00:14<00:00,  1.56it/s, loss=0.0683]
Training: 100%|██████████| 21/21 [00:15<00:00,  1.33it/s, loss=0.0119]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:14<00:00,  1.30it/s, loss=0.597]
Training: 100%|██████████| 21/21 [00:14<00:00,  1.22it/s, loss=0.416]
Training: 100%|██████████| 21/21 [00:17<00:00,  1.40it/s, loss=0.132]
Training: 100%|██████████| 21/21 [00:16<00:00,  1.23it/s, loss=0.0791]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 666, validation data size: 0


Training: 100%|██████████| 21/21 [00:11<00:00,  1.69it/s, loss=0.693]
Training: 100%|██████████| 21/21 [00:17<00:00,  1.21it/s, loss=0.48] 
Training: 100%|██████████| 21/21 [00:16<00:00,  1.30it/s, loss=0.23] 
Training: 100%|██████████| 21/21 [00:16<00:00,  1.32it/s, loss=0.0888]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:11<00:00,  1.72it/s, loss=0.675]
Training: 100%|██████████| 21/21 [00:15<00:00,  1.35it/s, loss=0.565]
Training: 100%|██████████| 21/21 [00:15<00:00,  1.58it/s, loss=0.254]
Training: 100%|██████████| 21/21 [00:17<00:00,  1.15it/s, loss=0.0718]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:11<00:00,  1.62it/s, loss=0.62] 
Training: 100%|██████████| 21/21 [00:16<00:00,  1.29it/s, loss=0.251]
Training: 100%|██████████| 21/21 [00:13<00:00,  1.47it/s, loss=0.0747]
Training: 100%|██████████| 21/21 [00:17<00:00,  1.34it/s, loss=0.0156] 
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 666, validation data size: 0


Training: 100%|██████████| 21/21 [00:12<00:00,  1.81it/s, loss=0.693]
Training: 100%|██████████| 21/21 [00:16<00:00,  1.31it/s, loss=0.635]
Training: 100%|██████████| 21/21 [00:15<00:00,  1.57it/s, loss=0.404]
Training: 100%|██████████| 21/21 [00:14<00:00,  1.57it/s, loss=0.233]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:11<00:00,  1.72it/s, loss=0.617]
Training: 100%|██████████| 21/21 [00:15<00:00,  1.37it/s, loss=0.419]
Training: 100%|██████████| 21/21 [00:16<00:00,  1.36it/s, loss=0.187]
Training: 100%|██████████| 21/21 [00:15<00:00,  1.32it/s, loss=0.0831]
                                                           

Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 667, validation data size: 0


Training: 100%|██████████| 21/21 [00:13<00:00,  1.52it/s, loss=0.709]
Training: 100%|██████████| 21/21 [00:15<00:00,  1.42it/s, loss=0.536]
Training: 100%|██████████| 21/21 [00:15<00:00,  1.43it/s, loss=0.192]
Training: 100%|██████████| 21/21 [00:15<00:00,  1.44it/s, loss=0.0655]
[Parallel(n_jobs=1)]: Done  18 out of  18 | elapsed: 18.4min finished


Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 1000, validation data size: 0


Training: 100%|██████████| 32/32 [00:21<00:00,  1.58it/s, loss=0.571]
Training: 100%|██████████| 32/32 [00:22<00:00,  1.84it/s, loss=0.248]
Training: 100%|██████████| 32/32 [00:25<00:00,  1.66it/s, loss=0.0716]
Training: 100%|██████████| 32/32 [00:21<00:00,  1.59it/s, loss=0.0343]

CPU times: user 12min 35s, sys: 7min 3s, total: 19min 39s
Wall time: 20min





## results

In [10]:
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']

for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))
        
# best scores
print("\nBest score:", clf.best_score_,"with params:", clf.best_params_)        

0.767 (+/-0.306) for {'epochs': 3, 'learning_rate': 2e-05}
0.873 (+/-0.015) for {'epochs': 3, 'learning_rate': 3e-05}
0.751 (+/-0.284) for {'epochs': 3, 'learning_rate': 5e-05}
0.876 (+/-0.015) for {'epochs': 4, 'learning_rate': 2e-05}
0.866 (+/-0.040) for {'epochs': 4, 'learning_rate': 3e-05}
0.841 (+/-0.017) for {'epochs': 4, 'learning_rate': 5e-05}

Best score: 0.876 with params: {'epochs': 4, 'learning_rate': 2e-05}
