## Simple Tutorial using `hyperopt`

In this simple tutorial, we show how to use hyperopt on the well known Iris dataset from scikit-learn. We use a neural network as the model, but any model works.

In [1]:
from __future__ import print_function, absolute_import, division, unicode_literals, with_statement

from hypopt import GridSearch
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler 
from sklearn.model_selection import train_test_split

# Neural Network imports (simple sklearn Neural Network)
from sklearn.neural_network import MLPClassifier
# Silence neural network SGD convergence warnings.
from sklearn.exceptions import ConvergenceWarning
import warnings
warnings.filterwarnings('ignore', category=ConvergenceWarning)

In [2]:
param_grid = {
    'learning_rate': ["constant", "adaptive"],
    'hidden_layer_sizes': [(100,20), (500,20), (20,10), (50,20), (4,2)],
    'alpha': [0.0001], # minimal effect
    'warm_start': [False], # minimal effect
    'momentum': [0.1, 0.9], # minimal effect
    'learning_rate_init': [0.001, 0.01, 1],
    'max_iter': [50],
    'random_state': [0],
    'activation': ['relu'],
}

In [3]:
# Get data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris["data"], iris["target"], test_size = 0.3, random_state = 0)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, stratify=y_train,test_size = 0.3, random_state = 0)
print('Set sizes:', len(X_train), '(train),', len(X_val), '(val),', len(X_test), '(test)')

# Normalize data 
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)  
X_val = scaler.transform(X_val)  
X_test = scaler.transform(X_test) 

Set sizes: 73 (train), 32 (val), 45 (test)


StandardScaler(copy=True, with_mean=True, with_std=True)

### Grid-search time comparison using validation set versus cross-validation. 
### The hyperopt package automatically distributes work on all CPU threads regardless of if you use a validation set or cross-validation.

In [4]:
print("First let's try the neural network with default parameters.")
default = MLPClassifier(max_iter=50, random_state=0)
default.fit(X_train, y_train)
test_score = round(default.score(X_test, y_test), 4)
val_score = round(default.score(X_val, y_val), 4)
print('\nTEST SCORE (default parameters):', test_score)
print('VALIDATION SCORE (default parameters):', val_score)

First let's try the neural network with default parameters.


MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=50, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=0, shuffle=True,
       solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)


TEST SCORE (default parameters): 0.8444
VALIDATION SCORE (default parameters): 0.9063


In [5]:
gs_val = GridSearch(model = MLPClassifier(max_iter=50, random_state=0))
print("Grid-search using a validation set.\n","-"*79)
%time gs_val.fit(X_train, y_train, param_grid, X_val, y_val, scoring = 'accuracy')
test_score = round(gs_val.score(X_test, y_test), 4)
val_score = round(gs_val.score(X_val, y_val), 4)
print('\nTEST SCORE (hyper-parameter optimization with validation set):', test_score)
print('VALIDATION SCORE (hyper-parameter optimization with validation set):', val_score)

Grid-search using a validation set.
 -------------------------------------------------------------------------------
Comparing 60 parameter setting(s) using 4 CPU thread(s) ( 15 job(s) per thread ).
CPU times: user 20 ms, sys: 23.9 ms, total: 43.9 ms
Wall time: 844 ms


MLPClassifier(activation=u'relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100, 20), learning_rate=u'constant',
       learning_rate_init=0.01, max_iter=50, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=0, shuffle=True,
       solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)


TEST SCORE (hyper-parameter optimization with validation set): 0.9556
VALIDATION SCORE (hyper-parameter optimization with validation set): 0.9375


In [6]:
gs_cv = GridSearch(model = MLPClassifier(max_iter=50, random_state=0), cv_folds=6)
print("\n\nLet's see how long grid-search takes to run when we don't use a validation set.")
print("Grid-search using cross-validation.\n","-"*79)
%time gs_cv.fit(X_train, y_train, param_grid)
test_score = round(gs_cv.score(X_test, y_test), 4)
val_score = round(gs_cv.score(X_val, y_val), 4)
print('\nTEST SCORE (hyper-parameter optimization with cross-validation):', test_score)
print('VALIDATION SCORE (hyper-parameter optimization with cross-validation):', val_score)
print('''\nNote that although its slower, cross-validation has many benefits (e.g. uses all
your training data). Thats why hyperopt also supports cross-validation when no validation 
set is provided as in the example above.''')



Let's see how long grid-search takes to run when we don't use a validation set.
Grid-search using cross-validation.
 -------------------------------------------------------------------------------
CPU times: user 1.07 s, sys: 76.3 ms, total: 1.15 s
Wall time: 4.01 s


MLPClassifier(activation=u'relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100, 20), learning_rate=u'constant',
       learning_rate_init=0.01, max_iter=50, momentum=0.1,
       nesterovs_momentum=True, power_t=0.5, random_state=0, shuffle=True,
       solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)


TEST SCORE (hyper-parameter optimization with cross-validation): 0.9556
VALIDATION SCORE (hyper-parameter optimization with cross-validation): 0.9375

Note that although its slower, cross-validation has many benefits (e.g. uses all
your training data). Thats why hyperopt also supports cross-validation when no validation 
set is provided as in the example above.


In [7]:
print('We can view the best performing parameters and their scores.')
for z in gs_val.get_param_scores()[:2]:
    p, s = z
    print(p)
    print('Score:', s)
print()
print('Verify that the lowest scoring parameters make sense.')
for z in gs_val.get_param_scores()[-2:]:
    p, s = z
    print(p)
    print('Score:', s)
print('\nAlas, these did poorly because the hidden layers are too small (4,2) and the learning rate is too high (1).')
print('Also, note that some of the scores are the same. With hyperopt, you can view all scores and settings to see which parameters really matter.')

We can view the best performing parameters and their scores.
{u'warm_start': False, u'hidden_layer_sizes': (100, 20), u'activation': u'relu', u'max_iter': 50, u'random_state': 0, u'momentum': 0.9, u'alpha': 0.0001, u'learning_rate': u'adaptive', u'learning_rate_init': 0.01}
Score: 0.9375
{u'warm_start': False, u'hidden_layer_sizes': (50, 20), u'activation': u'relu', u'max_iter': 50, u'random_state': 0, u'momentum': 0.1, u'alpha': 0.0001, u'learning_rate': u'adaptive', u'learning_rate_init': 0.01}
Score: 0.9375

Verify that the lowest scoring parameters make sense.
{u'warm_start': False, u'hidden_layer_sizes': (4, 2), u'activation': u'relu', u'max_iter': 50, u'random_state': 0, u'momentum': 0.9, u'alpha': 0.0001, u'learning_rate': u'constant', u'learning_rate_init': 1}
Score: 0.3125
{u'warm_start': False, u'hidden_layer_sizes': (4, 2), u'activation': u'relu', u'max_iter': 50, u'random_state': 0, u'momentum': 0.9, u'alpha': 0.0001, u'learning_rate': u'adaptive', u'learning_rate_init': 1}