### Introduction: using LALE wrappers for hyperparameter tuning with Hyperopt

In this notebook we will walk through the steps involved in using LogisticRegression from LALE for hyperparameter tuning using Hyperopt. We demonstrate this for classification of the Iris dataset.

The main steps involved are as follows:

1. Setup the optimization objective function to be used with Hyperopt.
2. Access the hyper-parameter search space for LALE's LogisticRegression and use it within the optimization objective.
3. Run hyperopt on the given dataset to obtain the best hyperparameter combination.

In [1]:
from lale.lib.sklearn import LogisticRegression
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials, space_eval
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

#### Dataset
We use the Iris dataset from scikit-learn for this example, we prepare the train and test data as follows:

In [2]:
from lale.datasets import load_iris_df
import sklearn.utils
(X_train, y_train), (X_test, y_test) = load_iris_df()

print('Unique labels in test set:\n{}'.format(y_test.unique()))

Unique labels in test set:
[2 1 0]


#### Hyperopt objective function

This example uses classification accuracy on a random train-test split as our performance metric. Hyperopt expects a minimization objective, so we use negation of accuracy as the loss to be minimized. 

In [3]:
def objective(params):
    acc = hyperopt_train_test(params.copy())
    return {'loss': -acc, 'status': STATUS_OK}

def hyperopt_train_test(params):
    t = params['name']
    del params['name']
    clf = get_classifier(t, params)
    X_train_part, X_validation, y_train_part, y_validation = train_test_split(X_train, y_train, test_size=0.20)
    clf_trained = clf.fit(X_train_part, y_train_part)
    predictions = clf_trained.predict(X_validation)
    accuracy = accuracy_score(y_validation, [round(pred) for pred in predictions])
    return accuracy

def get_classifier(t, param_dict):
    if 'LogisticRegression' in t:
        return LogisticRegression(**param_dict)

#### Set the hyperparameter search space

This is the step where function 'hyperopt_search_space' from a LALE wrapper can be used to obtain the relevant hyperparameters and their appropriate range of values as hyperopt expressions.

In [4]:
from lale.search.op2hp import hyperopt_search_space
#Supress warnings from sklearn
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
from sklearn.exceptions import ConvergenceWarning
warnings.filterwarnings("ignore", category=ConvergenceWarning)
from scipy.optimize.linesearch import LineSearchWarning
warnings.filterwarnings("ignore", category=LineSearchWarning)

In [5]:
search_space = hp.choice('classifier', [hyperopt_search_space(LogisticRegression())])

In [6]:
trials = Trials()
fmin(objective, search_space, algo=tpe.suggest, max_evals=50, trials=trials)
best_params = space_eval(search_space, trials.argmin)
print('Best hyperparameter combination:', best_params)

Best hyperparameter combination: {'C': 20215.05909343411, 'class_weight': 'balanced', 'dual': False, 'fit_intercept': True, 'multi_class': 'ovr', 'name': 'lale.lib.sklearn.logistic_regression.LogisticRegression', 'penalty': 'l2', 'solver': 'liblinear', 'tol': 0.09315307697480668}


In [7]:
def eval_on_best(params):
    t = params['name']
    del params['name']
    clf = get_classifier(t, params)
    clf_trained = clf.fit(X_train, y_train)
    predictions = clf_trained.predict(X_test)
    accuracy = accuracy_score(y_test, [round(pred) for pred in predictions])
    return accuracy

In [8]:
test_accuracy = eval_on_best(best_params)
print('Test Accuracy:', test_accuracy)

Test Accuracy: 1.0
