In [None]:
!pip install -r requirements.txt

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression, Lasso, LogisticRegression, Ridge, ElasticNet
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, r2_score, roc_curve
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
# from evolutionary_search import EvolutionaryAlgorithmSearchCV # due to version-conflicts we can't use this in our environment
from smac import HyperparameterOptimizationFacade, Scenario
from ConfigSpace import Configuration, ConfigurationSpace
from ConfigSpace import Float as FloatSMAC
from ConfigSpace import Categorical as CategoricalSMAC

In [3]:
# load  and preprocess data data
data = pd.read_csv('../data/credit-data/prepared_data.csv', index_col=0)

# extract labels
y = data['Credit_Score'].to_numpy()
X = data.drop(columns=['Credit_Score']).to_numpy()
y[y == 2] = 1
# split data into train- and test-set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
x_scaler = StandardScaler()

X_train = x_scaler.fit_transform(X_train)
X_test = x_scaler.transform(X_test)

# Hyperparameter Optimization
Machine Learning models come with hyperparameters that have to be set before we are able to train the model and which have influence on the final model performance. For example, a Random Forest has to know how many trees will be included in the ensemble, a Linear Regression has to know the learning rate GD will update the weights with and so on. In general, hyperparameters cannot be optimized during optimization of the model-parameters themselves. This is simply because hyperparameters *shape* the basic model and dictate how optimization algorithms should behave during optimization, thus optimizing hyperparameters and model-parameters jointly is not an easy task to do.

## The Objective
The objective in HO can be formalized as follows:
Given a hyperparameter-search space $\mathcal{H}$ and a task $T$ associated with some evaluation metric $e_T$, we aim to solve
\begin{equation}
 \arg \min_{h \in \mathcal{H}} e_T(h)
\end{equation}
This formlization is very general, but it has to be that general and is one of the reasons why HO is such a hard task to perform. For instance, $\mathcal{H}$ is often very heterogenous, e.g. even picking the right model to perform $T$ can be considered to be a hyperparameter we can optimize for. In the following we will consider a simpler problem with more assumptions:

We will assume we have given a model $M$ with hyperparameter-space $\mathcal{H}_M$, a task $\langle \mathbf{X}, \mathbf{y}, l\rangle$ where $\mathbf{X}$ are features, $\mathbf{y}$ are labels and $l$ is an evaluation metric (e.g. the loss of our model). This means we only consider supervised learning problems with a given model and want to optimize w.r.t. one evaluation metric.

Specifically, we will look at our Logistic Regression model again and try to find better hyperparameters for it! Our search-space will contain the learning rate, regularization parameter, number of epochs of training, 

In [14]:
model = LogisticRegression(solver='saga')
parameter_space = dict(C=uniform(1e-5, 4), penalty=['l1', 'l2', 'elasticnet'], max_iter=[50, 100, 200, 500], l1_ratio=uniform(0, 1))
clf = RandomizedSearchCV(model, parameter_space, n_iter=30, scoring='f1')
search = clf.fit(X, y) # we pass X and y instead of X_train and y_train here because sklearn will do cross-validation for us automatically
search.best_params_



{'C': 0.13760439444028533,
 'l1_ratio': 0.8844665691278802,
 'max_iter': 50,
 'penalty': 'l1'}

Let's see how it performs on our test set

In [16]:
lr = LogisticRegression(solver='saga', **search.best_params_)
lr.fit(X_train, y_train)
y_hat = lr.predict(X_test)
acc = accuracy_score(y_hat, y_test)
acc



0.7457180500658761

Indeed, we were able to achieve a slightly better accuracy (around +1.5%)!

In [17]:
f1 = f1_score(y_hat, y_test)
f1

0.8329004329004329

> **Task**
> 
> Try to incorporate more hyperparameters in the Random Search (RS) and play around with the parameters of the RS itself! Can you beat the result above?

## Bayesian Optimization
Another approach on HO is Bayesian Optimization (BO). The basic idea is to use information about the aim to optimize we have obtained in prior evaluations of this function. In contrast to RS which just randomly samples configurations from a search space and evaluates the model under these configurations, BO tries to estimate which configurations work best based on the current knowledge we have obtained. This works by initializing a configuration at random, evaluate it and update the prior distribution over configurations. This way we obtain a posterior distribution over the configuration space which can then be used to maximize a so called *utilization* function. For example one could use the Expected Improvement (EI) in order to choose the next configuration we want to evaluate. This way BO converges pretty fast toward good solutions.

In [12]:
lr = LogisticRegression(solver='saga')
parameter_space = dict(C=Real(1e-5, 4, prior='uniform'), penalty=Categorical(['l1', 'l2', 'elasticnet']), max_iter=Categorical([50, 100, 200, 500]), l1_ratio=Real(0, 1, prior='uniform'))
opt = BayesSearchCV(lr, parameter_space, n_iter=30)
search = opt.fit(X, y)
search.best_params_



OrderedDict([('C', 3.1629063275439733),
             ('l1_ratio', 0.8352478097808362),
             ('max_iter', 200),
             ('penalty', 'l2')])

In [18]:
lr = LogisticRegression(solver='saga', **search.best_params_)
lr.fit(X_train, y_train)
y_hat = lr.predict(X_test)
acc = accuracy_score(y_hat, y_test)
acc



0.7457180500658761

In [19]:
f1 = f1_score(y_hat, y_test)
f1

0.8329004329004329

## SMAC
Sequential Model-Based Algorothm Configuration (SMAC) is an algorithm developed to identify good configurations of parameterized algorithms. This is a more general problem than HO, but it is closely related. In HO the parameterized algorithm is the learning algorithm (which contains a model also being parameterized). SMAC is a Bayesian Optimization method that works as follows: First, a random configuration is evaluated by parameterizing and running the target algorithm. This returns a score that can be anything (e.g. loss, accuracy, R^2, ...). Then, a surrogate (Random Forest, short: RF) is fitted to represent the relation between configuration and scores obtained. As an extension one can alos include features $\mathbf{x}$ describing the task to be solved in either (1) the RF's feature space or (2) into the prediction of the RF by certain tricks (for details see [the original paper](https://ml.informatik.uni-freiburg.de/wp-content/uploads/papers/11-LION5-SMAC.pdf)). After a model is built, a subset of configurations is chosen from the configuration space is selected s.t. it maximizes the *Expected Improvement (EI)*. EI can be interpreted as the improvement that can be expected if we run the target algorithm with configurations $\theta^*$ instead of $\theta'$ where $\theta'$ is the best configuration obtained so far. Optimally, EI can be computed for each $\theta^*$. Sometimes, closed-formed solutions for this are available that can be computed efficiently, however, in SCMAC, EI is appromximated: First, EI is computed for each configuration tested so far (w.r.t. $\theta'$), then the top-$k$ configurations are chosen as well as their nearest neighbours (defined by some neighbour-function $n$). These neighbours have never been evaluated, thus we use the RF to predict a score that approximately represents the performance some configuration untested $\theta$ would achieve. Additionally, a local search is used to further improve configuration-candidates by manipulating them using $n$. This results in a set of configurations being tested in the next *intensify* round. Here, each configuration found using the local search is tested in a tournament-fashion: Each configuration parameterizes the target algorihtm multiple times. Once the configuration performs worse than the best one seen so far ($\theta'$), it is rejected. If not, it is accepted as the new best configuration seen so far.

In [5]:
def train(config, seed): 
    clf = LogisticRegression(solver='saga', C=config['C'], l1_ratio=config['l1_ratio'], max_iter=config['max_iter'], penalty=config['penalty'])
    clf.fit(X_train, y_train)
    y_hat = clf.predict(X_test)
    acc = accuracy_score(y_hat, y_test)
    return -acc # negative sign because SMAC minimizes objective
    
configspace = ConfigurationSpace()
C = FloatSMAC('C', (1e-5, 4))
penalty = CategoricalSMAC('penalty', ['l1', 'l2', 'elasticnet'])
max_iter = CategoricalSMAC('max_iter', [50, 100, 200, 500])
l1_ratio = FloatSMAC('l1_ratio', (0, 1))
configspace.add_hyperparameters([C, penalty, max_iter, l1_ratio])

scenario = Scenario(configspace, deterministic=True, n_trials=200)
smac = HyperparameterOptimizationFacade(scenario, train)
incumbent = smac.optimize()
incumbent

[INFO][abstract_initial_design.py:134] Using 40 initial design configurations and 0 additional configurations.
[INFO][abstract_intensifier.py:306] Using only one seed for deterministic scenario.
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


[INFO][abstract_intensifier.py:513] Added config d28a09 as new incumbent because there are no incumbents yet.
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jon

  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


[INFO]

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined




  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined




  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined




  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  diff_b_a = subtract(b, a)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + V

  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 158, in run
    rval = self(config_copy, target_function, kwargs)
  File "/home/jonas/anaconda3/envs/automl/lib/python3.9/site-packages/smac/runner/target_function_runner.py", line 231, in __call__
    return algorithm(config, **algorithm_kwargs)
  File "/tmp/ipykernel_74117/1513758354.py", line 3, in train
    clf.fit(X_train, y_train)
NameError: name 'X_train' is not defined


  File

  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)
  preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)


Configuration(values={
  'C': 1.0675214154943264,
  'l1_ratio': 0.6205646954476833,
  'max_iter': 50,
  'penalty': 'l1',
})

## Genetic Algorithms
A third method to optimize hyperparameters is to use Genetic Algorithms (GA). GAs are general purpose optimization algorithms inspired by biology and work as follows: First, we define operations *mutate* and *cross-over*. These operations define how the genes, which we will define next, can be combined. Thus these operations, together with the genes, implicitely define the search space. Genes are objects of interest and can be e.g. elements from the real vector space. In our case genes represent hyperparameter-configurations under which a certain learning algorithm (e.g. Random Forest) will be run. As a last ingridient, GAs need a fitness score that evaluates how well a certain gene performs. In our case this will be the accuracy of our final model trained under some hyperaprameter-configuration (or gene). The algorithm then works as follows:

1. initialize population (randomy draw some configurations)
2. compute fitness of each configuration
3. check if stop criterion (e.g. max. number of iterations) is reached
4. select the best performing objects from the population
5. apply *mutate* and *cross-over* to a random subset of the remaining population and add the resulting objects to the population
6. recurse to 2.

In [4]:
lr = LogisticRegression(solver='saga')
params = {
    'C': np.arange(1e-5, 4),
    'penalty': ['l1', 'l2', 'elasticnet'],
    'max_iter': range(50, 500),
    'l1_ratio': np.arange(0, 1)
}
ga = EvolutionaryAlgorithmSearchCV(estimator=lr, params=params, scoring='accuracy', 
                                    cv=3, population_size=10, gene_crossover_prob=0.5, gene_mutation_prob=0.1, tournament_size=3, generations_number=5)
ga.fit(X, y)
print(ga.best_params_)
print("Accuracy:"+ str(ga.best_score_))



{'C': 2.00001, 'penalty': 'l1', 'max_iter': 290, 'l1_ratio': 0}
Accuracy:0.7013449367088608




## Conclusion
Hyperparameter Tuning is a very important step in your ML-pipeline and you should invest a good amount of time to identify proper hyperparameters! You can do it manually (which can be costly, but if you know parameters that will probably work well it's fine) or you can employ automatic methods that make life easier for you. Of course, Random Search is one of the most basic techniques you can think of and it might not work well for high-dimensional hyperparameter-search spaces (we randomly draw samples from a search-space). Also, if your model reaches a certain complexity, uninformed methods like Random Search wastes many computation resources. In such cases different methods like Bayesian Optimization (BO) might be worth a consideration since such methods don't sample completely at random, they incorporate the results of prior hyperparameters and adjust a distribution over the hyperparameter-search space with the goal to converge faster. 

However, since HO is not the main topic of this workshop, we won't dive in any further. If you are interested in such things, please stand by, we plan a workshop on AutoML as well!