<img src="./images/logo.png" alt="Drawing" style="width: 500px;"/>

<font size="4">

# Tutorial for Hyperactive
    
    
This is a tutorial to introduce you to the basic functionalities of Hyperactive and provide some interesting applications. It will also give an introduction to some optimization techniques. Hyperactive is a package that can optimize any python function and collect the search data. 

<br>
    

    
## Table of contents:
* [Introduction](#intro)
* [Convex Optimization](#convex)
* [Non-convex Optimization](#non_convex)
* [Data collection](#data_collect)
* [Multiple objectives](#multi_objectives)
* [Non-numerical search space](#search_space)
* [Deep Learning Optimization](#deep_learning)
* [Hyperactive memory](#memory)


In [28]:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

import time
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from sklearn.preprocessing import Normalizer, MinMaxScaler
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.neural_network import MLPClassifier
from sklearn.gaussian_process.kernels import Matern, WhiteKernel, RBF, ConstantKernel

from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_boston, load_iris

from hyperactive import Hyperactive, BayesianOptimizer, HillClimbingOptimizer
from gradient_free_objective_functions.visualize import plot_surface, plot_heatmap

color_scale = px.colors.sequential.Jet


def _create_grid(objective_function, search_space):
    def objective_function_np(*args):
        para = {}
        for arg, key in zip(args, search_space.keys()):
            para[key] = arg

        return objective_function(para)

    (x_all, y_all) = search_space.values()
    xi, yi = np.meshgrid(x_all, y_all)
    zi = objective_function_np(xi, yi)

    return xi, yi, zi


In [None]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Dense, Flatten
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import SGD
from keras.utils import np_utils
from tensorflow import keras

import tensorflow as tf

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
config.log_device_placement = True

sess = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(sess)

In [None]:
# load dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

img_width = 28
img_height = 28

x_train = x_train.astype("float32")
x_train /= 255.0
x_test = x_test.astype("float32")
x_test /= 255.0

# reshape input data
x_train = x_train.reshape(x_train.shape[0], img_width, img_height, 1)
x_test = x_test.reshape(x_test.shape[0], img_width, img_height, 1)

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

<font size="4">
    
## Introduction <a class="anchor" id="intro"></a>
    
There are two things you need to define before starting your first optimization run:
    
    - the objective function: 
        Contains some kind of model. It always returns a score that will be maximized during
    - a search space: 
        Defines the parameter space in which the optimizer searches for the best parameter set
    
In this notebook you will see several different examples for objective functions. 
    
    
### How does Hyperactive help? <a class="anchor" id="intro"></a>

    
    

In [29]:
def objective_function(para):
    loss = para["x"]*para["x"]
    # -x*x is an inverted parabola 
    return -loss

# We have only one dimension here
search_space = {
    "x": list(np.arange(-5, 5, 0.01)),
}

<font size="4">

In the next step we will start the optimization run. The following code snippet shows the most basic example:
```python
hyper = Hyperactive()
hyper.add_search(objective_function, search_space, n_iter=70)
hyper.run()
```

You only need the objective_function, search_space and the number of iterations. Each iteration will evaluate the objective function. This will generate a score, that the optimization algorithm uses to determine which position in the search space to look next. All of the calculations will be done by Hyperactive in the background. You will receive the results of the optimization run when all iterations are done.

In [30]:
hyper_0 = Hyperactive(verbosity=False)
hyper_0.add_search(objective_function, search_space, n_iter=70, initialize={"random": 2, "vertices": 2})
hyper_0.run()

search_data_0 = hyper_0.results(objective_function)
search_data_0[["x", "score"]]

Unnamed: 0,x,score
0,-3.74,-13.9876
1,0.37,-0.1369
2,-5.00,-25.0000
3,4.99,-24.9001
4,4.25,-18.0625
...,...,...
65,-3.85,-14.8225
66,-2.75,-7.5625
67,-2.54,-6.4516
68,4.57,-20.8849


<font size="4">

In the table above you can see the 100 iterations performed during the run. This is called the **search data**. In each row you can see the parameter ```x``` and the corresponding score. As we previously discussed the optimization algorithm determines which position to select next based on the score from the evaluated objective function. 

When Hyperactive starts the optimization the **first iterations are initializations** from the ```initialize```-dictionary. In the example above there are 4 initializations (2 random and 2 vertices). They determine the initial positions in the search space that are used to evaluate the objective funtion. As you can see in the search data the 2. and 3. iteration are the vertices (edge points) of the search space. The 0. and 1.
The first rows of the search data are randomly selected. After those few initialization steps the optimization algorithm will select the next positions in the search space based on the score of the previous position(s).
    
The default algorithm for the optimization is the **random-search**. You can see the random pattern in the last few iterations of the search data. We can also see the random pattern if we plot the search data:

<font size="4">

### Random Search Optimizer
    
...

In [31]:
fig = px.scatter(search_data_0, x="x", y="score")
fig.show()

<font size="4">
    
The plot above shows the score of each parameter set (in this case just one parameter "x"). The random search explores the search space very well, so that we can see the inverse parabola.

<font size="4">
    
## Convex Optimization <a class="anchor" id="convex"></a>
    

    - knowing about obj func shape is important. What is convex opt?
    - when to expect convex opt?

In [32]:
def convex_function(para):
    loss = (para["x"]*para["x"] + para["y"]*para["y"])
    return -loss


search_space = {
    "x": list(np.arange(-5, 5, 0.01)),
    "y": list(np.arange(-5, 5, 0.01)),
}

In [33]:
hyper_convex_0 = Hyperactive(verbosity=False)
hyper_convex_0.add_search(convex_function, search_space, n_iter=2000)
hyper_convex_0.run()



In [34]:
search_data_convex_0 = hyper_convex_0.results(convex_function)

In [35]:
fig = px.scatter(search_data_convex_0, x="x", y="y", color="score", color_continuous_scale=color_scale)
fig.update_layout(width=900, height=800, xaxis_range=[-5, 5], yaxis_range=[-5, 5])
fig.show()

<font size="4">
    
The plot above shows the samples from the search data acquired from the convex-function in a 2-dimensional search space. The score is shown by the color of each point in the scatter plot. 

<font size="4">

We were able to see, that random search is a good optimization technique to explore the search space. But the goal is often to quickly find position in the search space with a high score. Therefore we should consider other optimization techniques like the hill climbing algorithm.

<font size="4">

### Hill Climbing Optimizer
    
The hill climbing optimization algorithm works by finding a random neighbour position close to the current position. If the score of the new position is better than the current one the algorithm makes a step to the new position and returns to finding the next position. This behaviour is like someone who tries to find the highest (highest score) position in a space by only moving up and never moves down. 
    
The hill climbing algorithm works very well with convex optimization problems, because the score continuously improves towards a direction. Hill climbing can find this direction by exploring the scores of its neighbours.
Hill climbing does not work of there are local optima. It tends to get "stuck" in certain regions, where the current position is surrounded by positions with worse scores. The algorithm would need to first "go down" and later "go up" again to find other (even better) positions in the search space.


In [36]:
optimizer = HillClimbingOptimizer(rand_rest_p=0)

hyper_convex_1 = Hyperactive(verbosity=False)
hyper_convex_1.add_search(convex_function, search_space, n_iter=90, optimizer=optimizer, initialize={"vertices":1})
hyper_convex_1.run()

search_data_convex_1 = hyper_convex_1.results(convex_function)

In [37]:
fig = px.scatter(search_data_convex_1, x="x", y="y", color="score", color_continuous_scale=color_scale)
fig.update_layout(width=900, height=800, xaxis_range=[-5, 5], yaxis_range=[-5, 5])
fig.show()

<font size="4">
    
The 2D-scatter plot above shows that the hill climbing algorithm convertes quickly to the optimum of the objective function in the search space. Hill climbing is specialized to find the optimum of convex functions quickly. It was able to find a good score in less than 100 iterations, while the random search used much more for a similar maximum score. 

<font size="4">
    
## Non-convex Optimization <a class="anchor" id="non_convex"></a>
    
    
    - convex vs non-convex opt
    - when to expect non-convex opt?

In [38]:
def ackley_function(para):
    x, y = para["x"], para["y"]

    loss = (
        -20 * np.exp(-0.2 * np.sqrt(0.5 * (x * x + y * y)))
        - np.exp(0.5 * (np.cos(2 * np.pi * x) + np.cos(2 * np.pi * y)))
        + np.exp(1)
        + 20
    )

    return -loss


search_space = {
    "x": list(np.arange(-5, 5, 0.01)),
    "y": list(np.arange(-5, 5, 0.01)),
}

<font size="4">
    
    ...

In [39]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

search_space_plot = {
    "x": list(np.arange(-5, 5, 0.2)),
    "y": list(np.arange(-5, 5, 0.2)),
}

xi_c, yi_c, zi_c = _create_grid(convex_function, search_space_plot)
xi_a, yi_a, zi_a = _create_grid(ackley_function, search_space_plot)

fig1 = go.Surface(x=xi_c, y=yi_c, z=zi_c, colorscale=color_scale)
fig2 = go.Surface(x=xi_a, y=yi_a, z=zi_a, colorscale=color_scale)

fig = make_subplots(rows=1, cols=2,
                    specs=[[{'is_3d': True}, {'is_3d': True}]],
                    subplot_titles=['Convex Function', 'Non-convex Function'],
                    )

fig.add_trace(fig1, 1, 1)
fig.add_trace(fig2, 1, 2)
fig.update_layout(title_text="Objective Function Surface")
fig.show()

<font size="4">
    
    ...

In [40]:
hyper_ackley_0 = Hyperactive(verbosity=False)
hyper_ackley_0.add_search(ackley_function, search_space, n_iter=2000)
hyper_ackley_0.run()

search_data_ackley_0 = hyper_ackley_0.results(ackley_function)

In [41]:
fig = px.scatter(search_data_ackley_0, x="x", y="y", color="score", color_continuous_scale=color_scale)
fig.update_layout(width=900, height=800)
fig.show()

<font size="4">

The plot above shows the random search exploring the ackley function. Random search is not affected by the many local optima in the search space. Lets try out the hill climbing algorithm on the ackley function and see the results.

In [42]:
optimizer = HillClimbingOptimizer(rand_rest_p=0)

hyper_ackley_1 = Hyperactive(verbosity=False)
hyper_ackley_1.add_search(ackley_function, 
                          search_space, 
                          n_iter=100, 
                          optimizer=optimizer, 
                          initialize={"vertices": 1})
hyper_ackley_1.run()

search_data_ackley_1 = hyper_ackley_1.results(ackley_function)

In [43]:
fig = px.scatter(search_data_ackley_1, x="x", y="y", color="score", color_continuous_scale=color_scale)
fig.update_layout(width=900, height=800, xaxis_range=[-5, 5], yaxis_range=[-5, 5])
fig.show()

<font size="4">

Maybe you already expected, that the hill climbing algorithm delivers bad results optimizing the ackley function. That does not mean, that hill climbing is a bad algorithm in general. It means that it is bad for this kind of objective functions. This is a very important idea in mathematical optimization. It is very useful to know about the properties of the objective function, because you can choose an optimization algorithm that works really well for this problem.

<font size="4">
    
### Repulsing Hill Climbing Optimizer
    
The repulsing hill climbing optimizer tries to improve how hill climbing solves non-convex objective functions. It does so by increasing the radius in which hill climbing selects a neighbour position if the last position wasn't an improvement over the current one. This means the hill climber will jump away from its current position of it does not find better position in its close environment.

In [44]:
from hyperactive import RepulsingHillClimbingOptimizer

optimizer = RepulsingHillClimbingOptimizer()

hyper_ackley_2 = Hyperactive(verbosity=False)
hyper_ackley_2.add_search(ackley_function, 
                          search_space, 
                          n_iter=100, 
                          optimizer=optimizer, 
                          initialize={"vertices": 1})
hyper_ackley_2.run()

search_data_ackley_2 = hyper_ackley_2.results(ackley_function)

In [45]:
fig = px.scatter(search_data_ackley_2, x="x", y="y", color="score", color_continuous_scale=color_scale)
fig.update_layout(width=900, height=800, xaxis_range=[-5, 5], yaxis_range=[-5, 5])
fig.show()

<font size="4">
    
The plot above shows how the repulsing hill climbing optimizer explored the search space of the ackley function. It does a much better job finding new optima in the space, while also exploring local regions.

<font size="4">
    
## Machine Learning Hyperparameter Optimization <a class="anchor" id="machine_learning"></a>

Until now we only optimized test functions to show how an objective function and the search space can look like. These problems were easy to solve, because the objective funtion evaluates very fast and the search space is very small. Real optimization problems often have one of those two problems:
    - The objective function is computationally expensive, so it takes a long time to evaluate. This increases the iteration time and slowes down the optimization progress.
    - The search space is very large. This can makes it very difficult to find positions with a high score.
    
In the first case you would want to use optimization algorithms that are very inteligent in finding new positions with high scores. You don't want to waste too much time exploring the search space, because each evaluation takes such a long time. You want to get to a good position with a high score in as few steps as possible.
    
In the second case you would want a fast algorithm, that looks for a good score but also explores the search space very well.
    

<font size="4">
    
Lets take a look at a (kind of) real optimization problem. We want to optimize the hyperparameters of a gradient boosting regressor that is trained on the boston housing regression dataset.

In [46]:
data = load_boston()
X_boston, y_boston = data.data, data.target

In [47]:
def gbr_model_0(opt):
    gbr = GradientBoostingRegressor(
        n_estimators=opt["n_estimators"],
        max_depth=opt["max_depth"],
    )
    scores = cross_val_score(gbr, X_boston, y_boston, cv=5)
    score = scores.mean()
    return score


search_space_gbr_0 = {
    "n_estimators": list(range(10, 100)),
    "max_depth": list(range(2, 12)),
}

In [48]:
hyper_gbr_0 = Hyperactive(verbosity=False)
hyper_gbr_0.add_search(gbr_model_0, search_space_gbr_0, n_iter=50)
hyper_gbr_0.run()

search_data_gbr_0 = hyper_gbr_0.results(gbr_model_0)
search_data_gbr_0

Unnamed: 0,n_estimators,max_depth,eval_time,iter_time,score
0,89,5,0.492801,0.492831,0.568722
1,46,11,0.449591,0.449608,0.408033
2,39,5,0.214391,0.214409,0.542755
3,39,8,0.313406,0.313423,0.464827
4,68,5,0.379446,0.379463,0.550212
5,68,8,0.529009,0.529031,0.476288
6,10,2,0.030105,0.030124,0.282314
7,99,11,0.953356,0.953377,0.414762
8,99,2,0.256666,0.256685,0.693617
9,10,11,0.098413,0.098431,0.339212


In [49]:
fig = px.scatter(search_data_gbr_0, 
                 x="n_estimators", 
                 y="max_depth", 
                 color="score", 
                 color_continuous_scale=color_scale)

fig.update_layout(width=900, height=800)
fig.show()

<font size="4">
    
The scatter plot above contains the samples from the search data from the gbr-model. It seams that high ```max_depth``` delivers bad scores but we should explore higher values for ```n_estimators```.

<font size="4">

## Continuing the Search <a class="anchor" id="continuing-search"></a>

<br>

Hyperactive makes it very easy to continue a search. The search data you already used for data exploration can just be passed to Hyperactive. This is done in multiple ways:
    
    - You can extract the best parameters via the "best_para"-method. This can than be passed to "initialize" to start at this position in the search space
    - The search data from the "results"-method can be passed to "memory_warm_start". The search data is automaticaly added into the memory-dictionary.
    - You can also pass the search data to "warm_start_smbo". This has the effect that the Bayesian optimizer can do more precise approximations in the beginning of the optimization run.
   

In [50]:
best_para_gbr_0 = hyper_gbr_0.best_para(gbr_model_0)
initialize = {"random": 4, "warm_start": [best_para_gbr_0]}


search_space_gbr_01 = {
    "n_estimators": list(range(10, 250, 5)),
    "max_depth": list(range(2, 8)),
}


hyper_gbr_01 = Hyperactive(verbosity=False)
hyper_gbr_01.add_search(gbr_model_0, 
                        search_space_gbr_01, 
                        n_iter=50,
                        n_jobs=2,
                        memory_warm_start=search_data_gbr_0, 
                        initialize=initialize)
hyper_gbr_01.run()

search_data_gbr_01 = hyper_gbr_01.results(gbr_model_0)

<font size="4">

    
    ...

In [51]:
search_data_gbr_01_ = search_data_gbr_01.append(search_data_gbr_0, ignore_index=True)
search_data_gbr_01_

Unnamed: 0,n_estimators,max_depth,eval_time,iter_time,score
0,135,4,0.621825,0.621869,0.641329
1,105,7,0.755399,0.755419,0.479270
2,130,7,0.929600,0.929621,0.482900
3,25,5,0.138286,0.138304,0.463320
4,100,2,0.261741,0.261759,0.698099
...,...,...,...,...,...
145,95,4,0.425479,0.425505,0.656512
146,17,9,0.148550,0.148577,0.416634
147,96,9,0.818522,0.818549,0.425215
148,41,9,0.351822,0.351848,0.412612


In [52]:
fig = px.scatter(search_data_gbr_01_, 
                 x="n_estimators", 
                 y="max_depth", 
                 color="score", 
                 color_continuous_scale=color_scale)

fig.update_layout(width=900, height=800)
fig.show()

<font size="4">

    
    ...

In [53]:
best_para_gbr_01 = hyper_gbr_01.best_para(gbr_model_0)
initialize = {"warm_start": [best_para_gbr_01]}


search_space_gbr_02 = {
    "n_estimators": list(range(150, 300, 2)),
    "max_depth": list(range(2, 5)),
}

optimizer = HillClimbingOptimizer(rand_rest_p=0)

hyper_gbr_02 = Hyperactive(verbosity=False)
hyper_gbr_02.add_search(gbr_model_0, 
                        search_space_gbr_02, 
                        n_iter=50,
                        n_jobs=1,
                        optimizer=optimizer,
                        memory_warm_start=search_data_gbr_01_, 
                        initialize=initialize)
hyper_gbr_02.add_search(gbr_model_0, 
                        search_space_gbr_02, 
                        n_iter=50,
                        n_jobs=1,
                        optimizer=optimizer,
                        memory_warm_start=search_data_gbr_01_, 
                        initialize={"random": 4})
hyper_gbr_02.add_search(gbr_model_0, 
                        search_space_gbr_02, 
                        n_iter=50,
                        n_jobs=2,
                        memory_warm_start=search_data_gbr_01_)
hyper_gbr_02.run()

search_data_gbr_02 = hyper_gbr_02.results(gbr_model_0)

In [54]:
search_data_gbr_02_ = search_data_gbr_02.append(search_data_gbr_01_, ignore_index=True)

In [55]:
fig = px.scatter(search_data_gbr_02_, 
                 x="n_estimators", 
                 y="max_depth", 
                 color="score", 
                 color_continuous_scale=color_scale)

fig.update_layout(width=900, height=800)
fig.show()

In [56]:
search_data_gbr_02_f = search_data_gbr_02_[search_data_gbr_02_["score"] > 0.68]


fig = px.scatter(search_data_gbr_02_f, 
                 x="n_estimators", 
                 y="max_depth", 
                 color="score", 
                 color_continuous_scale=color_scale)

fig.update_layout(width=900, height=800)
fig.show()

In [57]:
fig = px.scatter(search_data_gbr_02_f, 
                 x="n_estimators", 
                 y="score")

fig.update_layout(width=900, height=800)
fig.show()

<font size="4">
    
## Collecting more Data <a class="anchor" id="data_collect"></a>
    
Until now you have seen, that the objective function always returns only one variable: The score, which is always a real number. But Hyperactive has the capability to accept more variables. Those additional variables won't affect the score or the decision making of the optimization algorithm, but they will be collected in each iteration and accessed in the search data.
    
This feature can be very useful, because you can add any variable you want to the search data, which might help you understand the model better. To collect additional data in the objective function you just put it into a dictionary and return it alongside with the score. The key will be the column name in the search data and the value will be collected.

In [82]:
def gbr_model_1(opt):
    gbr = GradientBoostingRegressor(
        n_estimators=opt["n_estimators"],
        max_depth=opt["max_depth"],
    )
    c_time = time.time()
    scores = cross_val_score(gbr, X_boston, y_boston, cv=5)
    cv_time = time.time() - c_time
    
    # add the dictionary to collect more data
    return scores.mean(), {"cv_time": cv_time}


search_space_gbr_1 = {
    "n_estimators": list(range(10, 250, 5)),
    "max_depth": list(range(2, 8)),
}

In [61]:
hyper_gbr_1 = Hyperactive(verbosity=False)
hyper_gbr_1.add_search(gbr_model_1, search_space_gbr_1, n_iter=15, n_jobs=8, initialize={"random": 10})
hyper_gbr_1.run()

search_data_gbr_1 = hyper_gbr_1.results(gbr_model_1)
search_data_gbr_1.head()

Unnamed: 0,n_estimators,max_depth,cv_time,eval_time,iter_time,score
0,135,4,0.629156,0.6294,0.629438,0.641329
1,105,7,0.805923,0.805979,0.805996,0.47927
2,130,7,0.93077,0.930822,0.930839,0.4829
3,25,5,0.138219,0.138274,0.138292,0.46332
4,225,5,1.214869,1.214924,1.214941,0.54319


In [62]:
fig = px.scatter(search_data_gbr_1, 
                 x="n_estimators", 
                 y="max_depth", 
                 color="score", 
                 size='cv_time', 
                 color_continuous_scale=color_scale)

fig.update_layout(width=900, height=800)
fig.show()

<font size="4">
    
The scatter plot above shows the samples od the search data, but adds a visualization of the cross-validation-time with the size of the scatter-points.

<font size="4">
    
## Managing multiple objectives <a class="anchor" id="multi_objectives"></a>
    
In the last chapter you were able to collect additional data during the optimization run. This data did not affect the score. But you can still try to create one score that represents information from multiple scores. In the following example we want to optimize a model to get a high score and at the same time a low training time.

In [83]:
def gbr_model_2(opt):
    gbr = GradientBoostingRegressor(
        n_estimators=opt["n_estimators"],
        max_depth=opt["max_depth"],
    )
    c_time = time.time()
    scores = cross_val_score(gbr, X_boston, y_boston, cv=5)
    cv_time = time.time() - c_time
    
    score_cv_avg = scores.mean()
    score_cv_std = scores.std()
    
    # the score is calculated from the cv-score and the cv-training time
    score = score_cv_avg / (cv_time**0.1)
    
    # independed from the score we want some additional data
    return score, {"cv_time": cv_time, 
                   "score_cv_avg": score_cv_avg,
                   "score_cv_std": score_cv_std,
                  "scores": scores,
                  }


search_space_gbr_2 = {
    "n_estimators": list(range(10, 250, 5)),
    "max_depth": list(range(2, 12)),
}

<font size="4">
    
The objective function above enables us to return a score that is composed of multiple variables. At the same time, we also want to collect data about the variables the score is composed from. This helps us understand the score later during the data visualization.

In [64]:
hyper_gbr_2 = Hyperactive(verbosity=False)
hyper_gbr_2.add_search(gbr_model_2, search_space_gbr_2, n_iter=15, n_jobs=8, initialize={"random": 10})
hyper_gbr_2.run()

search_data_gbr_2 = hyper_gbr_2.results(gbr_model_2)
search_data_gbr_2.head()

Unnamed: 0,n_estimators,max_depth,cv_time,eval_time,iter_time,score,score_cv_avg,score_cv_std,scores
0,135,7,1.021605,1.021913,1.021969,0.474938,0.475955,0.343816,"[0.7400021563460772, 0.6639906236271695, 0.736..."
1,105,8,0.844044,0.844133,0.844152,0.472121,0.464184,0.35534,"[0.7628159570399108, 0.6424397856186803, 0.720..."
2,230,2,0.606243,0.606324,0.606341,0.73919,0.703106,0.143578,"[0.812534064772756, 0.8784967652206191, 0.7569..."
3,155,8,1.246981,1.247068,1.247087,0.481294,0.492036,0.319898,"[0.7677796273449734, 0.6538988104392887, 0.738..."
4,105,5,0.580312,0.580401,0.580419,0.578936,0.548273,0.25262,"[0.7441004952124477, 0.7461935546586134, 0.652..."


In [86]:
fig = px.scatter(search_data_gbr_2, 
                 x="n_estimators", 
                 y="max_depth", 
                 color="score", 
                 size='cv_time', 
                 color_continuous_scale=color_scale)

fig.update_layout(width=800, height=700)
fig.show()

<font size="4">

    explain plot above

In [84]:
fig = px.scatter(search_data_gbr_2, 
                 x="cv_time", 
                 y="score_cv_avg", 
                 color="score", 
                 size='score_cv_std', 
                 color_continuous_scale=color_scale)

fig.update_layout(width=800, height=700)
fig.show()

<font size="4">
    
## Non-numerical Search Spaces <a class="anchor" id="search_space"></a>
    
This chapter describes a very unique and helpful feature of Hyperactive: non-numerical values in the search space. You are not constrained to use numeric values in your search space, but also strings or even functions. Because of this you can do some really interesting stuff like:
    
    - hyperparameter optimization of any parameter
    - preprocessing-optimization
    - neural architecture search
    
Lets take a look at the following example:

In [66]:
data = load_iris()
X_iris, y_iris = data.data, data.target

In [67]:
def mlp_model(opt): 
    scaler = MinMaxScaler()
    X_norm = scaler.fit_transform(X_iris)
    
    mlp = MLPClassifier(
        hidden_layer_sizes=opt["hidden_layer_sizes"],
        activation=opt["activation"],
        alpha=opt["alpha"],
        learning_rate_init=opt["learning_rate_init"],

    )
    scores = cross_val_score(mlp, X_norm, y_iris, cv=5)

    return scores.mean()


search_space_mlp = {
    "hidden_layer_sizes": list(range(10, 100, 10)),
    "activation": ["identity", "logistic", "tanh", "relu"],
    "solver":  ["lbfgs", "sgd", "adam"],
    "alpha": [1/(10**x) for x in range(1, 9)],
    "learning_rate_init": [1/(10**x) for x in range(1, 9)],

}

In [68]:
hyper_mlp_0 = Hyperactive(verbosity=False)
hyper_mlp_0.add_search(mlp_model, search_space_mlp, n_iter=40)
hyper_mlp_0.run()

mlp_search_data = hyper_mlp_0.results(mlp_model)
mlp_search_data.head()

Unnamed: 0,hidden_layer_sizes,activation,solver,alpha,learning_rate_init,eval_time,iter_time,score
0,70,tanh,sgd,1e-07,0.1,0.262645,0.262677,0.973333
1,80,relu,sgd,0.0001,0.0001,0.348676,0.348695,0.686667
2,50,logistic,sgd,0.0001,0.0001,0.240616,0.240635,0.526667
3,20,logistic,lbfgs,0.001,0.01,0.285496,0.285518,0.953333
4,90,tanh,lbfgs,0.1,1e-06,0.034304,0.034322,0.406667


In [69]:
parameter_names = list(search_space_mlp.keys())

mlp_search_data = mlp_search_data.sort_values('hidden_layer_sizes', ascending=False)

fig = px.parallel_categories(mlp_search_data, 
                             color="score", 
                             color_continuous_scale=color_scale, 
                             dimensions=parameter_names, 
                             )
fig.update_layout(width=950, height=700)
fig.show()

<font size="4">

    - explain plot above

<font size="4">
    
## Deep Learning Optimization <a class="anchor" id="deep_learning"></a>
    
The optimization of deep learning models can be very difficult because the evaluation times of the objective functions are very high, due to the long training times. There is also the challenge of finding a way to find the optimal structure/architecture of the neural network. Hyperactive can help with both of those problems.
    
The optimization of the structure/architecture of a neural network is called **n**eural **a**rchitecture **s**earch. Because Hyperactive can handle functions in its search spaces performing **nas** is very easy.
    

In [72]:
def deep_learning_model(params):
    filters_0 = params["filters.0"]
    kernel_size_0 = params["kernel_size.0"]
    
    model = Sequential()
    model.add(Conv2D(filters_0, (kernel_size_0, kernel_size_0), input_shape=(img_width, img_height, 1), activation="relu"))
    
    model.add(MaxPooling2D(pool_size=(2, 2)))
        
    # the next two lines are layers that are put in during the optimization run
    model = params["layer.0"](params, model)
    model = params["layer.1"](params, model)

    model.add(Flatten())
    model.add(Dense(params["dense.0"], activation="relu"))
    model.add(Dense(num_classes, activation="softmax"))

    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    model.fit(
        x_train,
        y_train,
        validation_data=(x_test, y_test),
        epochs=5,
        verbose=False,
    )
    _, score = model.evaluate(x=x_test, y=y_test, verbose=False)

    return score

<font size="4">
    
The following functions are the layers and layer-compositions that we will use in the search space. The ```params```-argument enables the optimization of parameters inside the layer-function. There is also a ```no_layer```-function because we want to test of it might be better for the score of the neural network if its number of layers is reduced.

In [None]:
def Conv2D_MaxPooling2D_layer(params, model):
    filters_1 = params["layer.0.filters"]
    kernel_size_1 = params["layer.0.kernel_size"]
    model.add(Conv2D(filters_1, (kernel_size_1, kernel_size_1), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    return model

def Conv2D_layer(params, model):
    filters_1 = params["layer.0.filters"]
    kernel_size_1 = params["layer.0.kernel_size"]
    model.add(Conv2D(filters_1, (kernel_size_1, kernel_size_1), activation='relu'))
    return model

def Dropout_layer(params, model):
    model.add(Dropout(params["layer.1.rate"]))
    return model

def no_layer(params, model):
    return model

<font size="4">
    
In the search space you can see that the layer-functions are put inside lists. During the optimization run Hyperactive will select those layer-functions similar to any other variable inside the search space.

In [None]:
# you can put the layers into lists like any other variable
search_space_dl = {
    "filters.0": list(range(7, 15)),
    "kernel_size.0": list(range(3, 6)),
    
    "layer.0": [Conv2D_MaxPooling2D_layer, Conv2D_layer, no_layer],
    "layer.0.filters": list(range(5, 12)),
    "layer.0.kernel_size": list(range(3, 6)),
    
    "layer.1": [Dropout_layer, no_layer],
    "layer.1.rate": list(np.arange(0.2, 0.8, 0.1)),

    "dense.0": list(range(10, 200, 20)),
}

In [73]:
optimizer = BayesianOptimizer()

hyper_dl = Hyperactive(verbosity=False)
hyper_dl.add_search(deep_learning_model, search_space_dl, n_iter=30, optimizer=optimizer)
hyper_dl.run()

dl_search_data = hyper_dl.results(deep_learning_model)

In [98]:
# lets replace the functions with their names for the plot
def func2str(row):
    return row.__name__

dl_search_data["layer.0"] = dl_search_data["layer.0"].apply(func2str)
dl_search_data["layer.1"] = dl_search_data["layer.1"].apply(func2str)

dl_search_data = dl_search_data.drop(["eval_time", "iter_time"], axis=1)

AttributeError: 'str' object has no attribute '__name__'

In [106]:
score_max = np.amax(search_data_0["score"])
score_std = search_data_0["score"].std()
dl_search_data_f = dl_search_data[abs(search_data_0["score"]-score_max) < score_std*2]

In [107]:
parameter_names = list(dl_search_data_f.keys())

fig = px.parallel_categories(dl_search_data, 
                             color="score", 
                             color_continuous_scale=color_scale, 
                             dimensions=parameter_names, 
                             )
fig.update_layout(width=950, height=700)
fig.show()

ValueError: Value of 'color' is not the name of a column in 'data_frame'. Expected one of ['filters.0', 'kernel_size.0', 'layer.0', 'layer.0.filters', 'layer.0.kernel_size', 'layer.1', 'layer.1.rate', 'dense.0'] but received: score

<font size="4">
    
## Hyperactive memory <a class="anchor" id="memory"></a>
  
We already discussed, that some models (especially deep learning models) can take a long time to train, which slows down the optimization run. This means, that we want to avoid any unnecessary evaluation of the objective function. Unfortunately most optimization algorithms might won't avoid positions in the search space that were already evaluated. For example:
    
    - Random Search, which could select a position it already selected before
    - Hill climbing stuck in an optimum
    - Particle swarms that converge on one position
    
The bottom line is, that optimization algorithms don't "remember" already explored positions and won't avoid them. But Hyperactive has a feature that solves this problem, by saving the position and score in a memory-dictionary. If a position is selected Hyperactive will look up if this position is already known. If it knows the position and score it won't reevaluate the objective function, which saves time. This is very useful for computationally expensive objective functions.
    
You can even pass the search data from a previous optimization run into ```memory_warm_start```
    


In [76]:
def dtr_model(opt):
    dtr = DecisionTreeRegressor(
        max_depth=opt["max_depth"],
        min_samples_split=opt["min_samples_split"],
    )
    scores = cross_val_score(dtr, X_boston, y_boston, cv=5)

    return scores.mean()


search_space_dtr = {
    "max_depth": list(range(10, 35)),
    "min_samples_split": list(range(2, 35)),
}

In [77]:
c_time1 = time.time()

hyper_dtr_0 = Hyperactive(verbosity=False)
hyper_dtr_0.add_search(dtr_model, search_space_dtr, n_iter=300)
hyper_dtr_0.run()

d_time1 = time.time() - c_time1
print("Optimization time 1:", round(d_time1, 2))

# Hyperactive collects the search data
search_data_dtr_0 = hyper_dtr_0.results(dtr_model)

Optimization time 1: 2.54


In [78]:
# The next run will be faster, because Hyperactive knows parts of the search space

c_time2 = time.time()

hyper_dtr_1 = Hyperactive(verbosity=False)
hyper_dtr_1.add_search(dtr_model, search_space_dtr, n_iter=300, memory_warm_start=search_data_dtr_0)
hyper_dtr_1.run()

d_time2 = time.time() - c_time2
print("Optimization time 2:", round(d_time2, 2))

Optimization time 2: 1.84


In [79]:
search_data_dtr_1 = hyper_dtr_1.results(dtr_model)

search_data_dtr = search_data_dtr_1.append(search_data_dtr_0, ignore_index=True)


In [80]:
# times in seconds
eval_times = search_data_dtr_0["eval_time"]
eval_times_mem = search_data_dtr_1["eval_time"]

opt_times = search_data_dtr["iter_time"]-search_data_dtr["eval_time"]

In [81]:
fig = go.Figure()
fig.add_trace(go.Histogram(x=eval_times, name="evaluation time", nbinsx=15))
fig.add_trace(go.Histogram(x=eval_times_mem, name="evaluation time second run", nbinsx=15))
fig.add_trace(go.Histogram(x=opt_times, name="optimization time", nbinsx=15))
fig.show()

<font size="4">

    describe plot above