<img src="../Figures/Deephyper.png" style="height: 200px;">

<!--<h1><center>Hyperparameter search for classification with Tabular data</center></h1>-->

<div class="alert alert-info">
    
<b>Reference</b>
    
This tutorial is based on materials from the Keras Documentation:
* [Structured data classification from scratch](https://keras.io/examples/structured_data/structured_data_classification_from_scratch/)
    
</div>


<div class="alert alert-warning">

<b>Warning</b>
    
By design asyncio does not allow nested event loops. Jupyter is using Tornado which already starts an event loop. Therefore the following patch is required to run this tutorial.
    
This tutorial should be run with `tensorFlow>=2.6`.
    
</div>

# Hyperparameter search for classification with Tabular data

In [None]:
!pip install nest_asyncio

import nest_asyncio
nest_asyncio.apply()

<div class="alert alert-info">
    
<b>Note</b>
    
The following environment variables can be used to avoid the logging of **some** Tensorflow *DEBUG*, *INFO* and *WARNING* statements.
    
</div>

In [None]:
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = str(3)
os.environ["AUTOGRAPH_VERBOSITY"] = str(0)

## Imports

<div class="alert alert-block alert-danger">
    
<b>Danger</b> 

The following cell contains Tensorflow import `import tensorflow as tf`. It is important to follow this strategy instead of `from tensorflow.keras.layers import ...` to avoid non-serializable data, creating crashes during the search. For example, the original Keras tutorial was using the following set of imports which was creating a serialization error in our use case.
    
```python
from tensorflow import keras
from tensorflow.keras import layers
...
from tensorflow.keras.layers import IntegerLookup
from tensorflow.keras.layers import Normalization
from tensorflow.keras.layers import StringLookup
```
    
</div>

In [3]:
import ray
import pandas as pd
import tensorflow as tf

<div class="alert alert-info">
    
<b>Note</b>
    
The following can be used to detect if **GPU** devices are available on the current host. Therefore, this notebook will automatically adapt the parallel execution based on the ressources available locally. However, it will not be the case if many compute nodes are requested.
    
</div>

In [4]:
from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == "GPU"]

n_gpus = len(get_available_gpus())
is_gpu_available = n_gpus > 0

if is_gpu_available:
    print(f"{n_gpus} GPU{'s are' if n_gpus > 1 else ' is'} available.")
else:
    print("No GPU available")

No GPU available


### The dataset (from Keras.io)

The [dataset](https://archive.ics.uci.edu/ml/datasets/heart+Disease) is provided by the
Cleveland Clinic Foundation for Heart Disease.
It's a CSV file with 303 rows. Each row contains information about a patient (a
**sample**), and each column describes an attribute of the patient (a **feature**). We
use the features to predict whether a patient has a heart disease (**binary
classification**).

Here's the description of each feature:

Column| Description| Feature Type
------------|--------------------|----------------------
Age | Age in years | Numerical
Sex | (1 = male; 0 = female) | Categorical
CP | Chest pain type (0, 1, 2, 3, 4) | Categorical
Trestbpd | Resting blood pressure (in mm Hg on admission) | Numerical
Chol | Serum cholesterol in mg/dl | Numerical
FBS | fasting blood sugar in 120 mg/dl (1 = true; 0 = false) | Categorical
RestECG | Resting electrocardiogram results (0, 1, 2) | Categorical
Thalach | Maximum heart rate achieved | Numerical
Exang | Exercise induced angina (1 = yes; 0 = no) | Categorical
Oldpeak | ST depression induced by exercise relative to rest | Numerical
Slope | Slope of the peak exercise ST segment | Numerical
CA | Number of major vessels (0-3) colored by fluoroscopy | Both numerical & categorical
Thal | 3 = normal; 6 = fixed defect; 7 = reversible defect | Categorical
Target | Diagnosis of heart disease (1 = true; 0 = false) | Target

In [5]:
def load_data():
#     file_url = "http://storage.googleapis.com/download.tensorflow.org/data/heart.csv"
    file_url = "heart.csv"
    dataframe = pd.read_csv(file_url)

    val_dataframe = dataframe.sample(frac=0.2, random_state=1337)
    train_dataframe = dataframe.drop(val_dataframe.index)

    return train_dataframe, val_dataframe


def dataframe_to_dataset(dataframe):
    dataframe = dataframe.copy()
    labels = dataframe.pop("target")
    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    ds = ds.shuffle(buffer_size=len(dataframe))
    return ds

## Preprocessing & encoding of features

The next cells use `tf.keras.layers.Normalization()` to apply standard scaling on the features. Then, the `tf.keras.layers.StringLookup` and `tf.keras.layers.IntegerLookup` are used to encode categorical variables.

In [6]:
def encode_numerical_feature(feature, name, dataset):
    # Create a Normalization layer for our feature
    normalizer = tf.keras.layers.Normalization()

    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

    # Learn the statistics of the data
    normalizer.adapt(feature_ds)

    # Normalize the input feature
    encoded_feature = normalizer(feature)
    return encoded_feature


def encode_categorical_feature(feature, name, dataset, is_string):
    lookup_class = (
        tf.keras.layers.StringLookup if is_string else tf.keras.layers.IntegerLookup
    )
    # Create a lookup layer which will turn strings into integer indices
    lookup = lookup_class(output_mode="binary")

    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

    # Learn the set of possible string values and assign them a fixed integer index
    lookup.adapt(feature_ds)

    # Turn the string input into integer indices
    encoded_feature = lookup(feature)
    return encoded_feature

## Define the run-function

The run-function defines how the objective that we want to maximize is computed. It takes a `config` dictionnary as input and often returns a scalar value that we want to maximize. The `config` contains a sample value of hyperparameters to we want to tune. In this example we will search for:
* `units` (default value: `32`)
* `activation` (default value: `"relu"`)
* `dropout_rate` (default value: `0.5`)
* `num_epochs` (default value: `50`)
* `batch_size` (default value: `32`)
* `learning_rate` (default value: `1e-3`)
An hyperparameter value can be acessed easily in the dictionnary through the corresponding key, for example `config["units"]`.

In [7]:
def run(config: dict):
    tf.autograph.set_verbosity(0)
    
    train_dataframe, val_dataframe = load_data()

    train_ds = dataframe_to_dataset(train_dataframe)
    val_ds = dataframe_to_dataset(val_dataframe)

    train_ds = train_ds.batch(config["batch_size"])
    val_ds = val_ds.batch(config["batch_size"])

    # Categorical features encoded as integers
    sex = tf.keras.Input(shape=(1,), name="sex", dtype="int64")
    cp = tf.keras.Input(shape=(1,), name="cp", dtype="int64")
    fbs = tf.keras.Input(shape=(1,), name="fbs", dtype="int64")
    restecg = tf.keras.Input(shape=(1,), name="restecg", dtype="int64")
    exang = tf.keras.Input(shape=(1,), name="exang", dtype="int64")
    ca = tf.keras.Input(shape=(1,), name="ca", dtype="int64")

    # Categorical feature encoded as string
    thal = tf.keras.Input(shape=(1,), name="thal", dtype="string")

    # Numerical features
    age = tf.keras.Input(shape=(1,), name="age")
    trestbps = tf.keras.Input(shape=(1,), name="trestbps")
    chol = tf.keras.Input(shape=(1,), name="chol")
    thalach = tf.keras.Input(shape=(1,), name="thalach")
    oldpeak = tf.keras.Input(shape=(1,), name="oldpeak")
    slope = tf.keras.Input(shape=(1,), name="slope")

    all_inputs = [
        sex,
        cp,
        fbs,
        restecg,
        exang,
        ca,
        thal,
        age,
        trestbps,
        chol,
        thalach,
        oldpeak,
        slope,
    ]

    # Integer categorical features
    sex_encoded = encode_categorical_feature(sex, "sex", train_ds, False)
    cp_encoded = encode_categorical_feature(cp, "cp", train_ds, False)
    fbs_encoded = encode_categorical_feature(fbs, "fbs", train_ds, False)
    restecg_encoded = encode_categorical_feature(restecg, "restecg", train_ds, False)
    exang_encoded = encode_categorical_feature(exang, "exang", train_ds, False)
    ca_encoded = encode_categorical_feature(ca, "ca", train_ds, False)

    # String categorical features
    thal_encoded = encode_categorical_feature(thal, "thal", train_ds, True)

    # Numerical features
    age_encoded = encode_numerical_feature(age, "age", train_ds)
    trestbps_encoded = encode_numerical_feature(trestbps, "trestbps", train_ds)
    chol_encoded = encode_numerical_feature(chol, "chol", train_ds)
    thalach_encoded = encode_numerical_feature(thalach, "thalach", train_ds)
    oldpeak_encoded = encode_numerical_feature(oldpeak, "oldpeak", train_ds)
    slope_encoded = encode_numerical_feature(slope, "slope", train_ds)

    all_features = tf.keras.layers.concatenate(
        [
            sex_encoded,
            cp_encoded,
            fbs_encoded,
            restecg_encoded,
            exang_encoded,
            slope_encoded,
            ca_encoded,
            thal_encoded,
            age_encoded,
            trestbps_encoded,
            chol_encoded,
            thalach_encoded,
            oldpeak_encoded,
        ]
    )
    x = tf.keras.layers.Dense(config["units"], activation=config["activation"])(
        all_features
    )
    x = tf.keras.layers.Dropout(config["dropout_rate"])(x)
    output = tf.keras.layers.Dense(1, activation="sigmoid")(x)
    model = tf.keras.Model(all_inputs, output)

    optimizer = tf.keras.optimizers.Adam(learning_rate=config["learning_rate"])
    model.compile(optimizer, "binary_crossentropy", metrics=["accuracy"])

    history = model.fit(
        train_ds, epochs=config["num_epochs"], validation_data=val_ds, verbose=0
    )

    return history.history["val_accuracy"][-1]

<div class="alert alert-success">
    
<b>Important</b>
    
The objective maximised by DeepHyper is the scalar value returned by the `run`-function. In this tutorial it corresponds to the validation accuracy of the last epoch of training which we retrieve in the `History` object returned by the `model.fit(...)` call.
    
```python
...
history = model.fit(
    train_ds, epochs=config["num_epochs"], validation_data=val_ds, verbose=0
)

return history.history["val_accuracy"][-1]
...
```

Using an objective like `max(history.history["val_accuracy"])` can have undesired effect such as training curves passing by a local maximum and then dropping which will not generate a model in capacity of ingesting well more data in the future.
    
</div>



## Evaluate a default configuration

We evaluate the performance of the default set of hyperparameters provided in the Keras tutorial.

In [8]:
# We define a dictionnary for the default values
default_config = {
    "units": 32,
    "activation": "relu",
    "dropout_rate": 0.5,
    "num_epochs": 50,
    "batch_size": 32,
    "learning_rate": 1e-3,
}

# We launch the Ray run-time depending of the detected local ressources
# and execute the `run` function with the default configuration
# WARNING: in the case of GPUs it is important to follow this scheme
# to avoid multiple processes (Ray workers vs current process) to lock
# the same GPU.
if is_gpu_available:
    
    if not(ray.is_initialized()):
        ray.init(num_cpus=n_gpus, num_gpus=n_gpus, log_to_driver=False)
    
    run_default = ray.remote(num_cpus=1, num_gpus=1)(run)
    
    objective_default = ray.get(run_default.remote(default_config))
    
else:
    
    if not(ray.is_initialized()):
        ray.init(num_cpus=1, log_to_driver=False)
    
    run_default = run
    
    objective_default = run_default(default_config)
    
print(f"Accuracy Default Configuration:  {objective_default:.3f}")

2021-10-05 13:39:31,889	INFO services.py:1263 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


{'node_ip_address': '10.128.9.67',
 'raylet_ip_address': '10.128.9.67',
 'redis_address': '10.128.9.67:6379',
 'object_store_address': '/tmp/ray/session_2021-10-05_13-39-29_641068_1971/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2021-10-05_13-39-29_641068_1971/sockets/raylet',
 'webui_url': '127.0.0.1:8265',
 'session_dir': '/tmp/ray/session_2021-10-05_13-39-29_641068_1971',
 'metrics_export_port': 35130,
 'node_id': '72abf59e47f2371c8b03327da5d2a53af3c3537da7b64b1683318e4a'}

Accuracy Default Configuration:  0.787


## Define the Hyperparameter optimization problem

Hyperparameter ranges are defined using the following syntax:

* Discrete integer ranges are generated from a tuple `(lower: int, upper: int)`
* Continuous prarameters are generated from a tuple `(lower: float, upper: float)`
* Categorical or nonordinal hyperparameter ranges can be given as a list of possible values `[val1, val2, ...]`

We provide the default configuration of hyperparameters as a starting point of the problem.

In [9]:
from deephyper.problem import HpProblem


problem = HpProblem()

# Discrete hyperparameter (sampled with uniform prior)
problem.add_hyperparameter((8, 128), "units")

# Categorical hyperparameter (sampled with uniform prior)
ACTIVATIONS = ["elu", "gelu", "hard_sigmoid", "linear", "relu", "selu",
    "sigmoid", "softplus", "softsign", "swish", "tanh",
]
problem.add_hyperparameter(ACTIVATIONS, "activation")

# Real hyperparameter (sampled with uniform prior)
problem.add_hyperparameter((0.0, 0.6), "dropout_rate")

problem.add_hyperparameter((10, 100), "num_epochs")

# Discrete and Real hyperparameters (sampled with log-uniform)
problem.add_hyperparameter((8, 256, "log-uniform"), "batch_size")
problem.add_hyperparameter((1e-5, 1e-2, "log-uniform"), "learning_rate")

# Add a starting point to try first
problem.add_starting_point(**default_config)

problem

units, Type: UniformInteger, Range: [8, 128], Default: 68

activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: elu

dropout_rate, Type: UniformFloat, Range: [0.0, 0.6], Default: 0.3

num_epochs, Type: UniformInteger, Range: [10, 100], Default: 55

batch_size, Type: UniformInteger, Range: [8, 256], Default: 45, on log-scale

learning_rate, Type: UniformFloat, Range: [1e-05, 0.01], Default: 0.0003162278, on log-scale

Configuration space object:
  Hyperparameters:
    activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: elu
    batch_size, Type: UniformInteger, Range: [8, 256], Default: 45, on log-scale
    dropout_rate, Type: UniformFloat, Range: [0.0, 0.6], Default: 0.3
    learning_rate, Type: UniformFloat, Range: [1e-05, 0.01], Default: 0.0003162278, on log-scale
    num_epochs, Type: UniformInteger, Range: [10, 100], Default: 55
    units, Type: UniformInteger, Range: [8, 128], Default: 68


  Starting Point:
{0: {'activation': 'relu',
     'batch_size': 32,
     'dropout_rate': 0.5,
     'learning_rate': 0.001,
     'num_epochs': 50,
     'units': 32}}

## Define the evaluator object

The `Evaluator` object allows to change the parallelization backend used by DeepHyper. It is a standalone object which schedule the execution of remote tasks. All evaluators needs a `run_function` to be instantiated. Then a keyword `method` defines the backend (e.g., `"ray"`) and the `method_kwargs` corresponds to keyword arguments of this chosen `method`.

```python
evaluator = Evaluator.create(run_function, method, method_kwargs)
```

Once created the `evaluator.num_workers` gives access to the number of available parallel workers.

Finally, to submit and collect tasks to the evaluator one just needs to use the following interface:

```python
configs = [...]
evaluator.submit(configs)
...
tasks_done = evaluator.get("BATCH", size=1) # For asynchronous
tasks_done = evaluator.get("ALL") # For batch synchronous
```

<div class="alert alert-warning">

<b>Warning</b>

Each `Evaluator` saves its own state, therefore it is crutial to create a new evaluator when launching a fresh search.
    
</div>


In [10]:
from deephyper.evaluator.evaluate import Evaluator
from deephyper.evaluator.callback import LoggerCallback


def get_evaluator(run_function):
    
    # Default arguments for Ray: 1 worker and 1 worker per evaluation
    method_kwargs = {
        "num_cpus": 1, 
        "num_cpus_per_task": 1,
        "callbacks": [LoggerCallback()]
    }

    # If GPU devices are detected then it will create 'n_gpus' workers
    # and use 1 worker for each evaluation
    if is_gpu_available:
        method_kwargs["num_cpus"] = n_gpus
        method_kwargs["num_gpus"] = n_gpus
        method_kwargs["num_cpus_per_task"] = 1
        method_kwargs["num_gpus_per_task"] = 1

    evaluator = Evaluator.create(
        run_function, 
        method="ray", 
        method_kwargs=method_kwargs
    )
    print(f"Created new evaluator with {evaluator.num_workers} worker{'s' if evaluator.num_workers > 1 else ''} and config: {method_kwargs}", )
    
    return evaluator

evaluator_1 = get_evaluator(run)

Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.LoggerCallback object at 0x7f0c4c0b5e80>]}


## Define and run the asynchronous model-based search

A primary pillar of hyperparameter search in DeepHyper is given by an asynchronous parallel model-based search paradigm (henceforth AMBS). AMBS may be described in the following algorithm:

&nbsp;   
<div class="float" style="font-size:2.5em;line-height:2.0;text-align:center;align:center;">
    <ol>
        <li> $\mathcal{X}_{S}\leftarrow$ <code>random_sample_configs</code>$(\mathcal{D})$ </li>
        <li><code>add_eval_batch</code> $(\mathcal{X}_{S})$</li>
        <li> <b><code>while</code></b> <code>stopping criterion not met</code> <b><code>do</code></b>:</li>
        <ul>
            <li> $(\mathcal{X}_{r}, \mathcal{Y}_{r})\leftarrow$ <code>get_finished_evals</code>$()$</li>
            <li> $s\leftarrow|\mathcal{Y}_{r}|$</li>
            <li> <b>if</b> $s>0$ <b>then</b>:</li>
            <ul>
                <li>$\mathcal{X}_{\mathrm{out}}\leftarrow \mathcal{X}_{\mathrm{out}}\bigcup\mathcal{X}_{r}; \mathcal{Y}_{\mathrm{out}}\leftarrow\mathcal{Y}_{\mathrm{out}}\bigcup\mathcal{Y}_{r}$</li>
                <li> $\mathcal{M}\leftarrow$<code>Fit</code>$(\mathcal{X}_{\mathrm{out}}, \mathcal{Y}_{\mathrm{out}})$</li>
                <li>$\mathcal{D}\leftarrow \mathcal{D}-\mathcal{X}_{r}$</li>
                <li>$\mathcal{X}_{s}\leftarrow$<code>sample_configs</code>$(\mathcal{M}, \mathcal{D})$</li>
                <li><code>add_eval_batch</code>$(\mathcal{X}_{s})$</li>
            </ul>
            <code>end <b>if</b></code>
        </ul>
        <code>end <b>while</b></code>
    </ol>
    <ul>
        <li> <b>Output:</b> Best hyperparameter configuration(s) from $\mathcal{X}_{\mathrm{out}}$</li>
    </ul>
</div>


<img src="./Figures/AMBS.png" width=40% align="center">

Following the parallelized evaluation of these configurations, a low-fidelity and high efficiency model (henceforth "the surrogate") is devised to reproduce the relationship between the input variables involved in the model (i.e., the choice of hyperparameters) and the outputs (which are generally a measure of validation data accuracy). After obtaining this surrogate of the validation accuracy, we may utilize ideas from classical methods in Bayesian optimization literature for adaptively sample the search space of hyperparameters.

First, the surrogate is used to obtain an estimate for the mean value of the validation accuracy at a certain sampling location $x$ in addition to an estimated variance. The latter requirement restricts us to the use of high efficiency data-driven modeling strategies that have inbuilt variance estimates (such as a Gaussian process or Random Forest regressor). Regions where the mean is high represent opportunities for exploitation and regions where the variance is high represent opportunities for exploration. An optimistic acquisition function called UCB can be constructed using these two quantities:

$$L_{\text{UCB}}(x) = \mu(x) + \kappa \cdot \sigma(x)$$

The *unevaluated* hyperparameter configurations that *maximize* the acquisition function are chosen for the next batch of evaluations. Note that the choice of the variance weighting parameter $\kappa$ controls the degree of exploration in the hyperparameter search with zero indicating purely exploitation (unseen configurations where the predicted accuracy is highest will be sampled). The top `s` configurations are selected for the new batch. The following schematic demonstrates this process:

<img src="Figures/BO_AF.png" width=50%>

The process of obtaining `s` configurations relies on the "constant-liar" strategy where a sampled configuration is mapped to a dummy output given by a bulk metric of all the evaluated configurations thus far (such as the maximum, mean or median validation accuracy). Prior to sampling the next configuration by acquisition function maximization, the surrogate is retrained with the dummy output as a data point. As the true validation accuracy becomes available for one of the sampled configurations, the dummy output is replaced and the surrogate is updated. This allows for scalable asynchronous (or batch synchronous) sampling of new hyperparameter configurations. 

####  Choice of surrogate model

Users should note that our choice of the surrogate is given by the Random Forest regressor due to its ability to handle non-ordinal data (hyperparameter configurations may not be purely continuous or even numerical). Evidence for how they outperform other methods (such as Gaussian processes) is also available in [1]

<img src="Figures/RFR.png" width=45% align=left>

<img src="Figures/RFR_Superior.png" width=53% align=right>


In [11]:
from deephyper.search.hps import AMBS

# Uncomment the following line to show the arguments of AMBS.
# AMBS?

In [12]:
# Instanciate the search with the problem and a specific evaluator
search = AMBS(problem, evaluator_1)

<div class="alert alert-info">
    
<b>Note</b>
    
All DeepHyper's search algorithm have two stopping criteria:

* `max_evals`: (*int*) which defines the maximum number of evaluations that we want to perform. Default to `-1` for an infinite number.
* `timeout`: (positive *int*) which defines a time budget (in secondes) before stopping the search. Default to `None` for an infinite time budget.
    
</div>

In [13]:
results = search.search(max_evals=10)

[00001] -- best objective: 0.8360655903816223 -- received objective: 0.8360655903816223
[00002] -- best objective: 0.8360655903816223 -- received objective: 0.8032786846160889
[00003] -- best objective: 0.8360655903816223 -- received objective: 0.8196721076965332
[00004] -- best objective: 0.8360655903816223 -- received objective: 0.8032786846160889
[00005] -- best objective: 0.8360655903816223 -- received objective: 0.7704917788505554
[00006] -- best objective: 0.8360655903816223 -- received objective: 0.8360655903816223
[00007] -- best objective: 0.8360655903816223 -- received objective: 0.7868852615356445
[00008] -- best objective: 0.8360655903816223 -- received objective: 0.8196721076965332
[00009] -- best objective: 0.8360655903816223 -- received objective: 0.8032786846160889
[00010] -- best objective: 0.8360655903816223 -- received objective: 0.8032786846160889


<div class="alert alert-warning">

<b>Warning</b>
    
The `search` call does not output any information about the current status of the search. However, `results.csv` file is created in the local directly and can be visualized to see finished tasks.
    
</div>

The returned `results` is a Pandas Dataframe where columns are hyperparameters and information stored by the evaluator:

* `id` is a unique identifier corresponding to the order of creation of tasks
* `objective` is the value returned by the run-function
* `elapsed_sec` is the time (in seconds) when the task completed since the creation of the evaluator.
* `duration` is the duration (in seconds) of the task to be computed.

In [14]:
results

Unnamed: 0,activation,batch_size,dropout_rate,learning_rate,num_epochs,units,id,objective,elapsed_sec,duration
0,relu,32,0.5,0.001,50,32,1,0.836066,16.854036,9.364318
1,elu,10,0.066552,0.004223,87,100,2,0.803279,26.910806,9.720668
2,selu,76,0.363856,0.00367,51,59,3,0.819672,31.909261,4.78683
3,relu,51,0.008552,0.001566,45,15,4,0.803279,36.990385,4.704321
4,relu,25,0.194997,6.8e-05,72,19,5,0.770492,43.557472,6.348423
5,selu,13,0.545874,8.1e-05,14,126,6,0.836066,47.503983,3.715102
6,swish,29,0.494867,0.001171,78,30,7,0.786885,54.385174,6.665219
7,selu,14,0.507827,2.9e-05,60,13,8,0.819672,61.391547,6.790512
8,gelu,71,0.539577,6.2e-05,39,30,9,0.803279,66.061311,4.452444
9,relu,24,0.222418,2.9e-05,43,44,10,0.803279,71.372907,5.091154


The search can be continued without any issue.

In [15]:
results = search.search(max_evals=5)

results

[00011] -- best objective: 0.8360655903816223 -- received objective: 0.7704917788505554
[00012] -- best objective: 0.8360655903816223 -- received objective: 0.8032786846160889
[00013] -- best objective: 0.8360655903816223 -- received objective: 0.7868852615356445
[00014] -- best objective: 0.8360655903816223 -- received objective: 0.7868852615356445
[00015] -- best objective: 0.8360655903816223 -- received objective: 0.8032786846160889


Unnamed: 0,activation,batch_size,dropout_rate,learning_rate,num_epochs,units,id,objective,elapsed_sec,duration
0,relu,32,0.5,0.001,50,32,1,0.836066,16.854036,9.364318
1,elu,10,0.066552,0.004223,87,100,2,0.803279,26.910806,9.720668
2,selu,76,0.363856,0.00367,51,59,3,0.819672,31.909261,4.78683
3,relu,51,0.008552,0.001566,45,15,4,0.803279,36.990385,4.704321
4,relu,25,0.194997,6.8e-05,72,19,5,0.770492,43.557472,6.348423
5,selu,13,0.545874,8.1e-05,14,126,6,0.836066,47.503983,3.715102
6,swish,29,0.494867,0.001171,78,30,7,0.786885,54.385174,6.665219
7,selu,14,0.507827,2.9e-05,60,13,8,0.819672,61.391547,6.790512
8,gelu,71,0.539577,6.2e-05,39,30,9,0.803279,66.061311,4.452444
9,relu,24,0.222418,2.9e-05,43,44,10,0.803279,71.372907,5.091154


Now that the search is over, let us print the best configuration found during this run.

In [16]:
i_max = results.objective.argmax()
best_config = results.iloc[i_max][:-3].to_dict()

print(f"The default configuration has an accuracy of {objective_default:.3f}. " 
      f"The best configuration found by DeepHyper has an accuracy {results['objective'].iloc[i_max]:.3f}, " 
      f"trained in {results['duration'].iloc[i_max]:.2f} secondes and "
      f"finished after {results['elapsed_sec'].iloc[i_max]:.2f} secondes of search.")



best_config

The default configuration has an accuracy of 0.803. The best configuration found by DeepHyper has an accuracy 0.836, trained in 9.36 secondes and finished after 16.85 secondes of search.


{'activation': 'relu',
 'batch_size': 32,
 'dropout_rate': 0.5,
 'learning_rate': 0.001,
 'num_epochs': 50,
 'units': 32,
 'id': 1}

## Restart from a checkpoint

It can often be useful to continue the search from previous results. For example, if the allocation requested was not enough or if an unexpected crash happened. The `AMBS` searhc provides the `fit_surrogate(dataframe_of_results)` method for this use case. 

To simulate this we create a second evaluator `evaluator_2` and start a fresh AMBS search with strong explotation `kappa=0.001`.

In [17]:
evaluator_2 = get_evaluator(run)

search_from_checkpoint = AMBS(problem, evaluator_2, kappa=0.001)

# Initialize surrogate model of Bayesian optization (in AMBS)
# With results of previous search
search_from_checkpoint.fit_surrogate(results)

Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.LoggerCallback object at 0x7f7f2459d460>]}


In [18]:
results_from_checkpoint = search_from_checkpoint.search(max_evals=10)

[00001] -- best objective: 0.7868852615356445 -- received objective: 0.7868852615356445
[00002] -- best objective: 0.8524590134620667 -- received objective: 0.8524590134620667
[00003] -- best objective: 0.8524590134620667 -- received objective: 0.8032786846160889
[00004] -- best objective: 0.8524590134620667 -- received objective: 0.8196721076965332
[00005] -- best objective: 0.8524590134620667 -- received objective: 0.8032786846160889
[00006] -- best objective: 0.8524590134620667 -- received objective: 0.32786884903907776
[00007] -- best objective: 0.8524590134620667 -- received objective: 0.8360655903816223
[00008] -- best objective: 0.8524590134620667 -- received objective: 0.7868852615356445
[00009] -- best objective: 0.8524590134620667 -- received objective: 0.8196721076965332
[00010] -- best objective: 0.8524590134620667 -- received objective: 0.8524590134620667


In [19]:
results_from_checkpoint

Unnamed: 0,activation,batch_size,dropout_rate,learning_rate,num_epochs,units,id,objective,elapsed_sec,duration
0,selu,15,0.570155,0.008827,17,105,1,0.786885,5.952607,3.99087
1,selu,8,0.505059,0.000235,37,72,2,0.852459,12.374077,6.199955
2,selu,9,0.04027,0.00059,59,77,3,0.803279,20.342372,7.748084
3,hard_sigmoid,8,0.365839,0.001108,33,121,4,0.819672,26.322549,5.666041
4,softsign,10,0.466625,9e-05,35,55,5,0.803279,32.252039,5.701202
5,elu,166,0.434878,2.2e-05,11,118,6,0.327869,35.795487,3.316669
6,selu,13,0.471462,0.000259,29,103,7,0.836066,40.873278,4.858704
7,selu,8,0.057613,0.000342,56,83,8,0.786885,48.98549,7.893476
8,selu,9,0.595284,4.3e-05,28,76,9,0.819672,54.308821,5.084902
9,selu,8,0.468896,0.000903,30,13,10,0.852459,60.151635,5.621282


In [20]:
i_max = results_from_checkpoint.objective.argmax()
best_config = results_from_checkpoint.iloc[i_max][:-3].to_dict()

print(f"The default configuration has an accuracy of {objective_default:.3f}. " 
      f"The best configuration found by DeepHyper has an accuracy {results_from_checkpoint['objective'].iloc[i_max]:.3f}, " 
      f"trained in {results_from_checkpoint['duration'].iloc[i_max]:.2f} secondes and "
      f"finished after {results_from_checkpoint['elapsed_sec'].iloc[i_max]:.2f} secondes of search.")

best_config

The default configuration has an accuracy of 0.803. The best configuration found by DeepHyper has an accuracy 0.852, trained in 6.20 secondes and finished after 12.37 secondes of search.


{'activation': 'selu',
 'batch_size': 8,
 'dropout_rate': 0.5050594136122702,
 'learning_rate': 0.0002345033647209,
 'num_epochs': 37,
 'units': 72,
 'id': 2}

## Add conditional hyperparameters

Now we want to add the possibility to search for a second fully-connected layer. We simply add two new lines:

```python
if config.get("dense_2", False):
    x = tf.keras.layers.Dense(config["dense_2:units"], activation=config["dense_2:activation"])(x)
```

In [21]:
def run_with_condition(config: dict):
    tf.autograph.set_verbosity(0)
    
    train_dataframe, val_dataframe = load_data()

    train_ds = dataframe_to_dataset(train_dataframe)
    val_ds = dataframe_to_dataset(val_dataframe)

    train_ds = train_ds.batch(config["batch_size"])
    val_ds = val_ds.batch(config["batch_size"])

    # Categorical features encoded as integers
    sex = tf.keras.Input(shape=(1,), name="sex", dtype="int64")
    cp = tf.keras.Input(shape=(1,), name="cp", dtype="int64")
    fbs = tf.keras.Input(shape=(1,), name="fbs", dtype="int64")
    restecg = tf.keras.Input(shape=(1,), name="restecg", dtype="int64")
    exang = tf.keras.Input(shape=(1,), name="exang", dtype="int64")
    ca = tf.keras.Input(shape=(1,), name="ca", dtype="int64")

    # Categorical feature encoded as string
    thal = tf.keras.Input(shape=(1,), name="thal", dtype="string")

    # Numerical features
    age = tf.keras.Input(shape=(1,), name="age")
    trestbps = tf.keras.Input(shape=(1,), name="trestbps")
    chol = tf.keras.Input(shape=(1,), name="chol")
    thalach = tf.keras.Input(shape=(1,), name="thalach")
    oldpeak = tf.keras.Input(shape=(1,), name="oldpeak")
    slope = tf.keras.Input(shape=(1,), name="slope")

    all_inputs = [
        sex,
        cp,
        fbs,
        restecg,
        exang,
        ca,
        thal,
        age,
        trestbps,
        chol,
        thalach,
        oldpeak,
        slope,
    ]

    # Integer categorical features
    sex_encoded = encode_categorical_feature(sex, "sex", train_ds, False)
    cp_encoded = encode_categorical_feature(cp, "cp", train_ds, False)
    fbs_encoded = encode_categorical_feature(fbs, "fbs", train_ds, False)
    restecg_encoded = encode_categorical_feature(restecg, "restecg", train_ds, False)
    exang_encoded = encode_categorical_feature(exang, "exang", train_ds, False)
    ca_encoded = encode_categorical_feature(ca, "ca", train_ds, False)

    # String categorical features
    thal_encoded = encode_categorical_feature(thal, "thal", train_ds, True)

    # Numerical features
    age_encoded = encode_numerical_feature(age, "age", train_ds)
    trestbps_encoded = encode_numerical_feature(trestbps, "trestbps", train_ds)
    chol_encoded = encode_numerical_feature(chol, "chol", train_ds)
    thalach_encoded = encode_numerical_feature(thalach, "thalach", train_ds)
    oldpeak_encoded = encode_numerical_feature(oldpeak, "oldpeak", train_ds)
    slope_encoded = encode_numerical_feature(slope, "slope", train_ds)

    all_features = tf.keras.layers.concatenate(
        [
            sex_encoded,
            cp_encoded,
            fbs_encoded,
            restecg_encoded,
            exang_encoded,
            slope_encoded,
            ca_encoded,
            thal_encoded,
            age_encoded,
            trestbps_encoded,
            chol_encoded,
            thalach_encoded,
            oldpeak_encoded,
        ]
    )
    x = tf.keras.layers.Dense(config["units"], activation=config["activation"])(
        all_features
    )
    if config.get("dense_2", False):
        x = tf.keras.layers.Dense(config["dense_2:units"], activation=config["dense_2:activation"])(x)
    x = tf.keras.layers.Dropout(config["dropout_rate"])(x)
    output = tf.keras.layers.Dense(1, activation="sigmoid")(x)
    model = tf.keras.Model(all_inputs, output)

    optimizer = tf.keras.optimizers.Adam(learning_rate=config["learning_rate"])
    model.compile(optimizer, "binary_crossentropy", metrics=["accuracy"])

    history = model.fit(
        train_ds, epochs=config["num_epochs"], validation_data=val_ds, verbose=0
    )

    return history.history["val_accuracy"][-1]

To defined conditionnal hyperparameters we use [ConfigSpace](https://automl.github.io/ConfigSpace/master/index.html). We define `dense_2:units` and `dense_2:activation` as active hyperparameters only when `dense_2 == True`. The `cs.EqualsCondition` help us do that. Then we call

```python
problem_with_condition.add_condition(condition)
```

to register each new condition to the `HpProblem`.

In [22]:
import ConfigSpace as cs

# Define the same hyperparameters as before
problem_with_condition = HpProblem()
problem_with_condition.add_hyperparameter((8, 128), "units")
problem_with_condition.add_hyperparameter(ACTIVATIONS, "activation")
problem_with_condition.add_hyperparameter((0.0, 0.6), "dropout_rate")
problem_with_condition.add_hyperparameter((10, 100), "num_epochs")
problem_with_condition.add_hyperparameter((8, 256, "log-uniform"), "batch_size")
problem_with_condition.add_hyperparameter((1e-5, 1e-2, "log-uniform"), "learning_rate")

# Add a new hyperparameter "dense_2 (bool)" to decide if a second fully-connected layer should be created
hp_dense_2 = problem_with_condition.add_hyperparameter([True, False], "dense_2")
hp_dense_2_units = problem_with_condition.add_hyperparameter((8, 128), "dense_2:units")
hp_dense_2_activation = problem_with_condition.add_hyperparameter(ACTIVATIONS, "dense_2:activation")

problem_with_condition.add_condition(cs.EqualsCondition(hp_dense_2_units, hp_dense_2, True))
problem_with_condition.add_condition(cs.EqualsCondition(hp_dense_2_activation, hp_dense_2, True))


problem_with_condition

units, Type: UniformInteger, Range: [8, 128], Default: 68

activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: elu

dropout_rate, Type: UniformFloat, Range: [0.0, 0.6], Default: 0.3

num_epochs, Type: UniformInteger, Range: [10, 100], Default: 55

batch_size, Type: UniformInteger, Range: [8, 256], Default: 45, on log-scale

learning_rate, Type: UniformFloat, Range: [1e-05, 0.01], Default: 0.0003162278, on log-scale

Configuration space object:
  Hyperparameters:
    activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: elu
    batch_size, Type: UniformInteger, Range: [8, 256], Default: 45, on log-scale
    dense_2, Type: Categorical, Choices: {True, False}, Default: True
    dense_2:activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: elu
    dense_2:units, Type: UniformInteger, Range: [8, 128], Default: 68
    dropout_rate, Type: UniformFloat, Range: [0.0, 0.6], Default: 0.3
    learning_rate, Type: UniformFloat, Range: [1e-05, 0.01], Default: 0.0003162278, on log-scale
    num_epochs, Type: UniformInteger, Range: [10, 100], Default: 55
    units, Type: UniformInteger, Range: [8, 128], Default: 68
  Conditions:
    dense_2:activation | dense_2 == True
    dense_2:units | dense_2 == True

We create a new evaluator `evaluator_3` and start a fresh AMBS search with this new problem `problem_with_condition`.

In [23]:
evaluator_3 = get_evaluator(run_with_condition)

search_with_condition = AMBS(problem_with_condition, evaluator_3)

Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.LoggerCallback object at 0x7f7f34465700>]}


In [24]:
results_with_condition = search_with_condition.search(max_evals=20)

[00001] -- best objective: 0.39344263076782227 -- received objective: 0.39344263076782227
[00002] -- best objective: 0.8196721076965332 -- received objective: 0.8196721076965332
[00003] -- best objective: 0.8196721076965332 -- received objective: 0.7868852615356445
[00004] -- best objective: 0.8360655903816223 -- received objective: 0.8360655903816223
[00005] -- best objective: 0.8360655903816223 -- received objective: 0.7868852615356445
[00006] -- best objective: 0.8360655903816223 -- received objective: 0.8032786846160889
[00007] -- best objective: 0.8360655903816223 -- received objective: 0.7868852615356445
[00008] -- best objective: 0.8360655903816223 -- received objective: 0.7540983557701111
[00009] -- best objective: 0.8360655903816223 -- received objective: 0.8196721076965332
[00010] -- best objective: 0.8360655903816223 -- received objective: 0.7868852615356445
[00011] -- best objective: 0.8360655903816223 -- received objective: 0.8196721076965332
[00012] -- best objective: 0.8

In [25]:
results_with_condition

Unnamed: 0,activation,batch_size,dense_2,dropout_rate,learning_rate,num_epochs,units,dense_2:activation,dense_2:units,id,objective,elapsed_sec,duration
0,swish,157,False,0.2059,3.7e-05,24,101,,,1,0.393443,4.850309,3.563063
1,linear,91,True,0.198863,2.9e-05,56,108,relu,35.0,2,0.819672,10.106517,4.97671
2,softplus,29,True,0.402332,2.7e-05,56,80,relu,57.0,3,0.786885,16.074215,5.687785
3,relu,66,True,0.095036,0.000103,58,58,relu,112.0,4,0.836066,21.606031,5.254802
4,swish,142,True,0.10415,0.000695,91,52,tanh,112.0,5,0.786885,28.211668,6.32618
5,relu,10,False,0.315562,7.9e-05,49,46,,,6,0.803279,35.018749,6.528614
6,swish,92,True,0.053399,0.00895,14,25,relu,36.0,7,0.786885,38.675885,3.37719
7,relu,29,True,0.190145,5.6e-05,16,96,softplus,79.0,8,0.754098,42.676129,3.720201
8,swish,26,True,0.550226,1.2e-05,76,123,relu,104.0,9,0.819672,49.890837,6.929704
9,relu,177,True,0.382285,0.000108,60,75,softsign,119.0,10,0.786885,55.358827,5.183737


Finally, let us print out the best configuration found from this conditionned search space.

In [26]:
i_max = results_with_condition.objective.argmax()
best_config = results_with_condition.iloc[i_max][:-3].to_dict()

print(f"The default configuration has an accuracy of {objective_default:.3f}. " 
      f"The best configuration found by DeepHyper has an accuracy {results_with_condition['objective'].iloc[i_max]:.3f}, " 
      f"trained in {results_with_condition['duration'].iloc[i_max]:.2f} secondes and "
      f"finished after {results_with_condition['elapsed_sec'].iloc[i_max]:.2f} secondes of search.")

best_config

The default configuration has an accuracy of 0.803. The best configuration found by DeepHyper has an accuracy 0.852, trained in 4.73 secondes and finished after 82.14 secondes of search.


{'activation': 'swish',
 'batch_size': 156,
 'dense_2': False,
 'dropout_rate': 0.1359758497942754,
 'learning_rate': 0.0005434980114817,
 'num_epochs': 51,
 'units': 126,
 'dense_2:activation': nan,
 'dense_2:units': nan,
 'id': 15}