# Multi-Fidelity Hyperparameter Optimization with Keras

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deephyper/tutorials/blob/main/tutorials/colab/HPS_basic_classification_with_tabular_data/notebook.ipynb)

In this tutorial we present how to use hyperparameter optimization on a basic example from the Keras documentation. We follow the previous tutorial based on the same example and add multi-fidelity to it. The purpose of multi-fidelity is to dynamically manage the budget allocated (also called fidelity) to evaluate an hyperparameter configuration. For example, when training a deep neural network the number of epochs can be continued or stopped based on currently observed performance and some policy.

In DeepHyper, the multi-fidelity agent is designed separately from the hyperparameter search agent. Of course, both can communicate but from an API perspective they are different objects. The multi-fidelity agents are called `Stopper` in DeepHyper and their documentation can be found at [deephyper.stopper](https://deephyper.readthedocs.io/en/latest/_autosummary/deephyper.stopper.html). 

In this notebook, we will demonstrate how to use multi-fidelity inside sequential Bayesian optimization. When moving to a distributed setting, it is important to use a shared database accessible by all workers otherwise the multi-fidelity scheme may not work properly. An example, of database instanciation for parallel computing is explained in: [Introduction to Distributed Bayesian Optimization (DBO) with MPI (Communication) and Redis (Storage)](https://deephyper.readthedocs.io/en/latest/tutorials/tutorials/scripts/02_Intro_to_DBO/README.html).

**Reference**:
 This tutorial is based on materials from the Keras Documentation: [Structured data classification from scratch](https://keras.io/examples/structured_data/structured_data_classification_from_scratch/)

Let us start with installing DeepHyper!
    
<div class="alert alert-warning">

<b>Warning</b>
    
This tutorial should be run with `tensorflow>=2.6`.
    
</div>

In [1]:
!pip install "deephyper[jax-cpu]"
import deephyper
print(deephyper.__version__)

Collecting grpcio<=1.49.1,>=1.32.0
  Downloading grpcio-1.49.1.tar.gz (22.1 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m22.1/22.1 MB[0m [31m926.5 kB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: grpcio
  Building wheel for grpcio (setup.py) ... [?25ldone
[?25h  Created wheel for grpcio: filename=grpcio-1.49.1-cp39-cp39-macosx_11_0_arm64.whl size=3343063 sha256=7ea43983c514bb15eeea5cc12e74b206ca43fed14095198df6ce0812349151ed
  Stored in directory: /Users/romainegele/Library/Caches/pip/wheels/35/a0/8c/de46f52c6cde99252a495c2f83232f7ce94f847c22eced1837
Successfully built grpcio
Installing collected packages: grpcio
  Attempting uninstall: grpcio
    Found existing installation: grpcio 1.51.3
    Uninstalling grpcio-1.51.3:
      Successfully uninstalled grpcio-1.51.3
Successfully installed grpcio-1.49.1
0.5.0


<div class="alert alert-info">
    
<b>Note</b>
    
The following environment variables can be used to avoid the logging of **some** Tensorflow *DEBUG*, *INFO* and *WARNING* statements.
    
</div>

In [2]:
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = str(4)
os.environ["AUTOGRAPH_VERBOSITY"] = str(0)

## Imports

In [3]:
import pandas as pd
import tensorflow as tf
tf.get_logger().setLevel("ERROR")

## The dataset (from Keras.io)

The [dataset](https://archive.ics.uci.edu/ml/datasets/heart+Disease) is provided by the
Cleveland Clinic Foundation for Heart Disease.
It's a CSV file with 303 rows. Each row contains information about a patient (a
**sample**), and each column describes an attribute of the patient (a **feature**). We
use the features to predict whether a patient has a heart disease (**binary
classification**).

Here's the description of each feature:

Column| Description| Feature Type
------------|--------------------|----------------------
Age | Age in years | Numerical
Sex | (1 = male; 0 = female) | Categorical
CP | Chest pain type (0, 1, 2, 3, 4) | Categorical
Trestbpd | Resting blood pressure (in mm Hg on admission) | Numerical
Chol | Serum cholesterol in mg/dl | Numerical
FBS | fasting blood sugar in 120 mg/dl (1 = true; 0 = false) | Categorical
RestECG | Resting electrocardiogram results (0, 1, 2) | Categorical
Thalach | Maximum heart rate achieved | Numerical
Exang | Exercise induced angina (1 = yes; 0 = no) | Categorical
Oldpeak | ST depression induced by exercise relative to rest | Numerical
Slope | Slope of the peak exercise ST segment | Numerical
CA | Number of major vessels (0-3) colored by fluoroscopy | Both numerical & categorical
Thal | 3 = normal; 6 = fixed defect; 7 = reversible defect | Categorical
Target | Diagnosis of heart disease (1 = true; 0 = false) | Target

In [4]:
def load_data():
    file_url = "http://storage.googleapis.com/download.tensorflow.org/data/heart.csv"
    dataframe = pd.read_csv(file_url)

    val_dataframe = dataframe.sample(frac=0.2, random_state=1337)
    train_dataframe = dataframe.drop(val_dataframe.index)

    return train_dataframe, val_dataframe


def dataframe_to_dataset(dataframe):
    dataframe = dataframe.copy()
    labels = dataframe.pop("target")
    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    ds = ds.shuffle(buffer_size=len(dataframe))
    return ds

## Preprocessing & encoding of features

The next cells use `tf.keras.layers.Normalization()` to apply standard scaling on the features.

Then, the `tf.keras.layers.StringLookup` and `tf.keras.layers.IntegerLookup` are used to encode categorical variables.

In [5]:
def encode_numerical_feature(feature, name, dataset):
    # Create a Normalization layer for our feature
    normalizer = tf.keras.layers.Normalization()

    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

    # Learn the statistics of the data
    normalizer.adapt(feature_ds)

    # Normalize the input feature
    encoded_feature = normalizer(feature)
    return encoded_feature


def encode_categorical_feature(feature, name, dataset, is_string):
    lookup_class = (
        tf.keras.layers.StringLookup if is_string else tf.keras.layers.IntegerLookup
    )
    # Create a lookup layer which will turn strings into integer indices
    lookup = lookup_class(output_mode="binary")

    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

    # Learn the set of possible string values and assign them a fixed integer index
    lookup.adapt(feature_ds)

    # Turn the string input into integer indices
    encoded_feature = lookup(feature)
    return encoded_feature

## Define the run-function with multi-fidelity

The run-function defines how the objective that we want to maximize is computed. It takes a `job` (see [deephyper.evaluator.RunningJob](https://deephyper.readthedocs.io/en/latest/_autosummary/deephyper.evaluator.RunningJob.html)) as input and outputs a scaler value or dictionnary (see [deephyper.evaluator](https://deephyper.readthedocs.io/en/latest/_autosummary/deephyper.evaluator.html)). The objective is always maximized in DeepHyper. The `job.parameters` contains a suggested configuration of hyperparameters that we want to evaluate. In this example we will search for:

* `units` (default value: `32`)
* `activation` (default value: `"relu"`)
* `dropout_rate` (default value: `0.5`)
* `batch_size` (default value: `32`)
* `learning_rate` (default value: `1e-3`)

A hyperparameter value can be acessed easily in the dictionary through the corresponding key, for example `job["units"]` or `job.parameters["units"]` are both valid. Unlike the previous tutorial in this example we want to use multi-fidelity to dynamically choose the allocated budget of each evaluation. Therefore we use the tensorflow keras integration of stoppers `deephyper.stopper.integration.TFKerasStopperCallback`. The multi-fidelity agent will monitor the validation accuracy (`val_accuracy`) in the context of maximization. This `stopper_callback` is then added to the callbacks used by the model during the training. In order to collect more information about the execution of our job we use the `@profile` decorator on the run-function which will collect execution timings (`timestamp_start` and `timestamp_end`). We will also add `"metadata"` to the output of our function to know how many epochs were used to evaluate each model. To learn more about how the `@profile` decorator can be used check our tutorial on [Understanding the pros and cons of Evaluator parallel backends](https://deephyper.readthedocs.io/en/latest/tutorials/tutorials/scripts/03_Evaluators/README.html).

```python
    stopper_callback = TFKerasStopperCallback(
        job, 
        monitor="val_accuracy", 
        mode="max"
    )
                                              
    history = model.fit(
        train_ds, 
        epochs=100, 
        validation_data=val_ds, 
        verbose=0,
        callbacks=[stopper_callback]
    )
    
    
    objective = history.history["val_accuracy"][-1]
    metadata = {"budget": stopper_callback.budget}
    return {"objective": objective, "metadata": metadata}
```

In [6]:
from deephyper.evaluator import profile, RunningJob
from deephyper.stopper.integration import TFKerasStopperCallback


@profile
def run(job):
    
    config = job.parameters
    
    tf.autograph.set_verbosity(0)
    # Load data and split into validation set
    train_dataframe, val_dataframe = load_data()
    train_ds = dataframe_to_dataset(train_dataframe)
    val_ds = dataframe_to_dataset(val_dataframe)
    train_ds = train_ds.batch(config["batch_size"])
    val_ds = val_ds.batch(config["batch_size"])

    # Categorical features encoded as integers
    sex = tf.keras.Input(shape=(1,), name="sex", dtype="int64")
    cp = tf.keras.Input(shape=(1,), name="cp", dtype="int64")
    fbs = tf.keras.Input(shape=(1,), name="fbs", dtype="int64")
    restecg = tf.keras.Input(shape=(1,), name="restecg", dtype="int64")
    exang = tf.keras.Input(shape=(1,), name="exang", dtype="int64")
    ca = tf.keras.Input(shape=(1,), name="ca", dtype="int64")

    # Categorical feature encoded as string
    thal = tf.keras.Input(shape=(1,), name="thal", dtype="string")

    # Numerical features
    age = tf.keras.Input(shape=(1,), name="age")
    trestbps = tf.keras.Input(shape=(1,), name="trestbps")
    chol = tf.keras.Input(shape=(1,), name="chol")
    thalach = tf.keras.Input(shape=(1,), name="thalach")
    oldpeak = tf.keras.Input(shape=(1,), name="oldpeak")
    slope = tf.keras.Input(shape=(1,), name="slope")

    all_inputs = [
        sex,
        cp,
        fbs,
        restecg,
        exang,
        ca,
        thal,
        age,
        trestbps,
        chol,
        thalach,
        oldpeak,
        slope,
    ]

    # Integer categorical features
    sex_encoded = encode_categorical_feature(sex, "sex", train_ds, False)
    cp_encoded = encode_categorical_feature(cp, "cp", train_ds, False)
    fbs_encoded = encode_categorical_feature(fbs, "fbs", train_ds, False)
    restecg_encoded = encode_categorical_feature(restecg, "restecg", train_ds, False)
    exang_encoded = encode_categorical_feature(exang, "exang", train_ds, False)
    ca_encoded = encode_categorical_feature(ca, "ca", train_ds, False)

    # String categorical features
    thal_encoded = encode_categorical_feature(thal, "thal", train_ds, True)

    # Numerical features
    age_encoded = encode_numerical_feature(age, "age", train_ds)
    trestbps_encoded = encode_numerical_feature(trestbps, "trestbps", train_ds)
    chol_encoded = encode_numerical_feature(chol, "chol", train_ds)
    thalach_encoded = encode_numerical_feature(thalach, "thalach", train_ds)
    oldpeak_encoded = encode_numerical_feature(oldpeak, "oldpeak", train_ds)
    slope_encoded = encode_numerical_feature(slope, "slope", train_ds)

    all_features = tf.keras.layers.concatenate(
        [
            sex_encoded,
            cp_encoded,
            fbs_encoded,
            restecg_encoded,
            exang_encoded,
            slope_encoded,
            ca_encoded,
            thal_encoded,
            age_encoded,
            trestbps_encoded,
            chol_encoded,
            thalach_encoded,
            oldpeak_encoded,
        ]
    )
    x = tf.keras.layers.Dense(config["units"], activation=config["activation"])(
        all_features
    )
    x = tf.keras.layers.Dropout(config["dropout_rate"])(x)
    output = tf.keras.layers.Dense(1, activation="sigmoid")(x)
    model = tf.keras.Model(all_inputs, output)

    optimizer = tf.keras.optimizers.Adam(learning_rate=config["learning_rate"])
    model.compile(optimizer, "binary_crossentropy", metrics=["accuracy"])
    
    stopper_callback = TFKerasStopperCallback(
        job, 
        monitor="val_accuracy", 
        mode="max"
    )
                                              
    history = model.fit(
        train_ds, 
        epochs=100, 
        validation_data=val_ds, 
        verbose=0,
        callbacks=[stopper_callback]
    )
    
    
    objective = history.history["val_accuracy"][-1]
    metadata = {"budget": stopper_callback.budget}
    return {"objective": objective, "metadata": metadata}

<div class="alert alert-info"> 
<b>Note</b>  
<br>

The objective maximized by DeepHyper is the `"objective"` value returned by the `run`-function.
    
</div>

In this tutorial it corresponds to the validation accuracy of the last epoch of training which we retrieve in the `History` object returned by the `model.fit(...)` call.
    
```python
...
objective = history.history["val_accuracy"][-1]
...
``` 
    
Using an objective like `max(history.history['val_accuracy'])` can have undesired side effects.

For example, it is possible that the training curves will overshoot a local maximum, resulting in a model without the capacity to flexibly adapt to new data in the future.

## Define the Hyperparameter optimization problem

Hyperparameter ranges are defined using the following syntax:

* Discrete integer ranges are generated from a tuple `(lower: int, upper: int)`
* Continuous prarameters are generated from a tuple `(lower: float, upper: float)`
* Categorical or nonordinal hyperparameter ranges can be given as a list of possible values `[val1, val2, ...]`


In [7]:
from deephyper.problem import HpProblem


# Creation of an hyperparameter problem
problem = HpProblem()

# Discrete hyperparameter (sampled with uniform prior)
problem.add_hyperparameter((8, 128), "units", default_value=32)


# Categorical hyperparameter (sampled with uniform prior)
ACTIVATIONS = [
    "elu", "gelu", "hard_sigmoid", "linear", "relu", "selu",
    "sigmoid", "softplus", "softsign", "swish", "tanh",
]
problem.add_hyperparameter(ACTIVATIONS, "activation", default_value="relu")


# Real hyperparameter (sampled with uniform prior)
problem.add_hyperparameter((0.0, 0.6), "dropout_rate", default_value=0.5)


# Discrete and Real hyperparameters (sampled with log-uniform)
problem.add_hyperparameter((8, 256, "log-uniform"), "batch_size", default_value=32)
problem.add_hyperparameter((1e-5, 1e-2, "log-uniform"), "learning_rate", default_value=1e-3)

problem

Configuration space object:
  Hyperparameters:
    activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: relu
    batch_size, Type: UniformInteger, Range: [8, 256], Default: 32, on log-scale
    dropout_rate, Type: UniformFloat, Range: [0.0, 0.6], Default: 0.5
    learning_rate, Type: UniformFloat, Range: [1e-05, 0.01], Default: 0.001, on log-scale
    units, Type: UniformInteger, Range: [8, 128], Default: 32

## Evaluate a default configuration

We evaluate the performance of the default set of hyperparameters provided in the Keras tutorial.

In [8]:
objective_default = run(RunningJob(parameters=problem.default_configuration))
    
print(f"Accuracy of the default configuration is {objective_default['objective']:.3f}\n with a budget of {objective_default['metadata']['budget']}")

Accuracy of the default configuration is 0.803
 with a budget of 100


## Execute Multi-Fidelity Bayesian Optimization

We create the CBO using the `problem` and `run`-function defined above. When directly passing the `run`-function to the search it is wrapped inside a [deephyper.evaluator.SerialEvaluator](https://deephyper.readthedocs.io/en/latest/_autosummary/deephyper.evaluator.SerialEvaluator.html). Then, we also import the [deephyper.stopper.LCModelStopper](https://deephyper.readthedocs.io/en/develop/_autosummary/deephyper.stopper.LCModelStopper.html#deephyper.stopper.LCModelStopper).

In [9]:
from deephyper.search.hps import CBO
from deephyper.stopper import LCModelStopper

In [10]:
# Instanciate the search with the problem and the evaluator that we created before

stopper = LCModelStopper(min_steps=1, max_steps=100)
search = CBO(problem, run, initial_points=[problem.default_configuration], stopper=stopper)



<div class="alert alert-info">
    
<b>Note</b>
    
All DeepHyper's search algorithm have two stopping criteria:
    <ul> 
        <li> <code>`max_evals (int)`</code>: Defines the maximum number of evaluations that we want to perform. Default to <code>-1</code> for an infinite number.</li>
        <li> <code>`timeout (int)`</code>: Defines a time budget (in seconds) before stopping the search. Default to <code>None</code> for an infinite time budget.</li>
    </ul>
    
</div>

In [11]:
results = search.search(max_evals=30)

  0%|          | 0/30 [00:00<?, ?it/s]

The returned `results` is a Pandas Dataframe where columns starting by `"p:"` are hyperparameters, columns starting by `"m:"` are additional metadata (from the user or from the `Evaluator`) as well as the `objective` value and the `job_id`:

* `job_id` is a unique identifier corresponding to the order of creation of tasks.
* `objective` is the value returned by the run-function.
* `m:timestamp_submit` is the time (in seconds) when the task was created by the evaluator since the creation of the evaluator.
* `m:timestamp_gather` is the time (in seconds) when the task was received after finishing by the evaluator since the creation of the evaluator.
* `m:timestamp_start` is the time (in seconds) when the task started to run.
* `m:timestamp_end` is the time (in seconds) when task finished to run.
* `m:budget` is the consumed number of epoch for each evaluation.

In [12]:
results

Unnamed: 0,p:activation,p:batch_size,p:dropout_rate,p:learning_rate,p:units,objective,job_id,m:timestamp_submit,m:timestamp_gather,m:timestamp_start,m:timestamp_end,m:budget
0,relu,32,0.5,0.001,32,0.803279,0,1.670782,5.526985,1677845000.0,1677845000.0,38
1,swish,14,0.489233,0.001336,41,0.803279,1,5.572275,8.168081,1677845000.0,1677845000.0,4
2,relu,8,0.587494,0.00513,20,0.803279,2,8.192211,11.887135,1677845000.0,1677845000.0,28
3,tanh,149,0.467829,0.000165,120,0.803279,3,11.910818,14.864182,1677845000.0,1677845000.0,58
4,selu,106,0.527451,2e-05,107,0.590164,4,14.888071,17.392803,1677845000.0,1677845000.0,4
5,softsign,80,0.270408,3.2e-05,66,0.360656,5,17.416549,19.849482,1677845000.0,1677845000.0,4
6,hard_sigmoid,58,0.56334,0.009599,77,0.819672,6,20.020613,22.708393,1677845000.0,1677845000.0,29
7,hard_sigmoid,19,0.202024,1.3e-05,119,0.229508,7,22.73234,25.483728,1677845000.0,1677845000.0,4
8,gelu,87,0.393025,4.7e-05,13,0.557377,8,25.507511,28.061162,1677845000.0,1677845000.0,4
9,linear,150,0.405327,0.000357,82,0.819672,9,28.085102,31.010434,1677845000.0,1677845000.0,49


Now that the search is over, let us print the best configuration found during this run.

In [13]:
i_max = results.objective.argmax()
best_config = results.iloc[i_max].to_dict()


print(f"The default configuration has an accuracy of {objective_default['objective']:.3f}. \n" 
      f"The best configuration found by DeepHyper has an accuracy {results['objective'].iloc[i_max]:.3f}, \n" 
      f"discovered after {results['m:timestamp_gather'].iloc[i_max]:.2f} secondes of search.\n")

best_config

The default configuration has an accuracy of 0.803. 
The best configuration found by DeepHyper has an accuracy 0.852, 
discovered after 63.79 secondes of search.



{'p:activation': 'hard_sigmoid',
 'p:batch_size': 209,
 'p:dropout_rate': 0.5782715361012362,
 'p:learning_rate': 0.0098209943552909,
 'p:units': 69,
 'objective': 0.8524590134620667,
 'job_id': 20,
 'm:timestamp_submit': 61.116661071777344,
 'm:timestamp_gather': 63.789851903915405,
 'm:timestamp_start': 1677845351.991086,
 'm:timestamp_end': 1677845354.663899,
 'm:budget': 32}

We can observe an improvement of more than 3% in accuracy. We can retrieve the corresponding hyperparameter configuration with the number of epochs used for this evaluation (32).