# Optuna: A hyperparameter optimization framework

Optuna es un software de optimización automática de hiperparámetros, diseñado particularmente para machine learning. Gracias a su API, el código escrito en Optuna posee alta modularidad, y el usuario de Optuna puede construir dinámicamente los espacios de búsqueda para los hiperparámetros.

## ¿Qué son los hiperparámetros?

Los *hiperparámetros* son parámetros que determinan el comportamiento de los algoritmos, determinan el desempeño y generalmente se establecen de forma manual. A diferencia de los parámetros del modelo, que son aprendidos durante el entrenamiento, los hiperparámetros deben ser establecidos antes del proceso de entrenamiento. Algunos ejemplos comunes de hiperparámetros incluyen el *learning rate* (*tasa de aprendizaje*), el número de capas y unidades en una red neuronal, o la profundidad máxima en un árbol de decisión.

## Búsqueda de hiperparámetros

El problema de la búsqueda de hiperparámetros es el proceso de seleccionar un conjunto de hiperparámetros óptimos para un algoritmo de aprendizaje. Dado que el desempeño del modelo puede depender significativamente de los hiperparámetros seleccionados, siendo este un paso importante en el proceso de modelado.

La búsqueda de hiperparámetros es desafiante por varias razones, entre ellas:

 1. Espacio de búsqueda grande: Usualmente, el espacio de búsqueda de hiperparámetros puede ser muy grande, especialmente para modelos complejos como las redes neuronales profundas.

 2. Evaluaciones costosas: Evaluar una configuración de hiperparámetros dado puede ser muy costoso en términos de tiempo y recursos computacionales, especialmente si el modelo es complejo o si el conjunto de datos es muy grande.

 3. Interacciones complejas: Los hiperparámetros pueden interactuar entre sí de formas complejas, lo que implica que el efecto de un hiperparámetro podría depender de otro.

 4. Falta de gradientes: A diferencia de los parámetros del modelo, los hiperparámetros a menudo no tienen gradientes que puedan ser utilizados para guiar la búsqueda.

## Instalación de Optuna

In [1]:
#pip install optuna # Instalamos Optuna
#pip install optuna-dashboard # Instalación del Optuna dashboard (similar a Tensorboard)
#pip install plotly # En caso de no tener instalado Plotly (para graficar)
#pip install nbformat # Requerido por plotly
#pip install lightgbm # Para utilizar LightGBM (Light Gradient-Boosting Machine)
#pip install mlflow

### Ejemplo: Función cuadrática

Generalmente, Optuna se utiliza para optimizar hiperparámetros, pero como ejemplo, vamos a optimizar una función cuadrática simple: $(x-2)^2$.

In [2]:
import optuna

  from .autonotebook import tqdm as notebook_tqdm


En Optuna, las funciones a ser optimizadas llevan el nombre `objective`

In [3]:
import optuna

def objective(trial):

    x = trial.suggest_float("x", -10, 10)

    return (x - 2)**2

Esta función devuelve el valor de $(x-2)^2$. Nuestro objetivo es el encontrar el valor de `x` que minimíce la salida de la función `objective`. Durante la optimización, Optuna repetidamente llama y evalua la función objetivo con distintos valores de `x`.

Un objeto `trial` corresponde a una sola ejecución de la función objetivo y es internamente inicializado sobre cada invocación de la función.

Los `suggest` APIs (por ejemplo, `suggest_float()`) son llamados dentro de la función objetivo para obtener parámetros para un intento. `suggest_float()` selecciona los parámetros uniformemente dentro del rango establecido.

Para comenzar la optimización, creamos un objeto `study` y pasamos la función objetivo al método `optimize()` como sigue.

In [4]:
study = optuna.create_study()

study.optimize(objective, n_trials=100)

[I 2023-09-21 11:36:06,666] A new study created in memory with name: no-name-eab50edd-f37c-403b-bfc5-48f0578f2dba
[I 2023-09-21 11:36:06,666] Trial 0 finished with value: 10.439138179196712 and parameters: {'x': -1.230965518107043}. Best is trial 0 with value: 10.439138179196712.
[I 2023-09-21 11:36:06,666] Trial 1 finished with value: 67.67353109134189 and parameters: {'x': -6.22639234970846}. Best is trial 0 with value: 10.439138179196712.
[I 2023-09-21 11:36:06,674] Trial 2 finished with value: 6.198398748162254 and parameters: {'x': -0.48965835972774663}. Best is trial 2 with value: 6.198398748162254.
[I 2023-09-21 11:36:06,674] Trial 3 finished with value: 103.8159255741395 and parameters: {'x': -8.189010038965488}. Best is trial 2 with value: 6.198398748162254.
[I 2023-09-21 11:36:06,682] Trial 4 finished with value: 99.56679843591786 and parameters: {'x': -7.978316412898414}. Best is trial 2 with value: 6.198398748162254.
[I 2023-09-21 11:36:06,682] Trial 5 finished with value: 

Podemos obtener el mejor parámetro de la siguiente manera.

In [5]:
best_params = study.best_params

found_x = best_params["x"]

print("Found x: {}, (x-2)^2: {}".format(found_x, (found_x - 2)**2))

Found x: 2.019074956659619, (x-2)^2: 0.0003638539715663391


## El objeto `study`

Clarifiquemos la terminología en Optuna:

 - `Trial`: una sola llamada de la función objetivo
 - `Study`: una sesión de optimización, que consiste en un conjunto de trials
 - `Parameter`: una variable cuyo valor debe ser optimizado, como `x` en el ejemplo anterior.

En Optuna, utilizamos el objeto `study` para lidiar con la optimización. El método `create_study()` regresa un objeto study. Este objeto tiene propiedades útiles para analizar el resultado de la optimización.

Para obtener el diccionario de los nombres de los parámetros y sus valores utilizamos:

In [6]:
study.best_params

{'x': 2.019074956659619}

Para obtener el mejor valor observado de la función objetivo:

In [7]:
study.best_value

0.0003638539715663391

Y para obtener el mejor intento:

In [8]:
study.best_trial

FrozenTrial(number=61, state=TrialState.COMPLETE, values=[0.0003638539715663391], datetime_start=datetime.datetime(2023, 9, 21, 11, 36, 7, 99673), datetime_complete=datetime.datetime(2023, 9, 21, 11, 36, 7, 113052), params={'x': 2.019074956659619}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=61, value=None)

Si queremos todos los intentos:

In [9]:
study.trials

for trial in study.trials[:2]: # Muestra los primeros 2 intentos

    print (trial)

FrozenTrial(number=0, state=TrialState.COMPLETE, values=[10.439138179196712], datetime_start=datetime.datetime(2023, 9, 21, 11, 36, 6, 666218), datetime_complete=datetime.datetime(2023, 9, 21, 11, 36, 6, 666218), params={'x': -1.230965518107043}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=0, value=None)
FrozenTrial(number=1, state=TrialState.COMPLETE, values=[67.67353109134189], datetime_start=datetime.datetime(2023, 9, 21, 11, 36, 6, 666218), datetime_complete=datetime.datetime(2023, 9, 21, 11, 36, 6, 666218), params={'x': -6.22639234970846}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=1, value=None)


El número de intentos se obtiene como:

In [10]:
len(study.trials)

100

Algo importante en Optuna es que si ejecutamos nuevamente `optimize()`, podemos continuar con la optimización:

In [11]:
study.optimize(objective, n_trials = 100)

[I 2023-09-21 11:36:07,623] Trial 100 finished with value: 1.6779901354999374 and parameters: {'x': 3.295372585590701}. Best is trial 61 with value: 0.0003638539715663391.
[I 2023-09-21 11:36:07,631] Trial 101 finished with value: 0.00013646101936064974 and parameters: {'x': 1.9883183468909298}. Best is trial 101 with value: 0.00013646101936064974.
[I 2023-09-21 11:36:07,640] Trial 102 finished with value: 0.010412290722878784 and parameters: {'x': 1.8979593672947939}. Best is trial 101 with value: 0.00013646101936064974.
[I 2023-09-21 11:36:07,647] Trial 103 finished with value: 0.35331961992189004 and parameters: {'x': 1.405593051923272}. Best is trial 101 with value: 0.00013646101936064974.
[I 2023-09-21 11:36:07,663] Trial 104 finished with value: 0.18104713576853557 and parameters: {'x': 2.4254963404878302}. Best is trial 101 with value: 0.00013646101936064974.
[I 2023-09-21 11:36:07,671] Trial 105 finished with value: 0.003129880198192888 and parameters: {'x': 1.9440546677711814}

Podemos volver a revisar el resultado:

In [12]:
best_params = study.best_params

found_x = best_params["x"]

print("Found x: {}, (x-2)^2: {}".format(found_x, (found_x - 2)**2))

Found x: 1.9927699196721322, (x-2)^2: 5.227406154742039e-05


## Espacio de búsqueda

Para el muestreo de hiperparámetros, Optuna nos ofrece las siguientes funciones:

 - `optuna.trial.Trial.suggest_categorical()` para parámetros categóricos
 - `optuna.trial.Trial.suggest_int()` para parámetros enteros
 - `optuna.trial.Trial.suggest_float()` para parámetros float

Con argumentos opcionales de `step` y `log`, podemos discretizar o calcular el logaritmo de los parámetros enteros y flotantes.

In [13]:
def objective(trial):

    # Parámetro categórico

    optimizer = trial.suggest_categorical("optimizer", ["SGD", "Adam"])

    # Parámetro entero

    num_layers = trial.suggest_int("num_layers", 1, 3)

    # Parámetro entero (con log)

    num_channels = trial.suggest_int("num_channels", 32, 512, log = True)

    # Parámetro entero (discretizado)

    num_units = trial.suggest_int("num_units", 10, 100, step = 5)

    # Parámetro de punto flotante

    dropout_rate = trial.suggest_float("dropout_rate", 0.0, 1.0)

    # Parámetro de punto flotante (log)

    learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-2, log = True)

    # Parámetro de punto flotante (discretizado)

    drop_path_rate = trial.suggest_float("drop_path_rate", 0.0, 1.0, step = 0.1)

### Definiendo espacios de parámetros

En Optuna, podemos definir espacios de búsqueda utilizando la sintaxis de Python, que incluye condicionales y loops. También se pueden utilizar branches o loops, dependiendo de los valores de los parámetros.


#### Branches

In [14]:
import sklearn.ensemble
import sklearn.svm

def objective(trial):

    classifier_name = trial.suggest_categorical("classifier", ["SVC", "RandomForest"])

    if classifier_name == "SVC":

        svc_c = trial.suggest_float("svc_c", 1e-10, 1e10, log = True)

        classifier_obj = sklearn.svm.SVC(C = svc_c)

    else:

        rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log = True)

        classifier_obj = sklearn.ensemble.RandomForestClassifier(max_depth = rf_max_depth)

#### Loops

In [15]:
import tensorflow as tf
from tensorflow.keras import layers, models 

def create_model(trial, input_size):

    n_layers = trial.suggest_int("n_layers", 1, 3)

    layers_list = []

    for i in range(n_layers):

        n_units = trial.suggest_int("n_units_l{}".format(i), 4, 128, log = True)

        layers_list.append(layers.Dense(n_units, input_shape = (input_size, ), activation = 'relu'))

        input_size = n_units

    layers_list.append(layers.Dense(10))

    model = models.Sequential(layers_list)

    return model



**Sobre el número de parámetros**: La dificultad del proceso de optimización incrementa casi exponencialmente con respecto al número de parámetros. Es decir, el número de trials necesarios incrementa exponencialmente conforme incrementamos el número de parámetros, así que es recomendable no añadir parámetros poco importantes.


## Algoritmos de optimización eficientes

Optuna permite realizar una optimización de hiperparámetros eficiente al utilizar algoritmos de última generación para el muestreo de hiperperámetros y al descartar eficientemente ensayos poco prometedores.

### Algoritmos de muestreo

Básicamente, los samplers reducen continuamente el espacio de búsqueda utilizando los registros de los valores de los parámetros sugeridos y los valores objetivo evaluados, lo que conduce a un espacio de búsqueda óptimo que otorga parámetros que conducen a mejores valores de la función objetivo.

Optuna provee los siguientes algoritmos de muestreo:

 - Grid Search se implementa en `GridSampler`
 - Random Search se implementa en `RandomSampler`
 - Tree-structured Parzen Estimator Algorithm se implementa en `TPESampler`
 - CMA-ES Based Algorithm se implementa en `CmaEsSampler`
 - El algoritmo para permitir valores parciales fijos se implementa en `PartialFixedSampler`
 - Non-dominated Sorting Genetic Algorithm II se implementa en `NSGAIISampler`
 - Quasi Monte Carlo Sampling Algorithm se implementa en `QMCSampler`

El sampler por defecto es `TPESampler`.



In [16]:
study = optuna.create_study()

print(f"Sampler is: {study.sampler.__class__.__name__}")

[I 2023-09-21 11:36:16,336] A new study created in memory with name: no-name-33cdf731-bcde-4ea9-8343-dd293dfb2b3f


Sampler is: TPESampler


En caso de que deseemos usar distintos samplers, como `RandomSampler` y `CmaEsSampler`:

In [17]:
study = optuna.create_study(sampler = optuna.samplers.RandomSampler())
print(f"Sampler is {study.sampler.__class__.__name__}")

study = optuna.create_study(sampler = optuna.samplers.CmaEsSampler())
print(f"Sampler is {study.sampler.__class__.__name__}")

[I 2023-09-21 11:36:16,377] A new study created in memory with name: no-name-9f849da8-8783-4f14-87d5-0a998d9ccd77
[I 2023-09-21 11:36:16,385] A new study created in memory with name: no-name-804fdb6f-bac6-45b3-859f-7eb547c6240d


Sampler is RandomSampler
Sampler is CmaEsSampler


### Pruning Algorithms

Los `Pruners` automáticamente detienen las pruebas poco prometedoras en las primeras fases del entrenamiento (también conocido como *early-stopping*).

Optuna ofrece los siguientes algoritmos:

 - Median Pruning Algorithm se implementa en `MedianPruner`
 - Non-Pruning Algorithm se implementa en `NopPruner`
 - Un algoritmo para operar un pruner con tolerancia se implementa en `PatientPruner`
 - Un algoritmo para depurar el percentil específicado de ensayos se implementa en `PercentilPruner`
 - Assynchronous Successive Halving Algorithm se implementa en `SuccessiveHalvingPruner`
 - Hyperband Algorithm se implementa en `HyperbandPruner`
 - Treshold Pruning Algorithm se implementa en `TresholdPruner`


### Activando Pruners

Para activar la función de pruning, necesitamos llamar a los métodos `report()` y `should_prune()` después de cada paso de entrenamiento. `report()` monitorea periódicamente los valores objetivos intermedios. `should_prune()` decide la terminación del ensayo que no cumpla con una condición pre establecida.


In [18]:
import logging
import sys 

import sklearn.datasets
import sklearn.linear_model
import sklearn.model_selection

def objective(trial):

    iris = sklearn.datasets.load_iris()

    classes = list(set(iris.target))

    train_x, valid_x, train_y, valid_y = sklearn.model_selection.train_test_split(
        iris.data, iris.target, test_size = 0.25, random_state = 42 
    )

    alpha = trial.suggest_float("alpha", 1e-5, 1e-1, log = True)

    clf = sklearn.linear_model.SGDClassifier(alpha = alpha)

    for step in range(100):

        clf.partial_fit(train_x, train_y, classes = classes)

        # Reporte del valor objetivo intermedio

        intermediate_value = 1.0 - clf.score(valid_x, valid_y)

        trial.report(intermediate_value, step)

        # Pruning basado en el valor intermedio

        if trial.should_prune():

            raise optuna.TrialPruned()
        
    return 1.0 - clf.score(valid_x, valid_y)

In [19]:
optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))

study = optuna.create_study(pruner = optuna.pruners.MedianPruner())

study.optimize(objective, n_trials = 25)

[I 2023-09-21 11:36:16,572] A new study created in memory with name: no-name-04528ff5-be5f-45a3-a66f-7a699d429c2d


A new study created in memory with name: no-name-04528ff5-be5f-45a3-a66f-7a699d429c2d


[I 2023-09-21 11:36:16,836] Trial 0 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.001583681405390258}. Best is trial 0 with value: 0.052631578947368474.


Trial 0 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.001583681405390258}. Best is trial 0 with value: 0.052631578947368474.


[I 2023-09-21 11:36:17,096] Trial 1 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.007850586105704724}. Best is trial 1 with value: 0.02631578947368418.


Trial 1 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.007850586105704724}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:17,359] Trial 2 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.0011859014470015467}. Best is trial 1 with value: 0.02631578947368418.


Trial 2 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.0011859014470015467}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:17,603] Trial 3 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.03150859209518395}. Best is trial 1 with value: 0.02631578947368418.


Trial 3 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.03150859209518395}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:17,850] Trial 4 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.0036598653895681603}. Best is trial 1 with value: 0.02631578947368418.


Trial 4 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.0036598653895681603}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:17,850] Trial 5 pruned. 


Trial 5 pruned. 


[I 2023-09-21 11:36:17,881] Trial 6 pruned. 


Trial 6 pruned. 


[I 2023-09-21 11:36:18,131] Trial 7 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.008892554126735792}. Best is trial 1 with value: 0.02631578947368418.


Trial 7 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.008892554126735792}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:18,165] Trial 8 pruned. 


Trial 8 pruned. 


[I 2023-09-21 11:36:18,480] Trial 9 finished with value: 0.1842105263157895 and parameters: {'alpha': 0.043199806881960554}. Best is trial 1 with value: 0.02631578947368418.


Trial 9 finished with value: 0.1842105263157895 and parameters: {'alpha': 0.043199806881960554}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:18,495] Trial 10 pruned. 


Trial 10 pruned. 


[I 2023-09-21 11:36:18,758] Trial 11 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.015909739038452074}. Best is trial 1 with value: 0.02631578947368418.


Trial 11 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.015909739038452074}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:19,006] Trial 12 finished with value: 0.23684210526315785 and parameters: {'alpha': 0.08559381206196237}. Best is trial 1 with value: 0.02631578947368418.


Trial 12 finished with value: 0.23684210526315785 and parameters: {'alpha': 0.08559381206196237}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:19,022] Trial 13 pruned. 


Trial 13 pruned. 


[I 2023-09-21 11:36:19,278] Trial 14 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.005347413609214803}. Best is trial 1 with value: 0.02631578947368418.


Trial 14 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.005347413609214803}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:19,317] Trial 15 pruned. 


Trial 15 pruned. 


[I 2023-09-21 11:36:19,562] Trial 16 finished with value: 0.23684210526315785 and parameters: {'alpha': 0.027410475900033985}. Best is trial 1 with value: 0.02631578947368418.


Trial 16 finished with value: 0.23684210526315785 and parameters: {'alpha': 0.027410475900033985}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:19,818] Trial 17 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.004723407285569162}. Best is trial 1 with value: 0.02631578947368418.


Trial 17 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.004723407285569162}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:20,074] Trial 18 finished with value: 0.21052631578947367 and parameters: {'alpha': 0.09552054491541963}. Best is trial 1 with value: 0.02631578947368418.


Trial 18 finished with value: 0.21052631578947367 and parameters: {'alpha': 0.09552054491541963}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:20,331] Trial 19 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.030960934313489924}. Best is trial 1 with value: 0.02631578947368418.


Trial 19 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.030960934313489924}. Best is trial 1 with value: 0.02631578947368418.


[I 2023-09-21 11:36:20,619] Trial 20 finished with value: 0.0 and parameters: {'alpha': 0.010216154422302231}. Best is trial 20 with value: 0.0.


Trial 20 finished with value: 0.0 and parameters: {'alpha': 0.010216154422302231}. Best is trial 20 with value: 0.0.


[I 2023-09-21 11:36:20,656] Trial 21 pruned. 


Trial 21 pruned. 


[I 2023-09-21 11:36:20,665] Trial 22 pruned. 


Trial 22 pruned. 


[I 2023-09-21 11:36:21,037] Trial 23 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.03829470396119289}. Best is trial 20 with value: 0.0.


Trial 23 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.03829470396119289}. Best is trial 20 with value: 0.0.


[I 2023-09-21 11:36:21,321] Trial 24 finished with value: 0.26315789473684215 and parameters: {'alpha': 0.0026536373200147564}. Best is trial 20 with value: 0.0.


Trial 24 finished with value: 0.26315789473684215 and parameters: {'alpha': 0.0026536373200147564}. Best is trial 20 with value: 0.0.


Como podemos observar, distintos intentos fueron detenidos antes de que terminaran todas las iteraciones.

### ¿Qué sampler y pruner debería usar?

Para tareas **distintas** a deep learning, se recomienda:

 - Para `RandomSampler`, `MedianPruner` es el mejor.
 - Para `TPESampler`, `HyperbandPruner` es el mejor.

Para tareas de deep learning:

<table>
    <tr>
        <th>Recurso de cómputo en paralelo</th>
        <th>Hiperparámetros Categóricos/Condicionales</th>
        <th>Algoritmos recomendados</th>
    </tr>
    <tr>
        <td rowspan="2">Limitado</td>
        <td>No</td>
        <td>TPE, GP-EI si el espacio de búsqueda es de baja dimensión y continuo</td>
    </tr>
    <tr>
        <td>Sí</td>
        <td>TPE, GP-EI si el espacio de búsqueda es de baja dimensión y continuo</td>
    </tr>
    <tr>
        <td rowspan="2">Suficiente</td>
        <td>No</td>
        <td>CMA-ES, Random Searchc</td>
    </tr>
    <tr>
        <td>Sí</td>
        <td>Random Search o Genetic Algorithm</td>
    </tr>
</table>

## Visualización para el análisis de optimización de hiperparámetros


Optuna ofrece distintas funciones de visualización en el módulo `optuna.visualization` para analizar los resultados de la optimización de forma visual.

Para mostrar cómo hacerlo, veremos cómo visualizar el historial del modelo `lightgbm` para el dataset `breast cancer`.

In [20]:
#import optuna
import lightgbm as lgb
import numpy as np
import sklearn.datasets
import sklearn.metrics
from sklearn.model_selection import train_test_split

# Se puede utilizar Matplotlib en vez de Plotly para visualizar, al reemplazar 'optuna.visualization' 
# con 'optuna.visualization.matplotlib'

from optuna.visualization import plot_contour
from optuna.visualization import plot_edf
from optuna.visualization import plot_intermediate_values
from optuna.visualization import plot_optimization_history
from optuna.visualization import plot_parallel_coordinate
from optuna.visualization import plot_param_importances
from optuna.visualization import plot_rank
from optuna.visualization import plot_slice
from optuna.visualization import plot_timeline

In [21]:
SEED = 42
np.random.seed(SEED)

def objective(trial):

    data, target = sklearn.datasets.load_breast_cancer(return_X_y = True)

    train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25)

    dtrain = lgb.Dataset(train_x, label = train_y)
    dvalid = lgb.Dataset(valid_x, label = valid_y)

    param = {
        "objective": "binary",
        "metric": "auc",
        "verbosity": -1,
        "boosting_type": "gbdt",
        "bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0),
        "bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
        "min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
    }

    # Callback para pruning

    pruning_callback = optuna.integration.LightGBMPruningCallback(trial, "auc")

    gbm = lgb.train(param, dtrain, valid_sets = [dvalid], callbacks = [pruning_callback])

    preds = gbm.predict(valid_x)

    pred_labels = np.rint(preds)

    accuracy = sklearn.metrics.accuracy_score(valid_y, pred_labels)

    return accuracy

In [22]:
study = optuna.create_study(
    direction="maximize",
    sampler=optuna.samplers.TPESampler(seed=SEED),
    pruner=optuna.pruners.MedianPruner(n_warmup_steps=10),
)

study.optimize(objective, n_trials=100, timeout=600)

[I 2023-09-21 11:36:22,036] A new study created in memory with name: no-name-5e62f812-e50e-44c9-aca4-4e78b4154ca1


A new study created in memory with name: no-name-5e62f812-e50e-44c9-aca4-4e78b4154ca1


[I 2023-09-21 11:36:22,101] Trial 0 finished with value: 0.972027972027972 and parameters: {'bagging_fraction': 0.6247240713084175, 'bagging_freq': 7, 'min_child_samples': 75}. Best is trial 0 with value: 0.972027972027972.


Trial 0 finished with value: 0.972027972027972 and parameters: {'bagging_fraction': 0.6247240713084175, 'bagging_freq': 7, 'min_child_samples': 75}. Best is trial 0 with value: 0.972027972027972.


[I 2023-09-21 11:36:22,229] Trial 1 finished with value: 0.972027972027972 and parameters: {'bagging_fraction': 0.759195090518222, 'bagging_freq': 2, 'min_child_samples': 19}. Best is trial 0 with value: 0.972027972027972.


Trial 1 finished with value: 0.972027972027972 and parameters: {'bagging_fraction': 0.759195090518222, 'bagging_freq': 2, 'min_child_samples': 19}. Best is trial 0 with value: 0.972027972027972.


[I 2023-09-21 11:36:22,278] Trial 2 finished with value: 0.958041958041958 and parameters: {'bagging_fraction': 0.4348501673009197, 'bagging_freq': 7, 'min_child_samples': 62}. Best is trial 0 with value: 0.972027972027972.


Trial 2 finished with value: 0.958041958041958 and parameters: {'bagging_fraction': 0.4348501673009197, 'bagging_freq': 7, 'min_child_samples': 62}. Best is trial 0 with value: 0.972027972027972.


[I 2023-09-21 11:36:22,326] Trial 3 finished with value: 0.9790209790209791 and parameters: {'bagging_fraction': 0.8248435466776274, 'bagging_freq': 1, 'min_child_samples': 98}. Best is trial 3 with value: 0.9790209790209791.


Trial 3 finished with value: 0.9790209790209791 and parameters: {'bagging_fraction': 0.8248435466776274, 'bagging_freq': 1, 'min_child_samples': 98}. Best is trial 3 with value: 0.9790209790209791.


[I 2023-09-21 11:36:22,449] Trial 4 finished with value: 0.958041958041958 and parameters: {'bagging_fraction': 0.899465584480253, 'bagging_freq': 2, 'min_child_samples': 22}. Best is trial 3 with value: 0.9790209790209791.


Trial 4 finished with value: 0.958041958041958 and parameters: {'bagging_fraction': 0.899465584480253, 'bagging_freq': 2, 'min_child_samples': 22}. Best is trial 3 with value: 0.9790209790209791.


[I 2023-09-21 11:36:22,528] Trial 5 finished with value: 0.993006993006993 and parameters: {'bagging_fraction': 0.5100427059120604, 'bagging_freq': 3, 'min_child_samples': 55}. Best is trial 5 with value: 0.993006993006993.


Trial 5 finished with value: 0.993006993006993 and parameters: {'bagging_fraction': 0.5100427059120604, 'bagging_freq': 3, 'min_child_samples': 55}. Best is trial 5 with value: 0.993006993006993.


[I 2023-09-21 11:36:22,563] Trial 6 pruned. Trial was pruned at iteration 10.


Trial 6 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:22,591] Trial 7 pruned. Trial was pruned at iteration 10.


Trial 7 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:22,622] Trial 8 pruned. Trial was pruned at iteration 14.


Trial 8 pruned. Trial was pruned at iteration 14.


[I 2023-09-21 11:36:22,662] Trial 9 pruned. Trial was pruned at iteration 18.


Trial 9 pruned. Trial was pruned at iteration 18.


[I 2023-09-21 11:36:22,725] Trial 10 pruned. Trial was pruned at iteration 28.


Trial 10 pruned. Trial was pruned at iteration 28.


[I 2023-09-21 11:36:22,772] Trial 11 pruned. Trial was pruned at iteration 10.


Trial 11 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:22,881] Trial 12 finished with value: 0.986013986013986 and parameters: {'bagging_fraction': 0.9733907188988407, 'bagging_freq': 1, 'min_child_samples': 98}. Best is trial 5 with value: 0.993006993006993.


Trial 12 finished with value: 0.986013986013986 and parameters: {'bagging_fraction': 0.9733907188988407, 'bagging_freq': 1, 'min_child_samples': 98}. Best is trial 5 with value: 0.993006993006993.


[I 2023-09-21 11:36:22,923] Trial 13 pruned. Trial was pruned at iteration 10.


Trial 13 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:22,971] Trial 14 pruned. Trial was pruned at iteration 10.


Trial 14 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:23,032] Trial 15 pruned. Trial was pruned at iteration 10.


Trial 15 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:23,082] Trial 16 pruned. Trial was pruned at iteration 10.


Trial 16 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:23,142] Trial 17 pruned. Trial was pruned at iteration 10.


Trial 17 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:23,260] Trial 18 finished with value: 0.986013986013986 and parameters: {'bagging_fraction': 0.9905077333938093, 'bagging_freq': 2, 'min_child_samples': 54}. Best is trial 5 with value: 0.993006993006993.


Trial 18 finished with value: 0.986013986013986 and parameters: {'bagging_fraction': 0.9905077333938093, 'bagging_freq': 2, 'min_child_samples': 54}. Best is trial 5 with value: 0.993006993006993.


[I 2023-09-21 11:36:23,311] Trial 19 pruned. Trial was pruned at iteration 10.


Trial 19 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:23,362] Trial 20 pruned. Trial was pruned at iteration 10.


Trial 20 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:23,466] Trial 21 pruned. Trial was pruned at iteration 70.


Trial 21 pruned. Trial was pruned at iteration 70.


[I 2023-09-21 11:36:23,523] Trial 22 pruned. Trial was pruned at iteration 10.


Trial 22 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:23,576] Trial 23 pruned. Trial was pruned at iteration 10.


Trial 23 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:23,631] Trial 24 pruned. Trial was pruned at iteration 10.


Trial 24 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:23,823] Trial 25 finished with value: 0.9790209790209791 and parameters: {'bagging_fraction': 0.9539760625736883, 'bagging_freq': 1, 'min_child_samples': 5}. Best is trial 5 with value: 0.993006993006993.


Trial 25 finished with value: 0.9790209790209791 and parameters: {'bagging_fraction': 0.9539760625736883, 'bagging_freq': 1, 'min_child_samples': 5}. Best is trial 5 with value: 0.993006993006993.


[I 2023-09-21 11:36:23,875] Trial 26 pruned. Trial was pruned at iteration 10.


Trial 26 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:23,923] Trial 27 pruned. Trial was pruned at iteration 10.


Trial 27 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:23,978] Trial 28 pruned. Trial was pruned at iteration 11.


Trial 28 pruned. Trial was pruned at iteration 11.


[I 2023-09-21 11:36:24,040] Trial 29 pruned. Trial was pruned at iteration 10.


Trial 29 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,087] Trial 30 pruned. Trial was pruned at iteration 10.


Trial 30 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,150] Trial 31 pruned. Trial was pruned at iteration 10.


Trial 31 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,213] Trial 32 pruned. Trial was pruned at iteration 18.


Trial 32 pruned. Trial was pruned at iteration 18.


[I 2023-09-21 11:36:24,261] Trial 33 pruned. Trial was pruned at iteration 10.


Trial 33 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,324] Trial 34 pruned. Trial was pruned at iteration 10.


Trial 34 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,386] Trial 35 pruned. Trial was pruned at iteration 10.


Trial 35 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,449] Trial 36 pruned. Trial was pruned at iteration 10.


Trial 36 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,512] Trial 37 pruned. Trial was pruned at iteration 10.


Trial 37 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,574] Trial 38 pruned. Trial was pruned at iteration 10.


Trial 38 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,638] Trial 39 pruned. Trial was pruned at iteration 10.


Trial 39 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,701] Trial 40 pruned. Trial was pruned at iteration 10.


Trial 40 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,756] Trial 41 pruned. Trial was pruned at iteration 10.


Trial 41 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,824] Trial 42 pruned. Trial was pruned at iteration 10.


Trial 42 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:24,893] Trial 43 pruned. Trial was pruned at iteration 18.


Trial 43 pruned. Trial was pruned at iteration 18.


[I 2023-09-21 11:36:25,025] Trial 44 finished with value: 0.986013986013986 and parameters: {'bagging_fraction': 0.8823409487656906, 'bagging_freq': 1, 'min_child_samples': 48}. Best is trial 5 with value: 0.993006993006993.


Trial 44 finished with value: 0.986013986013986 and parameters: {'bagging_fraction': 0.8823409487656906, 'bagging_freq': 1, 'min_child_samples': 48}. Best is trial 5 with value: 0.993006993006993.


[I 2023-09-21 11:36:25,080] Trial 45 pruned. Trial was pruned at iteration 10.


Trial 45 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,136] Trial 46 pruned. Trial was pruned at iteration 10.


Trial 46 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,207] Trial 47 pruned. Trial was pruned at iteration 10.


Trial 47 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,262] Trial 48 pruned. Trial was pruned at iteration 10.


Trial 48 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,325] Trial 49 pruned. Trial was pruned at iteration 10.


Trial 49 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,380] Trial 50 pruned. Trial was pruned at iteration 10.


Trial 50 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,443] Trial 51 pruned. Trial was pruned at iteration 10.


Trial 51 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,506] Trial 52 pruned. Trial was pruned at iteration 10.


Trial 52 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,569] Trial 53 pruned. Trial was pruned at iteration 10.


Trial 53 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,633] Trial 54 pruned. Trial was pruned at iteration 10.


Trial 54 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,764] Trial 55 finished with value: 0.986013986013986 and parameters: {'bagging_fraction': 0.973380947872035, 'bagging_freq': 2, 'min_child_samples': 39}. Best is trial 5 with value: 0.993006993006993.


Trial 55 finished with value: 0.986013986013986 and parameters: {'bagging_fraction': 0.973380947872035, 'bagging_freq': 2, 'min_child_samples': 39}. Best is trial 5 with value: 0.993006993006993.


[I 2023-09-21 11:36:25,827] Trial 56 pruned. Trial was pruned at iteration 10.


Trial 56 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,890] Trial 57 pruned. Trial was pruned at iteration 10.


Trial 57 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:25,946] Trial 58 pruned. Trial was pruned at iteration 10.


Trial 58 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,017] Trial 59 pruned. Trial was pruned at iteration 10.


Trial 59 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,064] Trial 60 pruned. Trial was pruned at iteration 10.


Trial 60 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,143] Trial 61 pruned. Trial was pruned at iteration 10.


Trial 61 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,190] Trial 62 pruned. Trial was pruned at iteration 10.


Trial 62 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,261] Trial 63 pruned. Trial was pruned at iteration 10.


Trial 63 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,325] Trial 64 pruned. Trial was pruned at iteration 10.


Trial 64 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,380] Trial 65 pruned. Trial was pruned at iteration 10.


Trial 65 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,449] Trial 66 pruned. Trial was pruned at iteration 10.


Trial 66 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,511] Trial 67 pruned. Trial was pruned at iteration 10.


Trial 67 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,575] Trial 68 pruned. Trial was pruned at iteration 10.


Trial 68 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,625] Trial 69 pruned. Trial was pruned at iteration 10.


Trial 69 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,685] Trial 70 pruned. Trial was pruned at iteration 10.


Trial 70 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,754] Trial 71 pruned. Trial was pruned at iteration 10.


Trial 71 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,817] Trial 72 pruned. Trial was pruned at iteration 10.


Trial 72 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,870] Trial 73 pruned. Trial was pruned at iteration 10.


Trial 73 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,933] Trial 74 pruned. Trial was pruned at iteration 10.


Trial 74 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:26,996] Trial 75 pruned. Trial was pruned at iteration 10.


Trial 75 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,059] Trial 76 pruned. Trial was pruned at iteration 10.


Trial 76 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,125] Trial 77 pruned. Trial was pruned at iteration 10.


Trial 77 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,191] Trial 78 pruned. Trial was pruned at iteration 10.


Trial 78 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,254] Trial 79 pruned. Trial was pruned at iteration 10.


Trial 79 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,317] Trial 80 pruned. Trial was pruned at iteration 10.


Trial 80 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,388] Trial 81 pruned. Trial was pruned at iteration 10.


Trial 81 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,570] Trial 82 finished with value: 0.9790209790209791 and parameters: {'bagging_fraction': 0.818345732780162, 'bagging_freq': 3, 'min_child_samples': 22}. Best is trial 5 with value: 0.993006993006993.


Trial 82 finished with value: 0.9790209790209791 and parameters: {'bagging_fraction': 0.818345732780162, 'bagging_freq': 3, 'min_child_samples': 22}. Best is trial 5 with value: 0.993006993006993.


[I 2023-09-21 11:36:27,633] Trial 83 pruned. Trial was pruned at iteration 10.


Trial 83 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,696] Trial 84 pruned. Trial was pruned at iteration 10.


Trial 84 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,759] Trial 85 pruned. Trial was pruned at iteration 10.


Trial 85 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,826] Trial 86 pruned. Trial was pruned at iteration 10.


Trial 86 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,901] Trial 87 pruned. Trial was pruned at iteration 10.


Trial 87 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:27,972] Trial 88 pruned. Trial was pruned at iteration 10.


Trial 88 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:28,035] Trial 89 pruned. Trial was pruned at iteration 10.


Trial 89 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:28,098] Trial 90 pruned. Trial was pruned at iteration 10.


Trial 90 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:28,177] Trial 91 pruned. Trial was pruned at iteration 10.


Trial 91 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:28,245] Trial 92 pruned. Trial was pruned at iteration 10.


Trial 92 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:28,308] Trial 93 pruned. Trial was pruned at iteration 10.


Trial 93 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:28,379] Trial 94 pruned. Trial was pruned at iteration 10.


Trial 94 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:28,442] Trial 95 pruned. Trial was pruned at iteration 10.


Trial 95 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:28,509] Trial 96 pruned. Trial was pruned at iteration 21.


Trial 96 pruned. Trial was pruned at iteration 21.


[I 2023-09-21 11:36:28,588] Trial 97 pruned. Trial was pruned at iteration 10.


Trial 97 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:28,641] Trial 98 pruned. Trial was pruned at iteration 10.


Trial 98 pruned. Trial was pruned at iteration 10.


[I 2023-09-21 11:36:28,704] Trial 99 pruned. Trial was pruned at iteration 10.


Trial 99 pruned. Trial was pruned at iteration 10.


### Gráfica de funciones

Para visualizar el historial de optimización:

In [23]:
plot_optimization_history(study)

Para ver las curvas de aprendizaje de las pruebas:

In [24]:
plot_intermediate_values(study)

Para visualizar relaciones de parámetros de alta dimensión: 

In [25]:
plot_parallel_coordinate(study)

O si queremos seleccionar sólo algunos parámetros específicos para visualizar:

In [26]:
plot_parallel_coordinate(study, params = ["bagging_freq", "bagging_fraction"])

Para visualizar relaciones entre los hiperparámetros:

In [27]:
plot_contour(study)

In [28]:
plot_contour(study, params = ["bagging_freq", "bagging_fraction"])

Visualizar hiperparámetros individuales como gráfico de slice:

In [29]:
plot_slice(study)

Para visualizar la importancia de los parámetros.

In [30]:
plot_param_importances(study)

Si queremos aprender cuáles hiperparámetros están afectando la duración de la prueba con la importancia de los hiperparámetros:

In [31]:
optuna.visualization.plot_param_importances(study, 
                                            target = lambda t: t.duration.total_seconds(),
                                            target_name = "duration")

Para visualizar las relaciones entre los parámetros con gráficos de dispersión coloreados por los valores objetivos:

In [32]:
plot_rank(study)


plot_rank is experimental (supported from v3.2.0). The interface can change in the future.



Y finalmente, para visualizar la serie de tiempo de la optimización para las pruebas realizadas:

In [33]:
plot_timeline(study)


plot_timeline is experimental (supported from v3.2.0). The interface can change in the future.



## Ejemplo con Tensorflow: MNIST

Pongamos en práctica lo que hemos aprendido sobre Optuna. 

In [34]:
import optuna
import tensorflow as tf
import mlflow
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.utils import to_categorical

Haremos el clásico ejemplo de clasificar imágenes de dígitos escritos a mano. Para ello, usaremos el dataset MNIST. 

Podemos dividir los datos en los conjuntos de entrenamiento y prueba, así como normalizarlos y convirtiendo las etiquetas a una representación categórica (*one-hot encoding*).

In [35]:
# 1. Cargamos el dataset MNIST

(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()

x_train, x_test = x_train/255., x_test/255.

y_train, y_test = to_categorical(y_train), to_categorical(y_test)

Después, definimos la función `create_model()` la cual recibe el objeto `trial` y define y compila un modelo de red neuronal a partir de un conjunto de hiperparámetros. 

`trial` sugiere los valores a utilizar durante el estudio.

In [36]:
# Definimos la función create_model

def create_model(trial):

    model = models.Sequential()

    model.add(layers.Flatten(input_shape = (28, 28)))

    n_layers = trial.suggest_int("n_layers", 1, 5)

    # Optuna sugiere: Número de unidades por capa

    for i in range(n_layers):

        n_units = trial.suggest_int(f"n_units_l{i}", 32, 128)

        model.add(layers.Dense(n_units, activation = 'relu'))

    model.add(layers.Dense(10, activation = 'softmax'))

    # Optuna sugiere: Valor de learning rate

    lr = trial.suggest_float("lr", 1e-5, 1e-1, log = True)

    # Optuna sugiere: Optimizador

    optimizer_options = ["sgd", "adam", "rmsprop"]

    optimizer_selected = trial.suggest_categorical("optimizer", optimizer_options)

    if optimizer_selected == "adam":

        optimizer = tf.keras.optimizers.Adam(learning_rate = lr)
    
    elif optimizer_selected == "sgd":

        optimizer = tf.keras.optimizers.SGD(learning_rate = lr)

    elif optimizer_selected == "rmsprop":

        optimizer = tf.keras.optimizers.RMSprop(learning_rate = lr)

    # Compilamos el modelo

    model.compile(optimizer = optimizer,
                  loss = "categorical_crossentropy",
                  metrics = ["accuracy"])
    
    return model

También, como lo hicimos anteriormente, definimos la función `objective`, la cual define el objetivo de la optimización. 

In [37]:
def objective(trial):

    modelo = create_model(trial)

    modelo.fit(x_train, y_train, epochs = 5, batch_size = 64, verbose = 0)

    loss, accuracy = modelo.evaluate(x_test, y_test, verbose = 0)
        
    return loss

Finalmente, creamos un objeto `study`, pasando el argumento `direction = "minimize"`, que indica que la meta es minimizar la función `objective()`, indicando que realice 15 pruebas con los parámetros sugeridos, para encontrar el conjunto óptimo.

In [38]:

study = optuna.create_study(direction = "minimize")

study.optimize(objective, n_trials = 15)

[I 2023-09-21 11:36:31,862] A new study created in memory with name: no-name-e7ea1e13-f33f-4bae-b8e0-aae9fe57af4c


A new study created in memory with name: no-name-e7ea1e13-f33f-4bae-b8e0-aae9fe57af4c


[I 2023-09-21 11:37:35,466] Trial 0 finished with value: 0.13771341741085052 and parameters: {'n_layers': 2, 'n_units_l0': 101, 'n_units_l1': 48, 'lr': 0.007711398857368533, 'optimizer': 'adam'}. Best is trial 0 with value: 0.13771341741085052.


Trial 0 finished with value: 0.13771341741085052 and parameters: {'n_layers': 2, 'n_units_l0': 101, 'n_units_l1': 48, 'lr': 0.007711398857368533, 'optimizer': 'adam'}. Best is trial 0 with value: 0.13771341741085052.


[I 2023-09-21 11:38:35,001] Trial 1 finished with value: 0.12340866029262543 and parameters: {'n_layers': 2, 'n_units_l0': 70, 'n_units_l1': 41, 'lr': 0.003012808172117605, 'optimizer': 'rmsprop'}. Best is trial 1 with value: 0.12340866029262543.


Trial 1 finished with value: 0.12340866029262543 and parameters: {'n_layers': 2, 'n_units_l0': 70, 'n_units_l1': 41, 'lr': 0.003012808172117605, 'optimizer': 'rmsprop'}. Best is trial 1 with value: 0.12340866029262543.


[I 2023-09-21 11:39:13,948] Trial 2 finished with value: 0.1751943975687027 and parameters: {'n_layers': 3, 'n_units_l0': 124, 'n_units_l1': 48, 'n_units_l2': 58, 'lr': 0.005442268439829251, 'optimizer': 'rmsprop'}. Best is trial 1 with value: 0.12340866029262543.


Trial 2 finished with value: 0.1751943975687027 and parameters: {'n_layers': 3, 'n_units_l0': 124, 'n_units_l1': 48, 'n_units_l2': 58, 'lr': 0.005442268439829251, 'optimizer': 'rmsprop'}. Best is trial 1 with value: 0.12340866029262543.


[I 2023-09-21 11:40:18,749] Trial 3 finished with value: 0.11885558068752289 and parameters: {'n_layers': 3, 'n_units_l0': 122, 'n_units_l1': 97, 'n_units_l2': 66, 'lr': 0.00013421360666602302, 'optimizer': 'adam'}. Best is trial 3 with value: 0.11885558068752289.


Trial 3 finished with value: 0.11885558068752289 and parameters: {'n_layers': 3, 'n_units_l0': 122, 'n_units_l1': 97, 'n_units_l2': 66, 'lr': 0.00013421360666602302, 'optimizer': 'adam'}. Best is trial 3 with value: 0.11885558068752289.


[I 2023-09-21 11:41:25,615] Trial 4 finished with value: 0.9541876316070557 and parameters: {'n_layers': 3, 'n_units_l0': 37, 'n_units_l1': 43, 'n_units_l2': 65, 'lr': 0.05696244653419064, 'optimizer': 'rmsprop'}. Best is trial 3 with value: 0.11885558068752289.


Trial 4 finished with value: 0.9541876316070557 and parameters: {'n_layers': 3, 'n_units_l0': 37, 'n_units_l1': 43, 'n_units_l2': 65, 'lr': 0.05696244653419064, 'optimizer': 'rmsprop'}. Best is trial 3 with value: 0.11885558068752289.


[I 2023-09-21 11:42:14,592] Trial 5 finished with value: 0.09808095544576645 and parameters: {'n_layers': 3, 'n_units_l0': 78, 'n_units_l1': 126, 'n_units_l2': 60, 'lr': 0.05752434764095089, 'optimizer': 'sgd'}. Best is trial 5 with value: 0.09808095544576645.


Trial 5 finished with value: 0.09808095544576645 and parameters: {'n_layers': 3, 'n_units_l0': 78, 'n_units_l1': 126, 'n_units_l2': 60, 'lr': 0.05752434764095089, 'optimizer': 'sgd'}. Best is trial 5 with value: 0.09808095544576645.


[I 2023-09-21 11:43:16,771] Trial 6 finished with value: 0.08969108015298843 and parameters: {'n_layers': 4, 'n_units_l0': 119, 'n_units_l1': 86, 'n_units_l2': 112, 'n_units_l3': 91, 'lr': 0.00024040850246942775, 'optimizer': 'adam'}. Best is trial 6 with value: 0.08969108015298843.


Trial 6 finished with value: 0.08969108015298843 and parameters: {'n_layers': 4, 'n_units_l0': 119, 'n_units_l1': 86, 'n_units_l2': 112, 'n_units_l3': 91, 'lr': 0.00024040850246942775, 'optimizer': 'adam'}. Best is trial 6 with value: 0.08969108015298843.


[I 2023-09-21 11:44:07,795] Trial 7 finished with value: 0.37900349497795105 and parameters: {'n_layers': 2, 'n_units_l0': 114, 'n_units_l1': 51, 'lr': 0.0023572885912189617, 'optimizer': 'sgd'}. Best is trial 6 with value: 0.08969108015298843.


Trial 7 finished with value: 0.37900349497795105 and parameters: {'n_layers': 2, 'n_units_l0': 114, 'n_units_l1': 51, 'lr': 0.0023572885912189617, 'optimizer': 'sgd'}. Best is trial 6 with value: 0.08969108015298843.


[I 2023-09-21 11:45:08,302] Trial 8 finished with value: 0.1021626815199852 and parameters: {'n_layers': 5, 'n_units_l0': 116, 'n_units_l1': 52, 'n_units_l2': 75, 'n_units_l3': 33, 'n_units_l4': 99, 'lr': 0.0005769984149991314, 'optimizer': 'adam'}. Best is trial 6 with value: 0.08969108015298843.


Trial 8 finished with value: 0.1021626815199852 and parameters: {'n_layers': 5, 'n_units_l0': 116, 'n_units_l1': 52, 'n_units_l2': 75, 'n_units_l3': 33, 'n_units_l4': 99, 'lr': 0.0005769984149991314, 'optimizer': 'adam'}. Best is trial 6 with value: 0.08969108015298843.


[I 2023-09-21 11:45:46,243] Trial 9 finished with value: 0.6329582333564758 and parameters: {'n_layers': 1, 'n_units_l0': 38, 'lr': 0.08245600062993334, 'optimizer': 'adam'}. Best is trial 6 with value: 0.08969108015298843.


Trial 9 finished with value: 0.6329582333564758 and parameters: {'n_layers': 1, 'n_units_l0': 38, 'lr': 0.08245600062993334, 'optimizer': 'adam'}. Best is trial 6 with value: 0.08969108015298843.


[I 2023-09-21 11:46:43,635] Trial 10 finished with value: 0.3321899473667145 and parameters: {'n_layers': 5, 'n_units_l0': 99, 'n_units_l1': 82, 'n_units_l2': 122, 'n_units_l3': 110, 'n_units_l4': 32, 'lr': 1.785199667793032e-05, 'optimizer': 'adam'}. Best is trial 6 with value: 0.08969108015298843.


Trial 10 finished with value: 0.3321899473667145 and parameters: {'n_layers': 5, 'n_units_l0': 99, 'n_units_l1': 82, 'n_units_l2': 122, 'n_units_l3': 110, 'n_units_l4': 32, 'lr': 1.785199667793032e-05, 'optimizer': 'adam'}. Best is trial 6 with value: 0.08969108015298843.


[I 2023-09-21 11:47:35,436] Trial 11 finished with value: 1.9311206340789795 and parameters: {'n_layers': 4, 'n_units_l0': 72, 'n_units_l1': 127, 'n_units_l2': 113, 'n_units_l3': 81, 'lr': 0.00037332091663808175, 'optimizer': 'sgd'}. Best is trial 6 with value: 0.08969108015298843.


Trial 11 finished with value: 1.9311206340789795 and parameters: {'n_layers': 4, 'n_units_l0': 72, 'n_units_l1': 127, 'n_units_l2': 113, 'n_units_l3': 81, 'lr': 0.00037332091663808175, 'optimizer': 'sgd'}. Best is trial 6 with value: 0.08969108015298843.


[I 2023-09-21 11:48:28,832] Trial 12 finished with value: 0.1834155023097992 and parameters: {'n_layers': 4, 'n_units_l0': 88, 'n_units_l1': 125, 'n_units_l2': 33, 'n_units_l3': 127, 'lr': 0.018851800842778852, 'optimizer': 'sgd'}. Best is trial 6 with value: 0.08969108015298843.


Trial 12 finished with value: 0.1834155023097992 and parameters: {'n_layers': 4, 'n_units_l0': 88, 'n_units_l1': 125, 'n_units_l2': 33, 'n_units_l3': 127, 'lr': 0.018851800842778852, 'optimizer': 'sgd'}. Best is trial 6 with value: 0.08969108015298843.


[I 2023-09-21 11:49:21,509] Trial 13 finished with value: 0.14339764416217804 and parameters: {'n_layers': 4, 'n_units_l0': 55, 'n_units_l1': 104, 'n_units_l2': 96, 'n_units_l3': 77, 'lr': 0.022091351249891724, 'optimizer': 'sgd'}. Best is trial 6 with value: 0.08969108015298843.


Trial 13 finished with value: 0.14339764416217804 and parameters: {'n_layers': 4, 'n_units_l0': 55, 'n_units_l1': 104, 'n_units_l2': 96, 'n_units_l3': 77, 'lr': 0.022091351249891724, 'optimizer': 'sgd'}. Best is trial 6 with value: 0.08969108015298843.


[I 2023-09-21 11:50:18,432] Trial 14 finished with value: 0.7806804180145264 and parameters: {'n_layers': 4, 'n_units_l0': 85, 'n_units_l1': 71, 'n_units_l2': 96, 'n_units_l3': 81, 'lr': 0.0009877185587115079, 'optimizer': 'sgd'}. Best is trial 6 with value: 0.08969108015298843.


Trial 14 finished with value: 0.7806804180145264 and parameters: {'n_layers': 4, 'n_units_l0': 85, 'n_units_l1': 71, 'n_units_l2': 96, 'n_units_l3': 81, 'lr': 0.0009877185587115079, 'optimizer': 'sgd'}. Best is trial 6 with value: 0.08969108015298843.


In [39]:
print("Número de pruebas terminadas: ", len(study.trials))

trial = study.best_trial

print("Mejor intento: ", trial)


print("Valor: ", trial.value)
print("Hiperparámetros: ", trial.params)

Número de pruebas terminadas:  15
Mejor intento:  FrozenTrial(number=6, state=TrialState.COMPLETE, values=[0.08969108015298843], datetime_start=datetime.datetime(2023, 9, 21, 11, 42, 14, 592961), datetime_complete=datetime.datetime(2023, 9, 21, 11, 43, 16, 769889), params={'n_layers': 4, 'n_units_l0': 119, 'n_units_l1': 86, 'n_units_l2': 112, 'n_units_l3': 91, 'lr': 0.00024040850246942775, 'optimizer': 'adam'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_layers': IntDistribution(high=5, log=False, low=1, step=1), 'n_units_l0': IntDistribution(high=128, log=False, low=32, step=1), 'n_units_l1': IntDistribution(high=128, log=False, low=32, step=1), 'n_units_l2': IntDistribution(high=128, log=False, low=32, step=1), 'n_units_l3': IntDistribution(high=128, log=False, low=32, step=1), 'lr': FloatDistribution(high=0.1, log=True, low=1e-05, step=None), 'optimizer': CategoricalDistribution(choices=('sgd', 'adam', 'rmsprop'))}, trial_id=6, value=None)
Valor:  0.089

In [40]:
plot_optimization_history(study)

In [41]:
plot_parallel_coordinate(study)

In [42]:
plot_slice(study)

In [43]:
plot_param_importances(study)

In [44]:
plot_rank(study)


plot_rank is experimental (supported from v3.2.0). The interface can change in the future.

