# **TP HPO : Hyperparameter Optimization in Deep Learning**

* Author : IDRIS-CNRS, Guerda.K, Hunout.L
* Version : 2024-03
* License : CC BY-NC-SA 4.0

## **Initial Setup and Instructions**
This section prepares the notebook environment for our exercises and introduces key tools we'll use throughout.


### Auto-Reloading

To ensure any changes in imported modules are automatically updated in the notebook, we enable auto-reloading. This is particularly useful during development, as it saves time that would otherwise be spent restarting the kernel.


In [None]:
%load_ext autoreload
%autoreload 2

### Understanding `ToDo`

Throughout this notebook, you'll encounter placeholders marked with `ToDo`. This indicates a section where you, the student, need to complete or modify the code. The `ToDo` class is designed to remind you of tasks that need to be addressed to proceed with the exercises. It's a prompt for active participation in the learning process.

Whenever you see a `ToDo` in the code, it's your cue to engage with the material by writing or modifying code. This interactive approach helps reinforce learning through practice.

In [None]:
class ToDo:
    def __init__(self, message=None):
        if message is None:
            self.message = "This part of the code needs to be completed by the student."
        else:
            self.message = message

    def __call__(self, *args, **kwargs):
        print("TODO: Complete this part of the code.")
        raise NotImplementedError(self.message)

    # Example of an additional method that could be useful in the future
    def hint(self):
        print(f"Hint: {self.message}")

# Usage
todo = ToDo()

## **Introduction to Hyperparameter Optimization**

Hyperparameter Optimization (HPO) is a critical process in developing deep learning models. It involves finding the most effective combination of hyperparameters, which are the configuration settings used to structure the learning process of a model. Unlike model parameters, which are learned during training, hyperparameters are set prior to the learning process and can significantly impact the performance of the model.

### Goals of HPO
- **Maximize Performance**: Optimize model accuracy, precision, recall, or any other relevant metric.
- **Efficiency**: Reduce training time and computational resources without compromising performance.
- **Generalization**: Enhance the model's ability to perform well on unseen data.


## Overview of HPO Frameworks

In the realm of machine learning and deep learning, several frameworks are available for hyperparameter optimization (HPO). Each of these frameworks offers unique features and capabilities. Among the most notable are Optuna, Ray Tune, Hyperopt, and Scikit-Optimize. However, in this notebook, we will primarily focus on Optuna and Ray Tune due to their specific advantages and relevance in different scenarios.

Optuna

- **Overview**: Optuna is a versatile and user-friendly open-source optimization framework specifically tailored for machine learning. Known for its efficiency and ease of use, Optuna is particularly well-suited for individuals and teams starting their journey with HPO. 
- **Why We Focus on Optuna**: We choose Optuna for its intuitive API, efficient optimization algorithms, and excellent visualization capabilities, making it ideal for educational purposes and straightforward HPO tasks.
- [Optuna Website](https://optuna.org/)

Ray Tune

- **Overview**: Ray Tune is a powerful component of the Ray ecosystem, designed for distributed hyperparameter tuning. It is especially useful for handling large-scale, computationally intensive HPO tasks.
- **Why Ray Tune is an Interesting Choice**: We include Ray Tune due to its scalability, ability to leverage distributed computing resources effectively, and integration with various machine learning frameworks, making it suitable for more advanced, large-scale HPO scenarios.
- [Ray Tune Website](https://docs.ray.io/en/latest/tune/)

Other Notable Frameworks

- **Hyperopt**: A popular tool for optimizing over awkward search spaces with real-valued, discrete, and conditional dimensions. [Hyperopt Website](http://hyperopt.github.io/hyperopt/)
- **Scikit-Optimize**: A library for sequential model-based optimization that is built on top of Scikit-Learn. It's particularly straightforward for those already familiar with the Scikit-Learn ecosystem. [Scikit-Optimize Website](https://scikit-optimize.github.io/stable/)

While there are other excellent frameworks available, Optuna and Ray Tune stand out for their distinct advantages in specific use cases. Optuna's user-friendly nature makes it an excellent teaching tool for understanding the basics and intricacies of HPO, while Ray Tune's scalability and advanced features make it a robust choice for tackling more complex, resource-intensive HPO tasks.



## Installing the Frameworks

Before diving into practical examples, you may need to install Optuna and Ray Tune. These libraries can be easily installed using pip. However, they might already be installed in your environment, especially if you are using a pre-configured setup like Google Colab or a managed Jupyter environment.

To check if these libraries are already installed, you can import them in your notebook. If they are not installed, you can install them using the following pip commands:

In [None]:
try:
    import optuna
    print("Optuna is already installed.")
except ImportError:
    print("Optuna is not installed.")

try:
    import ray.tune
    print("Ray Tune is already installed.")
except ImportError:
    print("Ray Tune is not installed.")

If you find that Optuna or Ray Tune is not installed, you can install them using the following commands:


Remember to restart the kernel of your Jupyter notebook after installing the libraries to ensure that the changes take effect.

```bash
# To install Optuna
!pip install optuna --user

# To install Ray Tune
!pip install ray[tune] --user
```

---
---
---
## **Synthetic Optimization with a Simple Function**

To grasp the fundamentals of hyperparameter optimization (HPO) without the complexity of training models, we'll start with a synthetic example. This approach involves a straightforward function that simulates loss based on two hyperparameters and a mode—either 'simple' or 'complex'. This synthetic function facilitates exploration of the HPO process efficiently.

### Setting Up the Simulation

Our goal with the `synthetic_loss_function` is to explore how different combinations of hyperparameters affect the loss, depending on the mode of operation. This function demonstrates the impact of hyperparameters on a model's loss landscape in a simplified manner.

### Implementing the Synthetic Loss Function

The task involves implementing a single `synthetic_loss_function` capable of operating in two distinct modes. This design allows us to examine different types of loss landscapes: one straightforward with a single minimum and another more intricate with multiple minima.

> **Task**: Implement the `synthetic_loss_function` that behaves differently based on the `mode` parameter:
- The `mode='simple'` loss, will be a variant of the famous [Rosenbrock function](https://en.wikipedia.org/wiki/Rosenbrock_function) (a=b=1) :
  $$\text{f(x,y)} = (a - x)^2 + b(y - x^2)^2$$
  
- The `mode='complex'` loss, we will use the [Himmelblau's function](https://en.wikipedia.org/wiki/Himmelblau%27s_function). It's featuring multiple minima and a more dynamic landscape :
  $$\text{f(x,y)} = (x^2 + y - 11)^2 + (x + y^2 - 7)^2$$

In [None]:
import numpy as np

def synthetic_loss_function(hyperparam1, hyperparam2, mode='simple'):
    # Implementation based on mode
    if mode == 'simple':
        # Calculate the loss for the simple mode
        return todo() #Implement Rosenbrock
    
    elif mode == 'complex':
        # Calculate the loss for the complex mode
        return todo() #Implement Himmelblaus
    
    else:
        raise ValueError("Invalid mode specified. Choose either 'simple' or 'complex'.")


<details>
<summary>Solution (click to reveal)</summary>
Here's the completed `synthetic_loss_function`:

```python
import numpy as np

def synthetic_loss_function(hyperparam1, hyperparam2, mode='simple'):
    # Implementation based on mode
    if mode == 'simple':
        # Calculate the loss for the simple mode
        return ((1 - hyperparam1)**2 + (hyperparam2 - hyperparam1**2)**2)
    elif mode == 'complex':
        # Calculate the loss for the complex mode
        return (hyperparam1**2+hyperparam2-11)**2+(hyperparam1+hyperparam2**2-7)**2
    else:
        raise ValueError("Invalid mode specified. Choose either 'simple' or 'complex'.")
```
</details>


### Visualizing a Synthetic Loss Landscape

In practice, the loss landscape of deep learning models is intricate and high-dimensional, making direct visualization challenging. For this exercise, we simplify the concept by using a synthetic loss function. This visualization serves as a conceptual tool to illustrate the effects of hyperparameter adjustments on model performance, rather than a practical approach to navigating real-world loss landscapes.

This simplified exercise aims to provide insight into the optimization process in a visual and intuitive manner.

> **Task**: Generate a meshgrid for parameter values and use `synthetic_loss_function` to calculate the loss. Then, create a contour plot to visualize the loss landscape. Ensure the range for parameters is broad enough to effectively visualize the landscapes for both modes (-5 to 5 both axes).


> **Task**: Apply the `synthetic_loss_function` to calculate the loss across the meshgrid for both 'simple' and 'complex' modes.



In [None]:
import numpy as np

# Define the range for both parameters using np.linspace and that accommodates both 'simple' and 'complex' modes
hyperparam1_range = todo()
hyperparam2_range = todo()

# Create a meshgrid for the parameter values
hyperparam1, hyperparam2 =  np.meshgrid(hyperparam1_range,hyperparam2_range)

# Calculate the loss for each combination of param1 and param2 for both modes
# Hint: Use the synthetic_loss_function you defined earlier with 'simple' and 'complex' modes
loss_simple = todo()
loss_complex = todo()


<details>
<summary>Solution (click to reveal)</summary>

```python
import numpy as np

# Define the range for both parameters using np.linspace and that accommodates both 'simple' and 'complex' modes
hyperparam1_range = np.linspace(-5,5,100)
hyperparam2_range = np.linspace(-5,5,100)


# Create a meshgrid for the parameter values
hyperparam1, hyperparam2 =  np.meshgrid(hyperparam1_range,hyperparam2_range)

# Calculate the loss for each combination of param1 and param2 for both modes
# Hint: Use the synthetic_loss_function you defined earlier with 'simple' and 'complex' modes
loss_simple = synthetic_loss_function(hyperparam1,hyperparam2)
loss_complex = synthetic_loss_function(hyperparam1,hyperparam2,mode='complex')
```
</details>

In [None]:
import matplotlib.pyplot as plt
import matplotlib.colors as colors

plt.figure(figsize=(12, 6))

# Plotting the loss landscape for 'simple' mode
plt.subplot(2, 2, 1)
plt.contourf(hyperparam1, hyperparam2, loss_simple, levels=50, cmap='viridis') # add norm=colors.LogNorm() to see global minimum
plt.colorbar()
plt.title('Simple Mode Loss Landscape')
plt.xlabel('Hyperparameter 1')
plt.ylabel('Hyperparameter 2')

plt.subplot(2, 2, 3)
plt.contourf(hyperparam1, hyperparam2, loss_simple, levels=50, cmap='viridis', norm=colors.LogNorm())
plt.colorbar()

# Plotting the loss landscape for 'complex' mode
plt.subplot(2, 2, 2)
plt.contourf(hyperparam1, hyperparam2, loss_complex, levels=50, cmap='viridis')
plt.colorbar()
plt.title('Complex Mode Loss Landscape')
plt.xlabel('Hyperparameter 1')
plt.ylabel('Hyperparameter 2')

plt.subplot(2, 2, 4)
plt.contourf(hyperparam1, hyperparam2, loss_complex, levels=50, cmap='viridis', norm=colors.LogNorm())
plt.colorbar()


plt.tight_layout()
plt.show()

---
---
---
## **HPO with Optuna**

Hyperparameter Optimization (HPO) plays a crucial role in enhancing the performance of machine learning models by efficiently finding the best set of hyperparameters. It bridges the gap between theoretical understanding and practical application, moving beyond trial-and-error to systematic and automated search strategies. Optuna stands out in the HPO landscape, offering a user-friendly interface and efficient algorithms for exploring complex hyperparameter spaces. Its versatility makes it suitable for a wide range of applications, from tuning simple models to optimizing sophisticated deep learning architectures.

### Define the Objective Function

The cornerstone of using Optuna for HPO is the objective function. This function evaluates how well a set of hyperparameters performs against a predefined metric, typically the loss or accuracy of a model. Within Optuna, a `trial` object suggests hyperparameters, allowing the objective function to be dynamically adjusted based on the trial's performance. 

> **Task**: Implement an `objective` function for Optuna, designed to work with both 'simple' and 'complex' modes of a synthetic loss function. This entails defining hyperparameter ranges suitable for both modes and incorporating a mode selector. Utilize Optuna's `suggest_float` for continuous hyperparameters and `suggest_categorical` for selecting the operational mode. This exercise will introduce you to defining hyperparameter spaces and optimizing them within the Optuna framework. For detailed guidance, consult the documentation for [`suggest_float`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html#optuna.trial.Trial.suggest_float) and [`suggest_categorical`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html#optuna.trial.Trial.suggest_categorical).

In [None]:
import optuna

def objective(trial):
    # Use trial.suggest_float for hyperparam1. Think about the appropriate range.
    hyperparam1 = todo()
    
    # Use trial.suggest_float for hyperparam2. Consider what range might be best.
    hyperparam2 = todo()
    
    # Use trial.suggest_categorical to choose between 'simple' and 'complex' modes.
    mode = todo()
    
    return synthetic_loss_function(hyperparam1, hyperparam2, mode)


<details>
<summary>Solution (click to reveal)</summary>

```python
import optuna

def objective(trial):
    # Use trial.suggest_float for hyperparam1. Think about the appropriate range.
    hyperparam1 = trial.suggest_float('hyperparam1',-5,5)
    
    # Use trial.suggest_float for hyperparam2. Consider what range might be best.
    hyperparam2 = trial.suggest_float('hyperparam1',-5,5)
    
    # Use trial.suggest_categorical to choose between 'simple' and 'complex' modes.
    mode = trial.suggest_categorical("mode",["simple","complex"])
    
    return synthetic_loss_function(hyperparam1, hyperparam2, mode)
```
</details>

### Exploring Optimization Strategies with Optuna

Optuna supports a variety of hyperparameter optimization strategies, each offering distinct advantages. This section will guide you through the creation and optimization of studies using three key strategies: Grid Search, Bayesian Optimization (TPE), and Random Sampling.

#### Grid Search
> **Task**: Initialize an Optuna study with the [`GridSampler`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.samplers.GridSampler.html) to perform an exhaustive search over a predefined grid of hyperparameter values. This method is thorough, ensuring no potential combination is overlooked, though it may be computationally demanding.

#### Bayesian Optimization (TPE)
> **Task**: Employ the default [`TPESampler`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.samplers.TPESampler.html) for Bayesian Optimization. This approach uses a probabilistic model to intelligently propose hyperparameter sets, optimizing in complex, high-dimensional spaces efficiently.

#### Random Sampling
> **Task**: Utilize the [`RandomSampler`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.samplers.RandomSampler.html) for a stochastic exploration of the hyperparameter space. Random Sampling provides a baseline by selecting hyperparameters without prior assumptions, offering a chance to identify good parameters early in less complex spaces.

For each strategy:
- **Setting Up the Study**: Create an Optuna study specifying the chosen sampler. This setup defines the optimization strategy for your hyperparameter search.
  
- **Running the Optimization**: Optimize your study by invoking the `optimize` method with your objective function. The number of trials can be adjusted to balance thorough exploration with computational resource constraints.

- **Analyzing the Results**: Upon completion, use `study.best_params` to review the most effective hyperparameters identified through each strategy, providing insights into their performance and suitability for your specific problem.

This exploration provides a hands-on comparison of different optimization strategies within Optuna, demonstrating their utility and effectiveness across a range of optimization scenarios.

In [None]:
import logging

import optuna
from optuna.samplers import TPESampler, GridSampler, RandomSampler

# Adjust Optuna's logging level to WARN to reduce output verbosity
logging.getLogger("optuna").setLevel(logging.WARNING)

# Search space for Grid Search
grid_search_space = {
    'hyperparam1': [-3, -1, 0, 1, 3],
    'hyperparam2': [-3, -1, 0, 1, 3],
    'mode': ['simple', 'complex']
}

# Initialize studies with different samplers
grid_study = optuna.create_study(direction='minimize', sampler=GridSampler(grid_search_space), study_name='GridSearchStudy')
random_study = todo()  # Initialize Random Sampling study
tpe_study = todo()  # Initialize TPE study

# Specify the number of trials
n_trials = todo()

# Optimize the studies
grid_study.optimize(objective, n_trials=n_trials)
todo()  # Optimize Random Sampling study
todo()  # Optimize TPE study

# Print the best parameters found by each study
print(f"Grid Search Best parameters found: {grid_study.best_params}")
todo()  # Print best parameters for Random Sampling study
todo()  # Print best parameters for TPE study

<details>
<summary>Basic Solution (click to reveal)</summary>

This basic solution focuses on setting up and running the optimization without additional libraries for progress tracking.

```python
import optuna
from optuna.samplers import TPESampler, GridSampler, RandomSampler

# Adjust Optuna's logging level to WARN to reduce output verbosity
logging.getLogger("optuna").setLevel(logging.WARNING)

# Assuming 'objective' is defined elsewhere in your notebook
# Example objective function
def objective(trial):
    hyperparam1 = trial.suggest_float("hyperparam1", -5, 5)
    hyperparam2 = trial.suggest_float("hyperparam2", -5, 5)
    mode = trial.suggest_categorical("mode", ["simple", "complex"])
    # Example evaluation logic
    return (hyperparam1 - 1)**2 + (hyperparam2 - 2)**2

# Search space for Grid Search
grid_search_space = {
    'hyperparam1': [-3, -1, 0, 1, 3],
    'hyperparam2': [-3, -1, 0, 1, 3],
    'mode': ['simple', 'complex']
}

# Initialize studies with different samplers
grid_study = optuna.create_study(direction='minimize', sampler=GridSampler(grid_search_space), study_name='GridSearchStudy')
tpe_study = optuna.create_study(direction='minimize', sampler=TPESampler(), study_name='TPESamplerStudy')
random_study = optuna.create_study(direction='minimize', sampler=RandomSampler(), study_name='RandomSamplerStudy')

# Number of trials
n_trials = 100  # Specify the number of trials for demonstration

# Optimize the studies
grid_study.optimize(objective, n_trials=n_trials)
tpe_study.optimize(objective, n_trials=n_trials)
random_study.optimize(objective, n_trials=n_trials)

# Print the best parameters found by each study
print(f"Grid Search Best parameters found: {grid_study.best_params}")
print(f"TPE Best parameters found: {tpe_study.best_params}")
print(f"Random Sampling Best parameters found: {random_study.best_params}")
```

</details>

<details>
<summary>Advanced Solution with Progress Bar and less print (click to reveal)</summary>

This advanced solution incorporates `tqdm` for progress tracking, providing visual feedback during the optimization process.

```python
import logging
import optuna
from optuna.samplers import TPESampler, GridSampler, RandomSampler
from tqdm import tqdm

# Adjust Optuna's logging level to reduce output verbosity
logging.getLogger("optuna").setLevel(logging.WARNING)

# Search space for Grid Search
grid_search_space = {
    'hyperparam1': [-3, -1, 0, 1, 3],
    'hyperparam2': [-3, -1, 0, 1, 3],
    'mode': ['simple', 'complex']
}

# Initialize studies with different samplers
grid_study = optuna.create_study(direction='minimize', sampler=GridSampler(grid_search_space), study_name='GridSearchStudy')
tpe_study = optuna.create_study(direction='minimize', sampler=TPESampler(), study_name='TPESamplerStudy')
random_study = optuna.create_study(direction='minimize', sampler=RandomSampler(), study_name='RandomSamplerStudy')

# Define a callback for the progress bar update
def progress_bar_callback(study, trial):
    pbar.update(1)

# Number of trials
n_trials = 50

# Initialize tqdm progress bar for each study and optimize
for study in [grid_study, tpe_study, random_study]:
    with tqdm(total=n_trials, desc=f"Optimizing {study.study_name}", unit="trial") as pbar:
        study.optimize(objective, n_trials=n_trials, callbacks=[progress_bar_callback])
        pbar.close()  # Ensure the progress bar is closed after optimization

    # Print the best parameters found by each study
    print(f"{study.study_name} Best parameters found: {study.best_params}")
```

</details>

---
---
### Visualizing Comparisons of Optimization Strategies with Optuna

Optuna's visualization module provides a comprehensive toolkit for analyzing the outcomes of hyperparameter optimization experiments. These visual tools offer deep insights into the performance and dynamics of different optimization strategies, including Grid Search, TPE (Bayesian Optimization), and Random Sampling, highlighting the nuances of each approach in navigating the hyperparameter space.

- **[`plot_parallel_coordinate`](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_parallel_coordinate.html)**: Illustrates the interplay between hyperparameters and objective values, useful for comparing the influence of parameters across optimization strategies.

- **[`plot_optimization_history`](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_optimization_history.html)**: Chronicles the progression of objective value improvements across trials, offering insights into the efficiency and exploration depth of each strategy.

- **[`plot_param_importances`](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_param_importances.html)**: Highlights which hyperparameters significantly impact the objective value, aiding in strategy refinement for future optimizations.

- **[`plot_contour`](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_contour.html)**: Explores the synergy between pairs of hyperparameters and their collective effect on the objective, facilitating the identification of optimal parameter interactions.

- **[`plot_edf`](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_edf.html)**: Displays the empirical distribution of objective values across all trials, providing a macroscopic view of the optimization landscape influenced by different sampling strategies.

- **[`plot_intermediate_values`](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_intermediate_values.html)**: Reveals the trajectory of intermediate objective values, shedding light on the iterative progress within trials.

- **[`plot_slice`](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_slice.html)**: Demystifies how variations in individual hyperparameters impact the objective, highlighting parameter sensitivity.

- **[`plot_timeline`](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_timeline.html)**: Visualizes the timeline of trial execution, underscoring the operational efficiency and parallelization capabilities of each strategy.

- **[`plot_rank`](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_rank.html)**: Assesses the comparative performance of trials, accentuating the effectiveness of different optimization strategies.

> **Task**: Examine the visualizations to discern the distinct characteristics and efficacy of **Grid Search**, **Random Sampling** , and **TPE** strategies. Utilize insights from plots like hyperparameter importance and optimization history to devise nuanced future tuning approaches and enhance model optimization outcomes.


In [None]:
from IPython.display import display, HTML
import optuna.visualization as vis
from plotly.io import to_html
import matplotlib.pyplot as plt

def display_optuna_visualizations(study):
    """
    Display a "beautiful" series of Optuna visualization plots for a given study.

    Parameters:
    - study: The Optuna study to visualize.
    """
    # List of visualization functions to display, with a brief description as a comment
    visualization_functions = [
        vis.plot_parallel_coordinate,  # Hyperparameter Relationship Plot
        vis.plot_optimization_history, # Optimization History
        vis.plot_param_importances,    # Hyperparameter Importance
        vis.plot_contour,              # Contour Plot of Parameter Interactions
        vis.plot_edf,                  # EDF (Empirical Distribution Function)
        vis.plot_intermediate_values,  # Intermediate Values of All Trials
        vis.plot_slice,                # Slice Plot in a Study
        vis.plot_timeline,             # Timeline of a Study
        vis.plot_rank                  # Rank Plot
    ]
    grid_html = "<div style='display: grid; grid-template-columns: repeat( 3 , 1fr); gap: 10px;'>"
    for viz in visualization_functions:
        content = to_html(viz(study).update_layout(autosize=True), full_html=False, include_plotlyjs='cdn')
        grid_html += "<div>" + content + "</div>"
    grid_html += "</div>"
    display(HTML(grid_html))

<details>
<summary>WebGL Fix Firefox (click to reveal)</summary>

You can have a WebGL error on the training computer, they don't have any gpu.
Force WebGL on CPU :
* go to about:config
* search for webgl.force-enabled and make sure this preference is set to true

</details>

<details>
<summary>Alternative with matplotlib (click to reveal)</summary>

This basic solution focuses on setting up and running the optimization without additional libraries for progress tracking.

```python
import sys, io
import matplotlib
import matplotlib.pyplot as plt
import optuna.visualization.matplotlib as vis #note .matplotlib
from PIL import Image
import warnings
warnings.filterwarnings('ignore')

def display_optuna_visualizations(study):
    """
    Display a series of Optuna visualization plots for a given study.

    Parameters:
    - study: The Optuna study to visualize.
    """
    # List of visualization functions to display, with a brief description as a comment
    visualization_functions = [
        vis.plot_parallel_coordinate,  # Hyperparameter Relationship Plot
        vis.plot_optimization_history, # Optimization History
        vis.plot_param_importances,    # Hyperparameter Importance
        vis.plot_contour,              # Contour Plot of Parameter Interactions
        vis.plot_edf,                  # EDF (Empirical Distribution Function)
        vis.plot_intermediate_values,  # Intermediate Values of All Trials
        vis.plot_slice,                # Slice Plot in a Study
        vis.plot_timeline,             # Timeline of a Study
        vis.plot_rank                  # Rank Plot
    ]
    
    # Create a figure and axes for the grid of plots
    fig, axs = plt.subplots(3, 3, figsize=(25, 20))
    axs = axs.flatten()

    # Redirect stdout to a buffer
    stdout_buffer = io.StringIO()
    cell_stdout = sys.stdout
    sys.stdout = stdout_buffer

    # Call your plotting functions
    for i, ax in enumerate(axs):
        visualization_functions[i](study)

        # Capture the plot generated by the function
        captured_fig = plt.gcf()
        captured_ax = plt.gca()
        # Close the current figure to clear the plot
        plt.close(captured_fig)

        # Convert the captured plot to an image array
        buffer_ = io.BytesIO()
        captured_fig.savefig(buffer_, format='png', bbox_inches='tight')
        buffer_.seek(0)
        image = Image.open(buffer_)
        image= image.crop((0, 0, image.width, min(image.width,image.height)))

        # Plot the captured plot->image onto the current subplot
        ax.imshow(image)
        ax.axis('off')
        
    sys.stdout = cell_stdout 
    plt.tight_layout()
    plt.show()  
```

</details>

#### Grid search results

In [None]:
display_optuna_visualizations(grid_study)

#### Random search results

In [None]:
display_optuna_visualizations(random_study)

#### TPE search results

In [None]:
display_optuna_visualizations(tpe_study)

---
---
### Leveraging Optuna Pruners for Efficient Optimization

Optuna pruners enhance optimization by terminating unpromising trials early, conserving valuable computational resources.
#### Pruners in Action

- **[`SuccessiveHalvingPruner`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.pruners.SuccessiveHalvingPruner.html)**: An implementation of **ASHA**. This pruner evaluates trials at intervals, halving less promising ones, focusing resources on those with the most short potential.
- **[`HyperbandPruner`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.pruners.HyperbandPruner.html)**: An implementation of **BOHB**. This pruner evaluate trials at different step intervals, halving less promissing ones, focusing resources on those with the most potential.

#### Objective Function Adjustments for Pruning

To utilize pruning:
- **Intermediate Reporting**: The objective function must periodically report interim results using `trial.report()`.
- **Checkpoints for Pruning**: Incorporate `trial.should_prune()` checks after reporting, halting trials early with `optuna.TrialPruned()` if deemed non-promising.

#### Why Adapt Our Synthetic Function

Adjusting our synthetic function to report intermediate results allows pruners to make informed decisions on trial continuation. This step is crucial for pruning to effectively reduce computational load and improve the optimization process's overall efficiency.

Pruning with Optuna signifies a strategic layer to hyperparameter optimization, ensuring a smarter allocation of computational effort towards the most promising trials.

> **Note**: Effective use of pruners like `SuccessiveHalvingPruner` or `HyberbandPruner` necessitates modifications for intermediate evaluations within the objective function, enabling a dynamic and resource-efficient optimization workflow.

In [None]:
import time

def objective_with_pruning(trial):
    # Define hyperparameters
    hyperparam1 = trial.suggest_float("hyperparam1", -5, 5)
    hyperparam2 = trial.suggest_float("hyperparam2", -5, 5)
    mode = trial.suggest_categorical("mode", ["simple", "complex"])
    
    # Simulate a step-wise evaluation process
    accumulated_loss = 0
    steps = 10 
    best_loss = float('+inf')
    for step in range(1, steps + 1):
        
        end_loss = synthetic_loss_function(hyperparam1, hyperparam2 , mode)
        intermediate_loss =  end_loss * (1.1-np.tanh(step/3-1))
        
        if intermediate_loss < best_loss : best_loss = intermediate_loss
        
        # Report intermediate objective value
        trial.report(intermediate_loss, step)
        
        # Handle pruning based on the intermediate value
        if trial.should_prune():
            raise optuna.TrialPruned()
    
    return best_loss

> **Task**: Create and optimize an Optuna study with the [`SuccessiveHalvingPruner`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.pruners.SuccessiveHalvingPruner.html). Initialize the study to minimize the objective, using `objective_with_pruning`. Observe the pruning effect on trial completions and optimization efficiency. Analyze the best parameters found. Refer to Optuna's documentation on [creating a study](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.create_study.html) and [optimizing it](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.Study.html#optuna.study.Study.optimize) for guidance.

In [None]:
asha_pruner = todo()

asha_study = optuna.create_study(direction='minimize', 
                                 pruner=asha_pruner)

start_asha = time.time()
asha_study.optimize(objective_with_pruning, n_trials=400)
end_asha = time.time()

<details>
<summary>Solution (click to reveal)</summary>

This basic solution focuses on setting up and running the optimization without additional libraries for progress tracking.

```python
asha_pruner = optuna.pruners.SuccessiveHalvingPruner(min_resource='auto', 
                                                     reduction_factor=4,  
                                                     min_early_stopping_rate=0, 
                                                     bootstrap_count=0)

asha_study = optuna.create_study(direction='minimize', 
                                 pruner=asha_pruner)

start_asha = time.time()
asha_study.optimize(objective_with_pruning, n_trials=400)
end_asha = time.time()
```

</details>

In [None]:
print(f"ASHA,\n Best result : {asha_study.best_trial.value}\n Best config : {asha_study.best_params}, in {end_asha-start_asha}s")

> **Task**: Create and optimize an Optuna study with the [`HyberbandPruner`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.pruners.HyberbandPruner.html).

In [None]:
bohb_pruner = todo()

bohb_study = optuna.create_study(direction="minimize",
                                 pruner=bohb_pruner)

start_bohb = time.time()
bohb_study.optimize(objective_with_pruning, n_trials=400)
end_bohb = time.time()

<details>
<summary>Solution (click to reveal)</summary>

This basic solution focuses on setting up and running the optimization without additional libraries for progress tracking.

```python
bohb_pruner = optuna.pruners.HyperbandPruner(min_resource=1, 
                                             max_resource="auto", 
                                             reduction_factor=3)

bohb_study = optuna.create_study(direction="minimize",
                                 pruner=bohb_pruner)

start_bohb = time.time()
bohb_study.optimize(objective_with_pruning, n_trials=400)
end_bohb = time.time()
```

</details>

In [None]:
print(f"BOHB,\n Best result : {bohb_study.best_trial.value}\n Best config : {bohb_study.best_params}, in {end_bohb-start_bohb}s")

In [None]:
display_optuna_visualizations(asha_study)

In [None]:
display_optuna_visualizations(bohb_study)

---
---
---

## **HPO with Ray Tune (multi-workers)**

[Ray Tune](https://docs.ray.io/en/latest/tune/index.html) is a distributed hyperparameter tuning library that excels in optimizing machine learning models by efficiently managing and distributing trials. Its integration with the broader [Ray](https://ray.io/) ecosystem allows for scalable and parallel experimentation across clusters, making it ideal for tackling large-scale optimization tasks.

 Key Features:
- **Scalability**: Handles large-scale experiments across multiple CPUs or GPUs with ease.
- **Framework Agnostic**: Compatible with many ML frameworks like TensorFlow, PyTorch, and scikit-learn.
- **Advanced Algorithms**: Supports advanced search algorithms and scheduling techniques, including Bayesian optimization and HyperBand.
- **Integration with Optuna**: Ray Tune can leverage Optuna for its optimization algorithms, combining Ray Tune's efficient execution with Optuna's sophisticated sampling methods.

Why Ray Tune?

Choose Ray Tune for **complex, resource-intensive models** where parallel execution and efficient resource management are crucial. Its ability to integrate with Optuna combines the best of both worlds: Ray Tune's distributed computing capabilities with Optuna's powerful sampling strategies.

The integration with Optuna can be achieved by using Optuna's samplers and search algorithms within Ray Tune's framework, offering a nuanced approach to hyperparameter tuning.

In [1]:
import os, sys, time
import ray
from ray import tune
from ray import train
from ray.tune.search.optuna import OptunaSearch

In [None]:
#init ray cluster
ray.init(num_cpus=10, ignore_reinit_error=True, log_to_driver=False) #20 logicals CPU

2024-03-04 16:50:12,726	INFO worker.py:1724 -- Started a local Ray instance.


Ray Tune is part of the Ray ecosystem. This therefore uses the master/slave/worker mechanisms present in [Ray Core](https://docs.ray.io/en/latest/ray-core/walkthrough.html). You can test ray Core by passing the following cell in *code* :

### Define the Trainable Function

Ray Tune use like Optuna *an objective* which is called here *a trainable*.
The trainable can be a class or a callable.

> **Task**: Implement a simple trainable, reuse our previous synthetic loss function. You can check this documentation [`Tune trainable`](https://docs.ray.io/en/releases-2.9.2/tune/key-concepts.html#ray-tune-trainables). We will optimize on `loss` metrics.

In [None]:
def trainable(config):
    loss = todo()
    time.sleep(5)
    return {"loss":loss}

<details>
<summary>Solution (click to reveal)</summary>

```python
def trainable(config):
    # Simulated model training and evaluation
    loss = synthetic_loss_function(config["hyperparam1"],config["hyperparam2"],config["mode"])
    # Simalated training time
    time.sleep(5)
    return {"loss":loss}
```

</details>

### Quick example with ray.run 

In [None]:
analysis = tune.run(
    trainable,
    config={
        "hyperparam1": tune.uniform(-5, 5),
        "hyperparam2": tune.uniform(-5, 5),
        "mode": tune.choice(["simple","complex"])
    },
    num_samples=50,
    max_concurrent_trials=10,
    metric="loss",
    mode="min",
)

In [None]:
print("Best loss = " , analysis.best_result['loss'],"with config: ", analysis.get_best_config(metric="loss", mode="min"))

---

### Exploring tuning configurations with ray.tune

In [None]:
def trainable_with_pruning(config):
    accumulated_loss = 0
    steps = 10 
    best_loss = float('+inf')
    
    for step in range(1, steps + 1):
        # we keep the same hyperparameters landscape
        end_loss = synthetic_loss_function(config["hyperparam1"],config["hyperparam2"],config["mode"])
        # but we scale it to simulate a gradient descent !
        intermediate_loss =  end_loss * (1.1-np.tanh(step/3-1))
        
        if intermediate_loss < best_loss : best_loss = intermediate_loss
        
        # Report intermediate objective value
        train.report({"iterations": step, "loss" : intermediate_loss})
        
        # Simulate complex step training (full training = 0.4x10 = 4s)
        time.sleep(0.4)
        
    return {"loss" : best_loss}

In [None]:
search_space = {"hyperparam1": tune.uniform(-5, 5),
                "hyperparam2": tune.uniform(-5, 5),
                "mode": tune.choice(["simple","complex"])
               }

In [None]:
import ray
import optuna
from ray import train, tune
from ray.tune.search import ConcurrencyLimiter
from ray.tune.search.optuna import OptunaSearch
from ray.tune.schedulers import AsyncHyperBandScheduler
# Need pip install :
#from ray.tune.search.hyperopt import HyperOptSearch
#from ray.tune.search.bayesopt import BayesOptSearch

> **Task**: Set a Tune scheduler (=pruner) : [AsyncHyperband](https://docs.ray.io/en/releases-2.9.2/tune/api/doc/ray.tune.schedulers.AsyncHyperBandScheduler.html)

> **Task**: Set a Optuna pruner with Tune : [OptunaSearch](https://docs.ray.io/en/latest/tune/api/doc/ray.tune.search.optuna.OptunaSearch.html#ray.tune.search.optuna.OptunaSearch)

We use a [TuneConfig](https://docs.ray.io/en/latest/tune/api/doc/ray.tune.TuneConfig.html#ray.tune.TuneConfig) here to be cleaner :

In [None]:
scheduler = todo()
algo = todo()

tune_config = tune.TuneConfig(
        metric="loss",
        mode="min",
        search_alg=algo,
        scheduler=scheduler,
        num_samples=100,
)

tuner = tune.Tuner(
    trainable = trainable_with_pruning,
    param_space = search_space,
    tune_config = tune_config,
)

<details>
<summary>Solution (click to reveal)</summary>

```python
scheduler = AsyncHyperBandScheduler()
algo = OptunaSearch(sampler= optuna.samplers.TPESampler())
algo = ConcurrencyLimiter(algo, max_concurrent=10)

tune_config = tune.TuneConfig(
        metric="loss",
        mode="min",
        search_alg=algo,
        num_samples=30,
        scheduler=scheduler)

tuner = tune.Tuner(
    trainable = trainable,
    param_space = search_space,
    tune_config = tune_config,
)
```

</details>

In [None]:
results = tuner.fit()

In [None]:
print(f"Best result : {results.get_best_result().metrics['loss']} \nBest config : {results.get_best_result().config}")

---

### Performance Comparisions

Until now we have used Ray as a simple wrapper to use Optuna without taking advantage of its experience parallelism functionality.
We will now see the differences in performance as well as the possible gains to be made with Ray Tune parallelism.

> **Task**: Run the next cells & compare performance between Optuna & Ray (mono/multi-worker).


**mono-worker (Optuna)**

In [None]:
def objective_perf(trial):
    time.sleep(4)
    return objective_with_pruning(trial)


asha_pruner = optuna.pruners.SuccessiveHalvingPruner(min_resource='auto', 
                                                     reduction_factor=4,  
                                                     min_early_stopping_rate=0, 
                                                     bootstrap_count=0)

asha_study = optuna.create_study(direction='minimize', 
                                 #pruner=asha_pruner,
                                )

start_asha = time.time()
asha_study.optimize(objective_perf, n_trials=30,show_progress_bar=True)
end_asha = time.time()

In [None]:
print(f"ASHA,\n Best result : {asha_study.best_trial.value}\n Best config : {asha_study.best_params}\n HPO time : {end_asha-start_asha}s")

**mono-worker (Tune)**

In [None]:
scheduler = AsyncHyperBandScheduler()
algo = OptunaSearch(sampler= optuna.samplers.TPESampler())

algo = ConcurrencyLimiter(algo, max_concurrent=1)
tune_config = tune.TuneConfig(
        metric="loss",
        mode="min",
        search_alg=algo,
        #scheduler=scheduler,
        num_samples=30,
        max_concurrent_trials=1
)

tuner = tune.Tuner(
    trainable = trainable_with_pruning,
    param_space = search_space,
    tune_config = tune_config,
)

start_mono = time.time()
results_mono = tuner.fit()
end_mono = time.time()

In [None]:
print(f"Ray mono-worker,\n Best result : {results_mono.get_best_result().metrics['loss']} \n Best config : {results_mono.get_best_result().config} \n HPO time : {end_mono-start_mono}s")

**multi-worker (Tune)**

In [None]:
scheduler = AsyncHyperBandScheduler()
algo = OptunaSearch(sampler= optuna.samplers.TPESampler())

trainable_with_rsc = tune.with_resources(trainable_with_pruning, {"cpu": 1})
algo = ConcurrencyLimiter(algo, max_concurrent=2)
tune_config = tune.TuneConfig(
        metric="loss",
        mode="min",
        num_samples=30,
        search_alg=algo,
        #scheduler=scheduler,
        #max_concurrent_trials=4
)

tuner = tune.Tuner(
    trainable = trainable_with_rsc,
    param_space = search_space,
    tune_config = tune_config,
)

start_multi = time.time()
results_multi = tuner.fit()
end_multi = time.time()

In [None]:
print(f"Ray multi-worker,\n Best result : {results_multi.get_best_result().metrics['loss']} \n Best config : {results_multi.get_best_result().config} \n HPO time : {end_multi-start_multi}s")

---
If you have time left:
> **Task**: Run the preceding cells by modifying certain values having an impact on the HPO temporal performance (as for example *the num_sample*, *the max_concurrent* or *the time.sleep() in trainable_with_pruning*)

> **Task**: Same, by modifying sampler & pruner used


---
---
---

Optuna distributed throught Dask (Bonus)
---

In [None]:
import random
import time

import optuna
import optuna_distributed
from dask.distributed import LocalCluster, Client


def objective(trial):
    x = trial.suggest_float("x", -100, 100)
    y = trial.suggest_categorical("y", [-1, 0, 1])
    # Some expensive model fit happens here...
    time.sleep(1)
    return x**2 + y


if __name__ == "__main__":
    cluster = LocalCluster(n_workers=1, threads_per_worker=1)
    client = Client(cluster)
    
    
    sampler = optuna.samplers.TPESampler()

    study = optuna_distributed.from_study(optuna.create_study(), client=client)
    
    start = time.time()
    study.optimize(objective, n_trials=160)
    print(f"done in {time.time()-start}")
    print(study.best_value)

Optuna distributed throught database (Bonus)
---

We create a new database with **SQLite**, easy to use, possible to use but bad performances

In [None]:
!sqlite3 optuna_dist.db ".databases"

Now, we have to create a new study to initialize our trials database. Here we are using cmdline, we can do it directly with python API

In [None]:
!optuna create-study --study-name "distributed-example" --storage "sqlite:///optuna_dist.db"

For this demonstration, we use a toy (and useless XD) HPO

In [None]:
%%writefile optuna_worker.py
import optuna
import time
import random
#optuna.logging.set_verbosity(optuna.logging.WARNING)

def objective(trial):
    x = trial.suggest_float("x", -10, 10)
    time.sleep(random.uniform(1, 3))
    return (x - 2) ** 2

if __name__ == "__main__":
    study = optuna.load_study(
        study_name="distributed-example", storage="sqlite:///optuna_dist.db"
    )
    start = time.time()
    study.optimize(objective, n_trials=60)
    print(f"process run for {time.time()-start}")

We have everythings to run multiple process for our HPO. This line generate 4 processes which run the same HPO code. Each output color line mean a different process.

In [None]:
!seq 4 | xargs -I{} -P 4 sh -c 'python optuna_worker.py 2>&1 | while read -r line; do if [[ "$line" == *[* ]]; then echo -e "\033[3{}m${line}\033[0m"; else echo "$line"; fi; done'

Optuna Visualisation time ! (doesn't work right now on jupyterhub)
---

In [None]:
!~/.local/bin/optuna-dashboard --port="8080"  --server='wsgiref' sqlite:///optuna_dist.db

In [None]:
import os
#f'jupyterhub.idris.fr{os.environ["JUPYTERHUB_SERVICE_PREFIX"]}proxy/8080'
import optuna_dashboard
optuna_dashboard.run_server(storage="sqlite:///optuna_dist.db", host=f'jupyterhub.idris.fr{os.environ["JUPYTERHUB_SERVICE_PREFIX"]}proxy/8080')

usage with InMemoryStorage

In [None]:
import optuna
from optuna_dashboard import run_server

def objective(trial):
    x = trial.suggest_float("x", -100, 100)
    y = trial.suggest_categorical("y", [-1, 0, 1])
    return x**2 + y

storage = optuna.storages.InMemoryStorage()
study = optuna.create_study(storage=storage)
study.optimize(objective, n_trials=100)

run_server(storage)