Optuna Integration

I took a look at what it would take to integrate Dask with Optuna, a hyper-parameter optimization library from the good folks at Preferred Networks (the same people that make CuPy).  

Here is a tiny example that works, but slowly.  It's informative.

## Example

### First, create a study

```
optuna create-study --study-name "distributed-example" --storage "sqlite:///example.db"
```

### Python code

```python
import optuna, time

def objective(trial):
    x = trial.suggest_uniform('x', -10, 10)
    time.sleep(0.050)
    return (x - 2) ** 2

from dask.distributed import Client, wait
client = Client()

def f():
    study = optuna.load_study(study_name='distributed-example', storage='sqlite:///example.db')
    study.optimize(objective, n_trials=100)

futures = [client.submit(f, pure=False) for _ in range(100)]
wait(futures)

# look at dashboard at client.dashboard_link
```

### Problems

- I had to load the study each time (they don't seem to be easy to serialize)
- I had to create the study with the command line (but probably there was a way to handle this in Python that I didn't find)
- The SQLite engine is slow (more below) taking up pretty much all of the time
- I had to manually create some futures, set `pure=False` and so on, which might not be immediately obvious to new users

## Storage

One thing we could do here is create our own storage backend for Optuna.  This would either place the information in a Worker (probably with an Actor) or on the scheduler (similar to how we handle Lock/Variable/Queue/...).  The Storage API isn't trivial, but is intended to be subclassed.  My preference is to put this on the scheduler.  Code here: https://github.com/optuna/optuna/tree/master/optuna/storages

This isn't quite as good as a long-term database (presumably it's nice to look at old runs) but it would be fast and easy for users.  

## Ideal user experience

How would we replace the futures stuff?  Maybe something like this might be a target API?

```python
# import optuna
import dask_optuna

study = dask_optuna.load_study(study_name='distributed-example')
study.optimize(objective, n_trials=10000)
```

With greater integration with Optuna we could imagine something else, perhaps like the following:

```python
import optuna

study = optuna.load_study(study_name='distributed-example', use_dask=True)
study.optimize(objective, n_trials=10000)
```

This is what we see in projects like TPot.  This helps to make the feature a bit more discoverable, but requires integration with an upstream library, which may not be ideal as a first step.

## Who cares?

I came to this mostly from a technology perspective.  These two libraries seem to both have some traction, and complement each other nicely.  However, I don't know this space well enough to know if this is valuable, or if there are users who would find it interesting.  I'd love to learn more about this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optuna Integration #6571

Example

First, create a study

Python code

Problems

Storage

Ideal user experience

Who cares?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Optuna Integration #6571

Description

Example

First, create a study

Python code

Problems

Storage

Ideal user experience

Who cares?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions