Saving checkpoints #12

DavidFricker · 2021-02-16T08:12:37Z

Hi Simon,

Please could you let me know if there is a way to checkpoint the search space and persist it to disk after each iteration. I imagine there is a way since there is a memory_warm_start parameter available that could be used to resume from a saved checkpoint once loaded into a pandas dataframe?
I can see a call to .results() will return the dataframe in question but that seems to only be possible at the completion of all n_iter?

Thanks!

The text was updated successfully, but these errors were encountered:

SimonBlanke · 2021-02-16T09:23:40Z

Hello David,

if I understand your question correctly you want to save the selected parameters from the search space after each iteration and "append" them into a pandas dataframe.

This would look like this:

from hyperactive import Hyperactive


# init empty pandas dataframe


def objective_function(para):
    # append parameter dictionary to pandas dataframe

    return 1


search_space = {
    "x": list(range(0, 10)),
}


hyper = Hyperactive()
hyper.add_search(objective_function, search_space, n_iter=20)
hyper.run()

Does this describe your question adequately?

DavidFricker · 2021-02-16T10:39:31Z

Thanks for the quick response again!

In the example you provide, how would we get the "optimizer state" from the para variable to then save it to the pandas dataframe and later pass to the memory_warm_start parameter of add_search()?

We are looking to save the progress of the search after each iteration to be able to recover the search following a crash or similar un-planned exit (similar to this functionality in tensorflow).

The following is how far I got trying to achive the above goal with the sample your provided

from hyperactive import Hyperactive
import pandas as pd

# init empty pandas dataframe (used for warm starting via `memory_warm_start`)
optimisation_checkpoint = pd.DataFrame()

def objective_function(para):    
    # append parameter dictionary to pandas dataframe
    mem = hyper.results(objective_function)

    # append `mem` to `optimisation_checkpoint`, but we crash before getting here

    return 1


search_space = {
    "x": list(range(0, 10)),
}

# run search, checkpointing the search to disk in the event of a crash
hyper = Hyperactive()
hyper.add_search(objective_function, search_space, optimizer=TreeStructuredParzenEstimators(), n_iter=20)
hyper.run()

# start from "checkpoint" after a crash or other unexpected exit
hyper = Hyperactive()
hyper.add_search(objective_function, search_space, optimizer=TreeStructuredParzenEstimators(), memory_warm_start=mem, n_iter=20)
hyper.run()

SimonBlanke · 2021-02-16T12:06:31Z

I see what you want to do now.

I wrote another code snipped that should help you. This code is a complete example with a simple machine learning model.

import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsRegressor

from hyperactive import Hyperactive


data = load_boston()
X, y = data.data, data.target

# set a path to save the dataframe
path = "./search_data.csv"
search_space = {
    "n_neighbors": list(range(1, 50)),
}

# get para names from search space + the score
para_names = list(search_space.keys()) + ["score"]

# init empty pandas dataframe 
search_data = pd.DataFrame(columns=para_names)
search_data.to_csv(path, index=False)


def objective_function(para):
    # do your model training with the parameter set like usual
    knr = KNeighborsRegressor(n_neighbors=para["n_neighbors"])
    scores = cross_val_score(knr, X, y, cv=10)

    # you can access the entire dictionary from "para"
    parameter_dict = para.para_dict
    score = scores.mean()

    # save the score in the copy of the dictionary
    parameter_dict["score"] = score

    # append parameter dictionary to pandas dataframe
    search_data = pd.read_csv(path)
    search_data_new = pd.DataFrame(parameter_dict, columns=para_names, index=[0])
    search_data = search_data.append(search_data_new)
    search_data.to_csv(path, index=False)

    return score


hyper0 = Hyperactive()
hyper0.add_search(objective_function, search_space, n_iter=50)
hyper0.run()


search_data_0 = pd.read_csv(path)
"""
the second run should be much faster than before, 
because Hyperactive already knows most parameters/scores
"""
hyper1 = Hyperactive()
hyper1.add_search(
    objective_function, search_space, n_iter=50, memory_warm_start=search_data_0
)
hyper1.run()

But be aware that the file "search_data.csv" gets overwritten each time you run this script. To avoid this you can just delete the lines after the first time you ran the script:

search_data = pd.DataFrame(columns=para_names)
search_data.to_csv(path, index=False)

The functionality to save search-data during the run seems useful. I will open an issue for this new feature.

I hope I was able to help you. Let me know if the script works for your use case.

DavidFricker · 2021-02-16T14:46:22Z

Your example works perfectly and exactly addresses what we needed, thank you.

Thanks again for all your help!

SimonBlanke added the enhancement New feature or request label Feb 16, 2021

SimonBlanke added the question Further information is requested label Feb 16, 2021

DavidFricker changed the title ~~[Question] Saving checkpoints~~ Saving checkpoints Feb 16, 2021

SimonBlanke mentioned this issue Feb 16, 2021

New feature: Save search-data during optimization run #13

Closed

DavidFricker closed this as completed Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving checkpoints #12

Saving checkpoints #12

DavidFricker commented Feb 16, 2021

SimonBlanke commented Feb 16, 2021

DavidFricker commented Feb 16, 2021

SimonBlanke commented Feb 16, 2021

DavidFricker commented Feb 16, 2021

Saving checkpoints #12

Saving checkpoints #12

Comments

DavidFricker commented Feb 16, 2021

SimonBlanke commented Feb 16, 2021

DavidFricker commented Feb 16, 2021

SimonBlanke commented Feb 16, 2021

DavidFricker commented Feb 16, 2021