Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving checkpoints #12

Closed
DavidFricker opened this issue Feb 16, 2021 · 4 comments
Closed

Saving checkpoints #12

DavidFricker opened this issue Feb 16, 2021 · 4 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@DavidFricker
Copy link

Hi Simon,

Please could you let me know if there is a way to checkpoint the search space and persist it to disk after each iteration. I imagine there is a way since there is a memory_warm_start parameter available that could be used to resume from a saved checkpoint once loaded into a pandas dataframe?
I can see a call to .results() will return the dataframe in question but that seems to only be possible at the completion of all n_iter?

Thanks!

@SimonBlanke SimonBlanke added the enhancement New feature or request label Feb 16, 2021
@SimonBlanke
Copy link
Owner

Hello David,

if I understand your question correctly you want to save the selected parameters from the search space after each iteration and "append" them into a pandas dataframe.

This would look like this:

from hyperactive import Hyperactive


# init empty pandas dataframe


def objective_function(para):
    # append parameter dictionary to pandas dataframe

    return 1


search_space = {
    "x": list(range(0, 10)),
}


hyper = Hyperactive()
hyper.add_search(objective_function, search_space, n_iter=20)
hyper.run()

Does this describe your question adequately?

@SimonBlanke SimonBlanke added the question Further information is requested label Feb 16, 2021
@DavidFricker
Copy link
Author

Thanks for the quick response again!

In the example you provide, how would we get the "optimizer state" from the para variable to then save it to the pandas dataframe and later pass to the memory_warm_start parameter of add_search()?

We are looking to save the progress of the search after each iteration to be able to recover the search following a crash or similar un-planned exit (similar to this functionality in tensorflow).

The following is how far I got trying to achive the above goal with the sample your provided

from hyperactive import Hyperactive
import pandas as pd

# init empty pandas dataframe (used for warm starting via `memory_warm_start`)
optimisation_checkpoint = pd.DataFrame()

def objective_function(para):    
    # append parameter dictionary to pandas dataframe
    mem = hyper.results(objective_function)

    # append `mem` to `optimisation_checkpoint`, but we crash before getting here

    return 1


search_space = {
    "x": list(range(0, 10)),
}

# run search, checkpointing the search to disk in the event of a crash
hyper = Hyperactive()
hyper.add_search(objective_function, search_space, optimizer=TreeStructuredParzenEstimators(), n_iter=20)
hyper.run()

# start from "checkpoint" after a crash or other unexpected exit
hyper = Hyperactive()
hyper.add_search(objective_function, search_space, optimizer=TreeStructuredParzenEstimators(), memory_warm_start=mem, n_iter=20)
hyper.run()

@DavidFricker DavidFricker changed the title [Question] Saving checkpoints Saving checkpoints Feb 16, 2021
@SimonBlanke
Copy link
Owner

I see what you want to do now.

I wrote another code snipped that should help you. This code is a complete example with a simple machine learning model.

import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsRegressor

from hyperactive import Hyperactive


data = load_boston()
X, y = data.data, data.target

# set a path to save the dataframe
path = "./search_data.csv"
search_space = {
    "n_neighbors": list(range(1, 50)),
}

# get para names from search space + the score
para_names = list(search_space.keys()) + ["score"]

# init empty pandas dataframe 
search_data = pd.DataFrame(columns=para_names)
search_data.to_csv(path, index=False)


def objective_function(para):
    # do your model training with the parameter set like usual
    knr = KNeighborsRegressor(n_neighbors=para["n_neighbors"])
    scores = cross_val_score(knr, X, y, cv=10)

    # you can access the entire dictionary from "para"
    parameter_dict = para.para_dict
    score = scores.mean()

    # save the score in the copy of the dictionary
    parameter_dict["score"] = score

    # append parameter dictionary to pandas dataframe
    search_data = pd.read_csv(path)
    search_data_new = pd.DataFrame(parameter_dict, columns=para_names, index=[0])
    search_data = search_data.append(search_data_new)
    search_data.to_csv(path, index=False)

    return score


hyper0 = Hyperactive()
hyper0.add_search(objective_function, search_space, n_iter=50)
hyper0.run()


search_data_0 = pd.read_csv(path)
"""
the second run should be much faster than before, 
because Hyperactive already knows most parameters/scores
"""
hyper1 = Hyperactive()
hyper1.add_search(
    objective_function, search_space, n_iter=50, memory_warm_start=search_data_0
)
hyper1.run()

But be aware that the file "search_data.csv" gets overwritten each time you run this script. To avoid this you can just delete the lines after the first time you ran the script:

search_data = pd.DataFrame(columns=para_names)
search_data.to_csv(path, index=False)

The functionality to save search-data during the run seems useful. I will open an issue for this new feature.

I hope I was able to help you. Let me know if the script works for your use case.

@DavidFricker
Copy link
Author

Your example works perfectly and exactly addresses what we needed, thank you.

Thanks again for all your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants