-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving checkpoints #12
Comments
Hello David, if I understand your question correctly you want to save the selected parameters from the search space after each iteration and "append" them into a pandas dataframe. This would look like this: from hyperactive import Hyperactive
# init empty pandas dataframe
def objective_function(para):
# append parameter dictionary to pandas dataframe
return 1
search_space = {
"x": list(range(0, 10)),
}
hyper = Hyperactive()
hyper.add_search(objective_function, search_space, n_iter=20)
hyper.run() Does this describe your question adequately? |
Thanks for the quick response again! In the example you provide, how would we get the "optimizer state" from the We are looking to save the progress of the search after each iteration to be able to recover the search following a crash or similar un-planned exit (similar to this functionality in tensorflow). The following is how far I got trying to achive the above goal with the sample your provided from hyperactive import Hyperactive
import pandas as pd
# init empty pandas dataframe (used for warm starting via `memory_warm_start`)
optimisation_checkpoint = pd.DataFrame()
def objective_function(para):
# append parameter dictionary to pandas dataframe
mem = hyper.results(objective_function)
# append `mem` to `optimisation_checkpoint`, but we crash before getting here
return 1
search_space = {
"x": list(range(0, 10)),
}
# run search, checkpointing the search to disk in the event of a crash
hyper = Hyperactive()
hyper.add_search(objective_function, search_space, optimizer=TreeStructuredParzenEstimators(), n_iter=20)
hyper.run()
# start from "checkpoint" after a crash or other unexpected exit
hyper = Hyperactive()
hyper.add_search(objective_function, search_space, optimizer=TreeStructuredParzenEstimators(), memory_warm_start=mem, n_iter=20)
hyper.run() |
I see what you want to do now. I wrote another code snipped that should help you. This code is a complete example with a simple machine learning model. import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsRegressor
from hyperactive import Hyperactive
data = load_boston()
X, y = data.data, data.target
# set a path to save the dataframe
path = "./search_data.csv"
search_space = {
"n_neighbors": list(range(1, 50)),
}
# get para names from search space + the score
para_names = list(search_space.keys()) + ["score"]
# init empty pandas dataframe
search_data = pd.DataFrame(columns=para_names)
search_data.to_csv(path, index=False)
def objective_function(para):
# do your model training with the parameter set like usual
knr = KNeighborsRegressor(n_neighbors=para["n_neighbors"])
scores = cross_val_score(knr, X, y, cv=10)
# you can access the entire dictionary from "para"
parameter_dict = para.para_dict
score = scores.mean()
# save the score in the copy of the dictionary
parameter_dict["score"] = score
# append parameter dictionary to pandas dataframe
search_data = pd.read_csv(path)
search_data_new = pd.DataFrame(parameter_dict, columns=para_names, index=[0])
search_data = search_data.append(search_data_new)
search_data.to_csv(path, index=False)
return score
hyper0 = Hyperactive()
hyper0.add_search(objective_function, search_space, n_iter=50)
hyper0.run()
search_data_0 = pd.read_csv(path)
"""
the second run should be much faster than before,
because Hyperactive already knows most parameters/scores
"""
hyper1 = Hyperactive()
hyper1.add_search(
objective_function, search_space, n_iter=50, memory_warm_start=search_data_0
)
hyper1.run() But be aware that the file "search_data.csv" gets overwritten each time you run this script. To avoid this you can just delete the lines after the first time you ran the script: search_data = pd.DataFrame(columns=para_names)
search_data.to_csv(path, index=False) The functionality to save search-data during the run seems useful. I will open an issue for this new feature. I hope I was able to help you. Let me know if the script works for your use case. |
Your example works perfectly and exactly addresses what we needed, thank you. Thanks again for all your help! |
Hi Simon,
Please could you let me know if there is a way to checkpoint the search space and persist it to disk after each iteration. I imagine there is a way since there is a
memory_warm_start
parameter available that could be used to resume from a saved checkpoint once loaded into a pandas dataframe?I can see a call to
.results()
will return the dataframe in question but that seems to only be possible at the completion of alln_iter
?Thanks!
The text was updated successfully, but these errors were encountered: