New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getting final result from pSMAC on a distributed compute cluster #446
Comments
As the first lines of your script, you should add:
Afterwards you should see lines such as: |
Yes, the parallel SMAC runs do not coordinate which configuration is considered as the incumbent (i.e., the currently best known configuration) and thus, each run will return its own incumbent. |
Hi, Thanks. I managed to see the output. It looks like it's working correctly - how do I return the final incumbent values after a distributed run? I can see the values in the log... Thanks again, |
You can use |
Hi, Thanks! Awesome. In the distributed mode, I get multiple smac objects - since there was one created on each worker. So I am not sure how to implement your suggestion. Cheers, |
I also don't know since I don't know your DASK cluster. Either your workers or your master should have access at some point to the SMAC object, right? |
They all have access to their own SMAC object -that's my point. Each one writes to a run_<run_id> directory which is also accessed through input_psmac_dirs variable. |
Sorry, I don't understand your problem. |
Okay, I understand now. I have to figure out how to get the intermediate result (the smac object) from the computational graph. The issue is optimizing the distributed computation (the final result is a list of incumbents not the smac objects). Of course, it's trivial to get the smac objects but non-trivial (for me anyway) to optimize the computation then. I'll post the solution here when I have it so it helps other people wanting to run SMAC in a distributed computation setting. Thanks. |
@Mestalbet You can also rebuild the entire runhistory afterward and use that to predict the cost of every final incumbent of the parallel pools. I'll explain below with commented code. import os
import glob
import logging
from smac.optimizer import pSMAC
from smac.scenario.scenario import Scenario
from smac.runhistory.runhistory import RunHistory
from smac.utils.io.traj_logging import TrajLogger
# get a list of all the directories with smac logs
pool_dirs = glob.glob(<smac_dir>/run_*)
# initiate runhistory object
runhistory = RunHistory(aggregate_func=None)
# get configuration space via scenario.
# You need just 1 scenario file for the config_space as it does not differ per pool.
# I just select the one in the first folder
scen_file = os.path.join(pool_dirs[0], 'scenario.txt')
config_space = Scenario(scen_file).cs
# construct history using parallel smac logs
# Here you load the entire run history into the object 'runhistory' that is included
pSMAC.read(runhistory, pool_dirs, config_space, logging.getLogger())
for pool_dir in pool_dirs:
# get the file name of the trajectory jason file in this pool
traj_path = os.path.join(pool_dir, 'traj_aclib2.json')
# read the trajectory
trajectory = TrajLogger.read_traj_aclib_format(fn=traj_path, cs=config_space)
# get the last incumbent of the trajectory in this pool
last_incumbent = trajectory[-1]['incumbent']
# get the estimated cost of the incumbent using the runhistory of ALL pools
est_cost = runhistory.get_cost(last_incumbent)
# you can now use 'est_cost' and 'last_incumbent' to your liking.
# however, 'last_incumbent' is a ConfigSpace object. To convert to dictionary, see below
last_incumbent_dict = last_incumbent.get_dictionary() @mlindauer |
SMAC (potentially) evaluates a configuration only on a subset of instances. The set of instances is increasing over time. Let's say SMAC decides that configuration X performs better than configuration Y on instance set S; so, the performance estimate of X will be smaller than Y given S. But in the next iteration, SMAC will evaluate X on an additional instance i such that our subset of instances is now larger. (Please note that since Y was worse than X, SMAC won't evaluate Y on i.) If X performs worse on i compared to the average performance on S, the performance estimate will increase. Long story short: |
I do indeed use instances for SMAC, so I have probably been doing it wrong then. Can I then conclude to following?: |
In general, you should only compare the final incumbents against each other. You can do it based on the performance estimate of the runhistory if each worker looked at a representative set of instances for the final incumbent (i.e., the instance subset is not too small). |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi,
I have managed to run pSMAC on a distributed DASK cluster. I run it as follows:
where i in this case is the worker number.
This creates a directory structure in the shared space like the following:
where N is the number of workers in the cluster.
However, if you look at the traj_aclib2.json in each run_* subdirectory, they are identical. I tried adding
verbose_level = "DEBUG"
toerrortomimize
but I don't get any output from this.Any idea what the problem is? How can I tell whether pSMAC is actually sharing the model (as opposed to just running the same thing N times)?
Cheers,
Noah
The text was updated successfully, but these errors were encountered: