getting final result from pSMAC on a distributed compute cluster #446

Mestalbet · 2018-07-04T10:56:40Z

Hi,

I have managed to run pSMAC on a distributed DASK cluster. I run it as follows:

cs = ConfigurationSpace()
scenario = Scenario({"run_obj": "quality",  
                         "runcount-limit": 50,  
                         "cs": cs,              
                         "deterministic": True,
                         "shared_model": True, "initial_incumbent": 'RANDOM',
                         "output_dir": "/shared_data/smac3-output/runs/",
                         "input_psmac_dirs": "/shared_data/smac3-output/runs/run_*"})

def errortomimize:
  return( trainmodel( **cfg), use_pynisher = False))
tae = ExecuteTAFuncDict(errortomimize,use_pynisher=False)
smac = SMAC(scenario=scenario, rng=np.random.RandomState(i), tae_runner=tae, run_id = i)

where i in this case is the worker number.

This creates a directory structure in the shared space like the following:

----smac3-output/runs
                                |
                                |_______run_1
                                |_______run_2
                                             . . .
                                |_______run_N

where N is the number of workers in the cluster.

However, if you look at the traj_aclib2.json in each run_* subdirectory, they are identical. I tried adding verbose_level = "DEBUG" to errortomimize but I don't get any output from this.

Any idea what the problem is? How can I tell whether pSMAC is actually sharing the model (as opposed to just running the same thing N times)?

Cheers,
Noah

The text was updated successfully, but these errors were encountered:

mlindauer · 2018-07-04T11:11:15Z

As the first lines of your script, you should add:

logging.basicConfig(level=logging.DEBUG)

Afterwards you should see lines such as:
DEBUG:smac.optimizer.smbo.SMBO:Shared model mode: Loaded 5 new runs from ...

mlindauer · 2018-07-05T07:18:18Z

Yes, the parallel SMAC runs do not coordinate which configuration is considered as the incumbent (i.e., the currently best known configuration) and thus, each run will return its own incumbent.

Mestalbet · 2018-07-05T07:34:31Z

Hi,

Thanks. I managed to see the output. It looks like it's working correctly - how do I return the final incumbent values after a distributed run? I can see the values in the log...

Thanks again,
Noah

mlindauer · 2018-07-05T07:43:57Z

You can use smac.get_runhistory() to get a handle on the runhistory (i.e., all evaluated target algorithm runs).
runhistory.get_cost(incumbent) returns the estimated empirical cost.

Mestalbet · 2018-07-05T10:55:59Z

Hi,

Thanks! Awesome. In the distributed mode, I get multiple smac objects - since there was one created on each worker. So I am not sure how to implement your suggestion.

Cheers,
Noah

mlindauer · 2018-07-05T11:26:09Z

I also don't know since I don't know your DASK cluster. Either your workers or your master should have access at some point to the SMAC object, right?

Mestalbet · 2018-07-05T13:22:35Z

They all have access to their own SMAC object -that's my point. Each one writes to a run_<run_id> directory which is also accessed through input_psmac_dirs variable.

mlindauer · 2018-07-05T13:42:52Z

Sorry, I don't understand your problem.
Each SMAC object returns its own incumbent configuration.
You only have to decide which of these to use in the end.
One simple way is to ask each SMAC object how it estimates the empirical cost of its incumbent configuration and you simply choose the best one.

Mestalbet · 2018-07-08T05:34:47Z

Okay, I understand now. I have to figure out how to get the intermediate result (the smac object) from the computational graph. The issue is optimizing the distributed computation (the final result is a list of incumbents not the smac objects). Of course, it's trivial to get the smac objects but non-trivial (for me anyway) to optimize the computation then.

I'll post the solution here when I have it so it helps other people wanting to run SMAC in a distributed computation setting.

Thanks.

brenting · 2018-10-24T21:20:34Z

@Mestalbet
I was having the same problem. If you run pSMAC on a cluster using the command line (which I assume you did), you are using the SMACCLI object. This object creates the SMAC object within the smac package, which is only accessible when you modify the package.
EDIT: I just noticed that you did not use the SMACCLI, so it is indeed possible to extract the run history directly. However, only the full run history will be available after the last worker is finished.

You can also rebuild the entire runhistory afterward and use that to predict the cost of every final incumbent of the parallel pools. I'll explain below with commented code.

import os
import glob
import logging

from smac.optimizer import pSMAC
from smac.scenario.scenario import Scenario
from smac.runhistory.runhistory import RunHistory
from smac.utils.io.traj_logging import TrajLogger

# get a list of all the directories with smac logs
pool_dirs = glob.glob(<smac_dir>/run_*)

# initiate runhistory object
runhistory = RunHistory(aggregate_func=None)

# get configuration space via scenario. 
# You need just 1 scenario file for the config_space as it does not differ per pool.
# I just select the one in the first folder
scen_file = os.path.join(pool_dirs[0], 'scenario.txt')
config_space = Scenario(scen_file).cs

# construct history using parallel smac logs
# Here you load the entire run history into the object 'runhistory' that is included
pSMAC.read(runhistory, pool_dirs, config_space, logging.getLogger())

for pool_dir in pool_dirs:
    # get the file name of the trajectory jason file in this pool
    traj_path = os.path.join(pool_dir, 'traj_aclib2.json')

    # read the trajectory
    trajectory = TrajLogger.read_traj_aclib_format(fn=traj_path, cs=config_space)

    # get the last incumbent of the trajectory in this pool
    last_incumbent = trajectory[-1]['incumbent']

    # get the estimated cost of the incumbent using the runhistory of ALL pools
    est_cost = runhistory.get_cost(last_incumbent)

    # you can now use 'est_cost' and 'last_incumbent' to your liking. 
    # however, 'last_incumbent' is a ConfigSpace object. To convert to dictionary, see below
    last_incumbent_dict = last_incumbent.get_dictionary()

@mlindauer
What I do notice is that earlier incumbents sometimes have a lower estimated cost. Should I check all incumbents that are logged instead of only the last one, or is there an explanation for that?

mlindauer · 2018-10-25T14:25:00Z

SMAC (potentially) evaluates a configuration only on a subset of instances. The set of instances is increasing over time. Let's say SMAC decides that configuration X performs better than configuration Y on instance set S; so, the performance estimate of X will be smaller than Y given S. But in the next iteration, SMAC will evaluate X on an additional instance i such that our subset of instances is now larger. (Please note that since Y was worse than X, SMAC won't evaluate Y on i.) If X performs worse on i compared to the average performance on S, the performance estimate will increase.

Long story short:
You cannot easily use the performance estimates of the runhistory to compare configurations (if you scenario includes instances)

brenting · 2018-10-25T14:47:50Z

I do indeed use instances for SMAC, so I have probably been doing it wrong then.

Can I then conclude to following?:
Within a single worker, the cost of the SMAC incumbent can be estimated by using its own run history only. This estimated cost can then be used to compare it against incumbents of other workers.

mlindauer · 2018-10-26T07:18:15Z

In general, you should only compare the final incumbents against each other. You can do it based on the performance estimate of the runhistory if each worker looked at a representative set of instances for the final incumbent (i.e., the instance subset is not too small).
You could also try to fit an EPM model on the runhistories from all workers and use the predictions from the EPM to compare incumbents. However, that approach requires that you have good instance features.

stale · 2022-06-18T01:39:28Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Mestalbet changed the title ~~pSMAC not sharing model~~ getting final result from pSMAC on a distributed compute cluster Jul 5, 2018

stale bot added the stale label Jun 18, 2022

stale bot closed this as completed Jun 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getting final result from pSMAC on a distributed compute cluster #446

getting final result from pSMAC on a distributed compute cluster #446

Mestalbet commented Jul 4, 2018 •

edited

mlindauer commented Jul 4, 2018

mlindauer commented Jul 5, 2018

Mestalbet commented Jul 5, 2018

mlindauer commented Jul 5, 2018

Mestalbet commented Jul 5, 2018

mlindauer commented Jul 5, 2018

Mestalbet commented Jul 5, 2018

mlindauer commented Jul 5, 2018

Mestalbet commented Jul 8, 2018

brenting commented Oct 24, 2018 •

edited

mlindauer commented Oct 25, 2018

brenting commented Oct 25, 2018

mlindauer commented Oct 26, 2018

stale bot commented Jun 18, 2022

getting final result from pSMAC on a distributed compute cluster #446

getting final result from pSMAC on a distributed compute cluster #446

Comments

Mestalbet commented Jul 4, 2018 • edited

mlindauer commented Jul 4, 2018

mlindauer commented Jul 5, 2018

Mestalbet commented Jul 5, 2018

mlindauer commented Jul 5, 2018

Mestalbet commented Jul 5, 2018

mlindauer commented Jul 5, 2018

Mestalbet commented Jul 5, 2018

mlindauer commented Jul 5, 2018

Mestalbet commented Jul 8, 2018

brenting commented Oct 24, 2018 • edited

mlindauer commented Oct 25, 2018

brenting commented Oct 25, 2018

mlindauer commented Oct 26, 2018

stale bot commented Jun 18, 2022

Mestalbet commented Jul 4, 2018 •

edited

brenting commented Oct 24, 2018 •

edited