Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue/#193 restore scenario #263

Merged
merged 21 commits into from Oct 4, 2017
Merged

Conversation

shukon
Copy link
Collaborator

@shukon shukon commented Jun 4, 2017

Implementing state-restoration for smac-cmdline as mentioned in issue #193.
simply use the --restore_state <FOLDER>-option. Currently simply assuming for scenario to be the same, differing options can lead to unexpected behavior (except for the limits, e.g. runcount_limit, wallclock_limit, tuner-timeout).

@codecov-io
Copy link

codecov-io commented Jun 4, 2017

Codecov Report

Merging #263 into development will increase coverage by 2.59%.
The diff coverage is 97.33%.

Impacted file tree graph

@@               Coverage Diff               @@
##           development     #263      +/-   ##
===============================================
+ Coverage        86.52%   89.11%   +2.59%     
===============================================
  Files               45       45              
  Lines             2745     2812      +67     
===============================================
+ Hits              2375     2506     +131     
+ Misses             370      306      -64
Impacted Files Coverage Δ
smac/optimizer/smbo.py 95.23% <100%> (+0.43%) ⬆️
smac/utils/io/cmd_reader.py 100% <100%> (+45.16%) ⬆️
smac/utils/io/output_writer.py 96.82% <100%> (ø) ⬆️
smac/smac_cli.py 86.2% <100%> (+86.2%) ⬆️
smac/facade/smac_facade.py 93.83% <100%> (+0.73%) ⬆️
smac/stats/stats.py 93.97% <90.9%> (-1.19%) ⬇️
smac/intensification/intensification.py 94.91% <0%> (+1.12%) ⬆️
smac/tae/execute_ta_run_old.py 95.45% <0%> (+1.51%) ⬆️
...mac/initial_design/single_config_initial_design.py 81.08% <0%> (+8.1%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba6dcce...c7b3022. Read the comment docs.

@mfeurer
Copy link
Contributor

mfeurer commented Jun 6, 2017

Is this PR ready to be reviewed by anyone?

@shukon
Copy link
Collaborator Author

shukon commented Jun 6, 2017

Almost. I just now discovered compatibility-issues with #264, will fix that and assign people then.

@shukon
Copy link
Collaborator Author

shukon commented Jun 6, 2017

This is ready for review, although there is one line in the unit-test I will change so that tests will pass after merging #264 .

smac/smac_cli.py Outdated
stats = None
incumbent = None
if args_.restore_state:
# Check for folder and files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some output to debug if restoring fails, e.g. "Trying to restore state from

", "Successfully restored %d runs", "Successfully restored incumbent with value %f"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mfeurer
Copy link
Contributor

mfeurer commented Jun 9, 2017

Can we please merge the documentation before merging any other pull request? The documentation PR is already quite big and would have to be updated accordingly if we merge new features/change output files etc.

@shukon shukon requested a review from mlindauer June 29, 2017 12:39
@shukon
Copy link
Collaborator Author

shukon commented Jun 30, 2017

Not sure whats up with the travis-ci build, when I follow the Details it tells me all has passed...

Copy link
Contributor

@mlindauer mlindauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late review. I hope most of the requested changes should be quite easy to implement.

except FirstRunCrashedException as err:
if self.scenario.abort_on_first_run_crash:
raise
# Initialization, depends on input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please move Lines 104-122 in function?
(otherwise we would need to copy the code for new optimizers besides SMBO.

@@ -83,12 +85,43 @@ def main_cli(self):
fn=traj_fn, cs=scen.cs)
initial_configs.append(trajectory[-1]["incumbent"])

# Restore state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also move this code to its own function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be also easier to write a unit test if we move this code to its own function

smac/smac_cli.py Outdated
# Copy traj if output_dir is different
if scen.output_dir != Scenario(scen_path).output_dir:
new_traj_path = os.path.join(scen.output_dir, "traj_aclib2.json")
shutil.copy(traj_path, new_traj_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make a copy of the old "state" to prevent overwriting it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See command above... I figured that if the user specifies a new output_dir, then the restoring gets done in another folder. That way, the old version will be completely preserved.

smac/smac_cli.py Outdated
incumbent = trajectory[-1]["incumbent"]
root_logger.debug("Restored incumbent %s from %s", incumbent, traj_path)
# Copy traj if output_dir is different
if scen.output_dir != Scenario(scen_path).output_dir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scenario(scen_path) looks weird to me.
Is args_.restore_state not already the output_dir of the previous scenario?
Furthermore, I wonder whether we have side effects by creating the Scenario object again --- overwriting of old existing files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By leaving it open to the user to specify the scenario, the user can choose to use a different scenario-file (with extended limits or another output-path) to continue a restored scenario. When continuing in a new output-folder, everything gets written anew except for the trajectory (which is written continously), so its copied. I changed the Scenario(scen_path).output_dir to InputReader().read_scenario_file(scen_path)['output_dir'] to avoid possible side-effects.
Do you think the procedure is correct in general?

"""
Save all relevant attributes to json-dictionary.
"""
data = {'ta_runs': self.ta_runs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we save the Scenario attributes in a similar generic way as in load()?
Such a hand-written dictionary is also hard to maintain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it the other way around: manually filtering everything thats NOT to be saved. Implemented now, do you think it's better?

@shukon
Copy link
Collaborator Author

shukon commented Aug 12, 2017

@mlindauer I'm not sure whats up with the build... its something to with pexpect:
ModuleNotFoundError: No module named 'pexpect'
Do you (or does anyone) know or should I investigate?

@mfeurer
Copy link
Contributor

mfeurer commented Aug 13, 2017

@shukon we're working on the 'pexpect' issue. It shouldn't be there and the build failure is not caused by your changes.

@mlindauer
Copy link
Contributor

Since my concerns were addressed, can we merge this PR?

@shukon maybe an update of the docu mentioning this new feature would be nice.

@shukon
Copy link
Collaborator Author

shukon commented Aug 21, 2017

@mlindauer Added mention to the docs. As far as I'm concerned, we can merge.

@mlindauer
Copy link
Contributor

hmmm... we have a major issue with this PR because it includes unintended commits to the development branch. Unfortunately, another student commited without permissions stuff to the development branch. This is also the reason for the weird pexect issue.

We reverted the commits in the development branch.
Can we somehow also remove the commits from "Michael Rudolph" in this PR?

@shukon
Copy link
Collaborator Author

shukon commented Aug 22, 2017

I will clean this branch up at some point today.

@shukon shukon force-pushed the issue/#193_restore_scenario branch from a6fc961 to 8846f39 Compare August 22, 2017 14:48
@shukon
Copy link
Collaborator Author

shukon commented Aug 22, 2017

@mlindauer This branch should be clean now.

@mlindauer
Copy link
Contributor

Before I merge this, I wanted to try it myself using examples/spear_qcp/
So I did the following:

$ python ../../scripts/smac --scenario scenario.txt
[...]
$ python ../../scripts/smac --scenario scenario.txt --restore_state smac3-output_2017-08-25_09\:42\:05_\(034034\)_run1/
INFO:smac.smac_cli.SMACCLI:SMAC call: ../../scripts/smac --scenario scenario.txt --restore_state smac3-output_2017-08-25_09:42:05_(034034)_run1/
INFO:	Reading scenario file: scenario.txt
INFO:	Output to smac3-output_2017-08-25_09:43:01_(331676)
Traceback (most recent call last):
  File "../../scripts/smac", line 20, in <module>
    smac.main_cli()
  File "/home/lindauer/git/SMAC3_play/smac/smac_cli.py", line 89, in main_cli
    rh, stats, incumbent = self.restore_state(args_, scen, root_logger)
  File "/home/lindauer/git/SMAC3_play/smac/smac_cli.py", line 129, in restore_state
    stats.load(stats_path)
  File "/home/lindauer/git/SMAC3_play/smac/stats/stats.py", line 87, in load
    raise ValueError("Stats does not recognize {}".format(key))
ValueError: Stats does not recognize _ema_n_configs_per_intensifiy
lindauer@aadpool1:~/git/SMAC3_play/examples/spear_qcp$ ll

@shukon Please try to run the example yourself and check whether you can produce this error (and fix if necessary).

Furthermore, reading the docu, I was not sure whether I would need to change the scenario file provided in the cmd (--scenario ...) or the one saved in the output directory. Could you please clarify that.

@shukon
Copy link
Collaborator Author

shukon commented Sep 3, 2017

Done. I tried quite a few things now, but I'm quite confident that the build-fail has nothing to do with this PR. It's the adding some rundata to RunHistory2EPM4Cost and impute censored data, which also fails on dev and master.

@mlindauer
Copy link
Contributor

Sorry, I found another bug.
I tried again to run the spear example.
I increased the wallcocklimit to 90 seconds.
After 81 seconds, I sent a KeyboardInterrupt.
Restoring this run was fine except the wallclock budget which was reset to 0 in the restored run.
Also smac3-output*/stats.json states only "wallclock_time_used": 0.
@shukon could you please look into it and fix the wallclock_time_used in stats.json.

Furthermore, this branch has now a merge conflict with the dev. @shukon could you please fix this merge conflict.

@shukon
Copy link
Collaborator Author

shukon commented Sep 24, 2017

@mlindauer Don't be sorry for finding bugs, let me be sorry for not finding them. Should be fixed now (wallclock_time_used simply was never set, now set when saving Stats), also no merge-conflicts.

@mlindauer
Copy link
Contributor

Very nice now! I haven't found further bugs ;-)

Nevertheless, I have a further request. Could you please add a message (on INFO level) that the state was restored and that SMAC continues with the following state (incumbent + stats).

Right now, I as a user would wonder whether SMAC really restored the state or not since it does not say anything in this direction.

@shukon
Copy link
Collaborator Author

shukon commented Sep 25, 2017

@mlindauer Now logging:

INFO:	State Restored! Starting optimization with incumbent: Configuration:
  sp-clause-activity-inc, Value: 1.0
  sp-clause-decay, Value: 1.4
  sp-clause-del-heur, Value: '2'
  (...)

INFO:	State restored with following budget:
INFO:	##########################################################
INFO:	Statistics:
INFO:	#Incumbent changed: 0
INFO:	#Target algorithm runs: 8 / inf
INFO:	Used wallclock time: 15.90 / 30.00 sec 
INFO:	Used target algorithm runtime: 5.56 / inf sec
INFO:	##########################################################

@mlindauer mlindauer merged commit 24fbddb into development Oct 4, 2017
@mlindauer mlindauer deleted the issue/#193_restore_scenario branch October 4, 2017 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants