Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Integrate YAHPO Gym #301

Closed
wants to merge 7 commits into from
Closed

[WIP] Integrate YAHPO Gym #301

wants to merge 7 commits into from

Conversation

pfistfl
Copy link

@pfistfl pfistfl commented Aug 5, 2022

Description of changes:

First draft for including YAHPO Gym as a BlackBoxRecipe.
This is not entirely straightforward and I might need some input from @geoalgo on how to progress.

Currently, yahpo has a nested structure: /scenario/instance where scenario is a problem set and all instances within a scenario share the same search space. (A scenario is e.g. xgboost and the instances are different datasets)
In the current design, the user would call

bb = load_blackbox("YAHPO")[<scenario>]
bb.set_instance(<instance>)

If we unnest this, this would (in total) be around 850 instances.

@geoalgo Could you perhaps do a pass / help me think about how to integrate the different designs?
I guess we might want to have one Recipe per scenario as you do in the icml_2020 recipe?
Would this bloat the Recipes?

I will list a few open to-do's:

  • Check what needs to be serialized / How to distribute the .onnx neural networks.
  • Check where the YAHPO setup (pointer to data dir etc.) needs to happen.
  • Add an example script.

@geoalgo
Copy link
Contributor

geoalgo commented Aug 5, 2022

Thanks @pfistfl for this PR, its really exciting to have access to all the benchmarks of yahpo!

Regarding the nesting of instances, I think it would be best to have one blackbox per family rather than have all instances into one. This is because we can then benchmark all transfer-learning methods more easily inside one family (such as iaml_xgboost for instance).

To be clear, I am suggesting the following (plus/minus task naming conventions):

bb_dict = load_blackbox("yahpo-lcbench")  # download yahpo_data/lcbench, it would also be OK to download everything if more convenient as the whole dir seems to be 1.3GB 
bb = bb_dict["Fashion-MNIST"]  # access a given dataset for a scenario, not sure how its named in yahpo
print(bb(configuration={...}))

Or to take your naming, to have load_blackbox("yahpo-{scenario}") returns a dictionary where the keys are {instance}. I do not think it would bloat the recipes as we can programatically add them (since they all have the same structure).

Does this sounds reasonable to you? If so, let me know what part you want to work on. We could also merge your code and adapt it ourselves if you think this is reasonable but don't have the time for this, up to you :-)

PS1: A small thing, could you run ./dev_setup.sh before your future pushes (this will automatically format your code at each commit, see our contribution guideline for more info)

PS2: Could you add the LICENSE header (present in any file such as this one, this is something we did not get to automate yet

@pfistfl
Copy link
Author

pfistfl commented Aug 5, 2022

Hi @geoalgo

One blackbox per family -> makes sense, let us do that!
Should be trivial to programmatically add those, I have started.

It might be more sensible if you take over for now since you know the in's and out's of syne-tune better.
I hope the existing code provides you with everything required to make things run.

The current setup does not allow me to specify yahpo as a github repository, so I will work on making yahpo pip-installable in the mean time.

@geoalgo
Copy link
Contributor

geoalgo commented Aug 5, 2022

It might be more sensible if you take over for now since you know the in's and out's of syne-tune better.
I hope the existing code provides you with everything required to make things run.

Ok thanks it works for me, one thing though is that I may not be able to finish before summer break (next week), I will finish the integration once I get back (end of August) if I dont get it done by then.

The current setup does not allow me to specify yahpo as a github repository, so I will work on making yahpo pip-installable in the mean time.

As you prefer, it could be nice indeed to be able to install specific versions directly from requirements. That being said the current approach should work for now.

Again thanks a lot for this work, it is very exciting to be able to benchmark syne-tune schedulers with all this new data!

@pfistfl
Copy link
Author

pfistfl commented Aug 6, 2022

Just a small update: yahpo is pip-installable now

@mseeger
Copy link
Collaborator

mseeger commented Aug 15, 2022

Hello, is there anything I can do to unblock while David is PTO?

@pfistfl
Copy link
Author

pfistfl commented Aug 19, 2022

Hi,

We planned on David taking over the PR given that he is more familiar with the design intents and internal structure of syne-tune. I guess the current task is to marry minor differences in the design of yahpo-gym and syne-tune and add tests/ examples once everything works.

@mseeger
Copy link
Collaborator

mseeger commented Aug 19, 2022

Sounds good. David wrote the benchmark repository and will be the best person to help here.

@geoalgo
Copy link
Contributor

geoalgo commented Aug 31, 2022

Just a quick update on this.
I came back from holidays and will resume this work and add an example that allows to run ASHA and others with a Yahpo benchmark.

@geoalgo
Copy link
Contributor

geoalgo commented Sep 6, 2022

Hi Florian, I sent a PR based on your initial work (#337) to integrate Yahpo to syne-tune, could you take a look? (I opened a new one as there were a few conflicts, I hope you dont mind)

@geoalgo geoalgo closed this Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants