Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion about sim_specs/gen_specs/alloc_specs API #266

Closed
jmlarson1 opened this issue Oct 17, 2019 · 5 comments
Closed

Discussion about sim_specs/gen_specs/alloc_specs API #266

jmlarson1 opened this issue Oct 17, 2019 · 5 comments

Comments

@jmlarson1
Copy link
Member

jmlarson1 commented Oct 17, 2019

Issue

Currently sim_specs/gen_specs/alloc_specs contain both

  1. fields that are reserved for libE use
    • for example, sim_specs['out'] tells libE what fields name/type/size to be ready to receive
  2. fields that a user creates and fills
    • for example, a precomputed set of sampled points to given out by the allocation function

This can be confusing. I can think of three options

Option 1 (status quo)

Keep it as is

  • Users may want the libE fields as well as their own fields

Pros:

  • Docs, tests, examples, etc. don't need updating
  • Example intuition: all sim_f related parameters go in sim_specs

Cons:

  • For new users, confusion differentiating libE-reserved fields & user-defined fields without referencing docs.

Option 2 (nested)

Only allow a single user field (e.g., sim_specs['user']) which can contain any data structure that the user wants to receive whenever sim_f, gen_f, or alloc_f is called

Pros:

  • This clearly delineates what libE needs and what the user can adjust.
  • Rigidly defines all field names in sim_specs/gen_specs/alloc_specs

Cons:

  • Adds a dictionary within a dictionary (ugly?)
  • Users will need to adjust their functions to parse their fields properly
  • Architecture, docs, tests, examples, etc. need updating

Option 3 (separated)

Move libE information to libE_specs.

  • Requires sim/gen information to be in two different places.
    • sim_f, sim_out, sim_in declared in libE_specs and sim_specs contains only user fields/data

Pros:

  • Relative simplicity of libE-reserved fields only being in libE_specs
  • Users shouldn't need to adjust their functions

Cons:

  • Architecture, docs, tests, examples, etc. need updating
  • Autogenerating libE_specs for tests will become more complicated

Next steps

Edit this issue to add any alternative options/pros/cons.

Please vote below

@shuds13
Copy link
Member

shuds13 commented Oct 22, 2019

I'm leaning towards option3, though note that sim_specs['out'] is used to initiate output dtype in sim funcs, so users would have to change that to use libE_specs (we could at some point provide support functions to hide this).

Without that dtype inititation, then option 2 would also be easy as we would only have to send user defined specs, which could be received as sim_specs - and no changes required.

Note I have coded these three approaches to forces in the branch experimental/forces_specs_tests. I didn't change libe so option3 wont work - but its just to see the code.

@jlnav
Copy link
Member

jlnav commented Oct 22, 2019

Out of the two new options, I prefer 3 for the separation between libE-specific fields and user parameters. However, this would mean that calling scripts transition from containing two large dictionaries to three. parse_args() would need updating.

For use-cases where all user-parameters are hard-coded in their sim_f's or gen_f's, sim_specs and gen_specs could be empty. This could help produce simple tutorials or examples.

Option 2 from my understanding is just a suggested change in formatting sim_specs and gen_specs.

@shuds13
Copy link
Member

shuds13 commented Oct 28, 2019

As we need to factor multiple sim/gens into this. I think this draws me away from option 3 somewhat and towards opt2, while maybe allowing a nested structure of sim/gens to be passd to libE. E.g. In the most general case...

sim_specs1 = {'sim_f': run_forces,
             'in': ['x'],
             'out': [('energy', float)],
             'user_specs': ....
            }

sim_specs2 = {'sim_f': run_forces2,
             'in': ['x','m'],
             'out': [('energy', float, 'other', float)],
             'user_specs': ....
            }

sim_lookup = {'sim1': sim_specs1,
              'sim2': sim_specs2}

H, persis_info, flag = libE(sim_specs=sim_lookup, gen_specs, exit_criteria, persis_info=persis_info, libE_specs=libE_specs)

If these are kept constant (as they are currently) they could all be passed once to each worker and an H field can inform which sim_f to run on the worker.

Though there maybe more efficient ways to consolidate multiple sim_specs with some fields in common.

Note that this overlaps with discussion on alternative structures of H. Currently with H fixed allocation on manager, we would have to allocate to cover union of output fields and maximum size of each, so different outputs may not be worthy of different sim funcs. This might change with different H structure.

@jmlarson1
Copy link
Member Author

After a lengthy discussion, we have decided to implement option 2, in agreement with the comment from @shuds13 above.

@jmlarson1 jmlarson1 moved this from Aspirational to In progress in libE Kanban Oct 31, 2019
@jlnav jlnav moved this from In progress to In review/testing in libE Kanban Nov 11, 2019
@shuds13 shuds13 mentioned this issue Nov 12, 2019
13 tasks
@jlnav
Copy link
Member

jlnav commented Nov 15, 2019

Presumably closed by #271

@jlnav jlnav closed this as completed Nov 15, 2019
@shuds13 shuds13 moved this from In review/testing to Done in libE Kanban Dec 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

5 participants