Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write out YAML file after parsing #835

Merged
merged 3 commits into from Aug 6, 2021
Merged

Write out YAML file after parsing #835

merged 3 commits into from Aug 6, 2021

Conversation

mikemhenry
Copy link
Contributor

Description

Write out YAML file after parsing.

Motivation and context

In order to improve reproducibility and quickly see which options were used in the simulation, it would be nice to be able to inspect the YAML that has all the options set as parsed.

Resolves #817

How has this been tested?

Tested on GHA and locally manually.

Change log

Write out YAML file after all options are parsed and set. Saved as YAML original file name + date + time. 

@mikemhenry mikemhenry added this to the 0.9.2 BugfIx Release milestone Aug 4, 2021
@mikemhenry mikemhenry changed the title test to see if I need to move anything else around Write out YAML file after parsing Aug 4, 2021
@mikemhenry mikemhenry self-assigned this Aug 4, 2021
@codecov
Copy link

codecov bot commented Aug 4, 2021

Codecov Report

Merging #835 (75b9c9a) into master (5d50cbc) will decrease coverage by 0.02%.
The diff coverage is 9.09%.

@mikemhenry
Copy link
Contributor Author

mikemhenry commented Aug 4, 2021

@jchodera @ijpulidos thoughts on the format for the parsed yaml?

For example, test.yml is the starting yml and then this is the parsed yaml: parsed-2021-08-04-14:47:15-test.yml

time = datetime.datetime.now().strftime("%Y-%m-%d-%H:%M:%S")
yaml_parse_name = f"parsed-{time}-{yaml_name}"

Also this will be a bit more work than I thought, I tried reading in a parsed yaml and it barfs:

Traceback (most recent call last):
  File "/home/mmh/miniconda3/envs/perses/bin/perses-relative", line 33, in <module>
    sys.exit(load_entry_point('perses', 'console_scripts', 'perses-relative')())
  File "/home/mmh/Projects/perses/perses/app/setup_relative_calculation.py", line 687, in run
    setup_options = getSetupOptions(yaml_filename)
  File "/home/mmh/Projects/perses/perses/app/setup_relative_calculation.py", line 110, in getSetupOptions
    setup_options['complex_box_dimensions'] = tuple([float(x) for x in setup_options['complex_box_dimensions']])
TypeError: 'NoneType' object is not iterable

since setup_options gets some objects that are unrealizable, so they get set to null. I'll have to figure out the best way to handle that. Might not just write those values out since that should work fine, below is an example original yaml then a parsed one.

orignal:

atom_selection: not water
checkpoint_interval: 5
fe_type: repex
forcefield_files:
- amber/ff14SB.xml
- amber/tip3p_standard.xml
- amber/tip3p_HFE_multivalent.xml
- amber/phosaa10.xml
n_cycles: 50
n_equilibration_iterations: 10
n_states: 3
n_steps_per_move_application: 50
new_ligand_index: 15
old_ligand_index: 14
phases:
- vacuum
pressure: 1.0
save_setup_pickle_as: fesetup_hbonds.pkl
small_molecule_forcefield: openff-1.0.0
solvent_padding: 9.0
temperature: 300.0
timestep: 4.0
trajectory_directory: cdk2_repex_hbonds
trajectory_prefix: cdk2
ligand_file: /home/mmh/Projects/perses/examples/cdk2-example/CDK2_ligands_shifted.sdf
protein_pdb: /home/mmh/Projects/perses/examples/cdk2-example/CDK2_protein.pdb

parsed:

anneal_1,4s: false
atom_expr: null
atom_selection: not water
bond_expr: null
checkpoint_interval: 5
complex_box_dimensions: null
fe_type: repex
forcefield_files:
- amber/ff14SB.xml
- amber/tip3p_standard.xml
- amber/tip3p_HFE_multivalent.xml
- amber/phosaa10.xml
ionic_strength: 0.15
ligand_file: /home/mmh/Projects/perses/examples/cdk2-example/CDK2_ligands_shifted.sdf
map_strength: null
n_cycles: 50
n_equilibration_iterations: 10
n_states: 3
n_steps_per_move_application: 50
neglect_angles: false
new_ligand_index: 15
nonbonded_method: PME
offline-freq: 10
old_ligand_index: 14
phases:
- vacuum
pressure: 1.0
protein_pdb: /home/mmh/Projects/perses/examples/cdk2-example/CDK2_protein.pdb
protocol-type: default
remove_constraints: false
render_atom_map: true
rmsd_restraint: false
run_type: null
save_setup_pickle_as: fesetup_hbonds.pkl
small_molecule_forcefield: openff-1.0.0
small_molecule_parameters_cache: null
softcore_v2: false
solvent_box_dimensions: null
solvent_padding: 9.0
spectators: null
temperature: 300.0
timestep: 4.0
trajectory_directory: cdk2_repex_hbonds
trajectory_prefix: cdk2

So even if I drop the null values, we don't lose any information that was in the original and then we can re-run a parsed yaml file.

@mikemhenry
Copy link
Contributor Author

It isn't actually because it is an object, the parsing code doesn't handle the case where complex_box_dimensions is set to None or (null) but checks to see if the key exits and if it does, assumes it can be cast as a float.

    if 'complex_box_dimensions' not in setup_options:
        setup_options['complex_box_dimensions'] = None
    else:
        setup_options['complex_box_dimensions'] = tuple([float(x) for x in setup_options['complex_box_dimensions']])

@mikemhenry
Copy link
Contributor Author

So I'll just need to make the parsing logic a bit more robust.

@ijpulidos
Copy link
Contributor

@mikemhenry I actually think having all the values in the parsed YAML is a good idea for replicability (even the ones inferred by the software). If we don't have all of these in the parsed YAML and if we end up changing the defaults in the future we then would get different results for the same YAML file, which I think is not something we want.

Another solution would be to just not try to put these None when the original input YAML file doesn't have them, but actually just trust the class defaults and output the parsed YAML after it has passed through the classes and such. So, for example for the setup_options['complex_box_dimensions'] case you would just have to overwrite it if it is in the input YAML file, else just use the class defaults, i.e. just:

    if 'complex_box_dimensions' in setup_options:
        setup_options['complex_box_dimensions'] = tuple([float(x) for x in setup_options['complex_box_dimensions']])

or something like that.

Of course this means we have to be sure the classes have actual sane and working defaults and this may be a hard task.

@ijpulidos
Copy link
Contributor

Hmm, actually that code snippet wouldn't work as well, maybe you need to check also that they are not None up there, but the idea of having the default behavior be determined by the defaults in the classes still holds in my opinion.

@jchodera
Copy link
Member

jchodera commented Aug 5, 2021

So even if I drop the null values, we don't lose any information that was in the original and then we can re-run a parsed yaml file.

As long as we can re-parse the YAML file, that works for now! We'll be replacing all of this functionality when we bring the new API online.

@mikemhenry
Copy link
Contributor Author

Yes, I want to do the least amount of work possible on this since it is due for an overhaul. Ready for review now @ijpulidos and @jchodera

Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It works, even though I think this way of dealing with setup options from CLI can be done in a better way. Hopefully something for future releases.

I think we do require a test for this, though. Something that starts a sim from a parsed YAML.

@ijpulidos
Copy link
Contributor

Is it crazy to ask also for a test that verifies that the parsed YAML has all the possible parameters/options? Thinking again about reproducibility.

@jchodera
Copy link
Member

jchodera commented Aug 6, 2021

Is it crazy to ask also for a test that verifies that the parsed YAML has all the possible parameters/options? Thinking again about reproducibility.

I think all we need right now is "good enough" because all this functionality will be replaced in an upcoming milestone. What about creating a new issue to discuss what the new YAML functionality should look like with the new API, and making sure we spec out exactly what behavior we want for that? If we design it right, it will be simple to implement and achieve the desired behavior while not compromising flexibility. Tests could, for example, use introspection to regress over all available options.

Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good enough for now!

@mikemhenry mikemhenry dismissed ijpulidos’s stale review August 6, 2021 03:12

Good enough for now!

@mikemhenry mikemhenry merged commit cfbdc7a into master Aug 6, 2021
@mikemhenry mikemhenry deleted the feat/dump_yaml branch August 6, 2021 03:12
@mikemhenry mikemhenry restored the feat/dump_yaml branch January 26, 2022 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write out YAML after parsing
3 participants