Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for resuming with CLI/YAML #830

Merged
merged 22 commits into from Aug 2, 2021
Merged

Conversation

mikemhenry
Copy link
Contributor

No description provided.

@mikemhenry mikemhenry added this to the 0.9.1 Bugfix release milestone Jul 30, 2021
@mikemhenry
Copy link
Contributor Author

Not sure why it can't find the file, this works for me locally:

# Checkout PR
gh pr checkout 830
# Run the test
pytest -v -k test_cli_resume_repex

Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! It certainly helps when resuming from CLI/YAML files. However, we should keep in mind that maybe only changing and checking the iterations is not enough. What if the simulation crashes and/or other parameters are changed before trying to resume?

perses/app/setup_relative_calculation.py Outdated Show resolved Hide resolved
perses/app/setup_relative_calculation.py Outdated Show resolved Hide resolved
perses/app/setup_relative_calculation.py Outdated Show resolved Hide resolved
perses/app/setup_relative_calculation.py Show resolved Hide resolved
docs/changelog.rst Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jul 30, 2021

Codecov Report

Merging #830 (3a764c9) into master (222e1c7) will decrease coverage by 0.09%.
The diff coverage is 25.80%.

Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test for whether repex can be resumed no longer actually tests resuming: It just runs perses once. We need to run it once, stop it early (or extend it after it finishes), and then run it again, and demonstrate that works gracefully. We should fix the test, make sure it works, then merge.

perses/app/setup_relative_calculation.py Outdated Show resolved Hide resolved
perses/app/setup_relative_calculation.py Outdated Show resolved Hide resolved
perses/app/setup_relative_calculation.py Show resolved Hide resolved
perses/tests/test_resume.py Outdated Show resolved Hide resolved
@mikemhenry
Copy link
Contributor Author

I'll use an f-string, but I agree that using pathlib stuff is best when dealing with paths since it can make sure the separators are correct and while paths are string like, its best to treat them more a separate thing. I just was doing this with the least amount of effort so I coped the line here

storage_name = str(trajectory_directory)+'/'+str(trajectory_prefix)+'-'+str(phase)+'.nc'

I'll get the test working as well, it passed for me locally but then failed on GHA so I commented it out so if we needed this today it could be merged in. I'm working on fixing the tests now :)

@jchodera
Copy link
Member

jchodera commented Aug 2, 2021

@mikemhenry : One issue could be that the OpenEye license files are not being found, which I think means that the perses-relative entry point is failing to return an error code or we are failing to check for it after calling subprocess.run():

2021-08-01 23:58:52,344:(0.00s):relative_setup:Setting non bonded method to PME
2021-08-01 23:58:52,344:(0.00s):relative_setup:Handling files for ligands and indices...
2021-08-01 23:58:52,344:(0.00s):relative_setup:Detected .sdf format.  Proceeding...
LICENSE: Could not open license file specified by OE_LICENSE environment variable "/home/runner/oe_license.txt"
LICENSE: Could not open license file "oe_license.txt" in local directory
LICENSE: N.B. OE_DIR environment variable is not set
LICENSE: No product keys!
LICENSE: No valid license found for oechem

I think this means this test is failing to generate any output the first time because of a failure to find the license file in the subprocess.run() environment.

@ijpulidos
Copy link
Contributor

@mikemhenry I think the tests are expecting this cdk2_repex_hbonds/cdk2-vacuum.nc file that I cannot see where it comes from. That may be something you have only on your local tree, hence why it is passing there? Just a guess.

@mikemhenry
Copy link
Contributor Author

@ijpulidos it is created when we run the yaml file
@jchodera good catch -- I thought that was a red herring since the other tests seem to run fine and I expected them to blow up if there really was an issue with the license file -- looking into that now

@mikemhenry
Copy link
Contributor Author

AHH I forgot how subprocess will have a different env, I bet that is it @jchodera, testing now!
This would explain why it works locally, I have the license file in the spot where it expects to be and not set via an envar.

@mikemhenry
Copy link
Contributor Author

I am also going to quickly add LOGLEVEL = os.environ.get("LOGLEVEL", "INFO").upper() in this PR as well since the log level is hard coded and I remember relay having an issue with that when we told them to more information. When using the API you can set the log level but there wasn't a way to do that with the CLI until now.

@mikemhenry
Copy link
Contributor Author

mikemhenry commented Aug 2, 2021

If this last fix doesn't work, I will copy the license into the local dir where these tests are running.
Also adding check=True helps to diagnose where the error is since it will throw an exception now if the command fails.
So @jchodera you were correct the command was never running, which is why the directory was created, then it can't find the license, then loading the checkpoint failed since the command never ran.

@mikemhenry
Copy link
Contributor Author

🎉 okay it is passing now! This is what I get for doing TDD locally instead of on GHA 🤷

@mikemhenry
Copy link
Contributor Author

@ijpulidos @jchodera would you mind taking a quick look again? Really the only changes were made to the testing, but I did add functionality for reading LOGLEVEL from the environment. Once this is merged we can cut that release!

@mikemhenry mikemhenry mentioned this pull request Aug 2, 2021
Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes, good to see the debugging levels capabilities.

docs/changelog.rst Outdated Show resolved Hide resolved
perses/app/setup_relative_calculation.py Show resolved Hide resolved
Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@mikemhenry mikemhenry enabled auto-merge (squash) August 2, 2021 18:41
@mikemhenry mikemhenry merged commit fd6bb4e into master Aug 2, 2021
@mikemhenry mikemhenry deleted the feat/resume_with_cli branch August 2, 2021 18:51
@mikemhenry mikemhenry restored the feat/resume_with_cli branch January 26, 2022 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants