-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand runtime control: global max_seconds and STOP file #3028
Conversation
If I would really like to avoid cases like this if at all possible in the future:
|
Agreed. I was expecting this request. Documentation is still missing in this PR. Putting out this PR for feedback. |
Overall these look like good changes to me. |
892b334
to
48a983c
Compare
Good advance here but I would like some changes: I am against max_seconds="500" as an attribute the project section. It adds parsing complication as Jaron noted. Generally I am against the use of attributes when we could use parameters. This is much easier for users (including this one) to remember, read, and get right. Use of XML for input is unfortunate, but it is what we have currently. We already have maxcpusecs that works at the driver level, and should recycle the same tag at a higher level. It would also make sense to support the same parameter as a command line argument. Then when the user or robot writes a job submission file it can be written there without affecting the QMC input. .STOP will be very useful! I agree with the per block check, since part of our current block definition is that this is when we do I/O. |
@ye-luo Interested to know your current use case, or perhaps it was requested. |
I changed "max_seconds" from attribute to parameter but it remains under project. Regarding command line arguments, in general I would like to move more command line arguments under project to improve reproducibility from a single input file. For example, we may consider moving verbosity. I agree adding max_seconds in command line can be useful. I want to see the timer so the execution must stop clean. If I'm looking at the timer distribution and see which are heavier, just need to stop the code on the fly without changing input. |
@prckent let me know if you really want to stick to "maxcpusecs", otherwise, I will mark it deprecated. |
max_seconds is good imo. OK to mark the old tag as deprecated and abort - there can't be many people using it and it is an easy change to make. An edit might needed in NEXUS. |
"deprecated" will continue to run. Once 1 release is out, we change it to "deleted" then users get an abort. |
maxcpusecs has been superseded by max_seconds in classical drivers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified by hand.
Added another mention in the documentation and increased the safety factor.
Proposed changes
This PR enhances runtime controls.
<project id="project-prefix" max_seconds="500">
to control time limit in batched drivers. default 360000 (100h) as legacy drivers which however still respectmax_seconds
in the per driver input section.check frequency is per block.
explicitly tested ensemble job case. existing cpu_limit test results.
What type(s) of changes does this code introduce?
Does this introduce a breaking change?
What systems has this change been tested on?
epyc-server
Checklist
Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.