-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking LSF log if bjobs fails #5
Checking LSF log if bjobs fails #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved the simplest changes now, will continue later |
I think now we have two pending items. I will now run some pipelines with the current version of the code. I wonder if we should merge into a branch while we make sure I did not introduce any bugs, and you can also work on this branch to add rule-specific cluster configs. |
I think we are good to merge! Will keep using this profile for the next pipelines to spot any issue. |
Not quite. There is a conflict still. I think this is because the origin of this branch is before a change to |
I actually tried to solve the conflict locally, but was unable to:
Will retry... |
So one way to be able to merge is to add the old lsf-status and lsf-submit files. We can merge and then delete them. |
This PR checks LSF logs if bjobs fails 3 times (by default).
In the current code, the pipeline will fail with the error message that it got an unknown job status.
In this event, this PR will look at LSF logs to check if the pipeline failed or succeeded.
However, if the logs are stored in a slow filesystem, the submission rate can degrade (my experience is that 1 job was submitted every 10-20 seconds, while when not looking at the logs in the filesystem to check the job status, the rate was several jobs per second).
Test-wise, the coverage is close to 100%. CookieCutter.py and OSLayer.py are not supposed to be tested however, they are just an abstraction on getting values from CookieCutter and interacting with the OS so that they can be mocked in the testing code.
Still searching for a way to do this without degrading the submission rate.