Checking LSF log if bjobs fails #5

leoisl · 2020-03-14T20:18:25Z

This PR checks LSF logs if bjobs fails 3 times (by default).

In the current code, the pipeline will fail with the error message that it got an unknown job status.
In this event, this PR will look at LSF logs to check if the pipeline failed or succeeded.
However, if the logs are stored in a slow filesystem, the submission rate can degrade (my experience is that 1 job was submitted every 10-20 seconds, while when not looking at the logs in the filesystem to check the job status, the rate was several jobs per second).

Test-wise, the coverage is close to 100%. CookieCutter.py and OSLayer.py are not supposed to be tested however, they are just an abstraction on getting values from CookieCutter and interacting with the OS so that they can be mocked in the testing code.

Still searching for a way to do this without degrading the submission rate.

…jobs stat

mbhall88

Amazing work @leoisl . The test infrastructure is very welcome and needed! This will close #1

cookiecutter.json

README.md

{{cookiecutter.profile_name}}/OSLayer.py

{{cookiecutter.profile_name}}/config.yaml

{{cookiecutter.profile_name}}/lsf_submit.py

leoisl · 2020-03-15T15:23:19Z

Solved the simplest changes now, will continue later

requirements.txt

leoisl · 2020-03-19T17:10:38Z

I think now we have two pending items. I will now run some pipelines with the current version of the code. I wonder if we should merge into a branch while we make sure I did not introduce any bugs, and you can also work on this branch to add rule-specific cluster configs.

{{cookiecutter.profile_name}}/lsf_submit.py

leoisl · 2020-03-20T12:15:13Z

I think we are good to merge! Will keep using this profile for the next pipelines to spot any issue.

mbhall88 · 2020-03-20T15:57:01Z

I think we are good to merge! Will keep using this profile for the next pipelines to spot any issue.

Not quite. There is a conflict still. I think this is because the origin of this branch is before a change to lsf-status.py was made and now that file is deleted in this PR. We can't solve this on github, it has to be done locally.

leoisl · 2020-03-20T16:00:50Z

I actually tried to solve the conflict locally, but was unable to:

Cherry-picking the missing commit: leoisl@12d0bde
Manually adding lsf-submit.py back: leoisl@0fc5bee
and
Removing lsf-submit.py: leoisl@f3b3cf1

Will retry...

leoisl · 2020-03-20T16:19:32Z

So one way to be able to merge is to add the old lsf-status and lsf-submit files. We can merge and then delete them.

leandro added 16 commits March 1, 2020 00:31

Parsing the cluster log file to get the job status instead of using b…

58d2a3b

…jobs stat

WIP: status checking using bjobs and log files

7128e14

Adding tests for lsf_status

8d2b302

Fixing bugs and improving tests

5088f75

Finishing testing of LSF_Status_Checker

e6bd735

Adding tests for lsf_submit

6b5de55

making test_lsf_submit runnable on its own

12366ff

Updating some default values

ef213a4

Updating README

c141fd2

Fixing sys.path issues

b29058a

Fixing sys.path issues

cd58ef2

bugfixing on LSF_Submit._submit_cmd_and_get_external_job_id

4e42135

no need to store log files in cluster_checkpoints

b741495

Making it work for python v3.6

bd5f69c

Merge branch 'master' into trying_the_merge_again

0fc5bee

Deleting old lsf-submit to solve conflict

f3b3cf1

mbhall88 self-requested a review March 15, 2020 10:22

mbhall88 self-assigned this Mar 15, 2020

mbhall88 requested changes Mar 15, 2020

View reviewed changes

leandro added 6 commits March 15, 2020 13:53

Returning cookiecutter configs to default

23e8170

Reverting README.md

0687031

Fixing bug in OSLayer.remove_file()

5155b82

Adding type annotations

4eb7fe3

Removing unnecessary elses

8a23bf9

Adding more verbosity to errors

cb1bce2

leandro added 2 commits March 16, 2020 15:03

Updating tests

79fbd90

Adding a venv

ead1c00

mbhall88 reviewed Mar 16, 2020

View reviewed changes

requirements.txt Show resolved Hide resolved

Refactoring run_process_and_get_output_and_error_stream signature

31202c4

leandro added 3 commits March 19, 2020 13:07

Refactoring global variables in tests

8a259bb

Refactoring _get_information_to_status_script

5340276

Improving error handling and reporting

44b0ab5

mbhall88 requested changes Mar 20, 2020

View reviewed changes

{{cookiecutter.profile_name}}/lsf_submit.py Outdated Show resolved Hide resolved

mbhall88 changed the base branch from master to development March 20, 2020 10:50

Using uuid.uuid4() to get a random unique string instead of building own

aedfa2b

mbhall88 approved these changes Mar 20, 2020

View reviewed changes

leandro added 3 commits March 20, 2020 16:16

Adding old lsf-status/submit files

e32c6f5

Removing old lsf-status/submit files

4691d44

Adding old lsf-status/submit files

c091c52

mbhall88 approved these changes Mar 20, 2020

View reviewed changes

mbhall88 merged commit 9150a8f into Snakemake-Profiles:development Mar 20, 2020

mbhall88 mentioned this pull request Apr 2, 2020

Setup CI tests #1

Closed

mbhall88 mentioned this pull request Apr 9, 2020

Robust status handling and per-rule config #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checking LSF log if bjobs fails #5

Checking LSF log if bjobs fails #5

leoisl commented Mar 14, 2020

mbhall88 left a comment

leoisl commented Mar 15, 2020

leoisl commented Mar 19, 2020

leoisl commented Mar 20, 2020

mbhall88 commented Mar 20, 2020

leoisl commented Mar 20, 2020

leoisl commented Mar 20, 2020

Checking LSF log if bjobs fails #5

Checking LSF log if bjobs fails #5

Conversation

leoisl commented Mar 14, 2020

mbhall88 left a comment

Choose a reason for hiding this comment

leoisl commented Mar 15, 2020

leoisl commented Mar 19, 2020

leoisl commented Mar 20, 2020

mbhall88 commented Mar 20, 2020

leoisl commented Mar 20, 2020

leoisl commented Mar 20, 2020