Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slurm plugin: always raise for non-zero exit code #4332

Merged
merged 3 commits into from
Aug 31, 2020

Conversation

ltalirz
Copy link
Member

@ltalirz ltalirz commented Aug 27, 2020

potentially fixes #4326

Prior to this change, cases where squeue would return a non-zero exit
code but an empty stderr would not lead to a SchedulerError.
This is fixed by adapting the slurm joblist command such that it is
always expected to produce exit code zero, and raising whenever a
non-zero exit code is encountered.

@ltalirz ltalirz force-pushed the issue_4326_slurm_squeue_check branch 2 times, most recently from 1fc7951 to ea2e5a1 Compare August 27, 2020 18:44
@codecov
Copy link

codecov bot commented Aug 27, 2020

Codecov Report

Merging #4332 into develop will increase coverage by 0.05%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4332      +/-   ##
===========================================
+ Coverage    79.10%   79.14%   +0.05%     
===========================================
  Files          468      468              
  Lines        34614    34616       +2     
===========================================
+ Hits         27378    27395      +17     
+ Misses        7236     7221      -15     
Flag Coverage Δ
#django 72.77% <100.00%> (+0.05%) ⬆️
#sqlalchemy 71.95% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/schedulers/plugins/slurm.py 58.94% <100.00%> (+6.45%) ⬆️
aiida/transports/plugins/local.py 81.29% <0.00%> (-0.25%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0345e61...dee4a57. Read the comment docs.

Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some textual changes suggested

aiida/schedulers/plugins/slurm.py Show resolved Hide resolved
aiida/schedulers/plugins/slurm.py Outdated Show resolved Hide resolved
aiida/schedulers/plugins/slurm.py Outdated Show resolved Hide resolved
aiida/schedulers/plugins/slurm.py Outdated Show resolved Hide resolved
@giovannipizzi
Copy link
Member

Thanks!
For me it could be merged, but as pointed by codecov, the new lines are not tested. Would you be willing to add one (or two?) new tests to https://github.com/aiidateam/aiida-core/blob/develop/tests/schedulers/test_slurm.py to cover the new code?

Prior to this change, cases where squeue would return a non-zero exit
code but an empty stderr would not lead to a SchedulerError.
This is fixed by adapting the slurm joblist command such that it is
always expected to produce exit code zero, and raising whenever a
non-zero exit code is encountered.
@ltalirz ltalirz force-pushed the issue_4326_slurm_squeue_check branch from 19b1f2d to 199100f Compare August 30, 2020 15:54
@ltalirz
Copy link
Member Author

ltalirz commented Aug 30, 2020

@giovannipizzi fair point - added the test, thanks for pointing out where it should go (and indeed there was still as slight mistake)

Test both that non-zero exit codes and stderr are caught as well as the
behavior when being passed a single job id.
@ltalirz ltalirz force-pushed the issue_4326_slurm_squeue_check branch from 199100f to dee4a57 Compare August 30, 2020 16:13
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sphuber sphuber merged commit 44fe2a7 into aiidateam:develop Aug 31, 2020
@sphuber sphuber deleted the issue_4326_slurm_squeue_check branch August 31, 2020 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AiiDA scheduler plugin thinks job is finished while it is still running
3 participants