Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocpool: handle long STDOUT/ERR from commands #2876

Merged
merged 2 commits into from Nov 26, 2018

Conversation

matthewrmshin
Copy link
Contributor

Ensure that pipes with lots of output do not block the command.

Fully fix #2857.

@matthewrmshin matthewrmshin added this to the next release milestone Nov 22, 2018
@matthewrmshin matthewrmshin self-assigned this Nov 22, 2018
@cylc cylc deleted a comment Nov 22, 2018
@cylc cylc deleted a comment Nov 22, 2018
@kinow
Copy link
Member

kinow commented Nov 23, 2018

Code looks good, Travis happy too. +1 ! 🎉

digress mode: On

If os.read returned a context manager, or if there was a library for that, maybe that while block

while True:
    res = os.read(fileno, 65536)
    if not res:
        break
    data += res

could simply be

while os.read(fileno, 65536) as cm:
    data += cm.data

Which I think is simpler and goes more along with the Pythonic way... In Java I would write that in a single line, but that's not allowed in Python. Interestingly the only different way I found to work was with partial.

import os
from functools import partial

license = os.path.join(os.path.dirname(__file__), 'COPYING')
data = ''
with open(license) as l:
    fileno = l.fileno()
    for res in iter(partial(os.read, fileno, 1000), ''):
        data += res
print(data)

But looks like Python 3.8 might solve it with PEP-572, where we will have something like:

while res := os.read(fileno, 1000):
    data += res

digress mode: Off

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hjoliver
Copy link
Member

(So does this supersede #2870?)

@matthewrmshin
Copy link
Contributor Author

matthewrmshin commented Nov 23, 2018

This is the full fix. #2870 is still useful in a way - it breaks up the very long submit commands into smaller chunks - so more jobs can be submitted in parallel.

@matthewrmshin
Copy link
Contributor Author

Hi @kinow os.read is a low level function that wraps the POSIX read system call. I don't think it returns a Pythonic context manager. The system call is used here (i.e. os.read(handle.fileno, ...) instead of handle.read(...) to make sure that we don't get any funny buffering while we read the pipes. The code looks clunky but is actually safest.

(And ditto for Perl. I really miss the the while assignment statement when I switched from Perl to Python - it allows a person inspecting the code to quick know what is driving the exit condition of a while loop - code is much more readable and much safer.)

@matthewrmshin
Copy link
Contributor Author

(I'll push up a change with a few more lines of extra comments to explain this in the logic.)

@matthewrmshin
Copy link
Contributor Author

(Branch re-based. More comments added to explain the usage of os.read instead of file.read.)

@kinow
Copy link
Member

kinow commented Nov 23, 2018

Hi @kinow os.read is a low level function that wraps the POSIX read system call. I don't think it returns a Pythonic context manager.

Oh, I meant that it would be good if there was some sort of cm around that. I think open is a context manager for a low level API too (with the buffer detail that I didn't know! TIL! Thanks 😀 ).

@cylc cylc deleted a comment Nov 23, 2018
@matthewrmshin
Copy link
Contributor Author

(Should now be good to go!)

@kinow
Copy link
Member

kinow commented Nov 23, 2018

Still looking good to me, thanks for the extra docs! Will leave it up to you to merge or check with @hjoliver if you'd like. 👍

@oliver-sanders
Copy link
Member

oliver-sanders commented Nov 23, 2018

Quick thought on this one, the select.poll interface isn't quite universal so it might be good to maintain the current mostly-working approach for systems which don't support select.poll. I don't know whether it would be possible to just stick the .poll bit in a try/except block?

See #2591, #2689

@matthewrmshin
Copy link
Contributor Author

(Branch re-based again. Now with logic to bypass the select.poll code if it is not supported by the OS.)

@cylc cylc deleted a comment Nov 23, 2018
Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving.

Maybe just comment in the new integration test, that it may fail on systems without select.poll()?

@hjoliver
Copy link
Member

(Patch coverage 81%; but the detail shows that is good enough I think 😁 )

Ensure that pipes with lots of output do not block the command.
If OS does not support `select.poll`.
@cylc cylc deleted a comment Nov 24, 2018
@matthewrmshin
Copy link
Contributor Author

(Rebased + hopefully the final tweak - now include logic to skip the new tests on OS without select.poll.)

@kinow
Copy link
Member

kinow commented Nov 26, 2018

Had had a look on the select changes over the weekend. Just had another look, everything looking good. 👍 keeping approval

@hjoliver
Copy link
Member

Ditto. 👍

@hjoliver hjoliver merged commit 818bde9 into cylc:master Nov 26, 2018
@matthewrmshin matthewrmshin deleted the handle-long-subprocpool-output branch November 26, 2018 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cylc jobs-submit gets stuck on submitting several hundred jobs
4 participants