Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[not for merge] tidying threads and file descriptors at shutdown #3397

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

benclifford
Copy link
Collaborator

this is a patch stack in my dev environment where i have been pushing on cleanup of threads and file descriptors at shutdown, with the eventual goal of having pytest fixtures assert that only the main thread is running and the same number of file descriptors are open at the start and the end. I've been merging work from this already in small pieces and will continue to merge small tidyups from it.

There is at least one large part which needs some thought and discussion, which is how to shut down the htex queue processing thread - the engine of almost everything in htex that doesn't happen immediately on user task submission.

@benclifford benclifford marked this pull request as draft April 25, 2024 07:28
@benclifford benclifford force-pushed the benc-stdstreams-bugs branch 3 times, most recently from 511fe68 to ef0f2bf Compare April 28, 2024 07:52
This functionality was introduced in #3260 and we should be encouraging
people to use it in order to clean up nicely, after PR #3165 took away
atexit handler cleanup.
Theres was defined meaning to the return bool either in the code
or in the human text:

- no code calls close to give an example of interpretation

- it's not success/fail because local channel returns False
to indicate that it didn't need to do anything, not that there
was a failure.

Other .close() style methods return None and raise an exception
if there is a problem. This PR pushes Channel.close() in this
direction.

A separate PR will actually invoke this .close() method.
.. and use that new parameter in stderr/out tests which don't make use of
the UDP radio.
… too (and be tested)

(see different patch)

it should be the responsibility of the DFK to close channels (at least in that other patch)
and so that's why this test, which avoids the DFK, needs to close channels itself

but looks like channel.close() doesn't close down threads enough?
I can see it now closes the transport - the transport thread is now reported as unconnected,
prior to calling close it was reporting itself as connected. but how to make the transport
thread go away entirely?

it looks like the thread goes away "eventually" - a 20s sleep for example, is long enough
- so how to do that with joins?
its architecturally a bit ambiguous who should be responsible for this... channels live inside providers inside executors and its arguable that providers shoudl do that, when closed by executors (which they are not).

however, the DFK contains a bunch of channel initialization, and so it is legitimate for it to also contain a bunch of channel cleanup

TESTING: nothing actually tests this - perhaps a mock channel that we check DFK setup and shutdown works on?

I have seent this test in CI on cleanup of:

parsl/tests/test_htex/test_connected_blocks.py

E               AssertionError: The provider model assumes a provider has channel(s)
1986

but that test works for me...
?
… too (and be tested)

(see different patch)

it should be the responsibility of the DFK to close channels (at least in that other patch)
and so that's why this test, which avoids the DFK, needs to close channels itself

but looks like channel.close() doesn't close down threads enough?
I can see it now closes the transport - the transport thread is now reported as unconnected,
prior to calling close it was reporting itself as connected. but how to make the transport
thread go away entirely?

it looks like the thread goes away "eventually" - a 20s sleep for example, is long enough
- so how to do that with joins?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant