Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaking thread in SpecCluster because LoopRunner is never closed #6069

Open
graingert opened this issue Apr 5, 2022 · 0 comments
Open

Leaking thread in SpecCluster because LoopRunner is never closed #6069

graingert opened this issue Apr 5, 2022 · 0 comments
Labels
bug Something is broken

Comments

@graingert
Copy link
Member

the LocalCluster isn't calling self._loop_runner.stop() on close

I added:

--- a/distributed/utils.py
+++ b/distributed/utils.py
@@ -459,7 +459,10 @@ class LoopRunner:
             finally:
                 done_evt.set()
 
-        thread = threading.Thread(target=run_loop, name="IO loop")
+        thread = threading.Thread(
+            target=run_loop,
+            name=f"IO loop for {os.environ.get('PYTEST_CURRENT_TEST')}",
+        )
         thread.daemon = True
         thread.start()
 

which lets me see where the loops are staying alive from:

[<_MainThread(MainThread, started 140507058100032)>,
 <Thread(Dask-Offload_0, started 140505936979712)>,
 <Thread(TCP-Executor-44063-0, started daemon 140505708054272)>,
 <Thread(TCP-Executor-44063-1, started daemon 140505493010176)>,
 <Thread(Profile, started daemon 140505716446976)>,
 <Thread(IO loop for distributed/deploy/tests/test_local.py::LocalTest::test_context_manager (call), started daemon 140502607329024)>,
 <Thread(Profile, started daemon 140504410875648)>,
 <Thread(IO loop for distributed/deploy/tests/test_local.py::LocalTest::test_cores (call), started daemon 140505006462720)>,
 <Thread(Profile, started daemon 140505014855424)>,
 <Thread(IO loop for distributed/deploy/tests/test_local.py::LocalTest::test_no_workers (call), started daemon 140505023248128)>,
 <Thread(Profile, started daemon 140504998070016)>,
 <Thread(IO loop for distributed/deploy/tests/test_local.py::LocalTest::test_submit (call), started daemon 140504989677312)>,
 <Thread(Profile, started daemon 140504972891904)>,
 <Thread(IO loop for distributed/deploy/tests/test_local.py::test_cleanup (call), started daemon 140504419268352)>,
 <Thread(Profile, started daemon 140503144199936)>,
 <Thread(IO loop for distributed/deploy/tests/test_local.py::test_cleanup (call), started daemon 140504981284608)>,
 <Thread(Profile, started daemon 140504385697536)>,
 <Thread(IO loop for distributed/deploy/tests/test_local.py::test_dont_select_closed_worker (call), started daemon 140504368912128)>,
 <Thread(Profile, started daemon 140504402482944)>,
 <Thread(IO loop for distributed/deploy/tests/test_local.py::test_dont_select_closed_worker (call), started daemon 140503848826624)>,
 <Thread(Profile, started daemon 140504394090240)>,
 <paramiko.Transport at 0xf8785ab0 (cipher aes128-ctr, 128 bits) (active; 0 open channel(s))>,
 <paramiko.Transport at 0xdc71c070 (cipher aes128-ctr, 128 bits) (active; 0 open channel(s))>,
 <paramiko.Transport at 0xf879f430 (cipher aes128-ctr, 128 bits) (active; 0 open channel(s))>,
 <Timer(pytest_timeout distributed/tests/test_spill.py::test_weakref_cache[60-SupportsWeakRef-True], started 140505734280960)>]

Originally posted by @graingert in #6033 (comment)

@fjetter fjetter changed the title the LocalCluster isn't calling self._loop_runner.stop() on close Leaking thread in SpecCluster because LoopRunner is never closed Apr 20, 2022
@fjetter fjetter added the bug Something is broken label Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken
Projects
None yet
Development

No branches or pull requests

2 participants