Don't block event loop in `Worker.close` waiting for executor to shut down #6239

gjoseph92 · 2022-04-28T20:03:01Z

Lines 1517 to 1521 in 1cbee7f

    
           if isinstance(executor, ThreadPoolExecutor): 
        
               executor._work_queue.queue.clear() 
        
               executor.shutdown(wait=executor_wait, timeout=timeout) 
        
           else: 
        
               executor.shutdown(wait=executor_wait)

shutdown is a blocking call that generally calls join on the thread/process. If the executor takes a long time to shut down (say there's a function running that's itself blocked on something), this can block the worker event loop for 30s, or whatever the timeout is.

This is particularly important for our test suite, because the worker event loop is actually the only event loop.

I discovered this trying to run a test something like

event = distributed.Event()
f = client.submit(event.wait, workers=[a.address])
t = asyncio.create_task(a.close())
await asyncio.sleep(1)
await event.set()  # this hangs!
# because the whole event loop is blocked waiting for the ThreadPoolExecutor to shut down,
# which is waiting for the event to be set... which is waiting for the event loop to be free

This probably won't affect real-life clusters that much, but would be good to clean up.

cc @graingert

The text was updated successfully, but these errors were encountered:

mrocklin · 2022-04-28T20:10:06Z

See also #6091 maybe

…

On Thu, Apr 28, 2022 at 3:03 PM Gabe Joseph ***@***.***> wrote: https://github.com/dask/distributed/blob/1cbee7f18717f060c65b2ac44445745135c8dd46/distributed/worker.py#L1517-L1521 shutdown is a blocking call that generally calls join on the thread/process. If the executor takes a long time to shut down (say there's a function running that's itself blocked on something), this can block the worker event loop for 30s, or whatever the timeout is. This is particularly important for our test suite, because the worker event loop is actually the *only* event loop. I discovered this trying to run a test something like event = distributed.Event()f = client.submit(event.wait, workers=[a.address])t = asyncio.create_task(a.close())await asyncio.sleep(1)await event.set() # this hangs!# because the whole event loop is blocked waiting for the ThreadPoolExecutor to shut down,# which is waiting for the event to be set... which is waiting for the event loop to be free This *probably* won't affect real-life clusters that much, but would be good to clean up. cc @graingert <https://github.com/graingert> — Reply to this email directly, view it on GitHub <#6239>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTAGMPXPA65JR2XKCLTVHLVIBANCNFSM5UTTGKJA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

gjoseph92 · 2022-04-28T20:13:30Z

@mrocklin thanks, #6091 would fix this.

…6091) This reinstates #5883 which was reverted in #5961 / #5932 I could confirm the flakyness of `test_missing_data_errant_worker` after this change and am reasonably certain this is caused by #5910 which causes a closing worker to be restarted such that, even after `Worker.close` is done, the worker still appears to be partially up. The only reason I can see why this change promotes this behaviour is that if we no longer block the event loop while the threadpool is closing, this opens a much larger window for incoming requests to come in and being processed while close is running. Closes #6239

gjoseph92 added bug Something is broken tests Unit tests and/or continuous integration labels Apr 28, 2022

gjoseph92 mentioned this issue Apr 28, 2022

Test retire workers deadlock #6240

Merged

2 tasks

fjetter mentioned this issue Apr 29, 2022

Unblock event loop while waiting for ThreadpoolExecutor to shut down #6091

Merged

mrocklin closed this as completed in #6091 Apr 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't block event loop in `Worker.close` waiting for executor to shut down #6239

Don't block event loop in `Worker.close` waiting for executor to shut down #6239

gjoseph92 commented Apr 28, 2022

mrocklin commented Apr 28, 2022 via email

gjoseph92 commented Apr 28, 2022

Don't block event loop in Worker.close waiting for executor to shut down #6239

Don't block event loop in Worker.close waiting for executor to shut down #6239

Comments

gjoseph92 commented Apr 28, 2022

mrocklin commented Apr 28, 2022 via email

gjoseph92 commented Apr 28, 2022

Don't block event loop in `Worker.close` waiting for executor to shut down #6239

Don't block event loop in `Worker.close` waiting for executor to shut down #6239