Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot call worker_client() from an asynchronous function #5513

Closed
orf opened this issue Nov 10, 2021 · 3 comments · Fixed by #7844 · May be fixed by #6059
Closed

Cannot call worker_client() from an asynchronous function #5513

orf opened this issue Nov 10, 2021 · 3 comments · Fixed by #7844 · May be fixed by #6059

Comments

@orf
Copy link
Contributor

orf commented Nov 10, 2021

What happened:

Calling worker_client() from an async function throws an exception

What you expected to happen:

It should not throw an exception

Minimal Complete Verifiable Example:

from dask.distributed import worker_client

async def entrypoint():
    with worker_client() as client:
        return None


if __name__ == '__main__':
    from distributed import Client
    with Client() as client:
        client.submit(entrypoint).result()

This errors with:

distributed.worker - WARNING - Compute Failed
Function:  entrypoint
args:      ()
kwargs:    {}
Exception: 'AttributeError("\'_thread._local\' object has no attribute \'start_time\'")'

Traceback (most recent call last):
  File "foo/deadlock.py", line 48, in <module>
    client.submit(entrypoint).result()
  File "barpython3.9/site-packages/distributed/client.py", line 233, in result
    raise exc.with_traceback(tb)
  File "foo/deadlock.py", line 41, in entrypoint
    with worker_client() as client:
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "bar/python3.9/site-packages/distributed/worker_client.py", line 55, in worker_client
    duration = time() - thread_state.start_time
AttributeError: '_thread._local' object has no attribute 'start_time'

Anything else we need to know?:

Most likely related to #5485

Environment:

  • Dask version: Latest, tested on previous versions
  • Python version: 3.9
  • Operating System: MacOS
  • Install method (conda, pip, source): Pip
@jrbourbeau
Copy link
Member

Thanks for reporting @orf, I'm able to reproduce. cc @gjoseph92 for visibility

@orf
Copy link
Contributor Author

orf commented Nov 10, 2021

I'm actually quite confused about how asynchronous functions are scheduled. We needed to use worker_client to launch tasks from other tasks, but I cannot seem to deadlock Dask when running async tasks even without worker_client when the same synchronous version of the task will quickly be deadlocked.

Does this mean that workers run more than 1 async task concurrently, even if it's configured with "threads=1"? If so is there an upper bound of async tasks that can be run?

If so I guess worker_client isn't needed as seceding isn't needed.

@jcrist
Copy link
Member

jcrist commented Nov 30, 2021

Does this mean that workers run more than 1 async task concurrently, even if it's configured with "threads=1"? If so is there an upper bound of async tasks that can be run?

Yes, there is currently no bound to the number of concurrent async tasks (async tasks are assumed to be lightweight). We could add a configurable limit here if there's a good use case for one. But yeah, launching tasks from an async task doesn't run the same risk of a deadlock that launching from a synchronous task does (since the async task doesn't occupy a thread).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants