Allow nesting sync_to_async via asyncio.wait_for #367

jthorniley · 2023-01-11T15:31:10Z

Change the order of fallbacks used by SyncToAsync to find the appropriate executor for sync code, so it prefers to use AsyncToSync.loop_thread_executors rather than thread_sensitive_context. Add test case to demonstrate problem.

Resolves #348

andrewgodwin

These kind of changes are incredibly hard to fully review, but it seems reasonable and also passes all the current tests (which are pretty good at catching bad behaviour!).

carltongibson · 2023-01-12T10:36:44Z

Hi @jthorniley

Resolves #348

I'm still looking at this. There are two issues it looks like: even with the patch here, @brownan's example from #348 still hits a deadlock.

carltongibson · 2023-01-12T11:06:40Z

The test-case doesn't quite match the behaviour of the example running under Daphne.

In the test the executors selected in SyncToAsync.call are:

<concurrent.futures.thread.ThreadPoolExecutor object at 0x1064a4730> # Wrapping middleware — this is _max_workers = 1
<asgiref.current_thread_executor.CurrentThreadExecutor object at 0x1064a0850> # in async view

Under Daphne we get the same concurrent.futures.thread.ThreadPoolExecutor selected twice (which causes the timeout/deadlock, depending on which option is selected.)

jthorniley · 2023-03-20T17:58:04Z

Hi @carltongibson thanks for your comments, sorry I haven't had time to look at this for a while.

I've tried an alternative approach which I think makes more sense. Instead of changing how we look up the executor, set the reference to the current thread executor using the contextvars lib - since Python 3.7 is the minimum supported version this seems like its probably an option when it wasn't before.

The asgiref local has to be maintained across sync_to_async/async_to_sync conversions using the launch_map lookups, but these lookup are based on asyncio Task ID, which changes when you use asyncio.gather and friends, whereas contextvars is generally "just works" in asyncio code. Also, we already make the effort to transfer context to the new thread in AsyncToSync so there isn't really much extra logic to take care of.

I've tried this implementation with the example ASGI code in issue #348 - and it seems to work (GET request completes and the page renders "Hello world"). I've tried daphne and uvicorn and they both seem ok with this.

One problem is that one of the deadlock unit tests now fails, I think because there isn't a deadlock anymore! (It just actually works). I think its probably safe to just remove the test?

asgiref/sync.py

jthorniley · 2023-03-20T18:25:20Z

I'd comment that also, while this "fixes" the current thread executor issue, it also "breaks" asgiref.Local, which will not be maintained across a stack of sync_to_async/async_to_sync if theres a asyncio.wait_for etc somewhere in the middle. I'm not sure if thats really fixable except by either reimplementing Local with contextvars, or just deprecating it in favour of contextvars (Django seems to use it in a few fairly disparate places)

andrewgodwin · 2023-03-20T21:51:37Z

The intention was always to implement Local via contextvars once we hit the minimum version to support it, so I wouldn't be against doing that.

jthorniley · 2023-03-21T09:33:36Z

@andrewgodwin

The intention was always to implement Local via contextvars once we hit the minimum version to support it, so I wouldn't be against doing that.

That makes sense, contextvars is python 3.7 which seems to be now the min supported version, I did try to actually do that first and I have an implementation that seems to work, but I wasnt too happy with the way it handles deleting keys, see here:

https://github.com/django/asgiref/compare/main...jthorniley:asgiref:local-contextvar-draft?expand=1

(Comments in the diff, I won't repeat but tldr I'm worried I've made a memory leak)

carltongibson · 2023-03-21T11:10:35Z

Hi @jthorniley — thanks for swinging back to this.

For life reasons, it's not likely I'll have capacity to look at this closely soon, so let me just comment quickly not to block:

One problem is that one of the deadlock unit tests now fails, I think because there isn't a deadlock anymore! (It just actually works). I think it's probably safe to just remove the test?

Sounds right 😅

The intention was always to implement Local via contextvars once we hit the minimum version to support it, so I wouldn't be against doing that.

That makes sense, contextvars is python 3.7 which seems to be now the min supported version...

This would be great IMO! I briefly glanced at this back last year, but didn't get to progress it. If you could push it through, super! 🎁

There's even a comment about this:

asgiref/asgiref/local.py

Lines 29 to 30 in 0357158

    
               This doesn't use contextvars as it needs to support 3.6. Once it can support 
        
               3.7 only, we can then reimplement the storage more nicely.

jthorniley · 2023-03-22T17:18:23Z

Thanks both for the comments. I've updated the PR to use contextvars as an implementation for Local, and reverted the logic that keeps track of the CurrentThreadExecutor to use that new Local implementation rather than a plain contextvar, as I found in testing that plain contextvars are not thread safe when you are passing them between threads as is the case here.

Anyway, I think this implementation is fairly solid from what testing I've managed to do.

andrewgodwin

You have no idea how delighted I am that this is net-deletion of code! From what I can see it looks good, and honestly I mostly place my faith in the tests for this sort of thing anyway.

If you think this is ready to merge, I will happily land it and we can get around to a new minor release of asgiref.

andrewgodwin · 2023-03-25T17:23:58Z

Talking of tests... those and the lint failures will need fixing first!

jthorniley · 2023-03-26T07:22:33Z

The test failure was a good catch - the state was not cleaned up properly so it only occurred on the full test run rather than running that test individually.

Lint issue also fixed.

Whether this is ready to merge... as you say its hard to assess this sort of thing beyond the tests. I'm tempted to wait until I've rolled out this branch at work on Monday just to give some extra confidence from a "real-world" test (I can't see what would go wrong, and its passed all our CI). I will update then.

andrewgodwin · 2023-03-26T18:03:06Z

Alright, sounds like a good plan!

andrewgodwin · 2023-03-31T19:39:25Z

@jthorniley Did you manage to get any testing of this in this week?

jthorniley · 2023-04-01T12:14:14Z

@andrewgodwin yes and no I'm afraid... We rolled it out but had to roll back due to (unrelated to this change) infrastructure issues and haven't managed to redo this. So yes, this still looks fine still as far as I can see, but didn't get a proper test really.

andrewgodwin · 2023-04-01T16:40:18Z

Ah, no problem - let's wait a bit longer to see if you can give it some more burn-in time. Hopefully the other issues are easy to fix!

jthorniley · 2023-04-05T16:10:57Z

Ok, we've tried rolling out again and actually I think there are some issues with this branch so unfortunately it wouldn't be good to merge now.

Using this change (with a Django/ASGI app) we end up with too many database connections getting created, I think because they are held in a local by Django and the behaviour has changed in some unfortunate way (hard to track down exactly).

Additionally (trying to track down something more concrete) I found that running the current Django tests with this asgiref branch results in a failure:

./runtests.py asgi
Testing against Django installed in '/Users/james/repos/django/.venv/lib/python3.10/site-packages/django' with up to 8 processes
Found 16 test(s).
System check identified no issues (0 silenced).
..F.............
======================================================================
FAIL: test_concurrent_async_uses_multiple_thread_pools (asgi.tests.ASGITest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/james/repos/asgiref/asgiref/sync.py", line 235, in __call__
    return call_result.result()
  File "/Users/james/.pyenv/versions/3.10.7/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/Users/james/.pyenv/versions/3.10.7/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/james/repos/asgiref/asgiref/sync.py", line 297, in main_wrap
    result = await self.awaitable(*args, **kwargs)
  File "/Users/james/repos/django/tests/asgi/tests.py", line 339, in test_concurrent_async_uses_multiple_thread_pools
    self.assertEqual(response_start["status"], 200)
AssertionError: 500 != 200

----------------------------------------------------------------------
Ran 16 tests in 6.927s

FAILED (failures=1)

I still think this approach (changing to contextvars for the implementation of Local) makes sense, but these issues need to be resolved.

andrewgodwin · 2023-04-05T23:24:30Z

Hmm, well that's annoying, but not entirely unexpected. I can only presume that the behaviour around synchronous contexts is different enough to make it break - contextvars inherit in a different way to what Local currently does.

jthorniley · 2023-09-13T17:14:20Z

I think this is now a viable solution.

I'll just note that there's a separate issue #399 which is also directly affected by this (i.e. this essentially does that).

So this went stale as I got a bit stuck but I've had time to work on it, so I've rebased and addressed a few issues:

Local(thread_critical=True)

The behaviour of Local(thread_critical=True) has a critical subtlety which I've added a test for in test_local test_thread_critical_local_not_context_dependent_in_sync_thread()

Essentially, when Local(thread_critical=True) behaves more or less identically to threading.local() i.e. "global within the current thread", rather than simulating the contextvar behaviour of being local to a specific call stack. This is important because django uses a Local(thread_critical=True) to store the database connection reference at the point at which it is created.

In an Django ASGIHandler server, there will be one sync thread per request, and it will get a DB connection initialised local to that thread (typically) somewhere inside the async view code, wrapped in a sync_to_async. However, the connection is only closed using a callback attached to the request_finished signal. As a result, its important that the callbacks from the request_finished signal see the same Local() data as the inner sync_to_async code (provided its running in the same thread), otherwise DB connections never get closed.

Its easy to confirm that test_thread_critical_local_not_context_dependent_in_sync_thread() is just testing the existing behaviour of Local - i.e. it passes if copied onto current main branch.

ApplicationCommunicator

The change in behaviour of Local means that a Local is now preserved even when using asyncio.create_task- in the case of ApplicationCommunicator it was actually a benefit that it wasn't - this was the cause of the django test failure that I had above (now resolved).

When the django test code run an async def test_x function, it wraps it in async_to_sync and as a result any sync_to_async wrapped code inside that function will run in the original sync test thread.

However that means when using ApplicationCommunicator any sync_to_async wrapped code inside the ASGI handler will end up running in the single top-level test thread, which is really not what we want - ApplicationCommunicator should simulate the behaviour if you'd started the ASGI server straight from a "normal" async thread - i.e. the first sync_to_async inside an ASGI request should create its own Thread to run in.

The solution was to create a new contextvars.Context() inside ApplicationCommunicator so that the ASGI server doesn't try to run code in the top-level test thread.

andrewgodwin · 2023-09-14T22:28:36Z

I really appreciate your work on this!

Looks like the tests are failing in CI, at least - different environment issue to you locally, or something else, you think?

jthorniley · 2023-09-15T08:51:03Z

@andrewgodwin thanks for running it - yes just taking a quick look:

I think CI is running mypy which I didn't run locally, but I'm sure I can tidy that up.

Additionally used py3.11 locally - and the tests do work there in CI, i'm guessing some API changes between 3.10 - 3.11 that I didn't account for, I'll investigate that today.

andrewgodwin · 2023-09-18T18:53:14Z

Thanks - looks great now. I'll merge it in, and hopefully when we make a release with this in we won't get any immediate screaming! (The test suite looks good and pretty decently covers this, but there's always the chance of uncaught side effects)

jthorniley · 2023-09-19T08:20:01Z

@andrewgodwin thanks!

jthorniley mentioned this pull request Jan 11, 2023

sync_to_async calls hang / deadlock when called in a task #348

Closed

andrewgodwin approved these changes Jan 11, 2023

View reviewed changes

jthorniley force-pushed the current-thread-executor-test branch from 1650d42 to a4a2216 Compare March 20, 2023 17:42

jthorniley commented Mar 20, 2023

View reviewed changes

asgiref/sync.py Outdated Show resolved Hide resolved

carltongibson requested a review from andrewgodwin March 22, 2023 19:09

andrewgodwin approved these changes Mar 25, 2023

View reviewed changes

jthorniley added 4 commits September 13, 2023 16:50

Rewrite asgiref.local to use contextvars

73d8d91

Update docstring

c9122a5

Revert exception message

f9ede33

add a comment

b71b833

jthorniley force-pushed the current-thread-executor-test branch from ff83f49 to b71b833 Compare September 13, 2023 16:51

jthorniley added 2 commits September 15, 2023 09:54

mypy

1187cab

Use contextvars API compatible down to python 3.7

383dff3

andrewgodwin approved these changes Sep 18, 2023

View reviewed changes

andrewgodwin merged commit d920c3c into django:main Sep 18, 2023
6 checks passed

andersk mentioned this pull request Mar 21, 2024

Upgrade Python requirements zulip/zulip#29384

Merged

legau mentioned this pull request Mar 21, 2024

Revert removing of SyncToAsync.get_current_task #454

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow nesting sync_to_async via asyncio.wait_for #367

Allow nesting sync_to_async via asyncio.wait_for #367

jthorniley commented Jan 11, 2023

andrewgodwin left a comment

carltongibson commented Jan 12, 2023 •

edited

Loading

carltongibson commented Jan 12, 2023 •

edited

Loading

jthorniley commented Mar 20, 2023

jthorniley commented Mar 20, 2023

andrewgodwin commented Mar 20, 2023

jthorniley commented Mar 21, 2023

carltongibson commented Mar 21, 2023

jthorniley commented Mar 22, 2023

andrewgodwin left a comment

andrewgodwin commented Mar 25, 2023

jthorniley commented Mar 26, 2023

andrewgodwin commented Mar 26, 2023

andrewgodwin commented Mar 31, 2023

jthorniley commented Apr 1, 2023

andrewgodwin commented Apr 1, 2023

jthorniley commented Apr 5, 2023

andrewgodwin commented Apr 5, 2023

jthorniley commented Sep 13, 2023

andrewgodwin commented Sep 14, 2023

jthorniley commented Sep 15, 2023

andrewgodwin commented Sep 18, 2023

jthorniley commented Sep 19, 2023

Allow nesting sync_to_async via asyncio.wait_for #367

Allow nesting sync_to_async via asyncio.wait_for #367

Conversation

jthorniley commented Jan 11, 2023

andrewgodwin left a comment

Choose a reason for hiding this comment

carltongibson commented Jan 12, 2023 • edited Loading

carltongibson commented Jan 12, 2023 • edited Loading

jthorniley commented Mar 20, 2023

jthorniley commented Mar 20, 2023

andrewgodwin commented Mar 20, 2023

jthorniley commented Mar 21, 2023

carltongibson commented Mar 21, 2023

jthorniley commented Mar 22, 2023

andrewgodwin left a comment

Choose a reason for hiding this comment

andrewgodwin commented Mar 25, 2023

jthorniley commented Mar 26, 2023

andrewgodwin commented Mar 26, 2023

andrewgodwin commented Mar 31, 2023

jthorniley commented Apr 1, 2023

andrewgodwin commented Apr 1, 2023

jthorniley commented Apr 5, 2023

andrewgodwin commented Apr 5, 2023

jthorniley commented Sep 13, 2023

Local(thread_critical=True)

ApplicationCommunicator

andrewgodwin commented Sep 14, 2023

jthorniley commented Sep 15, 2023

andrewgodwin commented Sep 18, 2023

jthorniley commented Sep 19, 2023

carltongibson commented Jan 12, 2023 •

edited

Loading

carltongibson commented Jan 12, 2023 •

edited

Loading