Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using asyncio.shield hangs/deadlocks when used with sync middleware. #434

Closed
ttys0dev opened this issue Jan 14, 2024 · 10 comments
Closed

Using asyncio.shield hangs/deadlocks when used with sync middleware. #434

ttys0dev opened this issue Jan 14, 2024 · 10 comments

Comments

@ttys0dev
Copy link
Contributor

There is a deadlock bug which is demonstrated by this recently added test(added in #432) which is currently disabled. There was another recently fixed deadlock issue in #348 which I'm thinking may possibly be related/similar although I'm not entirely sure.

@pytest.mark.asyncio
@pytest.mark.skip(reason="deadlocks")
async def test_inner_shield_sync_middleware():
    """
    Tests that asyncio.shield is capable of preventing http.disconnect from
    cancelling a django request task when using sync middleware.

    Currently this tests is skipped as it causes a deadlock.
    """

    # Hypothetical Django scenario - middleware function is sync
    def sync_middleware():
        async_to_sync(async_view)()

    task_complete = False
    task_cancel_caught = False

    # Future that completes when subtask cancellation attempt is caught
    task_blocker = asyncio.Future()

    async def async_view():
        """Async view with a task that is shielded from cancellation."""
        nonlocal task_complete, task_cancel_caught, task_blocker
        task = asyncio.create_task(async_task())
        try:
            await asyncio.shield(task)
        except asyncio.CancelledError:
            task_cancel_caught = True
            task_blocker.set_result(True)
            await task
            task_complete = True

    task_executed = False

    # Future that completes after subtask is created
    task_started_future = asyncio.Future()

    async def async_task():
        """Async subtask that should not be canceled when parent is canceled."""
        nonlocal task_started_future, task_executed, task_blocker
        task_started_future.set_result(True)
        await task_blocker
        task_executed = True

    task_cancel_propagated = False

    async with ThreadSensitiveContext():
        task = asyncio.create_task(sync_to_async(sync_middleware)())
        await task_started_future
        task.cancel()
        try:
            await task
        except asyncio.CancelledError:
            task_cancel_propagated = True
        assert not task_cancel_propagated
        assert task_cancel_caught
        assert task_complete

    assert task_executed

When I run this broken test in a debugger the asyncio.CancelledError is incorrectly caught by this exception block and the test then hangs/deadlocks immediately after.

It's expected that the asyncio.CancelledError should instead be caught in this exception block which is what happens when using async middleware as demonstrated in this test which works as expected.

I'm guessing this issue has something to do with how asyncio.CancelledError exceptions propagate between sync_to_async and async_to_sync or something along those lines.

@andrewgodwin
Copy link
Member

I agree this is an issue, but I do not currently have the mental space to wander into that code and fix it; the exception propagation code is incredibly carefully put together and sensitive, and there's a chance it may not be possible to implement this at all without ruining other existing behaviour. If you want to have a go, though, be my guest.

@ttys0dev
Copy link
Contributor Author

If you want to have a go, though, be my guest.

Yeah, I've been experimenting a bit, I added an exception handler which was able to catch the initial cancellation exception but it's unclear how to properly identify subtasks through a sync_to_async and then through an async_to_sync transition.

I do see the subtasks that the cancellation needs to propagate to when I call asyncio.all_tasks() but I'm not sure how to go about say filtering those for only subtasks created by the function in the sync_to_async call. Are subtasks somehow tracked using context something like that?

@andrewgodwin
Copy link
Member

The only identifiers are kept as global dictionaries on the SyncToAsync and AsyncToSync classes, with a little bit of contextvars these days as well.

@ttys0dev
Copy link
Contributor Author

The only identifiers are kept as global dictionaries on the SyncToAsync and AsyncToSync classes

Is there already an identifier there that can be used to identify spawned tasks or would that need to be added?

@andrewgodwin
Copy link
Member

For SyncToAsync, it's not making asyncio.Task objects at all (if that's what you mean by tasks), it is making subthreads and/or reusing a parent thread that is already yielding via AsyncToSync.

They're identified using contextvars, but only if thread_sensitive mode is enabled; otherwise, it always makes a new subthread, and no mapping is kept because it's intrinsic in the await loop.run_in_executor call.

@ttys0dev
Copy link
Contributor Author

For SyncToAsync, it's not making asyncio.Task objects at all (if that's what you mean by tasks), it is making subthreads and/or reusing a parent thread that is already yielding via AsyncToSync.

So for reference if I run the failing test in debugger and set a breakpoint here and print asyncio.all_tasks() from the debug console I see the following tasks:

{<Task pending name='Task-1' coro=<test_inner_shield_sync_middleware() running at /Users/ttys0dev/asgiref/tests/test_sync.py:907> cb=[_run_until_complete_cb() at /usr/local/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py:180]>,
 <Task pending name='Task-3' coro=<AsyncToSync.main_wrap() running at /Users/ttys0dev/asgiref/asgiref/sync.py:321> wait_for=<Future pending cb=[shield.<locals>._outer_done_callback() at /usr/local/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py:922, Task.task_wakeup()]>>,
 <Task pending name='Task-4' coro=<test_inner_shield_sync_middleware.<locals>.async_task() running at /Users/ttys0dev/asgiref/tests/test_sync.py:895> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[shield.<locals>._inner_done_callback() at /usr/local/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py:905]>}

From that same breakpoint if I get the current task using asyncio.current_task() I can get the following task:

<Task pending name='Task-1' coro=<test_inner_shield_sync_middleware() running at /Users/ttys0dev/asgiref/tests/test_sync.py:907> cb=[_run_until_complete_cb() at /usr/local/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py:180]>

They're identified using contextvars, but only if thread_sensitive mode is enabled; otherwise, it always makes a new subthread, and no mapping is kept because it's intrinsic in the await loop.run_in_executor call.

So thread_sensitive is enabled for the failing test right?

I think the subtask that I need to propagate the cancellation to is showing up as Task-4 in asyncio.all_tasks(). Is there a way to identify that Task-4 is a subtask of Task-1 via contextvars in this case?

@andrewgodwin
Copy link
Member

I'm afraid I haven't looked at that code in over a year so it'll take me a couple of hours one evening to sit down and fully remember how it all works in order to correctly answer that question and review the pull request!

@ttys0dev
Copy link
Contributor Author

ttys0dev commented Jan 16, 2024

I think the subtask that I need to propagate the cancellation to is showing up as Task-4 in asyncio.all_tasks(). Is there a way to identify that Task-4 is a subtask of Task-1 via contextvars in this case?

By the way I think my PR effectively works by propagating the cancellation to Task-3 which then in turn cancels Task-4 which is a subtask of Task-3.

The idea with my PR is that it uses a list to track the tasks in the call stack across sync/async boundaries and then propagates the cancellation to the last task in the list which AFAIU will be the async task currently being executed and thus the one that needs to be cancelled first. The cancellation exception or completions(if say inhibited via shield) then propagates normally back through the call stack to the top level task.

The deadlock from my understanding happens due to a failure to propagate the task cancellation to subtasks across sync_to_async/async_to_sync boundaries.

@ttys0dev
Copy link
Contributor Author

The idea with my PR is that it uses a list to track the tasks in the call stack across sync/async boundaries and then propagates the cancellation to the last task in the list which AFAIU will be the async task currently being executed and thus the one that needs to be cancelled first. The cancellation exception or completions(if say inhibited via shield) then propagates normally back through the call stack to the top level task.

Actually this approach I used in my original PR didn't work correctly as it would propagate task cancellation through an asyncio.shield. I updated my PR to chain cancellation calls through the sync/async boundaries in a way that allows for asyncio.shield to work correctly. I also added a test that should catch this bug.

@ttys0dev
Copy link
Contributor Author

Fixed in #435.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants