Potential race condition in `Lock` #387

Tinche · 2021-11-12T23:40:38Z

Hello!

I'm trying to debug an issue we sometimes hit in production through httpx. We're not using anyio ourselves directly (it's regular asyncio), but httpx is based on it.

I'm not 100% sure the scenario I'm documenting here is the exact issue we're seeing, but there's a good chance. The symptoms we're seeing is that a lock instance gets stuck in the locked state with a finished task still set as the owner.

Here's the rough sketch, involving two tasks:

Task 1:

acquires the lock

Task 2:

tries acquiring the lock, fails, creates an event and starts waiting on it

Task 1:

cancels Task 2, releases the lock setting the lock _owner_task to Task 2, gets Task 2's event and sets it

Task 2:

wakes up with CancelledError on event.wait(), propagates it

The lock _owner_task is still set to Task 2, rendering the lock stuck.

In our production code it's not Task 1 cancelling Task 2, but aiohttp (but that's not relevant to the example).

Here's a Python snippet demonstrating the problem:

from asyncio import CancelledError, create_task, run, sleep
from contextlib import suppress

from anyio import Lock


async def main():
    lock = Lock()

    async def task_b():
        await sleep(0.1)  # Sleep to allow task_a to set up
        async with lock:
            pass

    t = create_task(task_b())

    async def task_a():
        async with lock:
            await sleep(0.1)  # Sleep to allow task_b to try locking
            t.cancel()
        await sleep(0.2)
        print(t.done())  # True
        print(lock._owner_task)  # TaskInfo(id=4337401360, name='Task-2')

    await task_a()
    with suppress(CancelledError):
        await t


run(main())

The lock should end up in a free state, but it gets stuck being owned by task_b.

The text was updated successfully, but these errors were encountered:

agronholm · 2021-11-13T14:25:29Z

What seems to happen here is:

Task A acquires the lock and goes to sleep
Task B awakens from sleep, tries to acquire lock, starts waiting for the event because it could not acquire the lock
Task A schedules a cancellation for task B
Task A releases the lock, transferring ownership to task B
Task B is rescheduled with CancelledError raised in await event.wait()
Task B tries to clean up but because it's no longer waiting on the lock (as it owns the lock now), the cleanup block does nothing

Adding a new check to the cleanup block where it calls release() when the lock is owned by the current task seems to solve the issue. I'll make a PR for this.

agronholm · 2021-11-13T14:27:29Z

As a side note, the Semaphore class may have a similar problem.

Tinche · 2021-11-13T15:31:39Z

I think that's exactly it. Except in my case it's not task A doing the cancelling but something else, but that's immaterial here.

Fixes #387.

agronholm · 2021-11-13T15:33:18Z

I think that's exactly it. Except in my case it's not task A doing the cancelling but something else, but that's immaterial here.

Task A contains this code: t.cancel()
Therefore it is task A that is doing the cancellation.

Tinche · 2021-11-13T15:52:13Z

I think that's exactly it. Except in my case it's not task A doing the cancelling but something else, but that's immaterial here.

Task A contains this code: t.cancel() Therefore it is task A that is doing the cancellation.

In the example I posted, yes. In the incidents on our production environment, no.

agronholm · 2021-11-13T15:55:31Z

I see. Anyway, there is a PR with the appropriate regression tests fixing the problem for both Lock and Semaphore, so as soon as I can get a review, I will merge it.

Tinche · 2021-11-13T15:56:10Z

Thank you!

…388) Fixes #387.

Tinche · 2021-11-16T11:40:26Z

Thanks for the quick response on this!

agronholm · 2021-11-16T11:41:26Z

I will cut a release once the contextvar propagation issue is solved (needs a PR that is almost ready).

Relates to #387.

Fixes #387.

It was decided to add workarounds for trio rather than wait for the relevant PR to be merged and a new version released. Fixes #387.

agronholm added the bug Something isn't working label Nov 13, 2021

agronholm added a commit that referenced this issue Nov 13, 2021

Fixed race condition when cancelling acquire() in Lock or Semaphore

6903008

Fixes #387.

agronholm mentioned this issue Nov 14, 2021

Fixed race condition when cancelling acquire() in Lock or Semaphore #388

Merged

agronholm closed this as completed in #388 Nov 16, 2021

agronholm added a commit that referenced this issue Nov 16, 2021

Fixed race condition when cancelling acquire() in Lock or Semaphore (#…

326901c

…388) Fixes #387.

agronholm added a commit that referenced this issue Nov 17, 2021

Copy the context when calling into a worker thread

c50242b

Relates to #387.

agronholm added a commit that referenced this issue Nov 17, 2021

Propagate context to and from worker threads

5de4ec1

Fixes #387.

agronholm added a commit that referenced this issue Nov 21, 2021

Added propagation of contextvars to/from threads (#390)

a170da3

It was decided to add workarounds for trio rather than wait for the relevant PR to be merged and a new version released. Fixes #387.

mscheifer mentioned this issue Mar 17, 2022

Got an "Attempted to acquire an already held Lock" error when request is quickly cancelled by a small timeout in parallel async tasks encode/httpx#1885

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential race condition in `Lock` #387

Potential race condition in `Lock` #387

Tinche commented Nov 12, 2021 •

edited

Loading

agronholm commented Nov 13, 2021

agronholm commented Nov 13, 2021

Tinche commented Nov 13, 2021

agronholm commented Nov 13, 2021

Tinche commented Nov 13, 2021

agronholm commented Nov 13, 2021

Tinche commented Nov 13, 2021

Tinche commented Nov 16, 2021

agronholm commented Nov 16, 2021

Potential race condition in Lock #387

Potential race condition in Lock #387

Comments

Tinche commented Nov 12, 2021 • edited Loading

agronholm commented Nov 13, 2021

agronholm commented Nov 13, 2021

Tinche commented Nov 13, 2021

agronholm commented Nov 13, 2021

Tinche commented Nov 13, 2021

agronholm commented Nov 13, 2021

Tinche commented Nov 13, 2021

Tinche commented Nov 16, 2021

agronholm commented Nov 16, 2021

Potential race condition in `Lock` #387

Potential race condition in `Lock` #387

Tinche commented Nov 12, 2021 •

edited

Loading