Skip to content

deadlock with TCPConnector limit after timeout #9670

@davidmanzanares

Description

Describe the bug

When using a limit in TCPConnector, timeouts can lead to a condition where new requests are not actually sent, resulting in "sticky" timeouts.

After debugging, I believe the problem occurs in this line:https://github.com/aio-libs/aiohttp/blob/v3.10.10/aiohttp/connector.py#L541

When a ValueError is raised there, that's the result of:

  • Awaiting the future failed, potentially due to a CancelledError: a timeout
  • _release_waiter method was called, and it tried to wake up a coroutine waiting for this connection. But since this coroutine also suffered the timeout, it won't proceed. Since it doesn't proceed, it won't wake up other potentially waiting coroutines.

If the number of waiters never get to zero, the cycle never ends, and no coroutine wakes up to proceed. All coroutines wake up with a CancelledError.

Proposed fix:
Replace the pass in the mentioned line with self._release_waiter(). This seems to resolve the issue by awakening another coroutine, which was the original intent of the previous self._release_waiter() call.

Another way to fix it might be to use asyncio.Semaphore with a context manager.

To Reproduce

  1. Start a small server script that returns slowly only sometimes. For example, run this with fastapi dev aio_bug_server.py:
from fastapi import FastAPI
import asyncio
from time import time

app = FastAPI()

sleep_time = 0.3
timestamp_last_change = time()
last_recv = time()

@app.get("/getme")
async def getme():
    global sleep_time, last_recv
    if time()-last_recv>8:
        # Go "up", forcing 0 sleep time if we don't know anymore about clients
        sleep_time = 0
    elif time()-timestamp_last_change>2:
        if sleep_time:
            sleep_time = 0
        else:
            sleep_time = 0.3
    print(f"recv {sleep_time=:.3f}")
    last_recv = time()

    await asyncio.sleep(sleep_time)
    print(f"fin {sleep_time=:.3f}")
    return {
        "asdf": "asdf",
    }
  1. Start a script that uses aiohttp to make concurrent requests to the server, and check the number of timeouts/ok results. For example:
import uvloop
uvloop.install()

import asyncio
import aiohttp
from time import time

timeouts = 0
errors = 0
oks = 0

async def do_request(session: aiohttp.ClientSession):
    global errors, timeouts, oks
    try:
        async with session.get(f"http://127.0.0.1:8000/getme") as resp:
            await resp.json()
        oks += 1
    except TimeoutError:
        timeouts += 1
    except BaseException as exc:
        print(f"UNEXPECTED EXCEPTION {exc=}")
        errors += 1



async def main():
    global timeouts, oks
    async with aiohttp.ClientSession(
                    connector=aiohttp.TCPConnector(limit=10),
                    timeout=aiohttp.ClientTimeout(total=0.2), 
                ) as session:
        tasks = set()
        done_reqs = 0
        i = 0
        t = time()
        last_ok = t
        while True:
            i+=1
            if i%100==0:
                dt = (time() - t)
                print(f"{done_reqs/dt:.1f}req/s {oks=} {timeouts=}")
                if oks:
                    last_ok = time()
                if time() - last_ok > 10:
                    # We hit the issue likely
                    # at this point the server should not have received anything in a while
                    print(f"ISSUE DETECTED")
                    t = time()
                    try:
                        async with session.get(f"http://127.0.0.1:8000/getme?q=a") as resp:
                            await resp.json()
                    except Exception as exc:
                        print(f"LAST REQUEST ERROR: EXC{exc=} {time()-t}")
                    for task in tasks:
                        await task
                    print(f"NEW REQUEST AFTER AWAITING ALL TASKS")
                    t = time()
                    try:
                        async with session.get(f"http://127.0.0.1:8000/getme?q=b") as resp:
                            await resp.json()
                    except Exception:
                        raise
                    finally:
                        print("FINAL DT:", time()-t)
                    return

                timeouts = 0
                oks=0

            if len(tasks) >= 200:
                done, tasks = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
                done_reqs += len(done)
            tasks.add(asyncio.create_task(do_request(session)))

asyncio.run(main())

Expected behavior

All requests should be sent, previous timeouts shouldn't change that.

Logs/tracebacks

no tracebacks

Python Version

$ python --version
Python 3.12.7

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.10.10
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: 
Author-email: 
License: Apache 2
Location: /home/dv/code/search/services/local/venv/lib/python3.12/site-packages
Requires: aiohappyeyeballs, aiosignal, attrs, frozenlist, multidict, yarl
Required-by:

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.1.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /home/dv/code/search/services/local/venv/lib/python3.12/site-packages
Requires: 
Required-by: aiohttp, yarl

propcache Version

$ python -m pip show propcache
Name: propcache
Version: 0.2.0
Summary: Accelerated property cache
Home-page: https://github.com/aio-libs/propcache
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache-2.0
Location: /home/dv/code/search/services/local/venv/lib/python3.12/site-packages
Requires: 
Required-by: yarl

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.17.1
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache-2.0
Location: /home/dv/code/search/services/local/venv/lib/python3.12/site-packages
Requires: idna, multidict, propcache
Required-by: aiohttp

OS

Linux

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct

Metadata

Assignees

No one assigned

    Labels

    bugreproducer: presentThis PR or issue contains code, which reproduce the problem described or clearly understandable STR

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions