Skip to content

ApiClient creates a new httpx.AsyncClient per JWKS / OIDC fetch with no timeout and no single-flight, causing "Unknown auth error" storms under load #85

@jackton1

Description

@jackton1

Checklist

  • I have looked into the Readme and Examples, and have not found a suitable solution or answer.
  • I have looked into the API documentation and have not found a suitable solution or answer.
  • I have searched the issues and have not found a suitable solution or answer.
  • I have searched the Auth0 Community forums and have not found a suitable solution or answer.
  • I agree to the terms within the Auth0 Code of Conduct.

Description

ApiClient._fetch_jwks and _fetch_oidc_metadata (via
utils.fetch_jwks / utils.fetch_oidc_metadata) construct a brand-new
httpx.AsyncClient() on every cache miss with no explicit timeout
configured, and there is no single-flight protection around the
refetch path. Combined, these turn routine cache expiry into a
self-inflicted outage under any non-trivial concurrency.

Reproduction

  1. Stand up a FastAPI service that calls ApiClient.verify_request
    on every authenticated route.
  2. Drive ≥ ~30 RPS of authenticated requests against it (we hit it
    on a 4 vCPU ECS Fargate task during a k6 load test).
  3. Wait for the in-memory JWKS cache to expire — by default Auth0
    returns Cache-Control: max-age=600, so this happens every ~10
    minutes under steady load.
  4. Observe a sudden burst of httpx.ConnectTimeout chained out of
    ApiClient._fetch_jwks, surfacing to callers as opaque "Unknown
    auth error" 5xx responses on every authenticated route until the
    herd subsides. Sentry trace shows
    auth0_api_python.errors.UnknownAuth0Exception
    httpx.ConnectTimeout.

Additional context

In src/auth0_api_python/utils.py

async def fetch_jwks(jwks_uri, custom_fetch=None):
    ...
    async with httpx.AsyncClient() as client:   # 1
        resp = await client.get(jwks_uri)       # 2
        ...
  1. No connection pooling across calls. A fresh client is created
    and torn down per fetch. Every cache miss = a fresh TCP + TLS
    handshake to https://<tenant>.auth0.com/. Under load this
    exhausts ephemeral source-port budget on the host and slows
    everything else on the box.
  2. No explicit timeout. httpx.AsyncClient() with no timeout=
    uses httpx's default 5-second connect/read/write/pool budget. On a
    stressed event loop or a slow Auth0 region that 5s budget is
    routinely blown, raising httpx.ConnectTimeout /
    httpx.ReadTimeout. Those bubble up into _fetch_jwks and the
    caller sees ConnectTimeout chained to UnknownAuth0Exception
    not a 401, not a 503, just an opaque "unknown auth error".
  3. No single-flight on refetch. When the in-memory cache expires
    (InMemoryCache.get returns None), every concurrent request
    that reaches _fetch_jwks simultaneously fires its own outbound
    JWKS fetch. N requests in flight at the moment of expiry = N
    concurrent JWKS calls to Auth0. Auth0 throttles some of them, the
    others time out per (1) and (2), and any request that lost the
    race fails auth.

The same three problems apply verbatim to fetch_oidc_metadata and
_fetch_oidc_metadata.

auth0-api-python version

1.0.0b8

Python version

3.11

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions