Skip to content

max_parallel_requests has no effect on actual HTTP connection pool size #459

@przemekboruta

Description

@przemekboruta

Priority Level

Medium (Annoying but has workaround)

Describe the bug

Setting max_parallel_requests in ModelConfig (or ChatCompletionInferenceParams) has no effect on the underlying HTTP connection pool. The pool is silently capped at 100 concurrent connections regardless of the configured value, because httpx.Client ignores its limits parameter when a custom transport is provided, and RetryTransport creates its internal HTTPTransport with httpx's default limits.

This means a user who sets max_parallel_requests=300 expecting ~300 concurrent LLM requests will observe at most ~100 in practice.

Steps/Code to reproduce bug

import data_designer as dd

# Configure model with high parallelism
model_config = dd.ModelConfig(
    alias="my_model",
    model="your-model-name",
    inference_parameters=dd.ChatCompletionInferenceParams(
        max_parallel_requests=300,
    ),
)
...

# Verify what connection pool actually gets created
from data_designer.engine.models.clients.retry import create_retry_transport
rt = create_retry_transport(config=None, strip_rate_limit_codes=False)
print(rt._sync_transport._pool._max_connections)  # prints: 100, NOT 600 (= 2 * 300)

Expected behavior

600 # max(32, 2 * max_parallel_requests) = 2 * 300

Additional context

The bug spans three files and involves a silent parameter drop between layers.

1. http_model_client.py — limits are calculated correctly but passed to the wrong place

# data_designer/engine/models/clients/adapters/http_model_client.py

pool_max = max(_MIN_MAX_CONNECTIONS, _POOL_MAX_MULTIPLIER * max_parallel_requests)
pool_keepalive = max(_MIN_KEEPALIVE_CONNECTIONS, max_parallel_requests)
self._limits = lazy.httpx.Limits(          # calculated correctly
    max_connections=pool_max,
    max_keepalive_connections=pool_keepalive,
)

# ...later, on first request:
self._transport = create_retry_transport(self._retry_config, strip_rate_limit_codes=False)
self._client = lazy.httpx.Client(
    transport=self._transport,   # ← custom transport provided
    limits=self._limits,         # ← IGNORED by httpx when transport != None
    timeout=lazy.httpx.Timeout(self._timeout_s),
)

2. httpx.Client._init_transport silently ignores limits when a custom transport is provided

This is documented httpx behaviour: when transport is not None, the method returns it directly without applying limits:

# httpx source (v0.28.1)
def _init_transport(self, ..., limits=DEFAULT_LIMITS, transport=None) -> BaseTransport:
    if transport is not None:
        return transport       # limits never used
    return HTTPTransport(..., limits=limits)

3. RetryTransport creates its internal HTTPTransport with default limits

# httpx_retries/transport.py (v0.4.6)
class RetryTransport:
    def __init__(self, transport=None, retry=None):
        if transport is not None:
            self._sync_transport = transport ...
        else:
            self._sync_transport = httpx.HTTPTransport()        # ← no limits argument
            self._async_transport = httpx.AsyncHTTPTransport()  # ← no limits argument

httpx.HTTPTransport() with no arguments creates an httpcore.ConnectionPool with max_connections=100 (httpx 0.28.1 default), regardless of what was configured in ModelConfig.

Verified empirically

from httpx_retries import RetryTransport, Retry
import httpx

rt = RetryTransport(retry=Retry(total=3))
print(rt._sync_transport._pool._max_connections)   # 100

# httpx.Client ignores limits= when transport= is provided:
client = httpx.Client(
    transport=rt,
    limits=httpx.Limits(max_connections=600, max_keepalive_connections=300),
)
print(client._transport._sync_transport._pool._max_connections)  # still 100

Expected Behavior

Setting max_parallel_requests=N in ModelConfig should result in a connection pool that allows at least N (ideally 2*N as per the existing _POOL_MAX_MULTIPLIER constant) concurrent connections.


Actual Behavior

The connection pool is always limited to 100 concurrent connections (httpx's internal default), making max_parallel_requests values above ~100 have no effect on actual throughput.


Suggested Fix

Pass a pre-configured httpx.HTTPTransport (and AsyncHTTPTransport) into RetryTransport instead of letting it create its own with default limits:

# http_model_client.py — _get_sync_client()
def _get_sync_client(self) -> httpx.Client:
    with self._init_lock:
        if self._client is None:
            if self._transport is None:
                inner = lazy.httpx.HTTPTransport(limits=self._limits)  # ← pass limits here
                self._transport = create_retry_transport(
                    self._retry_config,
                    strip_rate_limit_codes=False,
                    transport=inner,                                    # ← pass to RetryTransport
                )
            self._client = lazy.httpx.Client(
                transport=self._transport,
                timeout=lazy.httpx.Timeout(self._timeout_s),
            )
        return self._client

This requires create_retry_transport to accept and forward an optional transport argument to RetryTransport(transport=..., retry=...), which httpx_retries already supports.

The same fix should be applied to _get_async_client using httpx.AsyncHTTPTransport.


Workaround

Until fixed, monkey-patch RetryTransport.__init__ before any model client is created:

import httpx
from httpx_retries import RetryTransport

_orig = RetryTransport.__init__

def _fixed(self, transport=None, retry=None):
    _orig(self, transport=transport, retry=retry)
    if transport is None:
        unlimited = httpx.Limits(max_connections=None, max_keepalive_connections=None)
        self._sync_transport = httpx.HTTPTransport(limits=unlimited)
        self._async_transport = httpx.AsyncHTTPTransport(limits=unlimited)

RetryTransport.__init__ = _fixed

Environment

Component Version
data-designer 0.5.4
data-designer-engine 0.5.4
httpx 0.28.1
httpx-retries 0.4.6
httpcore 1.0.9
Python 3.12.9
Platform macOS (darwin)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions