-
Notifications
You must be signed in to change notification settings - Fork 87
Description
Priority Level
Medium (Annoying but has workaround)
Describe the bug
Setting max_parallel_requests in ModelConfig (or ChatCompletionInferenceParams) has no effect on the underlying HTTP connection pool. The pool is silently capped at 100 concurrent connections regardless of the configured value, because httpx.Client ignores its limits parameter when a custom transport is provided, and RetryTransport creates its internal HTTPTransport with httpx's default limits.
This means a user who sets max_parallel_requests=300 expecting ~300 concurrent LLM requests will observe at most ~100 in practice.
Steps/Code to reproduce bug
import data_designer as dd
# Configure model with high parallelism
model_config = dd.ModelConfig(
alias="my_model",
model="your-model-name",
inference_parameters=dd.ChatCompletionInferenceParams(
max_parallel_requests=300,
),
)
...
# Verify what connection pool actually gets created
from data_designer.engine.models.clients.retry import create_retry_transport
rt = create_retry_transport(config=None, strip_rate_limit_codes=False)
print(rt._sync_transport._pool._max_connections) # prints: 100, NOT 600 (= 2 * 300)Expected behavior
600 # max(32, 2 * max_parallel_requests) = 2 * 300
Additional context
The bug spans three files and involves a silent parameter drop between layers.
1. http_model_client.py — limits are calculated correctly but passed to the wrong place
# data_designer/engine/models/clients/adapters/http_model_client.py
pool_max = max(_MIN_MAX_CONNECTIONS, _POOL_MAX_MULTIPLIER * max_parallel_requests)
pool_keepalive = max(_MIN_KEEPALIVE_CONNECTIONS, max_parallel_requests)
self._limits = lazy.httpx.Limits( # calculated correctly
max_connections=pool_max,
max_keepalive_connections=pool_keepalive,
)
# ...later, on first request:
self._transport = create_retry_transport(self._retry_config, strip_rate_limit_codes=False)
self._client = lazy.httpx.Client(
transport=self._transport, # ← custom transport provided
limits=self._limits, # ← IGNORED by httpx when transport != None
timeout=lazy.httpx.Timeout(self._timeout_s),
)2. httpx.Client._init_transport silently ignores limits when a custom transport is provided
This is documented httpx behaviour: when transport is not None, the method returns it directly without applying limits:
# httpx source (v0.28.1)
def _init_transport(self, ..., limits=DEFAULT_LIMITS, transport=None) -> BaseTransport:
if transport is not None:
return transport # limits never used
return HTTPTransport(..., limits=limits)3. RetryTransport creates its internal HTTPTransport with default limits
# httpx_retries/transport.py (v0.4.6)
class RetryTransport:
def __init__(self, transport=None, retry=None):
if transport is not None:
self._sync_transport = transport ...
else:
self._sync_transport = httpx.HTTPTransport() # ← no limits argument
self._async_transport = httpx.AsyncHTTPTransport() # ← no limits argumenthttpx.HTTPTransport() with no arguments creates an httpcore.ConnectionPool with max_connections=100 (httpx 0.28.1 default), regardless of what was configured in ModelConfig.
Verified empirically
from httpx_retries import RetryTransport, Retry
import httpx
rt = RetryTransport(retry=Retry(total=3))
print(rt._sync_transport._pool._max_connections) # 100
# httpx.Client ignores limits= when transport= is provided:
client = httpx.Client(
transport=rt,
limits=httpx.Limits(max_connections=600, max_keepalive_connections=300),
)
print(client._transport._sync_transport._pool._max_connections) # still 100Expected Behavior
Setting max_parallel_requests=N in ModelConfig should result in a connection pool that allows at least N (ideally 2*N as per the existing _POOL_MAX_MULTIPLIER constant) concurrent connections.
Actual Behavior
The connection pool is always limited to 100 concurrent connections (httpx's internal default), making max_parallel_requests values above ~100 have no effect on actual throughput.
Suggested Fix
Pass a pre-configured httpx.HTTPTransport (and AsyncHTTPTransport) into RetryTransport instead of letting it create its own with default limits:
# http_model_client.py — _get_sync_client()
def _get_sync_client(self) -> httpx.Client:
with self._init_lock:
if self._client is None:
if self._transport is None:
inner = lazy.httpx.HTTPTransport(limits=self._limits) # ← pass limits here
self._transport = create_retry_transport(
self._retry_config,
strip_rate_limit_codes=False,
transport=inner, # ← pass to RetryTransport
)
self._client = lazy.httpx.Client(
transport=self._transport,
timeout=lazy.httpx.Timeout(self._timeout_s),
)
return self._clientThis requires create_retry_transport to accept and forward an optional transport argument to RetryTransport(transport=..., retry=...), which httpx_retries already supports.
The same fix should be applied to _get_async_client using httpx.AsyncHTTPTransport.
Workaround
Until fixed, monkey-patch RetryTransport.__init__ before any model client is created:
import httpx
from httpx_retries import RetryTransport
_orig = RetryTransport.__init__
def _fixed(self, transport=None, retry=None):
_orig(self, transport=transport, retry=retry)
if transport is None:
unlimited = httpx.Limits(max_connections=None, max_keepalive_connections=None)
self._sync_transport = httpx.HTTPTransport(limits=unlimited)
self._async_transport = httpx.AsyncHTTPTransport(limits=unlimited)
RetryTransport.__init__ = _fixedEnvironment
| Component | Version |
|---|---|
data-designer |
0.5.4 |
data-designer-engine |
0.5.4 |
httpx |
0.28.1 |
httpx-retries |
0.4.6 |
httpcore |
1.0.9 |
| Python | 3.12.9 |
| Platform | macOS (darwin) |