[python] Make HTTP timeout/retry/keep-alive configurable via CatalogOptions#7732
Conversation
|
Why can't it be consistent with Java implementation? |
Thanks for the review @JingsongLi. I do want to keep both the bug fix and the new options in this PR. The reason isn't just convenience — without these as catalog options the client's actual behaviour is not under our control. requests falls back to the OS / kernel for
Exposing the five keys (http.connect-timeout / http.read-timeout / http.max-connect-retries / http.max-read-retries / http.keep-alive) is what lets us pin those down deterministically per-catalog. The defaults already match Java's On the Java consistency point: I take that seriously. My plan is, once this PR lands, to follow up with a dedicated PR that introduces the same option keys on the Java side (extending RESTCatalogOptions and wiring them through If that two-step plan works for you, I'll keep this PR as-is. Otherwise happy to discuss alternatives. |
|
@TheR1sing3un If Java is fine, can we just align Java's default behavior? There's no need to expose so many configurations. |
At first, I thought the same way. But the core issue is that if these configurations cannot be adjusted, then some network configurations of the kernel will be used. However, in different machines and different clusters, these network configurations are not uniform. For example, in our internal cluster, the connect retry is only once. This will cause the client to fail directly once it encounters a little network fluctuation, and then lead to the failure of the entire ray job. |
By the way, when these parameters are not configured, the behavior should be aligned. |
What I want to confirm is, is there a problem with the Java SDK? |
Checked the Java side too — same problem actually.
And Happy to mirror the same option set on the Java side as a follow-up PR |
|
@TheR1sing3un You may have misunderstood my meaning. Even if Java cannot configure these parameters, its default parameters run well, so there is no problem. |
Sorry, I've always misunderstood your meaning. I agree with you. Let me directly align the default behavior on the python side and temporarily not introduce configuration-based adjustment capabilities. What do you think? |
2434d26 to
93da407
Compare
Done. Aligned with java version. |
``HttpClient`` set ``self.session.timeout = (180, 180)`` and never applied it: the requests library does not consult ``Session.timeout`` on outgoing calls, only ``Session.request(timeout=...)`` does. The client could therefore hang indefinitely on a slow upstream. Pass the timeout through ``session.request(timeout=...)`` on every call so it actually fires. Also bump the hardcoded retry budget from 3 to 5 and keep connect failures non-retriable -- defaults that match the existing reference implementation and work well across the cluster shapes we see in practice. No new ``CatalogOptions`` are introduced. Callers who need different shapes can override the class-level constants on ``HttpClient``. Tests: * ``HttpClientTimeoutTest`` patches ``Session.request`` and asserts it is called with ``timeout=client._timeout``. This pins the bug fix. * ``test_exponential_retry_strategy`` uses the single-counter API and asserts connect failures bail fast (``connect=0``).
93da407 to
3e616f8
Compare
|
@TheR1sing3un Cool! Thanks~ |
|
+1 |
Purpose
The REST
HttpClientcurrently hardcodes its retry count and ignores its own session timeout. Two practical issues fall out of this:requests.Session.timeoutis not consulted by the library — onlysession.request(timeout=...)is. The previousself.session.timeout = (180, 180)had no effect, so requests could hang indefinitely on a slow server.max_retries=3mixed connect / read retries; users could not disable connect retries (which often shouldn't retry) or boost read retries against flaky upstreams.This PR introduces five
CatalogOptionsso REST behaviour can be tuned via standard catalog options:http.connect-timeouthttp.read-timeouthttp.max-connect-retrieshttp.max-read-retrieshttp.keep-aliveConnection: closeHttpClientnow accepts an optionalOptionsargument, applies the timeout viasession.request(timeout=...), separatesconnect/readretry counters inExponentialRetry(withtotal=Noneso each type governs independently), and setsConnection: closewhen keep-alive is disabled.RESTApiforwards its options through.token_loader.pyis updated to the newExponentialRetry(connect_retries, read_retries)signature.Linked issue
N/A — discovered while tuning REST clients against high-latency catalog servers and verifying that
session.timeoutis dead code.Tests
pypaimon/tests/rest/test_exponential_retry_strategy.py— refreshed for the new signature; coverstotal=None, separated connect/read counters, zero-retries, and an end-to-end retry-on-connect-error case.pypaimon/tests/rest/client_test.py— newHttpClientHttpOptionsTest:options=Nonehttp.connect-timeout/http.read-timeoutreachclient._timeouthttp.keep-alive=falsesetsConnection: closehttp.max-connect-retries/http.max-read-retriesreach the mounted adapter'sRetryLocal:
pytest pypaimon/tests/rest/client_test.py pypaimon/tests/rest/test_exponential_retry_strategy.py→ 10 passed;flake8 --config=dev/cfg.iniclean.API and format
HttpClient.__init__adds an optionaloptionskwarg (backward compatible — existingHttpClient(uri)callers still work and use the same default behaviour as before, except the timeout is now actually honoured).ExponentialRetry.__init__now takesconnect_retries/read_retriesinstead of a singlemax_retries. The only internal caller (token_loader.py) is updated; no public API forExponentialRetry.No file format change.
Documentation
Option keys are self-described via
with_description(...)inCatalogOptions. No additional doc change required.Generative AI disclosure
Drafted with assistance from an AI coding tool; all logic reviewed by the author and validated by the tests above.