feat(retries): add min_attempts and absolute_max_elapsed_time_ms to BackoffStrategy#342
Conversation
…ackoffStrategy
Closes a short-circuit in retry_with_backoff{,_async} where a single slow
first attempt can exhaust max_elapsed_time before any retry fires. With
the partitioner config (max_elapsed_time=5min, CLIENT_TIMEOUT_MS=30min),
any chunk whose first attempt exceeds 5 minutes blew the retry budget
on attempt 1 -- zero retries on subsequent transient errors. Documented
in two customer-visible regressions in the platform partition path.
New fields on BackoffStrategy:
* min_attempts (default 0) -- minimum number of retry attempts that must
fire before max_elapsed_time is honored. Counts retries (not the
initial attempt). Default 0 preserves existing behavior.
* absolute_max_elapsed_time_ms (default None) -- cap on when a new retry
can START. Does NOT interrupt an in-flight func() call. Worst-case
wall-clock under this cap is absolute_max_elapsed_time_ms +
per_attempt_timeout. Default None preserves existing behavior.
Loop changes in both retry_with_backoff and retry_with_backoff_async:
* Post-attempt cap check honors min_attempts as a floor on the soft cap
and treats the hard cap as unconditional.
* Pre-sleep hard-cap check refuses to sleep into a doomed retry whose
projected start would exceed the hard cap.
* Post-sleep verification (belt-and-suspenders against late wakeups and
rounding drift in the projection).
* Helper extraction (_cap_hit_after_attempt, _raise_or_return_after_cap)
dedupes logic between sync and async paths.
Validation rejects min_attempts < 0, absolute_max_elapsed_time_ms <= 0,
and absolute_max_elapsed_time_ms below max_elapsed_time.
.genignore: ignore src/unstructured_client/utils/retries.py to preserve
these fields across Speakeasy regens. Documented merge procedure for
future template updates. Pushing these fields upstream to Speakeasy
templates is filed as a follow-up.
Tests:
* T1-T14 in unit/test_retries.py with a fake-clock harness that
monkeypatches time.time / time.sleep / asyncio.sleep / random.uniform.
Covers the v1 reproducer (slow first attempt + min_attempts floor),
floor-is-not-a-ceiling semantics, hard cap overrides floor, sleep
truncation, TemporaryError early-return through both caps, and
PermanentError short-circuit immunity.
* Validation tests for the new __init__ guards.
* New integration test
test_split_pdf_cache_tmp_data_chunk_request_stream_is_replay_safe
pins the body-replay invariant: chunks built from open file objects
(cache_tmp_data=True) must produce replay-safe httpx requests so SDK
retries actually deliver the original multipart payload. Iterates
request.stream twice directly to bypass request.read() caching.
759f400 to
12ba6d6
Compare
|
@claude review |
|
Claude finished @aballman's task —— View job PR Review Complete ✓Reviewing PR #342: feat(retries): add min_attempts and absolute_max_elapsed_time_ms to BackoffStrategy Todo List:
Review SummaryThis is a well-implemented PR that addresses a legitimate issue where slow first attempts could exhaust retry budgets before any retries occur. The code quality is high with comprehensive test coverage. Code Review Feedback✅ Strengths1. Clean API Design
2. Excellent Test Coverage
3. Robust Implementation
4. Clear Documentation
|
## Summary Cuts a 0.44.1 release so the `BackoffStrategy` retry-budget fields that landed on main in #342 actually ship to PyPI. v0.44.0 was tagged before #342 merged, so PyPI v0.44.0 does not include those fields. ## Changes - `src/unstructured_client/_version.py`: bump `__version__` and `__user_agent__` to `0.44.1`. - `CHANGELOG.md`: split the combined 0.44.0 entry — `min_attempts` / `absolute_max_elapsed_time_ms` move under a new `## 0.44.1` section so the changelog matches what's actually in each PyPI artifact. - `RELEASES.md`: append a 0.44.1 entry following the existing Speakeasy-publish format. ## Test plan - [x] No code changes; only metadata files - [ ] CI green - [ ] PyPI publish workflow picks up the bump on merge
Summary
retry_with_backoff_asynccurrently checksnow - start > max_elapsed_timebefore sleeping. WhenRetryConfig.max_elapsed_timeis set to e.g. 5 min and the per-attempt httpx client timeout is 30 min, any attempt that runs longer than the budget blows it on attempt 1 — zero retries fire on subsequent transient errors. Closes that short-circuit without re-introducing the unbounded retry loops that recent budget tightening was meant to prevent.What changes
New optional fields on
BackoffStrategy(default values preserve existing behavior):min_attempts: int = 0— minimum retry attempts that must fire beforemax_elapsed_timeis honored. Counts retries, not the initial attempt.min_attempts=2permits 1 initial + at least 2 retries (3 total attempts) before the soft cap can cut the loop.absolute_max_elapsed_time_ms: int | None = None— cap on when a new retry can START. Does NOT interrupt an in-flightfunc()call. Worst-case wall-clock under this cap isabsolute_max_elapsed_time_ms + per_attempt_timeout.Loop changes in both sync (
retry_with_backoff) and async (retry_with_backoff_async) paths:retries >= min_attempts; hard cap unconditional._cap_hit_after_attempt,_raise_or_return_after_cap) dedupes the soft/hard cap logic between sync and async.Validation in
BackoffStrategy.__init__rejectsmin_attempts < 0,absolute_max_elapsed_time_ms <= 0, and hard cap below soft cap..genignoreupdated to preserve these fields across future Speakeasy regens, matching thegeneral.pyandusers.pyprecedent.Design choices
func()cannot be interrupted from the retry loop. Consumers should pair the cap with a sensible per-attempt timeout to keep the worst case bounded.min_attempts > 0and/orabsolute_max_elapsed_time_ms.Tests
49 tests pass (46 unit + 3 split-PDF retry integration). New coverage:
_test_unstructured_client/unit/test_retries.py: fake-clock harness monkeypatchingtime.time/time.sleep/asyncio.sleep/random.uniform. Covers the slow-first-attempt +min_attemptsfloor scenario, floor-is-not-a-ceiling semantics, hard cap overrides floor, sleep truncation,TemporaryErrorearly-return through both caps,PermanentErrorshort-circuit immunity, andBackoffStrategy.__init__validation.test_split_pdf_cache_tmp_data_chunk_request_stream_is_replay_safeinintegration/test_decorators.py: pins the body-replay invariant for chunk requests built from open file objects (thesplit_pdf_cache_tmp_data=Truepath). Iteratesrequest.streamtwice directly — bypassesrequest.read()caching — so a future Speakeasy template change that produced a single-consumption stream would fail this test.Test plan
uv run pytest _test_unstructured_client/unit/test_retries.py— 46 passuv run pytest _test_unstructured_client/integration/test_decorators.py::test_split_pdf_*retry* _test_unstructured_client/integration/test_decorators.py::test_split_pdf_cache_tmp_data_*— 3 pass