feat(retries): add min_attempts and absolute_max_elapsed_time_ms to BackoffStrategy by aballman · Pull Request #342 · Unstructured-IO/unstructured-python-client

aballman · 2026-05-21T15:48:44Z

Summary

retry_with_backoff_async currently checks now - start > max_elapsed_time before sleeping. When RetryConfig.max_elapsed_time is set to e.g. 5 min and the per-attempt httpx client timeout is 30 min, any attempt that runs longer than the budget blows it on attempt 1 — zero retries fire on subsequent transient errors. Closes that short-circuit without re-introducing the unbounded retry loops that recent budget tightening was meant to prevent.

What changes

New optional fields on BackoffStrategy (default values preserve existing behavior):

min_attempts: int = 0 — minimum retry attempts that must fire before max_elapsed_time is honored. Counts retries, not the initial attempt. min_attempts=2 permits 1 initial + at least 2 retries (3 total attempts) before the soft cap can cut the loop.
absolute_max_elapsed_time_ms: int | None = None — cap on when a new retry can START. Does NOT interrupt an in-flight func() call. Worst-case wall-clock under this cap is absolute_max_elapsed_time_ms + per_attempt_timeout.

Loop changes in both sync (retry_with_backoff) and async (retry_with_backoff_async) paths:

Post-attempt cap check. Soft cap honored only when retries >= min_attempts; hard cap unconditional.
Pre-sleep hard-cap check. Refuses to sleep into a retry whose projected start would cross the hard cap.
Post-sleep verification. Belt-and-suspenders against late wakeups and projection rounding.
Helper extraction (_cap_hit_after_attempt, _raise_or_return_after_cap) dedupes the soft/hard cap logic between sync and async.

Validation in BackoffStrategy.__init__ rejects min_attempts < 0, absolute_max_elapsed_time_ms <= 0, and hard cap below soft cap.

.genignore updated to preserve these fields across future Speakeasy regens, matching the general.py and users.py precedent.

Design choices

Hard cap is a retry-start cap, not a wall-clock bound. In-flight func() cannot be interrupted from the retry loop. Consumers should pair the cap with a sensible per-attempt timeout to keep the worst case bounded.
Defaults preserve existing behavior so this change is non-breaking for every existing consumer. Consumers opt in by setting min_attempts > 0 and/or absolute_max_elapsed_time_ms.

Tests

49 tests pass (46 unit + 3 split-PDF retry integration). New coverage:

T1–T14 in _test_unstructured_client/unit/test_retries.py: fake-clock harness monkeypatching time.time / time.sleep / asyncio.sleep / random.uniform. Covers the slow-first-attempt + min_attempts floor scenario, floor-is-not-a-ceiling semantics, hard cap overrides floor, sleep truncation, TemporaryError early-return through both caps, PermanentError short-circuit immunity, and BackoffStrategy.__init__ validation.
test_split_pdf_cache_tmp_data_chunk_request_stream_is_replay_safe in integration/test_decorators.py: pins the body-replay invariant for chunk requests built from open file objects (the split_pdf_cache_tmp_data=True path). Iterates request.stream twice directly — bypasses request.read() caching — so a future Speakeasy template change that produced a single-consumption stream would fail this test.

Test plan

uv run pytest _test_unstructured_client/unit/test_retries.py — 46 pass
uv run pytest _test_unstructured_client/integration/test_decorators.py::test_split_pdf_*retry* _test_unstructured_client/integration/test_decorators.py::test_split_pdf_cache_tmp_data_* — 3 pass
CI green

…ackoffStrategy Closes a short-circuit in retry_with_backoff{,_async} where a single slow first attempt can exhaust max_elapsed_time before any retry fires. With the partitioner config (max_elapsed_time=5min, CLIENT_TIMEOUT_MS=30min), any chunk whose first attempt exceeds 5 minutes blew the retry budget on attempt 1 -- zero retries on subsequent transient errors. Documented in two customer-visible regressions in the platform partition path. New fields on BackoffStrategy: * min_attempts (default 0) -- minimum number of retry attempts that must fire before max_elapsed_time is honored. Counts retries (not the initial attempt). Default 0 preserves existing behavior. * absolute_max_elapsed_time_ms (default None) -- cap on when a new retry can START. Does NOT interrupt an in-flight func() call. Worst-case wall-clock under this cap is absolute_max_elapsed_time_ms + per_attempt_timeout. Default None preserves existing behavior. Loop changes in both retry_with_backoff and retry_with_backoff_async: * Post-attempt cap check honors min_attempts as a floor on the soft cap and treats the hard cap as unconditional. * Pre-sleep hard-cap check refuses to sleep into a doomed retry whose projected start would exceed the hard cap. * Post-sleep verification (belt-and-suspenders against late wakeups and rounding drift in the projection). * Helper extraction (_cap_hit_after_attempt, _raise_or_return_after_cap) dedupes logic between sync and async paths. Validation rejects min_attempts < 0, absolute_max_elapsed_time_ms <= 0, and absolute_max_elapsed_time_ms below max_elapsed_time. .genignore: ignore src/unstructured_client/utils/retries.py to preserve these fields across Speakeasy regens. Documented merge procedure for future template updates. Pushing these fields upstream to Speakeasy templates is filed as a follow-up. Tests: * T1-T14 in unit/test_retries.py with a fake-clock harness that monkeypatches time.time / time.sleep / asyncio.sleep / random.uniform. Covers the v1 reproducer (slow first attempt + min_attempts floor), floor-is-not-a-ceiling semantics, hard cap overrides floor, sleep truncation, TemporaryError early-return through both caps, and PermanentError short-circuit immunity. * Validation tests for the new __init__ guards. * New integration test test_split_pdf_cache_tmp_data_chunk_request_stream_is_replay_safe pins the body-replay invariant: chunks built from open file objects (cache_tmp_data=True) must produce replay-safe httpx requests so SDK retries actually deliver the original multipart payload. Iterates request.stream twice directly to bypass request.read() caching.

aballman · 2026-05-27T00:10:59Z

@claude review

claude · 2026-05-27T00:11:13Z

Claude finished @aballman's task —— View job

PR Review Complete ✓

Reviewing PR #342: feat(retries): add min_attempts and absolute_max_elapsed_time_ms to BackoffStrategy

Todo List:

Read repository guidelines (CLAUDE.md) - None found
Review main implementation in retries.py
Examine test coverage and quality
Check other modified files (.genignore, CHANGELOG.md)
Provide comprehensive review feedback

Review Summary

This is a well-implemented PR that addresses a legitimate issue where slow first attempts could exhaust retry budgets before any retries occur. The code quality is high with comprehensive test coverage.

Code Review Feedback

✅ Strengths

1. Clean API Design

src/unstructured_client/utils/retries.py:25-26: New optional parameters with sensible defaults preserve backward compatibility
src/unstructured_client/utils/retries.py:31-45: Comprehensive validation logic prevents invalid configurations

2. Excellent Test Coverage

_test_unstructured_client/unit/test_retries.py:177-228: Thorough validation testing covers edge cases
_test_unstructured_client/unit/test_retries.py:20-48: Sophisticated fake clock mechanism enables deterministic testing
Tests cover both sync and async paths consistently

3. Robust Implementation

src/unstructured_client/utils/retries.py:182-205: Helper functions _cap_hit_after_attempt and _raise_or_return_after_cap eliminate code duplication between sync/async paths
src/unstructured_client/utils/retries.py:246-256: Sleep truncation prevents doomed retries that would exceed hard cap
src/unstructured_client/utils/retries.py:259-269: Belt-and-suspenders post-sleep verification

4. Clear Documentation

PR description thoroughly explains the problem and solution
src/unstructured_client/utils/retries.py:28-30: Inline comments clarify parameter semantics
Test names like test_t3_slow_first_attempt_then_success_sync are descriptive

⚠️ Minor Concerns

1. Magic Numbers in Tests

_test_unstructured_client/unit/test_retries.py:300,319,338,405,424,454,523: Various hardcoded timeouts (6000ms, 9950ms, 11000ms) could benefit from named constants explaining their significance

2. Error Message Clarity

src/unstructured_client/utils/retries.py:201-204: Error messages include technical details like "retries + 1 attempts over 6.0s" which may confuse end users. Consider more user-friendly messages.

📝 Implementation Notes

1. Design Decisions Well-Justified

Hard cap only prevents retry start, doesn't interrupt in-flight requests - sensible choice
min_attempts counts retries not initial attempts - clear from tests and docs

2. Thread Safety

Implementation is stateless per retry loop, avoiding concurrency issues
Clock mocking approach in tests is clean and doesn't introduce race conditions

3. Integration with Existing Code

src/unstructured_client/utils/retries.py:129-131,175-177: Updates to retry and retry_async functions cleanly pass new parameters
.genignore:23-25: Proper Speakeasy configuration prevents regeneration conflicts

Test Quality Assessment

The test suite is particularly impressive:

T1-T14 test numbering provides clear test organization
FakeClock mechanism enables precise timing control without flaky tests
Both sync and async coverage ensures feature parity
Edge case coverage includes validation, early returns, and cap interactions

Minor Suggestions

Consider extracting timeout constants in tests for better readability
Error messages could be more user-friendly while retaining debugging info
The test_split_pdf_cache_tmp_data_chunk_request_stream_is_replay_safe test in integration looks unrelated to this PR but passes - good defensive programming

Verdict: APPROVE ✅

This PR successfully solves the stated problem with a clean, well-tested implementation. The backward compatibility is preserved, the new functionality is opt-in, and the code quality meets high standards. Ready to merge.

## Summary Cuts a 0.44.1 release so the `BackoffStrategy` retry-budget fields that landed on main in #342 actually ship to PyPI. v0.44.0 was tagged before #342 merged, so PyPI v0.44.0 does not include those fields. ## Changes - `src/unstructured_client/_version.py`: bump `__version__` and `__user_agent__` to `0.44.1`. - `CHANGELOG.md`: split the combined 0.44.0 entry — `min_attempts` / `absolute_max_elapsed_time_ms` move under a new `## 0.44.1` section so the changelog matches what's actually in each PyPI artifact. - `RELEASES.md`: append a 0.44.1 entry following the existing Speakeasy-publish format. ## Test plan - [x] No code changes; only metadata files - [ ] CI green - [ ] PyPI publish workflow picks up the bump on merge

aballman force-pushed the aballman/fs-1988-min-attempts branch from 759f400 to 12ba6d6 Compare May 22, 2026 21:19

chore(FS-1988): trim verbose source comments

d85deb1

aballman changed the title ~~feat(FS-1988): add min_attempts and absolute_max_elapsed_time_ms to BackoffStrategy~~ feat(retries): add min_attempts and absolute_max_elapsed_time_ms to BackoffStrategy May 26, 2026

chore: drop ticket ref from .genignore comment

97e28be

jordan-homan approved these changes May 27, 2026

View reviewed changes

aballman merged commit 6c28d32 into main May 27, 2026
17 checks passed

aballman deleted the aballman/fs-1988-min-attempts branch May 27, 2026 15:56

aballman mentioned this pull request May 27, 2026

chore: bump to 0.44.1 to release retry budget fields #343

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retries): add min_attempts and absolute_max_elapsed_time_ms to BackoffStrategy#342

feat(retries): add min_attempts and absolute_max_elapsed_time_ms to BackoffStrategy#342
aballman merged 3 commits into
mainfrom
aballman/fs-1988-min-attempts

aballman commented May 21, 2026 •

edited

Loading

Uh oh!

aballman commented May 27, 2026

Uh oh!

claude Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aballman commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes

Design choices

Tests

Test plan

Uh oh!

aballman commented May 27, 2026

Uh oh!

claude Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✓

Todo List:

Review Summary

Code Review Feedback

✅ Strengths

⚠️ Minor Concerns

📝 Implementation Notes

Test Quality Assessment

Minor Suggestions

Verdict: APPROVE ✅

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aballman commented May 21, 2026 •

edited

Loading

claude Bot commented May 27, 2026 •

edited

Loading