fix/ensure that split pdf requests are retried #160

awalker4 · 2024-08-22T05:24:59Z

We discovered that when a pdf is split into smaller chunks, those requests are not being retried. Now that we have allow_failed=False, this results in the whole document failing as soon as any of the child requests hit a transient error. The fix is to reuse the utils.Retry logic that the main code path uses. Copying the retry config in the hook logic is not great, and we can work with Speakeasy to make the internal logic more modular so we can reuse more. But for now, this will address the current failures while we work on a better implementation.

Testing:
See the added unit test. The existing retry logic works for the final split page, everything else needs to use the new logic. To test this, I mocked a response from the server to return 502 for a low starting_page_number, which we know will have to be handled by the hooks.

Other changes:
Remove the "Not splitting" log. When the final split page is retried, it triggers all the hooks again. We need to force split_pdf_page=False in this request, and we don't need additional logging when this code is hit again.

We discovered that when a pdf is split into smaller chunks, those requests are not being retried. Now that we have `allow_failed=False`, this results in the whole document failing as soon as any of the child requests hit a transient error. The fix is to reuse the `utils.Retry` logic that the main code path uses. Copying the retry config in the hook logic is not great, and we can work with Speakeasy to make the internal logic more modular. But for now, this will address the current failures while we work on a better implementation. Testing: See the added unit test. The existing retry logic works for the final split page, everything else needs to use the new logic. To test this, I mocked a response from the server to return 502 for a low `starting_page_number`, which we know will have to be handled by the hooks.

pawel-kmiecik

Found a nit but LGTM!

_test_unstructured_client/integration/test_decorators.py

src/unstructured_client/_hooks/custom/request_utils.py

awalker4 requested review from amadeusz-ds and pawel-kmiecik August 22, 2024 05:25

Remove debug log

583408b

awalker4 requested a review from MKhalusova August 22, 2024 05:27

pawel-kmiecik approved these changes Aug 22, 2024

View reviewed changes

_test_unstructured_client/integration/test_decorators.py Outdated Show resolved Hide resolved

src/unstructured_client/_hooks/custom/request_utils.py Show resolved Hide resolved

awalker4 added 2 commits August 22, 2024 09:26

Use nonlocal 502 counter

e28cded

Add better comments for new retry config

f18f2b4

awalker4 force-pushed the fix/retry-split-requests branch from c385af5 to f18f2b4 Compare August 22, 2024 13:35

awalker4 enabled auto-merge (squash) August 22, 2024 13:37

awalker4 disabled auto-merge August 22, 2024 13:42

Add last page retries to mocked test

5350dba

awalker4 enabled auto-merge (squash) August 22, 2024 14:18

awalker4 merged commit 0410784 into main Aug 22, 2024

awalker4 deleted the fix/retry-split-requests branch August 22, 2024 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix/ensure that split pdf requests are retried #160

fix/ensure that split pdf requests are retried #160

Uh oh!

awalker4 commented Aug 22, 2024 •

edited

Loading

Uh oh!

pawel-kmiecik left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix/ensure that split pdf requests are retried #160

fix/ensure that split pdf requests are retried #160

Uh oh!

Conversation

awalker4 commented Aug 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pawel-kmiecik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

awalker4 commented Aug 22, 2024 •

edited

Loading