fix(deploy): make verify step tolerate transient curl errors#23
Merged
JohnRDOrazio merged 1 commit intomainfrom Apr 29, 2026
Merged
fix(deploy): make verify step tolerate transient curl errors#23JohnRDOrazio merged 1 commit intomainfrom
JohnRDOrazio merged 1 commit intomainfrom
Conversation
The verify step exited with code 56 (CURLE_RECV_ERROR) during the
first big-batch poll, killing the entire workflow run on a single
transient blip. Three robustness fixes:
- Append `|| echo '{}'` to each curl call so `set -e` doesn't kill
the script when curl fails for any reason — the empty JSON makes
jq return empty strings, which keeps the page in PENDING for the
next attempt (correct semantics: "not yet translated").
- Add `--max-time 15 --retry 3 --retry-delay 5 --retry-all-errors`
so curl handles short network blips at the transport layer.
- Append `2>/dev/null || echo ""` to jq calls for the same reason.
Also bump MAX_ATTEMPTS from 20 to 60 (10 min → 30 min). The current
deploy enqueues up to 50 translations at a time (10 docs × 5 langs
on workflow_dispatch / first deploy), and observed throughput is
~1 translation every 5–8 seconds — 50 jobs needs ~5 minutes minimum,
plus retry budget for OpenAI capacity issues.
Progress messages now show "X/N done" so the run status is readable
at a glance.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The verify step from #22 exited with code 56 (CURLE_RECV_ERROR) on its
second polling attempt during a workflow_dispatch run today, killing
the workflow on a single transient network blip. Three robustness
fixes:
set -esafety net — append|| echo '{}'to eachcurlcallso the script keeps running through transient failures. The empty
JSON makes
jqreturn empty strings, which keeps the page inPENDINGfor the next attempt (correct: "not yet translated").--max-time 15 --retry 3 --retry-delay 5 --retry-all-errorsfor short network blips at the transport layer.MAX_ATTEMPTS=60(30 min, was 10). Onworkflow_dispatch(DEPLOY_ALL=true) we enqueue up to 50translations and observed throughput is ~1 every 5-8s; the old
10-minute cap couldn't fit a full bulk recovery.
Also: progress messages now show
"X/N done"for at-a-glance status.Why now
Today's
workflow_dispatchrecovery run (run 25120538762) enqueued50 translations, and the verify step failed at attempt 2 with curl
exit 56 — leaving the run reported as failed even though 30+
translations did succeed in the background. This patch lets the
verify step ride out brief network blips and gives enough time for
a 50-job bulk recovery to actually finish.
Test plan
large-content translations clear; confirm the verify step
polls cleanly through the full window
run log
🤖 Generated with Claude Code