Skip to content

fix: add retry, timeout, and rate-limit handling to GitHub API scripts#37084

Open
WYSIATI wants to merge 3 commits intoanthropics:mainfrom
WYSIATI:fix/scripts-resilient-github-requests
Open

fix: add retry, timeout, and rate-limit handling to GitHub API scripts#37084
WYSIATI wants to merge 3 commits intoanthropics:mainfrom
WYSIATI:fix/scripts-resilient-github-requests

Conversation

@WYSIATI
Copy link

@WYSIATI WYSIATI commented Mar 21, 2026

Summary

  • Extract a shared githubRequest utility (scripts/github-request.ts) that wraps fetch() with retry, timeout, and rate-limit handling
  • Migrate all 4 GitHub API scripts to use the shared utility, eliminating 4 copies of inline githubRequest functions
  • Add 52 unit and integration tests (scripts/github-request.test.ts)
  • Previously, a single ECONNRESET or GitHub 502 would crash these CI scripts with no recovery

What's included

The shared utility adds:

  • Exponential backoff retry (default 3 attempts) for transient network errors (ECONNRESET, ECONNREFUSED, EPIPE, ETIMEDOUT, EAI_AGAIN, etc.)
  • Retry on retryable HTTP status codes (408, 429, 500, 502, 503, 504)
  • Respect for Retry-After headers from GitHub's rate limiter
  • Per-request timeout (30s default) via AbortController
  • 204 No Content handling — returns {} instead of crashing on .json() parse
  • Jitter (±25%) to avoid thundering herd on concurrent retries

Test architecture

Tests use Bun's native test runner (bun:test) and are structured as:

Unit tests (internal helpers exported for testability):

  • isTransientError — parameterized via test.each() across all 10 transient error codes + Bun socket message + AbortError/TimeoutError + negative cases
  • calculateDelay — exponential backoff curve, jitter range, cap at 32s
  • parseRetryAfter — numeric seconds, HTTP-date, edge cases (0, negative, >300s cap, missing header)

Integration tests (end-to-end githubRequest):

  • Basic requests: GET with auth, POST with JSON body, 204 No Content, custom/default User-Agent
  • Non-retryable errors: test.each([401, 403, 422]) — immediate throw, 1 attempt
  • Transient network error retry: test.each(["ECONNRESET", "EPIPE", "ETIMEDOUT", "ECONNREFUSED"]) — recovers after N failures
  • Retryable HTTP codes: test.each([429, 500, 502, 503, 504]) — retries and recovers
  • Retry exhaustion — correct attempt count, throws final error
  • maxRetries=0 — disables retry for both network and HTTP errors
  • Mixed scenarios — ECONNRESET → 502 → success in one request
  • Timeout — AbortController fires, AbortError retried
$ bun test scripts/github-request.test.ts
52 pass, 0 fail, 84 expect() calls

Scripts updated

Script What it does Why retry matters
auto-close-duplicates.ts Closes duplicate issues after 3-day grace period Makes hundreds of sequential API calls; a single 502 would abort the entire batch
backfill-duplicate-comments.ts Triggers dedupe workflows on issues missing duplicate detection Paginates through up to 200 pages of issues; transient failures are inevitable
lifecycle-comment.ts Posts lifecycle label comments on issues Single POST — retry prevents silent failure in CI
sweep.ts Marks stale issues and closes expired ones Most API-intensive script; pagination + per-issue events/comments queries

Test plan

  • bun test scripts/github-request.test.ts — 52/52 passing
  • Run bun run scripts/sweep.ts --dry-run — verify no behavioral change in normal path
  • Verify retry logging: mock a 429 or 502 response to confirm [RETRY] messages appear with backoff delays
  • Verify non-retryable errors still throw immediately (e.g., 401, 403, 422)

WYSIATI added 3 commits March 21, 2026 20:32
Extract a shared `githubRequest` utility in `scripts/github-request.ts`
that wraps `fetch()` with:
- Exponential backoff retry (3 attempts) for transient errors
  (ECONNRESET, ECONNREFUSED, EPIPE, ETIMEDOUT, etc.)
- Retry on retryable HTTP status codes (408, 429, 500, 502, 503, 504)
- Respect for `Retry-After` headers from GitHub's rate limiter
- Per-request timeout (30s default) via AbortController
- 204 No Content handling
- Jitter to avoid thundering herd on concurrent retries

Migrate all 4 scripts to use the shared utility:
- auto-close-duplicates.ts
- backfill-duplicate-comments.ts
- lifecycle-comment.ts
- sweep.ts

Previously, a single ECONNRESET or GitHub 502 would crash these CI
scripts with no recovery. This is especially problematic for the
sweep and auto-close scripts which make hundreds of sequential API
calls during batch processing.
24 tests covering:
- Basic GET/POST functionality and 204 No Content handling
- Non-retryable HTTP errors (401, 403, 422) — no retry, immediate throw
- Transient network error retry (ECONNRESET, EPIPE, ETIMEDOUT,
  ECONNREFUSED, Bun socket close)
- Retryable HTTP status codes (429, 500, 502, 503)
- Retry exhaustion — throws after max attempts
- maxRetries=0 disables all retry
- Mixed error scenarios (ECONNRESET → 502 → success)
- Request timeout via AbortController
- Custom User-Agent header
Test architecture improvements:
- Export isTransientError, calculateDelay, parseRetryAfter for direct
  unit testing — follows common practice of testing internal logic
  independently from the integration layer
- Split tests into unit tests (internal helpers) and integration tests
  (githubRequest end-to-end)
- Nest describe blocks by concern: basic requests, non-retryable errors,
  transient network errors, retryable HTTP codes, maxRetries=0, mixed
  scenarios, timeout
- Use test.each() for parameterized tests — reduces boilerplate for
  error code and status code coverage
- Move beforeEach/afterEach inside the githubRequest describe block
- Extract mockFetch() and failThenSucceed() helpers to eliminate
  repeated globalThis.fetch reassignment

Test count: 24 → 52 (added unit tests for internal helpers)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant