fix(identity): stop amplifying Hiro egress throttling into 25s lookup-failed (#939) by biwasxyz · Pull Request #951 · aibtcdev/landing-page

biwasxyz · 2026-06-01T13:16:08Z

Problem (#939)

When Hiro rate-limits Cloudflare's shared egress IPs (429), the synchronous identity/BNS lookups on POST /api/identity/{addr}/refresh and the enrichment branch of GET /api/agents/{addr} turned a transient upstream blip into a ~25s hang returning lookup-failed — even though direct Hiro calls answer in <500ms. Downstream (aibtc.news identity-gate, 3s budget) then 503s and loops.

Verified live while writing this: when Hiro is healthy, the path is fast and correct (/refresh 2.4s, idOutcome/bnsOutcome positive; direct Hiro holdings 0.66s). So this is defensive hardening for throttle windows, not a correctness bug — Hiro throttling our egress is the trigger; our code amplifying it into a 25s outage is what this fixes.

Two compounding causes

Amplification — detectAgentIdentity treated any non-ok holdings response (incl. 429/5xx) as "holdings unavailable" and fell back to the O(N) legacy scan, firing 5+ more call-read requests at the same throttled upstream. One rate-limited call became a multi-second storm still ending in failure.
8s × retries timeout — a single hung call alone burned ~16s, past the consumer budget.

Fix

Legacy scan now triggers only on a genuine 404. 429/5xx fail fast as lookup-failed with the existing short-TTL negative cache, so the next request retries cleanly instead of storming.
Add configurable perAttemptTimeoutMs to stacksApiFetch (default 8s preserved) and thread SYNC_PER_ATTEMPT_TIMEOUT_MS (3.5s) + reduced retries through the synchronous holdings / get-token-uri / BNS get-primary calls. Worst case on a throttled window drops from ~25s to a sub-second fast-fail.

Tests

New lib/identity/__tests__/detection.test.ts: 429 and 5xx fail fast without the legacy scan; 404 still falls back to it; a holdings hit resolves positive. Full affected suite green.

Not in scope

The heavier architectural option (B in the issue — move enrichment fully async/background) is deferred; this is the surgical fast-fail that resolves the measured 25s symptom.

…-failed (#939) When Hiro rate-limits Cloudflare's shared egress IPs (429), the synchronous identity/BNS lookups on POST /api/identity/{addr}/refresh and the enrichment branch of GET /api/agents/{addr} were turning a transient upstream blip into a ~25s hang that returned lookup-failed — even though direct Hiro calls answer in <500ms. Two compounding causes: 1. Amplification: detectAgentIdentity treated ANY non-ok holdings response (including 429/5xx) as "holdings unavailable" and fell back to the O(N) legacy scan — firing 5+ more call-read requests at the same throttled upstream, each with its own retry budget. One rate-limited call became a multi-second storm that still ended in failure. 2. Long per-attempt timeout: each call used an 8s timeout × retries, so a single hung call alone burned ~16s — far past a consumer's ~3s budget (aibtc.news identity-gate 503s and loops when we exceed it). Hardening (we can't stop Hiro throttling our egress, but we stop amplifying it): - Legacy scan now only triggers on a genuine 404 (endpoint can't serve the lookup). 429/5xx fail fast as lookup-failed with the existing short-TTL negative cache, so the next request retries cleanly instead of storming. - Add a configurable perAttemptTimeoutMs to stacksApiFetch (default 8s) and thread SYNC_PER_ATTEMPT_TIMEOUT_MS (3.5s) + reduced retries (1) through the synchronous holdings, get-token-uri, and BNS get-primary calls. Worst case on a throttled window drops from ~25s to a sub-second fast-fail. This is defensive hardening, not a correctness fix: when Hiro is healthy the path is fast and correct (verified live: /refresh 2.4s, idOutcome/bnsOutcome positive). It only changes behavior during Hiro throttle windows. Tests: new detection suite asserts 429/5xx fail fast WITHOUT the legacy scan, 404 still falls back to it, and a holdings hit resolves positive.

cloudflare-workers-and-pages · 2026-06-01T13:18:12Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
✅ Deployment successful! View logs	landing-page	`ad9704a`	Jun 01 2026, 01:18 PM

biwasxyz merged commit dc06dac into main Jun 1, 2026
8 checks passed

biwasxyz deleted the fix/identity-lookup-throttle-hardening-939 branch June 1, 2026 13:34

github-actions Bot mentioned this pull request Jun 1, 2026

chore(main): release 1.46.0 #941

Open

biwasxyz mentioned this pull request Jun 1, 2026

/api/identity/{address}/refresh: 25s sync timeouts return lookup-failed while Hiro responds in <500ms direct (upstream for agent-news#826) #939

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(identity): stop amplifying Hiro egress throttling into 25s lookup-failed (#939)#951

fix(identity): stop amplifying Hiro egress throttling into 25s lookup-failed (#939)#951
biwasxyz merged 1 commit into
mainfrom
fix/identity-lookup-throttle-hardening-939

biwasxyz commented Jun 1, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

biwasxyz commented Jun 1, 2026

Problem (#939)

Two compounding causes

Fix

Tests

Not in scope

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 1, 2026

Deploying with Cloudflare Workers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant