Skip to content

revert: block app.listen() until tenant registry is ready (#4062)#4063

Merged
bokelley merged 1 commit into
mainfrom
bokelley/revert-block-listen-on-tenant-init
May 4, 2026
Merged

revert: block app.listen() until tenant registry is ready (#4062)#4063
bokelley merged 1 commit into
mainfrom
bokelley/revert-block-listen-on-tenant-init

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented May 4, 2026

Summary

Reverts #4062. Awaiting tenant warmup before app.listen() made Fly's deploy-time health-check timeout (300s) fire before the new machines came healthy — the deploy itself failed instead of just the smoke. Worse outcome than the original problem (smoke flake but production healthy).

What was tried

  1. fix(training-agent): eager tenant registry init at server boot #4060 — eager init at module load. Init started during construction but didn't gate listen, so smoke still raced. Net effect: 5 more failed-but-non-blocking deploys.
  2. fix(training-agent): block app.listen() until tenant registry is ready #4062 — captured warmup() and awaited it before app.listen(). Made Fly's health check time out before the listener bound. Deploy failed with Unrecoverable error: timeout reached waiting for health checks to pass.

Tenant registry init is taking >300s on a fresh Fly machine for reasons that aren't visible from the workflow logs (need flyctl logs --app adcp-docs for the boot-time stdout). Until we know why, blocking listen is the wrong tool.

Where to go from here

Options:

For now, restoring main to the #4060 state — at least the deploy completes, even if the smoke still flakes.

Test plan

🤖 Generated with Claude Code

@bokelley bokelley force-pushed the bokelley/revert-block-listen-on-tenant-init branch from b9ea3ab to 437b2b6 Compare May 4, 2026 10:30
@bokelley bokelley merged commit dd5fc08 into main May 4, 2026
19 checks passed
@bokelley bokelley deleted the bokelley/revert-block-listen-on-tenant-init branch May 4, 2026 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant