feat(server): speed up recce server startup with background loading#1129
Merged
gcko merged 8 commits intoMar 2, 2026
Merged
Conversation
Move artifact loading to a background task so the HTTP server starts accepting connections immediately. This reduces /api/health response time from 20+ seconds to <1 second in cloud deployments. Changes: - Lifespan runs _do_lifespan_setup in background via asyncio.to_thread - New readiness gate middleware: /api/health passes through immediately, all other endpoints wait until loading completes - Artifact files (manifests + catalogs) load in parallel via ThreadPoolExecutor - StartupPerfTracker is now thread-safe with threading.Lock Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR optimizes recce server startup time by moving heavy artifact loading to a background thread, allowing the HTTP server to start accepting connections immediately. The changes implement a three-part strategy: background loading with asyncio.to_thread(), a readiness gate middleware that blocks non-health endpoints until loading completes, and parallel artifact loading using ThreadPoolExecutor.
Changes:
- Background loading: Server lifespan now yields immediately after spawning a background task, reducing time-to-first-response from 20+ seconds to near-instant for health checks
- Readiness gate middleware:
/api/healthpasses through immediately for liveness probes, while data endpoints wait for startup or return 503 on failure - Parallel artifact loading: Four dbt artifact files now load concurrently instead of sequentially
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| recce/server.py | Refactored lifespan to use background_load async function with asyncio.to_thread(), added ready_event/startup_error state tracking, and implemented readiness_gate middleware |
| recce/adapter/dbt_adapter/init.py | Modified load_artifacts() to load 4 dbt artifacts in parallel using ThreadPoolExecutor instead of sequentially |
| recce/util/startup_perf.py | Added threading.Lock to StartupPerfTracker for thread-safe recording of timings and artifact sizes during parallel loading |
| tests/test_server_lifespan.py | Added await ready_event.wait() to ensure background loading completes before assertions |
| tests/test_server.py | Added TestReadinessGate test class covering health endpoint passthrough and 503 error handling on startup failure |
When tests set app.state to a MagicMock, getattr returns a MagicMock for ready_event instead of None. Use isinstance(ready_event, asyncio.Event) to avoid awaiting a MagicMock in test environments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>
- Move future.result() calls inside ThreadPoolExecutor with block - Use consistent path variables for artifacts_files list - Add test for data endpoint succeeding after ready_event is set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>
Codecov Report❌ Patch coverage is
... and 4 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Move expensive operations out of the CLI startup path so the HTTP port opens within ~1-2s instead of ~18-25s: - Remove state_loader.load() from create_state_loader() — deferred to RecceContext.load() during background lifespan setup - Remove verify_required_artifacts() from server command — redundant with setup_server() in background; errors surface as 503 - Move update_onboarding_state() to _do_lifespan_setup() (non-fatal) - Move read-only context loading to setup_ready_only() in background - Enhance /api/health with ready/error fields for K8s readiness checks - Add explicit state_loader.load() to non-server callers that need eager loading (summary, purge, upload, share) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>
…from-recce_statejson-directly
Cover the behavioral changes from deferring heavy CLI operations: - _do_lifespan_setup onboarding state update (success, skip, failure) - setup_ready_only context loading and set_default_context - state_loader.verify() failure exits before uvicorn - create_state_loader no longer calls load() - health endpoint backward compat (ready=true without ready_event) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>
- setup_ready_only: use load_context() for consistency with setup_server/setup_preview instead of manual RecceContext.load() + set_default_context() - TestReadinessGate: extract fixture and helper to eliminate repetitive ready_event/startup_error setup/cleanup - TestDoLifespanSetup: extract server_app_state fixture to reduce duplicated AppState construction - Remove unnecessary prepare_api_token patch from test_create_state_loader_does_not_call_load Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>
- Move clear_startup_tracker() to finally block so it runs on both success and failure paths - Narrow readiness gate middleware to /api/* paths only, letting SPA and static assets pass through immediately - Return only exception type name in /api/health error field to avoid leaking internal details (file paths, config values) - Add startup timeout (default 300s, configurable via RECCE_STARTUP_TIMEOUT) to prevent indefinite request hangs if background loading stalls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>
even-wei
added a commit
that referenced
this pull request
May 21, 2026
Captured via standalone Playwright pattern (T5's approach for PR #1129) against duckdb fixture pr-1376-qd. Renderings: - SS-2 / SS-2b: Cancelled chip persists through the in-flight waitRun poll window (6s hold). Pre-fix this reverted to Running. - SS-3: post-reload — Query page returns to default editor state; runId still in localStorage (canceledRuns) for future references. - SS-4: second tab opens to default Query page (no cross-tab carryover of editor state, but localStorage canceledRuns is shared). - SS-5: RunResultPane shows Cancelled in both header and body. - SS-6: POST /cancel_run round-trip 2ms (well under 2.5s budget). - SS-7: PostHog cancel_run event deferred — OSS local has no dev telemetry; production capture tracked separately. SS-1 (pre-existing) shows the Running state captured prior to the fix. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR checklist
What type of PR is this?
Performance improvement
What this PR does / why we need it:
In cloud deployments,
recce servertakes 20+ seconds before/api/healthreturns 200 because the FastAPI lifespan synchronously loads all dbt artifacts before the HTTP server starts listening. This PR makes the server start accepting connections immediately.Three complementary changes:
_do_lifespan_setup()to a background task viaasyncio.to_thread. The lifespanyields immediately so uvicorn starts listening right away./api/healthpasses through immediately (returns 200). All other/api/*endpointsawait ready_event.wait()until loading completes. Non-API routes (SPA, static assets) pass through immediately. Returns 503 if startup failed or times out.ThreadPoolExecutor(max_workers=4)instead of sequentially.Which issue(s) this PR fixes:
Fixes DRC-2832
Special notes for your reviewer:
_do_lifespan_setupno longer callsschedule_lifetime_terminationorschedule_idle_timeout_checkdirectly — those asyncio APIs can't run in a thread. They're now called in the asyncbackground_loadfunction after the thread returns.StartupPerfTrackergains athreading.Lockfor thread safety during parallel artifact loading.getattrfor backward compatibility — tests that don't trigger the lifespan still work.RECCE_STARTUP_TIMEOUTenv var.Does this PR introduce a user-facing change?:
/api/healthresponse now includes two additional fields (additive, backward-compatible):ready(bool):truewhen server is fully loaded,falseduring startup or on errorerror(string|null): exception type name if startup failed,nullotherwiseExisting consumers that only check
status: "ok"are unaffected.