feat(server): speed up recce server startup with background loading by kentwelcome · Pull Request #1129 · DataRecce/recce

kentwelcome · 2026-02-26T06:42:37Z

PR checklist

Ensure you have added or ran the appropriate tests for your PR.
DCO signed

What type of PR is this?

Performance improvement

What this PR does / why we need it:

In cloud deployments, recce server takes 20+ seconds before /api/health returns 200 because the FastAPI lifespan synchronously loads all dbt artifacts before the HTTP server starts listening. This PR makes the server start accepting connections immediately.

Three complementary changes:

Background loading: Move _do_lifespan_setup() to a background task via asyncio.to_thread. The lifespan yields immediately so uvicorn starts listening right away.
Readiness gate middleware: /api/health passes through immediately (returns 200). All other /api/* endpoints await ready_event.wait() until loading completes. Non-API routes (SPA, static assets) pass through immediately. Returns 503 if startup failed or times out.
Parallel artifact loading: Load the 4 dbt artifact files (base/curr manifest + base/curr catalog) concurrently via ThreadPoolExecutor(max_workers=4) instead of sequentially.

Which issue(s) this PR fixes:

Fixes DRC-2832

Special notes for your reviewer:

_do_lifespan_setup no longer calls schedule_lifetime_termination or schedule_idle_timeout_check directly — those asyncio APIs can't run in a thread. They're now called in the async background_load function after the thread returns.
StartupPerfTracker gains a threading.Lock for thread safety during parallel artifact loading.
The readiness middleware uses getattr for backward compatibility — tests that don't trigger the lifespan still work.
Startup timeout for the readiness gate defaults to 300s, configurable via RECCE_STARTUP_TIMEOUT env var.

Does this PR introduce a user-facing change?:

/api/health response now includes two additional fields (additive, backward-compatible):

ready (bool): true when server is fully loaded, false during startup or on error
error (string|null): exception type name if startup failed, null otherwise

Existing consumers that only check status: "ok" are unaffected.

Move artifact loading to a background task so the HTTP server starts accepting connections immediately. This reduces /api/health response time from 20+ seconds to <1 second in cloud deployments. Changes: - Lifespan runs _do_lifespan_setup in background via asyncio.to_thread - New readiness gate middleware: /api/health passes through immediately, all other endpoints wait until loading completes - Artifact files (manifests + catalogs) load in parallel via ThreadPoolExecutor - StartupPerfTracker is now thread-safe with threading.Lock Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>

Copilot

Pull request overview

This PR optimizes recce server startup time by moving heavy artifact loading to a background thread, allowing the HTTP server to start accepting connections immediately. The changes implement a three-part strategy: background loading with asyncio.to_thread(), a readiness gate middleware that blocks non-health endpoints until loading completes, and parallel artifact loading using ThreadPoolExecutor.

Changes:

Background loading: Server lifespan now yields immediately after spawning a background task, reducing time-to-first-response from 20+ seconds to near-instant for health checks
Readiness gate middleware: /api/health passes through immediately for liveness probes, while data endpoints wait for startup or return 503 on failure
Parallel artifact loading: Four dbt artifact files now load concurrently instead of sequentially

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
recce/server.py	Refactored lifespan to use background_load async function with asyncio.to_thread(), added ready_event/startup_error state tracking, and implemented readiness_gate middleware
recce/adapter/dbt_adapter/init.py	Modified load_artifacts() to load 4 dbt artifacts in parallel using ThreadPoolExecutor instead of sequentially
recce/util/startup_perf.py	Added threading.Lock to StartupPerfTracker for thread-safe recording of timings and artifact sizes during parallel loading
tests/test_server_lifespan.py	Added await ready_event.wait() to ensure background loading completes before assertions
tests/test_server.py	Added TestReadinessGate test class covering health endpoint passthrough and 503 error handling on startup failure

When tests set app.state to a MagicMock, getattr returns a MagicMock for ready_event instead of None. Use isinstance(ready_event, asyncio.Event) to avoid awaiting a MagicMock in test environments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>

- Move future.result() calls inside ThreadPoolExecutor with block - Use consistent path variables for artifacts_files list - Add test for data endpoint succeeding after ready_event is set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>

codecov · 2026-02-26T07:00:31Z

Codecov Report

❌ Patch coverage is 88.01498% with 32 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
recce/adapter/dbt_adapter/__init__.py	0.00%	18 Missing ⚠️
recce/cli.py	28.57%	5 Missing ⚠️
recce/server.py	93.24%	5 Missing ⚠️
recce/util/startup_perf.py	66.66%	2 Missing ⚠️
tests/test_server.py	97.18%	2 Missing ⚠️

Files with missing lines	Coverage Δ
tests/test_cli.py	`100.00% <100.00%> (ø)`
tests/test_server_lifespan.py	`100.00% <100.00%> (ø)`
recce/util/startup_perf.py	`94.44% <66.66%> (ø)`
tests/test_server.py	`99.00% <97.18%> (-0.45%)`	⬇️
recce/cli.py	`49.76% <28.57%> (+0.34%)`	⬆️
recce/server.py	`63.21% <93.24%> (+1.07%)`	⬆️
recce/adapter/dbt_adapter/__init__.py	`74.07% <0.00%> (ø)`

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Move expensive operations out of the CLI startup path so the HTTP port opens within ~1-2s instead of ~18-25s: - Remove state_loader.load() from create_state_loader() — deferred to RecceContext.load() during background lifespan setup - Remove verify_required_artifacts() from server command — redundant with setup_server() in background; errors surface as 503 - Move update_onboarding_state() to _do_lifespan_setup() (non-fatal) - Move read-only context loading to setup_ready_only() in background - Enhance /api/health with ready/error fields for K8s readiness checks - Add explicit state_loader.load() to non-server callers that need eager loading (summary, purge, upload, share) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>

…from-recce_statejson-directly

Cover the behavioral changes from deferring heavy CLI operations: - _do_lifespan_setup onboarding state update (success, skip, failure) - setup_ready_only context loading and set_default_context - state_loader.verify() failure exits before uvicorn - create_state_loader no longer calls load() - health endpoint backward compat (ready=true without ready_event) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>

- setup_ready_only: use load_context() for consistency with setup_server/setup_preview instead of manual RecceContext.load() + set_default_context() - TestReadinessGate: extract fixture and helper to eliminate repetitive ready_event/startup_error setup/cleanup - TestDoLifespanSetup: extract server_app_state fixture to reduce duplicated AppState construction - Remove unnecessary prepare_api_token patch from test_create_state_loader_does_not_call_load Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

- Move clear_startup_tracker() to finally block so it runs on both success and failure paths - Narrow readiness gate middleware to /api/* paths only, letting SPA and static assets pass through immediately - Return only exception type name in /api/health error field to avoid leaking internal details (file paths, config values) - Add startup timeout (default 300s, configurable via RECCE_STARTUP_TIMEOUT) to prevent indefinite request hangs if background loading stalls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent Huang <kent@infuseai.io>

gcko

LGTM

Captured via standalone Playwright pattern (T5's approach for PR #1129) against duckdb fixture pr-1376-qd. Renderings: - SS-2 / SS-2b: Cancelled chip persists through the in-flight waitRun poll window (6s hold). Pre-fix this reverted to Running. - SS-3: post-reload — Query page returns to default editor state; runId still in localStorage (canceledRuns) for future references. - SS-4: second tab opens to default Query page (no cross-tab carryover of editor state, but localStorage canceledRuns is shared). - SS-5: RunResultPane shows Cancelled in both header and body. - SS-6: POST /cancel_run round-trip 2ms (well under 2.5s budget). - SS-7: PostHog cancel_run event deferred — OSS local has no dev telemetry; production capture tracked separately. SS-1 (pre-existing) shows the Running state captured prior to the fix. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>

Copilot AI review requested due to automatic review settings February 26, 2026 06:42

Copilot started reviewing on behalf of kentwelcome February 26, 2026 06:43 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Comment thread recce/adapter/dbt_adapter/__init__.py Outdated

Comment thread recce/adapter/dbt_adapter/__init__.py Outdated

Comment thread tests/test_server.py

kentwelcome and others added 2 commits February 26, 2026 14:50

kentwelcome and others added 4 commits February 26, 2026 20:29

Merge branch 'main' into feature/drc-2832-recce-load-the-linear-data-…

846b839

…from-recce_statejson-directly

kentwelcome requested a review from Copilot March 2, 2026 03:30

Copilot started reviewing on behalf of kentwelcome March 2, 2026 03:31 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

Comment thread recce/server.py

Comment thread recce/server.py Outdated

Comment thread recce/server.py Outdated

Comment thread recce/server.py Outdated

Comment thread recce/server.py Outdated

kentwelcome self-assigned this Mar 2, 2026

kentwelcome requested a review from gcko March 2, 2026 04:04

gcko approved these changes Mar 2, 2026

View reviewed changes

gcko merged commit fa277da into main Mar 2, 2026
19 checks passed

gcko deleted the feature/drc-2832-recce-load-the-linear-data-from-recce_statejson-directly branch March 2, 2026 04:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): speed up recce server startup with background loading#1129

feat(server): speed up recce server startup with background loading#1129
gcko merged 8 commits into
mainfrom
feature/drc-2832-recce-load-the-linear-data-from-recce_statejson-directly

kentwelcome commented Feb 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gcko left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kentwelcome commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gcko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kentwelcome commented Feb 26, 2026 •

edited

Loading

codecov Bot commented Feb 26, 2026 •

edited

Loading