Skip to content

feat(server): speed up recce server startup with background loading#1129

Merged
gcko merged 8 commits into
mainfrom
feature/drc-2832-recce-load-the-linear-data-from-recce_statejson-directly
Mar 2, 2026
Merged

feat(server): speed up recce server startup with background loading#1129
gcko merged 8 commits into
mainfrom
feature/drc-2832-recce-load-the-linear-data-from-recce_statejson-directly

Conversation

@kentwelcome
Copy link
Copy Markdown
Member

@kentwelcome kentwelcome commented Feb 26, 2026

PR checklist

  • Ensure you have added or ran the appropriate tests for your PR.
  • DCO signed

What type of PR is this?

Performance improvement

What this PR does / why we need it:

In cloud deployments, recce server takes 20+ seconds before /api/health returns 200 because the FastAPI lifespan synchronously loads all dbt artifacts before the HTTP server starts listening. This PR makes the server start accepting connections immediately.

Three complementary changes:

  • Background loading: Move _do_lifespan_setup() to a background task via asyncio.to_thread. The lifespan yields immediately so uvicorn starts listening right away.
  • Readiness gate middleware: /api/health passes through immediately (returns 200). All other /api/* endpoints await ready_event.wait() until loading completes. Non-API routes (SPA, static assets) pass through immediately. Returns 503 if startup failed or times out.
  • Parallel artifact loading: Load the 4 dbt artifact files (base/curr manifest + base/curr catalog) concurrently via ThreadPoolExecutor(max_workers=4) instead of sequentially.

Which issue(s) this PR fixes:

Fixes DRC-2832

Special notes for your reviewer:

  • _do_lifespan_setup no longer calls schedule_lifetime_termination or schedule_idle_timeout_check directly — those asyncio APIs can't run in a thread. They're now called in the async background_load function after the thread returns.
  • StartupPerfTracker gains a threading.Lock for thread safety during parallel artifact loading.
  • The readiness middleware uses getattr for backward compatibility — tests that don't trigger the lifespan still work.
  • Startup timeout for the readiness gate defaults to 300s, configurable via RECCE_STARTUP_TIMEOUT env var.

Does this PR introduce a user-facing change?:

/api/health response now includes two additional fields (additive, backward-compatible):

  • ready (bool): true when server is fully loaded, false during startup or on error
  • error (string|null): exception type name if startup failed, null otherwise

Existing consumers that only check status: "ok" are unaffected.

Move artifact loading to a background task so the HTTP server starts
accepting connections immediately. This reduces /api/health response
time from 20+ seconds to <1 second in cloud deployments.

Changes:
- Lifespan runs _do_lifespan_setup in background via asyncio.to_thread
- New readiness gate middleware: /api/health passes through immediately,
  all other endpoints wait until loading completes
- Artifact files (manifests + catalogs) load in parallel via ThreadPoolExecutor
- StartupPerfTracker is now thread-safe with threading.Lock

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kent Huang <kent@infuseai.io>
Copilot AI review requested due to automatic review settings February 26, 2026 06:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes recce server startup time by moving heavy artifact loading to a background thread, allowing the HTTP server to start accepting connections immediately. The changes implement a three-part strategy: background loading with asyncio.to_thread(), a readiness gate middleware that blocks non-health endpoints until loading completes, and parallel artifact loading using ThreadPoolExecutor.

Changes:

  • Background loading: Server lifespan now yields immediately after spawning a background task, reducing time-to-first-response from 20+ seconds to near-instant for health checks
  • Readiness gate middleware: /api/health passes through immediately for liveness probes, while data endpoints wait for startup or return 503 on failure
  • Parallel artifact loading: Four dbt artifact files now load concurrently instead of sequentially

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
recce/server.py Refactored lifespan to use background_load async function with asyncio.to_thread(), added ready_event/startup_error state tracking, and implemented readiness_gate middleware
recce/adapter/dbt_adapter/init.py Modified load_artifacts() to load 4 dbt artifacts in parallel using ThreadPoolExecutor instead of sequentially
recce/util/startup_perf.py Added threading.Lock to StartupPerfTracker for thread-safe recording of timings and artifact sizes during parallel loading
tests/test_server_lifespan.py Added await ready_event.wait() to ensure background loading completes before assertions
tests/test_server.py Added TestReadinessGate test class covering health endpoint passthrough and 503 error handling on startup failure

Comment thread recce/adapter/dbt_adapter/__init__.py Outdated
Comment thread recce/adapter/dbt_adapter/__init__.py Outdated
Comment thread tests/test_server.py
kentwelcome and others added 2 commits February 26, 2026 14:50
When tests set app.state to a MagicMock, getattr returns a MagicMock
for ready_event instead of None. Use isinstance(ready_event, asyncio.Event)
to avoid awaiting a MagicMock in test environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kent Huang <kent@infuseai.io>
- Move future.result() calls inside ThreadPoolExecutor with block
- Use consistent path variables for artifacts_files list
- Add test for data endpoint succeeding after ready_event is set

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kent Huang <kent@infuseai.io>
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 26, 2026

Codecov Report

❌ Patch coverage is 88.01498% with 32 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
recce/adapter/dbt_adapter/__init__.py 0.00% 18 Missing ⚠️
recce/cli.py 28.57% 5 Missing ⚠️
recce/server.py 93.24% 5 Missing ⚠️
recce/util/startup_perf.py 66.66% 2 Missing ⚠️
tests/test_server.py 97.18% 2 Missing ⚠️
Files with missing lines Coverage Δ
tests/test_cli.py 100.00% <100.00%> (ø)
tests/test_server_lifespan.py 100.00% <100.00%> (ø)
recce/util/startup_perf.py 94.44% <66.66%> (ø)
tests/test_server.py 99.00% <97.18%> (-0.45%) ⬇️
recce/cli.py 49.76% <28.57%> (+0.34%) ⬆️
recce/server.py 63.21% <93.24%> (+1.07%) ⬆️
recce/adapter/dbt_adapter/__init__.py 74.07% <0.00%> (ø)

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

kentwelcome and others added 4 commits February 26, 2026 20:29
Move expensive operations out of the CLI startup path so the HTTP port
opens within ~1-2s instead of ~18-25s:

- Remove state_loader.load() from create_state_loader() — deferred to
  RecceContext.load() during background lifespan setup
- Remove verify_required_artifacts() from server command — redundant
  with setup_server() in background; errors surface as 503
- Move update_onboarding_state() to _do_lifespan_setup() (non-fatal)
- Move read-only context loading to setup_ready_only() in background
- Enhance /api/health with ready/error fields for K8s readiness checks
- Add explicit state_loader.load() to non-server callers that need
  eager loading (summary, purge, upload, share)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kent Huang <kent@infuseai.io>
Cover the behavioral changes from deferring heavy CLI operations:
- _do_lifespan_setup onboarding state update (success, skip, failure)
- setup_ready_only context loading and set_default_context
- state_loader.verify() failure exits before uvicorn
- create_state_loader no longer calls load()
- health endpoint backward compat (ready=true without ready_event)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kent Huang <kent@infuseai.io>
- setup_ready_only: use load_context() for consistency with
  setup_server/setup_preview instead of manual RecceContext.load() +
  set_default_context()
- TestReadinessGate: extract fixture and helper to eliminate repetitive
  ready_event/startup_error setup/cleanup
- TestDoLifespanSetup: extract server_app_state fixture to reduce
  duplicated AppState construction
- Remove unnecessary prepare_api_token patch from
  test_create_state_loader_does_not_call_load

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kent Huang <kent@infuseai.io>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Comment thread recce/server.py
Comment thread recce/server.py Outdated
Comment thread recce/server.py Outdated
Comment thread recce/server.py Outdated
Comment thread recce/server.py Outdated
- Move clear_startup_tracker() to finally block so it runs on both
  success and failure paths
- Narrow readiness gate middleware to /api/* paths only, letting SPA
  and static assets pass through immediately
- Return only exception type name in /api/health error field to avoid
  leaking internal details (file paths, config values)
- Add startup timeout (default 300s, configurable via
  RECCE_STARTUP_TIMEOUT) to prevent indefinite request hangs if
  background loading stalls

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kent Huang <kent@infuseai.io>
@kentwelcome kentwelcome self-assigned this Mar 2, 2026
@kentwelcome kentwelcome requested a review from gcko March 2, 2026 04:04
Copy link
Copy Markdown
Contributor

@gcko gcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gcko gcko merged commit fa277da into main Mar 2, 2026
19 checks passed
@gcko gcko deleted the feature/drc-2832-recce-load-the-linear-data-from-recce_statejson-directly branch March 2, 2026 04:09
even-wei added a commit that referenced this pull request May 21, 2026
Captured via standalone Playwright pattern (T5's approach for PR #1129)
against duckdb fixture pr-1376-qd. Renderings:

- SS-2 / SS-2b: Cancelled chip persists through the in-flight waitRun
  poll window (6s hold). Pre-fix this reverted to Running.
- SS-3: post-reload — Query page returns to default editor state;
  runId still in localStorage (canceledRuns) for future references.
- SS-4: second tab opens to default Query page (no cross-tab carryover
  of editor state, but localStorage canceledRuns is shared).
- SS-5: RunResultPane shows Cancelled in both header and body.
- SS-6: POST /cancel_run round-trip 2ms (well under 2.5s budget).
- SS-7: PostHog cancel_run event deferred — OSS local has no dev
  telemetry; production capture tracked separately.

SS-1 (pre-existing) shows the Running state captured prior to the fix.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants