feat(api-gateway): CORE-418 readiness probe data model check#10773
feat(api-gateway): CORE-418 readiness probe data model check#10773igorlukanin wants to merge 2 commits intomasterfrom
Conversation
…fail readiness probe on broken data model
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
Claude finished @igorlukanin's task in 2m 27s —— View job Code Review: CORE-418 Readiness Probe Data Model Check
Overall AssessmentThis is a well-scoped, well-tested feature that solves a real production problem. The opt-in design, early-exit on failure, and clear separation from Code Quality
The implementation is clean and follows the existing patterns in the codebase. A few observations:
SecurityNo concerns. The feature:
PerformanceThe design is sound:
TestingThe 5 new tests are well-structured and cover the key scenarios:
Possible additions (nice-to-have, not blocking):
DocumentationBoth docs changes are clear and accurate:
Minor doc note: The deployment.mdx text says "compile the data model for every tenant on each probe" — this is technically true, but readers might worry about performance. Worth noting (even parenthetically) that compilations are cached and the probe response itself has a 1s cache TTL, so steady-state cost is minimal. SummaryThis is a solid, well-tested feature that addresses a real production gap. The code follows existing patterns, the tests cover the important scenarios, and the documentation is clear. Blocking: None. Non-blocking suggestions:
|
…catch and warn on empty contexts
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10773 +/- ##
==========================================
+ Coverage 57.96% 57.99% +0.02%
==========================================
Files 215 215
Lines 16776 16795 +19
Branches 3389 3394 +5
==========================================
+ Hits 9725 9740 +15
- Misses 6559 6561 +2
- Partials 492 494 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Linear: CORE-418
A user reported that merging a data model with two definitions of the same view caused Cube to successfully promote the build to production: the API process stayed alive and existing health checks passed, but every request that needed the schema (querying, Playground, SQL Runner, meta) broke with a compile error like:
The existing
/readyzprobe only verifies orchestrator connectivity in standalone mode; in multi-tenant deployments the schema compiles lazily on first request per tenant, so a broken model passes deploy-time checks and silently breaks the deployment.This PR adds an opt-in readiness check that compiles the data model for every tenant returned by
scheduledRefreshContexts(or once with an empty security context for single-tenant deployments). When enabled,/readyzreturns 500 if any tenant fails to compile, so the deployment platform refuses to promote the bad build and the previous (working) version keeps serving traffic.CUBEJS_READINESS_CHECK_DATA_MODEL(defaultfalse)./livezis intentionally not changed — failing it would crash-loop the pod on a bad model.CompilerApi.getCompilersalready caches bycompilerVersion, so steady-state probe cost is dominated by cache hits. The existingcachedHandler(1s TTL) coalesces concurrent probes.Files changed
packages/cubejs-backend-shared/src/env.ts— registerreadinessCheckDataModel.packages/cubejs-api-gateway/src/gateway.ts— extendreadinesshandler withcheckDataModelCompiles().packages/cubejs-api-gateway/test/index.test.ts— 5 new jest tests covering: env disabled, single-tenant healthy, single-tenant broken, multi-tenant healthy, multi-tenant one-broken.docs-mintlify/reference/configuration/environment-variables.mdx— new env var section.docs-mintlify/cube-core/deployment.mdx— paragraph in the existing health-checks section pointing to the new env var.Test plan
yarn build— TS compiles (no new errors; pre-existingsql-server.tserrors unrelated).yarn lint— 0 errors.cubejs-backend-sharedunit tests — 308 passing.cubejs-api-gatewayunit tests — 185 passing, including 5 new healthcheck tests.cubejs/cube:devbuilt from this branch) end-to-end scenarios:/readyz200 (existing behavior preserved)false, broken model →/readyz200true, healthy single-tenant →/readyz200true, broken single-tenant →/readyz500, expected error reproducedtrue, multi-tenant all healthy →/readyz200, three compilations executedtrue, multi-tenant one broken →/readyz500/readyz500 → 200 withincachedHandlerTTL, no restart/livezreturns 200 in every scenario regardless of compile state