DEV-1392: add --ingest-on-startup to slayer serve / slayer mcp#116
Conversation
Opt-in boot-time idempotent auto-ingestion across every configured
datasource, exposed three ways:
* --ingest-on-startup flag on `slayer serve` and `slayer mcp`
* SLAYER_INGEST_ON_STARTUP=1 (truthy: 1/true/yes, flag wins over env)
* ingest_on_startup=True kwarg on create_app / create_mcp_server
Sequential, sync-before-listen (uvicorn / mcp.run don't start until
ingest finishes). Continue-on-failure per datasource — only
`storage.list_datasources()` raising prevents server startup. Drift
entries are printed but never auto-applied (apply_drift_deletes stays
gated behind `slayer validate-models --force-clean`). All output goes
to stderr so `slayer mcp` stdio JSON-RPC stays protocol-safe.
Refactors:
* _friendly_db_error moved from slayer/mcp/server.py to
slayer/engine/ingestion.py (no re-export shim; mcp/server.py
imports from the new location).
* _print_ingest_addition / _print_ingest_drift_and_errors moved
from slayer/cli.py to slayer/engine/ingestion.py, gained a
file= kwarg defaulting to sys.stdout.
Docs: new "Ingesting at Startup" section in docs/concepts/ingestion.md,
flag/env/kwarg threaded through all getting-started + interface +
reference docs + README + CLAUDE.md + slayer-overview skill.
45 tests in tests/test_startup_ingest.py cover orchestrator behaviour,
CLI flag plumbing, env-var precedence, demo-then-ingest ordering,
programmatic kwarg paths, and propagation of list_datasources errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ding Two findings from codex MCP review of the prior commit: 1. HIGH — `storage.get_datasource(name)` was outside the per-datasource `try` in `ingest_all_datasources_idempotent`. A YAML parse error (or any non-None raise from a single datasource's config load) would abort the whole startup pass, violating the documented "only `list_datasources()` raising prevents boot" contract. Wrapped it in the same `Exception` handler used for `ingest_datasource_idempotent`, and added `test_get_datasource_raises_does_not_abort_iteration`. 2. LOW — `_print_ingest_addition` / `_print_ingest_drift_and_errors` had `file: TextIO = sys.stdout` defaults that bind at import time. `contextlib.redirect_stdout` callers would be bypassed. Switched to `file: Optional[TextIO] = None` and resolve to `sys.stdout` inside the function. 46 tests pass (was 45). Full unit suite + ruff still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds opt-in, idempotent model ingestion at REST and MCP server boot via ChangesStartup Ingestion Orchestration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
.claude/skills/slayer-overview.md (1)
15-15: ⚡ Quick winTighten sentence grammar for readability.
This line reads as a fragment (“Can be triggered…”). Consider making it a full sentence so the architecture bullet is easier to scan.
Proposed wording
-- **Ingestion** — auto-generates models from DB schema with rollup-style FK joins (denormalized LEFT JOINs). Can be triggered manually (`slayer ingest`, `ingest_datasource_models`, `POST /ingest`) or **on every server boot** via `slayer serve --ingest-on-startup` / `slayer mcp --ingest-on-startup` (also `SLAYER_INGEST_ON_STARTUP=1`, or `create_app/create_mcp_server(ingest_on_startup=True)` programmatically). Idempotent; continue-on-failure per datasource. +- **Ingestion** — auto-generates models from DB schema with rollup-style FK joins (denormalized LEFT JOINs). **It can be triggered** manually (`slayer ingest`, `ingest_datasource_models`, `POST /ingest`) or **on every server boot** via `slayer serve --ingest-on-startup` / `slayer mcp --ingest-on-startup` (also `SLAYER_INGEST_ON_STARTUP=1`, or `create_app/create_mcp_server(ingest_on_startup=True)` programmatically). It is idempotent and continues on per-datasource failures.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/slayer-overview.md at line 15, The bullet has a sentence fragment; rewrite it into a full, clear sentence (or two) so the ingestion behavior is easy to scan: state that ingestion auto-generates models from DB schema with rollup-style FK joins (denormalized LEFT JOINs) and that it can be triggered manually via the commands `slayer ingest`, `ingest_datasource_models`, or the `POST /ingest` endpoint, and add a second sentence listing startup triggers (`slayer serve --ingest-on-startup`, `slayer mcp --ingest-on-startup`, env var `SLAYER_INGEST_ON_STARTUP=1`, or programmatic `create_app/create_mcp_server(ingest_on_startup=True)`), noting idempotence and continue-on-failure per datasource.slayer/api/server.py (1)
121-125: ⚡ Quick winKeep startup-ingest imports at the top of the module.
Lines 122–125 add local imports inside
create_app; these should be moved to module scope to follow repo conventions.As per coding guidelines, "Imports at the top of files."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/api/server.py` around lines 121 - 125, The local imports inside create_app (the conditional block using ingest_on_startup) should be moved to module scope: import sys, from slayer.async_utils import run_sync, and from slayer.engine.ingestion import ingest_all_datasources_idempotent should be declared at the top of slayer/api/server.py instead of inside create_app; update create_app to reference these already-imported symbols (and remove the in-function import statements) so startup ingestion still calls run_sync(ingest_all_datasources_idempotent(...)) when ingest_on_startup is true.slayer/mcp/server.py (1)
907-912: ⚡ Quick winMove function-local imports to module scope.
Lines 908–912 introduce local imports in a hot factory path. Please keep these at file top for consistency with repo standards.
Suggested patch
import json import logging +import sys from typing import Any, Dict, List, NamedTuple, Optional, Tuple import sqlalchemy as sa +from slayer.async_utils import run_sync from slayer.core.enums import DataType @@ from slayer.core.query import ModelExtension, SlayerQuery from slayer.engine.ingestion import _friendly_db_error +from slayer.engine.ingestion import ingest_all_datasources_idempotent @@ def create_mcp_server( @@ if ingest_on_startup: - import sys - - from slayer.async_utils import run_sync - from slayer.engine.ingestion import ingest_all_datasources_idempotent - run_sync( ingest_all_datasources_idempotent(storage=storage, stream=sys.stderr) )As per coding guidelines, "Imports at the top of files."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/mcp/server.py` around lines 907 - 912, The local imports inside the ingest_on_startup conditional should be moved to module scope: add import sys, from slayer.async_utils import run_sync, and from slayer.engine.ingestion import ingest_all_datasources_idempotent at the top of server.py, then remove the in-block imports in the ingest_on_startup branch so the code still calls run_sync(ingest_all_datasources_idempotent(...)) but without performing imports on the hot path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/concepts/ingestion.md`:
- Around line 205-207: The two fenced code blocks showing the startup summary
(the lines containing "Ingest-on-startup: N/M datasources ingested" and
"Ingest-on-startup: N/M datasources ingested (K failed: name1, name2)") are
missing a language tag; add "text" (or "bash") after the opening triple
backticks for both code fences so the blocks read ```text to satisfy
markdownlint MD040 and keep the docs lint-clean.
---
Nitpick comments:
In @.claude/skills/slayer-overview.md:
- Line 15: The bullet has a sentence fragment; rewrite it into a full, clear
sentence (or two) so the ingestion behavior is easy to scan: state that
ingestion auto-generates models from DB schema with rollup-style FK joins
(denormalized LEFT JOINs) and that it can be triggered manually via the commands
`slayer ingest`, `ingest_datasource_models`, or the `POST /ingest` endpoint, and
add a second sentence listing startup triggers (`slayer serve
--ingest-on-startup`, `slayer mcp --ingest-on-startup`, env var
`SLAYER_INGEST_ON_STARTUP=1`, or programmatic
`create_app/create_mcp_server(ingest_on_startup=True)`), noting idempotence and
continue-on-failure per datasource.
In `@slayer/api/server.py`:
- Around line 121-125: The local imports inside create_app (the conditional
block using ingest_on_startup) should be moved to module scope: import sys, from
slayer.async_utils import run_sync, and from slayer.engine.ingestion import
ingest_all_datasources_idempotent should be declared at the top of
slayer/api/server.py instead of inside create_app; update create_app to
reference these already-imported symbols (and remove the in-function import
statements) so startup ingestion still calls
run_sync(ingest_all_datasources_idempotent(...)) when ingest_on_startup is true.
In `@slayer/mcp/server.py`:
- Around line 907-912: The local imports inside the ingest_on_startup
conditional should be moved to module scope: add import sys, from
slayer.async_utils import run_sync, and from slayer.engine.ingestion import
ingest_all_datasources_idempotent at the top of server.py, then remove the
in-block imports in the ingest_on_startup branch so the code still calls
run_sync(ingest_all_datasources_idempotent(...)) but without performing imports
on the hot path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 53a1d91d-d08f-4a4c-aa5d-1258e588fd7c
📒 Files selected for processing (22)
.claude/skills/slayer-overview.mdCLAUDE.mdREADME.mddocs/concepts/ingestion.mddocs/configuration/datasources.mddocs/examples/03_auto_ingest/auto_ingest.mddocs/getting-started/cli.mddocs/getting-started/mcp.mddocs/getting-started/python.mddocs/getting-started/rest-api.mddocs/index.mddocs/interfaces/cli.mddocs/interfaces/mcp.mddocs/interfaces/rest-api.mddocs/reference/cli.mddocs/reference/mcp.mddocs/reference/rest-api.mdslayer/api/server.pyslayer/cli.pyslayer/engine/ingestion.pyslayer/mcp/server.pytests/test_startup_ingest.py
…est-on-startup-flag-to-slayer-serve-and-slayer-mcp # Conflicts: # slayer/cli.py # slayer/engine/ingestion.py # slayer/mcp/server.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
docs/concepts/ingestion.md (1)
205-213:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd language tags to fenced code blocks.
Both fenced blocks showing the startup summary output are missing language specifiers, triggering markdownlint MD040. Add
textafter the opening triple backticks for both blocks.📝 Suggested fix
-``` +```text Ingest-on-startup: N/M datasources ingestedor, when at least one failed:
-
+text
Ingest-on-startup: N/M datasources ingested (K failed: name1, name2)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/concepts/ingestion.md` around lines 205 - 213, The two fenced code blocks in docs/concepts/ingestion.md that show the startup summary ("Ingest-on-startup: N/M datasources ingested" and "Ingest-on-startup: N/M datasources ingested (K failed: name1, name2)") are missing language tags; update both opening triple-backtick lines to use ```text so the blocks become ```text ... ``` which will satisfy markdownlint MD040 and mark them as plain text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@docs/concepts/ingestion.md`:
- Around line 205-213: The two fenced code blocks in docs/concepts/ingestion.md
that show the startup summary ("Ingest-on-startup: N/M datasources ingested" and
"Ingest-on-startup: N/M datasources ingested (K failed: name1, name2)") are
missing language tags; update both opening triple-backtick lines to use ```text
so the blocks become ```text ... ``` which will satisfy markdownlint MD040 and
mark them as plain text.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b4e375cb-e0ef-43c2-b1d3-d9a65a9e93ef
📒 Files selected for processing (9)
.claude/skills/slayer-overview.mdCLAUDE.mddocs/concepts/ingestion.mddocs/getting-started/cli.mddocs/reference/cli.mdslayer/api/server.pyslayer/cli.pyslayer/engine/ingestion.pyslayer/mcp/server.py
✅ Files skipped from review due to trivial changes (3)
- docs/getting-started/cli.md
- docs/reference/cli.md
- CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (1)
- slayer/cli.py
Group A — `captured[0]` IndexError risk (Sonar S6466 ×2):
* tests/test_startup_ingest.py:486, 493 — argparse-stub tests now
`assert len(captured) == 1` before indexing.
Group B — dict literal style (Sonar S7498 ×2):
* tests/test_startup_ingest.py — _serve_args / _mcp_args now use
{...} literals instead of dict(...) constructors.
Group C — docs polish:
* docs/concepts/ingestion.md — add `text` language tag to the two
startup-summary fenced blocks (markdownlint MD040; CodeRabbit
thread r3226214839).
* .claude/skills/slayer-overview.md — reword sentence fragment
"Can be triggered..." to a full sentence (CodeRabbit nitpick).
Group D — NOSONAR suppressions for genuine false positives + two
findings whose proper fix is a major refactor out of scope here:
* slayer/api/server.py::create_app (S3776 — cog complexity 96).
The function is FastAPI's route-handler factory; complexity is
the sum of every @app.<route> closure body. Splitting would need
DI of engine/storage into a separate module — not a same-PR fix.
* slayer/mcp/server.py::create_mcp_server (S3776 — cog complexity
649). Same shape, every @mcp.tool() handler inlined.
* tests/test_startup_ingest.py — 7× S7503 on async stubs that MUST
be `async def` because they're AsyncMock.side_effect or replace
an async API consumed by run_sync. Plus 1× S125 on a section
separator misread as commented-out code.
Verification: 2466 unit tests pass (no regressions); ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`--ingest-on-startup` triggers per-datasource embedding refresh through the existing `_refresh_datasource_embeddings` hook in `ingest_datasource_idempotent`. The hook hash-skips entities whose content hasn't changed, but only runs when the `embedding_search` extra is installed AND a provider key is in the environment. Update getting-started/mcp.md so the recommended `uvx --from 'motley-slayer[...]'` snippets include `embedding_search`, document the provider-key contract (OPENAI_API_KEY by default; override the model with SLAYER_EMBEDDING_MODEL=voyage/... + VOYAGE_API_KEY, etc.), and show the JSON-config `env` block as the place to set the key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|



Summary
Opt-in boot-time idempotent auto-ingestion across every configured datasource. Three surfaces:
slayer serve --ingest-on-startupandslayer mcp --ingest-on-startupSLAYER_INGEST_ON_STARTUP=1env var (truthy:1/true/yes, case-insensitive; flag wins over env when set)create_app(storage=..., ingest_on_startup=True)andcreate_mcp_server(storage=..., ingest_on_startup=True)programmatic kwargsSync-before-listen — uvicorn /
mcp.rundo not start until ingest finishes. Continue-on-failure per datasource (a single bad config or raise fromget_datasource/ingest_datasource_idempotentdoesn't abort startup). Onlystorage.list_datasources()raising propagates and prevents boot. Drift entries are printed but never auto-applied —apply_drift_deletesstays gated behindslayer validate-models --force-clean. All output goes to stderr soslayer mcpstdio JSON-RPC stays protocol-safe.Composes with
--demo: demo runs first (creates the Jaffle Shop datasource), then the startup-ingest pass walks every datasource including the freshly-created demo (cheap idempotent no-op).Refactors
_friendly_db_errormoved fromslayer/mcp/server.pytoslayer/engine/ingestion.py(no re-export shim; allmcp/server.pycall sites updated)._print_ingest_addition/_print_ingest_drift_and_errorsmoved fromslayer/cli.pytoslayer/engine/ingestion.py, gained a call-time-resolvedfile=kwarg socontextlib.redirect_stdoutcallers aren't bypassed.Codex-review fix included
storage.get_datasource(name)is wrapped in the per-datasource try, so a YAML parse error / invalid config on one datasource no longer aborts the whole boot pass.Test plan
poetry run pytest tests/test_startup_ingest.py -m "not integration"— 46 cases (orchestrator behaviour, CLI flag plumbing, env precedence, demo-then-ingest ordering, programmatic kwarg,list_datasourcespropagation,get_datasourceraise).poetry run pytest -m "not integration"— full unit suite: 2268 passed.poetry run ruff check slayer/ tests/— clean.slayer serve --ingest-on-startupagainst a directory with a healthy datasource YAML; verify stderr summary line + models become queryable on first request.SLAYER_INGEST_ON_STARTUP=1 slayer mcpand verify stdout has zero bytes before first JSON-RPC frame.--demo --ingest-on-startuptogether —_prepare_demoruns first, then the startup-ingest pass logs the demo datasource as a no-op.Linear
DEV-1392
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests