Skip to content

v0.3 alpha: local file ingest + run chains#4

Merged
arpan-mondal merged 10 commits into
phase-2/v0.2-handofffrom
phase-3/v0.3-file-ingest-and-chains
Jun 4, 2026
Merged

v0.3 alpha: local file ingest + run chains#4
arpan-mondal merged 10 commits into
phase-2/v0.2-handofffrom
phase-3/v0.3-file-ingest-and-chains

Conversation

@arpan-mondal
Copy link
Copy Markdown
Contributor

Stacked on phase-2/v0.2-handoff so this PR shows only the v0.3 delta (10 commits, 22 files). The v0.2 work it builds on is gated separately (see PHASE-2-HANDOFF.md).

What this adds

Closes the credential-free loop: get local files into signed KOs and compose AI runs, all runnable offline with the deterministic adapter.

W1 — Local file ingest (no credentials)

  • stacy brain create --file <path> — a text/markdown/json file → signed document KO. Media-type inference, binary/UTF-8 detection, size guard, leak-safe relative source labels (absolute paths never persisted).
  • stacy brain create --dir/--glob/--ext/--yes — one KO per file via native fs.globSync (no new dep). Skips dotfiles/node_modules/.git/symlinks; binary/oversized skipped mid-batch with a warning; confirmation prompt.

W3 — Run chains

  • stacy run --chain <spec.json> — ordered steps[] where a later step consumes an earlier step's agent_output via @<stepId>. Validated before egress, gated once, one-hop provenance, abort-on-step-failure with durable prior outputs.

Correctness fixes (found by review + live smoke)

  • #7 agent_output content no longer embeds a wall-clock generatedAt → identical runs hash identically.
  • #8 extracted a reusable runOnce executor (single-run verb + chain share it; single-run behavior unchanged).
  • Cache keying — the run cache now keys inputs on content, not the timestamp-stamped KO hash. Without this, a chain's downstream step missed the cache every run (the KO contentHash folds in createdAt). Caught by the live smoke; now an identical chain re-run makes zero adapter calls.
  • --json hygienecreateDb now silences Postgres NOTICEs that were polluting --json stdout on an initialized DB (pre-existing, affected all DB-touching federation commands).

Verification

  • Full federation suite 319 passed / 11 skipped; both packages typecheck clean.
  • Live smoke against real Postgres: brain create --dir (binary skipped, leak-safe labels) → run --chain 2-step round-trip (step 2 reads step 1's real output KO) → re-run fully cached (both steps cached: true) → provenance one-hop + create/sign/run receipts. --json before/after confirmed parseable.

Process

Eng-reviewed (/plan-eng-review + Codex outside voice). The outside voice reshaped the approach away from a premature connector-framework refactor toward the smaller brain create --file path. See docs/stacy/PHASE-3-FEDERATION.md.

Not in scope (deferred, tracked in TODOS.md)

  • Linear connector (W2, needs PAT-vs-OAuth decision) + real-API robustness (W4) → v0.3.1
  • Generic ingest refactor / fs as a real connector; brain lineage graph-walk command

Gate

Per the plan, do not tag a public v0.3 until the v0.2 external-credential validation clears (PHASE-2-HANDOFF.md).

🤖 Generated with Claude Code

arpan-mondal and others added 10 commits June 3, 2026 09:06
v0.3 = brain create --file (zero-credential file ingest) + run chains.
Reviewed via /plan-eng-review + Codex outside voice; the outside voice
reshaped the approach away from a premature connector-framework refactor
toward the smaller brain create --file path. Captures two real bugs to fix
in the existing run path (generatedAt breaks content-hash caching; extract
a reusable runOnce executor). Connector refactor, Linear, robustness, and a
brain lineage command deferred to TODOS.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
W1 of Phase 3. Turn a local text/markdown/json file into a signed
Knowledge Object with no connector and no credentials, closing the
credential-free ingest->run loop.

- New pure module src/brain/file-document.ts: wraps file bytes in a
  uniform { kind: "document", source, mediaType, text|data } envelope.
  Media-type inference, binary detection (NUL byte), strict UTF-8 decode,
  max-bytes guard, and JSON parsing for .json files. Fully unit-tested
  without touching disk.
- Source label is a cwd-relative path (basename if the file sits outside
  cwd) so absolute paths never leak into shareable KOs (review #10).
  --source-label overrides.
- brain create gains --file / --source-label / --max-bytes; exactly one of
  --content-json / --prompt / --file is allowed.

Tests: 16 passing (file-document unit + verb integration incl. leak-safety).
Typecheck clean (federation + cli).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
W1 of Phase 3, directory mode. Create one signed Knowledge Object per file
under a directory, still credential-free.

- New src/brain/file-walk.ts using Node 24 native fs.globSync (no new dep):
  recursively enumerates files, skips dotfiles/dotdirs, node_modules, .git,
  and symlinks; optional --glob pattern and --ext allowlist. Pure
  excludeReason() predicate is unit-tested; the fs walk is integration-tested.
- brain create gains --dir/--glob/--ext/--yes. Shows a confirmation summary
  (file count + preview) before writing; --yes/--json skip it. Binary and
  oversized files are skipped with a warning mid-batch (never abort the run).
- Leak-safe labels reused from W1; absolute paths never persisted.

Tests: full federation suite 303 passed / 11 skipped (+24, no regressions).
Typecheck clean (federation + cli).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…g #7)

agent_output content embedded a live generatedAt timestamp, so two
identical runs produced different content hashes. In a run chain this meant
a downstream step's input hash changed every run, defeating the run-result
cache ("re-bill only changed steps" was false).

Remove generatedAt from the hashed content envelope. The generation time is
already captured on the KO record (createdAt/storedAt) and the run receipt,
so nothing is lost. This matches the existing dashboard-content /
prompt-output convention of keeping hashed content timestamp-free.

Adds a determinism test: two identical buildAgentOutputContent calls produce
byte-identical canonical content. Full suite: 304 passed / 11 skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
agentRunCommand inlined the whole run (load+verify inputs, cache lookup,
adapter invoke, sign+store) and returned void. Run chains need to invoke a
step and get back the stored KO id to feed the next step.

Extract runOnce(params, context) -> { koId, contentHash, fromCache, ... }.
agentRunCommand now resolves runtime/db/cache, calls runOnce, and formats
the summary; single-run behavior is byte-for-byte unchanged (6 existing
tests still pass). Adds a direct runOnce test proving it returns the KO id
and that identical runs reuse the cache and produce the same id/hash (#7+#8
together = the chain-caching contract).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Compose runs: a later step consumes an earlier step's agent_output KO via
an @<stepId> reference.

- src/runs/chain.ts (pure): parse + fully validate a JSON chain spec and
  resolve @refs. Rejects forward/unknown/self/duplicate refs up front, so a
  malformed chain fails BEFORE the egress gate or any KO read.
- verbs/run.ts: runChainCommand orchestrates steps through the shared runOnce
  executor. Egress is gated ONCE up front (factored into
  resolveAdapterWithEgressGate, shared with the single-run path). One cache
  instance is reused across steps, so with the #7 determinism fix an identical
  chain re-run reuses every step. A step failure aborts and names the failed
  step; already-produced step KOs stay durable. Provenance is one-hop.
- CLI: stacy run --chain <spec.json>; exported from the federation verbs barrel.

Tests: chain spec unit (9) + chain orchestration incl. 2-step @ref round-trip,
egress-once, step-failure abort, forward-ref rejection. Full suite 318 passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…art, CHANGELOG)

- cli-reference.md: brain create --file/--dir/--glob/--ext/--source-label/
  --max-bytes/--yes; run --chain section with spec + guarantees.
- concepts/ai-runs.md: "Run chains (v0.3)" section.
- docs/v0.3-files-and-chains-quickstart.md: end-to-end credential-free loop.
- CHANGELOG [Unreleased]: v0.3 Added/Fixed/Changed entries.
- PHASE-3 plan: exit criteria checked off.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Live smoke caught it: a KO's contentHash (and id) folds in createdAt
(knowledge-object.ts), so every run produces a new KO id even for identical
content. The #7 fix made the agent_output envelope deterministic, but the run
cache still keyed inputs on the KO contentHash, so a chain's downstream step
(input = the prior step's freshly-created output KO) missed the cache every
run — only leaf steps cached.

Key the run cache on a content-addressed hash (contentType + content),
independent of createdAt. Two KOs with identical content now share a cached
adapter result, so an identical chain re-run makes zero adapter calls (proven
live: per_doc + synthesis both cached=true on re-run). KO contentHash is still
used for provenance.

Regression test: two KOs with identical content but different createdAt (=>
different ids/hashes) hit the cache. Full suite 319 passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rvation

Found during the v0.3 live smoke. The NOTICE-on-stdout bug is pre-existing and
affects all DB-touching federation commands; tracked for a follow-up fix in the
DB connection factory.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Federation commands self-bootstrap their tables/indexes with CREATE TABLE/INDEX
IF NOT EXISTS. Against an already-initialized DB, Postgres emits NOTICE
("relation ... already exists, skipping") and the `postgres` driver prints it to
stdout, ahead of the command's JSON. This corrupted `--json` output for every
DB-touching federation command (a strict parser chokes on the NOTICE objects).

createDb now passes `onnotice: () => {}`, matching the existing createUtilitySql
factory in the same file. Verified live before/after against an initialized DB:
`brain create --file --json` previously emitted three NOTICE objects before the
JSON (unparseable); now emits only the JSON object (python3 -m json.tool parses).

Pre-existing issue, found during the v0.3 live smoke. Affects all DB-touching
federation commands, not just run --chain.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@arpan-mondal arpan-mondal merged commit 9449bd9 into phase-2/v0.2-handoff Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant