Skip to content

feat(ingest): gate Phase-1 CGO parse on backend — ASTWalker serves file-level extracts (S4)#445

Merged
jamestexas merged 2 commits into
mainfrom
feat/s4b-gate-cgo-parse
Jun 25, 2026
Merged

feat(ingest): gate Phase-1 CGO parse on backend — ASTWalker serves file-level extracts (S4)#445
jamestexas merged 2 commits into
mainfrom
feat/s4b-gate-cgo-parse

Conversation

@jamestexas

Copy link
Copy Markdown
Contributor

What

S4 (mache-33de70) — the commitment point of the CGO-removal campaign. Until now the tree-sitter parse ran unconditionally in Phase-1, before walker selection, so mounting a pre-parsed _ast db saved zero CGO: the parse + sitter ExtractContext/ExtractGoImports/ExtractFileLevelRefs always ran.

This gates all of it on e.astWalker == nil:

On the ASTWalker path no CGO runs in ingest — which also removes the mache-2y9w tree-sitter SIGSEGV surface for that path.

Proof

The existing projection parity gate, now made genuine: sitterToASTDB emits an _imports table (derived from the same parse, as leyline would) so the AST path's ExtractGoImports has the import set qualified-call resolution needs. TestASTQueryParity now ingests the fixture through the ASTWalker with the CGO parse skipped and asserts byte-for-byte identical projection (nodes, children, content/context, callers-via-imports, Properties) vs the sitter path. Full internal/ingest suite + -race green.

Scope / honest limit

This is the engine mechanism. Production source ingestion does not call SetASTWalker yet (only the parity tests do), so CGO still runs in the default mache <dir> path until a follow-up wires SelectWalker / an _ast-db-detection step into the source-ingest entry. This PR makes that activation a near one-line change with a parity-proven engine beneath it.

🤖 Generated with Claude Code

…le-level extracts (S4)

S4 (mache-33de70): until now the tree-sitter parse ran unconditionally in
Phase-1 (engine_treesitter.go), BEFORE walker selection, so mounting a
pre-parsed _ast db saved zero CGO — the parse + sitter ExtractContext /
ExtractGoImports / ExtractFileLevelRefs always ran. This gates all of it on
`e.astWalker == nil`:

- parallel-worker Phase-1: skip parser.SetLanguage + ParseCtx + the three
  sitter extracts when an ASTWalker backend is present
- processTreeSitterResult AST arm: serve context / imports / file-level-refs
  from SQL (ExtractContext, ExtractGoImports, ExtractFileLevelRefs — the
  last landed in #443) instead
- single-file path (the mache-2y9w SIGSEGV source): same gate

So on the ASTWalker path NO CGO runs in ingest — which also removes the
mache-2y9w tree-sitter SIGSEGV surface for that path.

Proven by the existing projection parity gate, now made genuine: emit an
_imports table into sitterToASTDB (derived from the same parse, as leyline
would) so the AST path's ExtractGoImports has the import set qualified-call
resolution needs. TestASTQueryParity now ingests the fixture through the
ASTWalker with the CGO parse SKIPPED and asserts byte-for-byte identical
projection (nodes, children, content/context, callers via imports,
Properties) vs the sitter path. Full ingest suite + -race green.

SCOPE / honest limit: this is the engine MECHANISM. Production source
ingestion does not call SetASTWalker yet (only the parity tests do), so CGO
still runs in the default `mache <dir>` path until a follow-up wires
SelectWalker / an _ast-db-detection step into the source-ingest entry. This
PR makes that activation a one-line change with a parity-proven engine
beneath it.

Claude-Session: https://claude.ai/code/session_01FhfjttLFYPzgs3aBG45eBT
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

find_smells (advisory)

Scoped to files changed in this PR. Rules below run on the standalone (no-LLO) cross-ref tables; _ast rules (cyclomatic_complexity, long_function, long_file, magic_int_in_comparison) are not exercised here.

fan_out_skew — 3 finding(s) in changed files
Source Node Metric
internal/ingest/engine_treesitter.go ingest/methods/Engine.ingestTreeSitterParallel/source 58
internal/ingest/engine_treesitter.go ingest/methods/Engine.processTreeSitterResult/source 43
internal/ingest/engine_treesitter.go ingest/methods/Engine.ingestTreeSitter/source 29

Rules: see docs/ARCHITECTURE.md for the full registry. Advisory only — these are heuristics, not gates.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances the ingest pipeline’s CGO-removal campaign by making Phase-1 tree-sitter parsing and related “file-level extracts” conditional on the selected backend. When an ASTWalker backend is present, Phase-1 skips the tree-sitter parse work and Phase-2 serves context, Go imports, and file-level refs from the mounted SQL _ast database instead; parity is reinforced by extending the AST parity test DB to include _imports.

Changes:

  • Gate Phase-1 tree-sitter parse + sitter-based context/imports/file-level-ref extraction on e.astWalker == nil (parallel + single-file paths).
  • On the ASTWalker path in processTreeSitterResult, populate context / Go imports / file-level refs via SQL-backed extractors.
  • Extend the parity-test helper DB schema to emit/populate _imports from the same sitter parse (Go), matching ASTWalker’s import-resolution expectations.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
internal/ingest/engine_treesitter.go Gates tree-sitter parse/extracts on backend and serves equivalent extracts from AST SQL when using ASTWalker.
internal/ingest/ast_query_parity_test.go Adds _imports table to the synthetic AST DB and populates it for Go using sitter-derived imports to preserve parity under the new gating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/ingest/engine_treesitter.go
Comment thread internal/ingest/engine_treesitter.go
jamestexas added a commit that referenced this pull request Jun 25, 2026
… activation plan

The CGO-removal step-4 commitment point's ENGINE work has landed and is
byte-parity-proven, but the ADR still read "gated on mache-33dc5f". Update it
to reflect reality and to record where activation goes (answering "where
should SetASTWalker be used?"):

- Interface widening (S3, #431/#432/#433) + Phase-1 parse gating (S4, #445):
  the Engine skips the CGO parse and serves context/imports/file-level-refs
  from SQL when an ASTWalker backend is present; TestASTQueryParity proves
  byte-identical projection with the parse skipped.
- What remains is activation + deletion, NOT the engine. Activation: insert a
  leyline-parse-first step in the read-only source-ingest fast path
  (cmd/mount.go, the SchemaUsesTreeSitter branch), then SetASTWalker /
  SelectWalker — because mache is its own parser at ingest time today, there
  is no _ast db to read until leyline produces one. Gated on leyline
  availability; absent -> today's tree-sitter path.

ARCHITECTURE.md: same status correction in the standalone-path note + ADR
table. The standalone "five rules" list is unchanged — duplicate_code (added
this cycle) needs _ast, so it's a paired-path rule, not standalone.

Claude-Session: https://claude.ai/code/session_01FhfjttLFYPzgs3aBG45eBT
…ree guard (Copilot #445)

Two valid review catches:

1. The Phase-1 worker still called runtime.LockOSThread() + sitter.NewParser()
   (itself a CGO call) before the e.astWalker==nil gate, so the "no CGO runs
   in ingest" claim wasn't literally true on the AST path. Both are now gated
   on e.astWalker==nil — with an ASTWalker backend the worker allocates no
   parser and takes no thread pin, so the AST path is genuinely CGO-free.

2. The sequential single-file path lacked the parallel worker's nil-tree
   guard: ParseCtx can return (nil, nil) for empty input, which
   processTreeSitterResult would deref as result.tree.RootNode() and panic.
   Mirrored the guard so it routes to the BROKEN_ fallback instead.

Build, full internal/ingest suite, and -race (parity + engine + reingest)
all green.

Claude-Session: https://claude.ai/code/session_01FhfjttLFYPzgs3aBG45eBT
jamestexas added a commit that referenced this pull request Jun 25, 2026
… activation plan (#446)

The CGO-removal step-4 commitment point's ENGINE work has landed and is
byte-parity-proven, but the ADR still read "gated on mache-33dc5f". Update it
to reflect reality and to record where activation goes (answering "where
should SetASTWalker be used?"):

- Interface widening (S3, #431/#432/#433) + Phase-1 parse gating (S4, #445):
  the Engine skips the CGO parse and serves context/imports/file-level-refs
  from SQL when an ASTWalker backend is present; TestASTQueryParity proves
  byte-identical projection with the parse skipped.
- What remains is activation + deletion, NOT the engine. Activation: insert a
  leyline-parse-first step in the read-only source-ingest fast path
  (cmd/mount.go, the SchemaUsesTreeSitter branch), then SetASTWalker /
  SelectWalker — because mache is its own parser at ingest time today, there
  is no _ast db to read until leyline produces one. Gated on leyline
  availability; absent -> today's tree-sitter path.

ARCHITECTURE.md: same status correction in the standalone-path note + ADR
table. The standalone "five rules" list is unchanged — duplicate_code (added
this cycle) needs _ast, so it's a paired-path rule, not standalone.

Claude-Session: https://claude.ai/code/session_01FhfjttLFYPzgs3aBG45eBT
@jamestexas jamestexas merged commit d390664 into main Jun 25, 2026
15 checks passed
@jamestexas jamestexas deleted the feat/s4b-gate-cgo-parse branch June 25, 2026 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants