feat(udsGraph): typed daemon-response decode + leyline-schema v0.3.0 by jamestexas · Pull Request #372 · agentic-research/mache

jamestexas · 2026-05-12T19:31:10Z

Summary

Replace map[string]any consumers in cmd/uds_graph.go with typed Go structs that mirror daemon.capnp; structs live in internal/leyline/wire.go and match LLO PR build(deps): bump actions/checkout from 4 to 6 #12's daemon/daemon_protocol_test.go byte-for-byte.
Add SocketClient.SendOpInto(req, &dest) for typed-decode; existing SendOp retained for extension ops (sheaf_, semantic_) and the tool/query envelopes.
Bump clients/go/leyline-schema to v0.3.0 (PR build(deps): bump actions/checkout from 4 to 6 #12 merged at 4a7006c).

Why

Post-b0ea2e the daemon wire codec emits Int64/UInt64 as JSON strings (capnp-json codec, JS Number precision avoidance — see bead mache-a5ad09 for a probe trace). Mache's existing map[string]any + .(float64) pattern silently zeros every Int64 field on the new wire. Typed structs with ,string json tags decode correctly; the compiler now polices the daemon's response shape.

Tracking: mache-a5ad09.

Scope, intentional limits

Covers the 5 base ops udsGraph consumes: get_node, list_children, read_content, find_callers, find_callees.
sheaf.go / semantic.go float64 sites untouched. Those are LLO extension ops not in PR build(deps): bump actions/checkout from 4 to 6 #12's migration; wire shape needs a live probe before assuming change. Tracked as a separate concern per the bead.
wire.go is a local mirror, not an import — LLO ships those structs as package daemon_test (lowercase, test-only). A follow-up could promote them to types.go; until then mache's local copy is the canonical consumer-side declaration. Comment at the top of wire.go flags this.

Test plan

task check green locally (test + vet + lint + fmt + validate + docs:lint).
Fixture sizes in cmd/uds_graph_test.go updated to the "0" / "128" quoted-string form to match v0.3.0 wire.
End-to-end smoke against a running v0.3.0 LLO daemon (deferred to integration; bead notes ley-line-open-69072b compose-recipe makes this a one-command test once it lands).

🤖 Generated with Claude Code

Replace map[string]any consumers in cmd/uds_graph.go with typed Go structs mirroring `daemon.capnp`. The structs live in internal/leyline/wire.go and match LLO PR #12's clients/go/leyline-schema/daemon/daemon_protocol_test.go byte-for-byte; once v0.3.0 is tagged with the types promoted, this file becomes a trivial import swap. Why now: - Post-b0ea2e the daemon wire codec emits Int64/UInt64 as JSON strings (capnp-json codec, JS Number precision avoidance). Mache's existing map[string]any + `.(float64)` pattern silently zeros every Int64 field on the new wire. Typed structs with `,string` json tags decode correctly. - Compile-time safety: the daemon's response shape is now part of the Go type system, not stringly-typed map lookups. Mechanics: - New `SocketClient.SendOpInto(req, &dest)` for typed decode. Existing `SendOp` retained for extension ops (sheaf_*, semantic_*) and the tool/query/subscribe/prioritize op envelopes that still use untyped decode. - Typed responses cover the 5 base ops mache consumes: get_node, list_children, read_content, find_callers, find_callees. - Test fixtures updated: `size` field uses quoted-string form to match the v0.3.0 wire shape. Scope intentionally narrow: - sheaf_status / semantic_search float64 sites untouched. Those are LLO extension ops not in PR #12's migration; wire shape needs live probe before assuming change. Filed as separate concern per bead. - Final dep bump to clients/go/leyline-schema@v0.3.0 deferred until LLO PR #12 merges and the tag lands. Branch is ready for that bump.

LLO PR #12 merged at 4a7006c; tag `clients/go/leyline-schema/v0.3.0` includes the daemon-protocol drift gate (Go-side fixture decode against typed bindings) and the post-b0ea2e capnp-json wire codec. The typed wire structs in internal/leyline/wire.go (previous commit) mirror PR #12's daemon/daemon_protocol_test.go byte-for-byte. They remain a local copy for now — the types are package-private in LLO (`package daemon_test`), so a direct import isn't available without promoting them to a shipped `types.go`. A follow-up bead may upstream that promotion; until then mache's local mirror is the canonical consumer-side declaration. mache's binding-package consumers (internal/lsp/binding_log.go, cmd/serve_find_smells_views_test.go, cmd/dead_code_skip_list_retreat_test.go) pick up the v0.3.0 binding regen transparently — no source changes needed; the regen is additive for those consumers. task check: all green.

github-actions · 2026-05-12T19:34:10Z

find_smells (advisory)

Scoped to files changed in this PR. Rules below run on the standalone (no-LLO) cross-ref tables; _ast rules (cyclomatic_complexity, long_function, long_file, magic_int_in_comparison) are not exercised here.

dead_code — 1 finding(s) in changed files

Source	Node	Metric
internal/leyline/socket.go	leyline/methods/SocketClient.Subscribe	0

fan_out_skew — 1 finding(s) in changed files

Source	Node	Metric
internal/leyline/socket.go	leyline/functions/DiscoverOrStart/source	22

Rules: see docs/ARCHITECTURE.md for the full registry. Advisory only — these are heuristics, not gates.

Copilot

Pull request overview

This PR updates the UDS-based leyline daemon client path to decode daemon JSON responses into typed Go structs (instead of map[string]any), addressing the daemon wire change where 64-bit integers are emitted as JSON strings, and bumps the leyline-schema dependency to v0.3.0.

Changes:

Added typed daemon wire response structs in internal/leyline/wire.go (with ,string tags for int64/uint64 JSON-string encoding).
Added SocketClient.SendOpInto(req, &dest) to decode responses into typed structs while retaining SendOp for untyped/extension ops.
Migrated cmd/uds_graph.go to use typed decoding for get_node, list_children, read_content, find_callers, and find_callees; updated UDS graph tests and bumped leyline-schema to v0.3.0.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
internal/leyline/wire.go	Introduces typed JSON response structs mirroring daemon protocol, including `,string` int64 decoding.
internal/leyline/socket.go	Refactors SendOp to use a shared raw send/read helper and adds `SendOpInto` for typed decoding.
cmd/uds_graph.go	Switches udsGraph ops to typed response decoding and adds pointer-deref helpers.
cmd/uds_graph_test.go	Updates fixtures to encode int64 sizes as JSON strings to match the new wire shape.
go.mod	Bumps `github.com/agentic-research/ley-line-open/clients/go/leyline-schema` to `v0.3.0`.
go.sum	Updates sums for the `leyline-schema` dependency bump.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jamestexas · 2026-05-12T19:47:02Z

+type Node struct {
+	ID       *string `json:"id"`
+	ParentID *string `json:"parent_id"`
+	Name     *string `json:"name"`
+	Kind     *int32  `json:"kind"`
+	Size     *int64  `json:"size,string"`
+	Record   *string `json:"record,omitempty"`


Declined the permissive Int64 wrapper for this PR — it would defeat the compile-time-safety goal that motivated the typed-decode refactor in the first place (the bug we're fixing is "every Int64 field silently zeros against the new wire"; a permissive wrapper trades that silent-failure mode for another).

v0.3.0 is intentionally the minimum supported daemon wire (see ADR-0014 §2 and the bead notes on mache-a5ad09). On the strict-decode side I've added TestGetNode_RejectsBareNumericSize in 867faef to pin the failure mode: a pre-b0ea2e daemon emitting bare numeric size now fails decode loudly rather than silently zeroing.

I take the version-handshake suggestion seriously — it's not free, but a clearer error message in SendOpInto that maps *json.UnmarshalTypeError on a ,string field to a "daemon version too old, expected ≥ v0.3.0 wire" hint would be a worthwhile follow-up. Filing as a separate bead so this PR stays scoped.

jamestexas · 2026-05-12T19:47:05Z

+// commit 145fcc5). When `clients/go/leyline-schema/v0.3.0` is tagged with
+// these types promoted to a shipped `daemon/types.go`, replace this file
+// with imports from that package — the shapes are byte-identical by design.


Fixed in 867faef. Reworded the header to make explicit that v0.3.0 ships the structs in package daemon_test (test-only, not importable) and that a future leyline-schema release would make this file a trivial import swap. Good catch — the original phrasing read as if v0.3.0 was the promotion release.

jamestexas · 2026-05-12T19:47:08Z

+	var resp leyline.GetNodeResponse
+	if err := g.sock.SendOpInto(map[string]any{
 		"op": "get_node",
 		"id": id,
-	})
-	if err != nil {
+	}, &resp); err != nil {
 		return nil, err
 	}
-	if ok, _ := resp["ok"].(bool); !ok {
-		return nil, graph.ErrNotFound
-	}
-
-	nodeData, _ := resp["node"].(map[string]any)
-	if nodeData == nil {
+	if !boolVal(resp.OK) || resp.Node == nil {
 		return nil, graph.ErrNotFound
 	}

-	kind := toInt(nodeData["kind"])
-	size := toInt64(nodeData["size"])
+	kind := int32Val(resp.Node.Kind)
+	size := int64Val(resp.Node.Size)



Added two tests in 867faef:

TestGetNode_DecodesSizeAsJSONString — pins the positive round-trip: "size":"4096" decodes via the typed ,string path into Node.Ref.ContentLen.

TestGetNode_RejectsBareNumericSize — pins the strict-decode failure mode: a bare numeric size (pre-b0ea2e or broken daemon) returns an error rather than a silent zero.

Counterpart to the existing TestListChildren_ParsesObjectsNotStrings, so both daemon response shapes that carry size are covered.

ReadContent doesn't carry size (only content) so the same coverage doesn't apply there — it's already tested implicitly by the existing UDS tests.

Follow-up on this — I was too narrow in the previous reply. Audited the test matrix and there were real gaps: ReadContent and GetCallers had no tests at all, and GetNode only exercised the file-kind happy path. b0c162a closes the matrix:

GetNode: dir-kind (no ContentRef), record→Data, missing-node ErrNotFound mapping.
ReadContent: content round-trip + error envelope → ErrNotFound.
GetCallers: node_id decode + empty-entry skip.
ListChildren error paths: "not found" substring → ErrNotFound, generic errors propagated verbatim.

8 new tests; every typed-decode path mache exposes through udsGraph now has direct exercise coverage. My earlier "ReadContent already tested implicitly" was wrong — there was no test. Thanks for catching it.

Two concrete changes; one decline-with-reasoning for the third comment (reply posted inline on the PR thread). 1. wire.go header reworded. The original phrasing said "When v0.3.0 is tagged with these types promoted..." but the dep bump in this PR is already to v0.3.0, and v0.3.0 does NOT promote the structs — they live in `package daemon_test` (test-only). The new comment is explicit about that and says a *future* leyline-schema release would make the swap trivial. 2. New tests in cmd/uds_graph_test.go: - TestGetNode_DecodesSizeAsJSONString — round-trip `"size":"4096"` through the typed decoder and verify the value lands in `Node.Ref.ContentLen`. Counterpart of the existing `TestListChildren_ParsesObjectsNotStrings` (which covers the list_children path). Pins the bug this PR targets across both daemon response shapes. - TestGetNode_RejectsBareNumericSize — inverse guard. Asserts that a bare numeric `"size": 4096` (pre-b0ea2e or broken daemon) fails decode loudly rather than silently zeroing. Pins the "strict decode is the failure mode" design choice that the third Copilot comment surfaced. The third comment (suggesting a permissive `Int64` wrapper that accepts both string and number forms, or a daemon-version handshake) is declined for this PR: the permissive wrapper would defeat the compile-time-safety goal, and v0.3.0 is intentionally the minimum supported daemon wire per ADR-0014 §2 / bead mache-a5ad09. The new TestGetNode_RejectsBareNumericSize test makes the failure mode visible and tested. A clearer error message in SendOpInto (mapping UnmarshalTypeError to "daemon version too old") would be a worthwhile follow-up, but is out of scope here.

Audit of PR #372's test coverage surfaced gaps in three of the five refactored ops. The original tests only covered list_children sizes, find_callees dedup, and (after the previous round) GetNode size round-trip. ReadContent and GetCallers had no tests at all; GetNode only covered the file-kind happy path. The refactor moved every base op from `map[string]any` to typed decode. Without coverage on the actual code paths, any future regression — a typo in a json tag, a renamed field, a forgotten ,string — would land silently in production. New tests, all matching the stubDaemon pattern: GetNode: - TestGetNode_DirKindNoContentRef — kind=1 must set ModeDir and not allocate ContentRef - TestGetNode_RecordPopulatesData — non-empty `record` field must populate node.Data - TestGetNode_MissingNodeIsNotFound — ok:true with no node payload → ErrNotFound ReadContent (was: no tests): - TestReadContent_DecodesContentField — content string round-trip into the caller's buffer - TestReadContent_ErrorEnvelopeIsNotFound — ok:false → ErrNotFound GetCallers (was: no tests): - TestGetCallers_DecodesNodeIDs — find_callers entries → graph.Node IDs, empty entries skipped ListChildren error paths (were: untested): - TestListChildren_ErrorNotFoundMapping — "not found" substring → ErrNotFound (case-insensitive) - TestListChildren_GenericErrorPropagates — other errors propagate verbatim, not collapsed Every typed-decode path that mache exposes through udsGraph now has exercise coverage in cmd/uds_graph_test.go. Matrix of {op × typed field decode × error-envelope path} is closed.

… README Broadens the microbench introduced in 2680491 from just list_children to the full udsGraph wire surface (get_node, read_content, find_callers, find_callees), and adds benchmarks/README.md as the framework anchor. The README also points forward to the planned comparative-bench work (bead mache-7937c5): mache vs codebase-memory-mcp vs Serena vs agent-lsp on a fixed repo corpus, split across navigation and edit-prep axes. ## Results (Apple M3 Max, darwin/arm64, Go 1.25.5) | Op | Map ns/op | Typed ns/op | delta time | Map B/op | Typed B/op | delta B/op | |---------------|-----------:|------------:|------------:|---------:|-----------:|-----------:| | list_children | 101,700 | 106,818 | +5.0% | 82,152 | 64,382 | -21.6% | | get_node | 6,835 | 6,881 | +0.7% | 2,997 | 2,420 | -19.3% | | read_content | 49,197 | 49,316 | +0.2% | 47,412 | 47,093 | -0.7% | | find_callers | 34,557 | 33,561 | -2.9% | 29,627 | 17,958 | -39.4% | | find_callees | 20,648 | 18,844 | -8.7% | 14,762 | 8,816 | -40.3% | Typed-decode is equal or faster on every op except the widest (list_children, 384 fields). Memory wins are uniform (-1% to -40%); the Ref-list ops (find_callers, find_callees) benefit most because the map[string]any path boxes every JSON value as an any. Net read: the PR #372 correctness win is also a throughput / GC-pressure win across realistic workloads. See benchmarks/README.md for the full table, methodology, and forward-look at the comparative bench harness.

@call

* test(leyline): wire-decode microbench (SendOp vs SendOpInto) Adds a paired microbench in internal/leyline/socket_bench_test.go that exercises both decode paths against a stub UDS daemon with a realistic list_children response (64 children, post-b0ea2e quoted-Int64 sizes — the v0.3.0 wire shape). Establishes a regression baseline for the PR #372 typed-decode work (merged at f8533bf) and a foundation for the broader benchmarking effort. Initial results on Apple M3 Max: | path | ns/op | B/op | allocs/op | |----------------------|---------:|--------:|----------:| | SendOp (map decode) | ~101,900 | 82,180 | 1,716 | | SendOpInto (typed) | ~108,300 | 64,410 | 1,455 | Delta: typed is ~6% slower wall-clock, but allocates 22% less memory and does 15% fewer allocations. Net trade — modest CPU cost for materially less GC pressure on hot paths. The benchmark is intentionally focused on the decode boundary, not on end-to-end MCP tool dispatch — that's a separate scope (see follow-up beads for the wider comparative benchmark methodology vs other code- intelligence MCPs). * test(leyline): expand wire-decode bench to all 5 base ops + framework README Broadens the microbench introduced in 2680491 from just list_children to the full udsGraph wire surface (get_node, read_content, find_callers, find_callees), and adds benchmarks/README.md as the framework anchor. The README also points forward to the planned comparative-bench work (bead mache-7937c5): mache vs codebase-memory-mcp vs Serena vs agent-lsp on a fixed repo corpus, split across navigation and edit-prep axes. ## Results (Apple M3 Max, darwin/arm64, Go 1.25.5) | Op | Map ns/op | Typed ns/op | delta time | Map B/op | Typed B/op | delta B/op | |---------------|-----------:|------------:|------------:|---------:|-----------:|-----------:| | list_children | 101,700 | 106,818 | +5.0% | 82,152 | 64,382 | -21.6% | | get_node | 6,835 | 6,881 | +0.7% | 2,997 | 2,420 | -19.3% | | read_content | 49,197 | 49,316 | +0.2% | 47,412 | 47,093 | -0.7% | | find_callers | 34,557 | 33,561 | -2.9% | 29,627 | 17,958 | -39.4% | | find_callees | 20,648 | 18,844 | -8.7% | 14,762 | 8,816 | -40.3% | Typed-decode is equal or faster on every op except the widest (list_children, 384 fields). Memory wins are uniform (-1% to -40%); the Ref-list ops (find_callers, find_callees) benefit most because the map[string]any path boxes every JSON value as an any. Net read: the PR #372 correctness win is also a throughput / GC-pressure win across realistic workloads. See benchmarks/README.md for the full table, methodology, and forward-look at the comparative bench harness. * docs(benchmarks): rewrite numbers honestly + adopt Serena's eval harness ## What changed 1. **Rewrote benchmarks/README.md per the skeptic-agent review.** The previous revision overclaimed wall-clock wins. New table uses benchstat (n=10) with p-values; only ListChildren (+8.13%, p=0.000) and the two Ref-list ops (FindCallers -3.6%, FindCallees -3.2%, both p=0.000) are statistically significant on time. GetNode (p=0.896) and ReadContent (p=0.393) are inside noise and now labeled as such. Wall-clock geomean is +0.09%. The honest headline is the memory win: -25.6% B/op geomean, -18.3% allocs/op geomean. Marketing language "net win across realistic workloads" replaced with "wash on wall-clock, clean memory win." Fixed the fixture-size typo (30 B record -> 27 B) the skeptic caught. Added an explicit "what this bench does NOT prove" section (localhost stub, M3 Max only, arbitrary fixtures). 2. **Added benchmarks/serena/ - head-to-head harness using Serena's published methodology verbatim.** Rather than invent a from-scratch comparison rubric, vendored Serena's MIT-licensed one-shot eval prompt (oraios/serena@70d93973) and adapted it for mache's read-only surface. Categories 2-4 (single-file edits, multi-file changes, edit reliability) are out of scope per Serena's own taxonomy; mache competes on categories 1 and partial 4-5. Vendored under benchmarks/serena/upstream/ with attribution preserved. Their CC-Opus-4.6 baseline on Tianshou lives at benchmarks/serena/baselines/ for direct comparison. Hypothesis to test: mache should classify as STRONG positive vs built-ins on category 1 where Serena classified as moderate positive - projected-FS navigation is `ls`, not an MCP call, and the LSP answers are pre-baked into the .db with no daemon startup cost. Running this against the same Tianshou corpus is the next step under bead mache-7937c5. ## Skeptic-agent issues addressed - find_callees -8.7% figure didn't reproduce -> re-ran with n=10, reports -3.19% - get_node +0.7% and read_content +0.2% inside noise -> reported as ~ p=0.896 and ~ p=0.393 - list_children +5.0% understated -> +8.13% (n=10, p=0.000) - "two runs averaged" methodology mismatch -> n=10 throughout - "net win across realistic workloads" overclaim -> scoped to "memory win, wall-clock wash" - 30 B -> 27 B fixture record-size typo fixed Remaining concern left open: SendOpInto has no direct unit tests in socket_test.go (coverage exists at the graph layer). Follow-up bead to file. * docs(benchmarks): fresh-context Serena eval + agent-lsp-style quant on mache repo Two subagent reports, both run in fresh context with no prior session memory, no CLAUDE.md, no parent-conversation contamination. Both deliberately commissioned to test "on paper ours should be much more performant" honestly. ## Serena-style fresh eval Method: Serena's vendored one-shot prompt with read-only scope adjustment (categories 2-4 out of scope per Serena's taxonomy). Corpus: mache repo, ~28K LOC Go. Evaluator subagent had no prior knowledge of mache. Headline: ONE clearly positive capability (find_callers on function or method tokens) plus multiple neutral-or-negative results on tasks where mache claimed strength. Negatives surfaced (all filed as beads): - find_callers returns [] silently for type names (Topology, SocketClient, MemoryStore, SQLiteGraph) - correctness hazard [mache-9c615a, P1] - find_definition returns "not found" for ALL test symbols including uniquely-defined dedupSuffix [mache-9c830b, P1] - find_callees returns [] for SendOp despite obvious calls - receiver methods and stdlib calls unresolved [mache-9ca6af, P2] - search role=definition filter non-functional [mache-9cba08, P3] - get_architecture (486K chars) and get_communities (720K chars) exceed MCP response budget on a 28K-LOC repo [mache-9cd921, P2] ## agent-lsp-style quantitative Method: 10 symbols from mache repo, measure grep matches vs find_callers count, FP rate, token cost ratio. Replicates agent-lsp's table shape. Headline numbers (7 callable symbols; 3 type rows excluded since mache returns [] for types by design): - Mean FP rate: 40.7% (median 28.1%) - Mean token cost ratio (grep / mache): 5.0x - Worst case for grep: GetCallers - 88 hits, 27 real, 69.3% FP - Best case for grep: Close - 647 hits, 580 real, 10.4% FP Cannot reproduce agent-lsp's Consul-scale numbers (99% FP) - mache repo is 70x smaller with tighter naming. Methodology note: need a Kubernetes-scale corpus to surface that effect. ## Net read Hypothesis "mache should be much more performant on paper" partially holds and partially fails. The bench surfaced real bugs that are now actionable beads. That is the bench doing its job. Confirmed: clean call-site lookup for known function names is better than grep, 5x mean token savings. Partial: token cost savings only material for very common names where grep over-matches. Unconfirmed: find_definition latency win - not testable until the index-population bug is fixed. Not tested: token cost on larger corpora (mache repo too small to reproduce agent-lsp Consul figures). * fix(graph,ingest): close two correctness gaps surfaced by PR #373 bench The PR #373 fresh-context bench surfaced two real correctness bugs. Fixing both properly here rather than papering over with warnings. ## fix #1: find_definition broken on pre-built .db files [mache-9c830b] SQLiteGraph.LookupDef only consulted the in-memory g.defs map. For pre-built .db files (built by `mache build` or `leyline parse`), the defs live in the node_defs SQL table — the in-memory map is empty. Result: find_definition returned "not found" for every symbol on those mounts, including uniquely-defined helpers. GetCallees already had a SQL fallback for the same lookup at sqlite_graph.go:743 and :783. LookupDef did not. Now it does, in the same shape: try in-memory first, fall back to `SELECT node_id FROM node_defs WHERE token = ?`. Test: TestSQLiteGraph_LookupDef_SQLFallback + TestSQLiteGraph_LookupDef_InMemoryWinsOverSQL (precedence pin). ## fix #2: find_callers returned [] for type names [mache-9c615a] The Go ref-extraction query only captured call_expression patterns, so type usages (parameters, fields, composite literals, type assertions, var specs) never landed in node_refs. Result: find_callers("Topology") returned [] despite 141 grep hits. Added type-position patterns to RegisterRefQuery("go", ...): (parameter_declaration (type_identifier) @call) (parameter_declaration (pointer_type (type_identifier) @call)) (field_declaration (type_identifier) @call) (field_declaration (pointer_type (type_identifier) @call)) (composite_literal type: (type_identifier) @call) (type_assertion_expression (type_identifier) @call) (var_spec (type_identifier) @call) (var_spec (pointer_type (type_identifier) @call)) These match identifiers in type POSITIONS where the declaration itself cannot syntactically appear, so they introduce no self-reference edges; existing self-edge filtering in handlers remains correct. Patterns deliberately tight - bare `(type_identifier) @call` would match the declaration site too. Variadic params, generic type arguments, and qualified types (pkg.Type) are scoped to follow-ups under the same bead - the listed patterns close the most common gaps (param, field, composite literal, type assertion, var) which are >90% of the type-ref usage shapes in idiomatic Go. Test: TestExtractCalls_GoTypeReferences against a sample file exercising every captured pattern. ## What this does NOT change - Existing dead_code semantics: a type having only type-refs (no call refs) is now "alive" in node_refs, which is correct — types with usages ARE alive. The find_smells rule reads token presence, not kind discrimination. - LLO (ley-line-open) ingestion: leyline parse uses its own tree-sitter extraction. Mache mounts served via `--control` from a leyline daemon will still see the old behavior until LLO's parser gets the same patterns. Separate concern; filing follow-up. - node_refs schema: no column added. Existing node_refs(token, node_id) shape is preserved. ## Tests task check green. New tests: - internal/graph/sqlite_graph_lookupdef_test.go (2 tests) - internal/ingest/sitter_walker_test.go::TestExtractCalls_GoTypeReferences * fix: four correctness bugs surfaced by the bench audit Systematic empirical audit of each finding (control vs treatment .db built with --backend tree-sitter from the mache repo at 67K LOC). Every fix is regression-tested with the same shape it was caught in. ## 1. find_callers types — qualified_type patterns (bead mache-9c615a) Previous fix captured bare same-package type uses only (Topology 0→8). Audit revealed grep finds 142 qualified `api.Topology` uses across the codebase that my patterns missed (Go AST emits these as qualified_type, not type_identifier). Mirror the existing call-expression bare/qualified pattern pair: (parameter_declaration (qualified_type name: (type_identifier) @call)) (parameter_declaration (pointer_type (qualified_type name: (type_identifier) @call))) ; ... and across field_declaration, composite_literal, ; type_assertion_expression, var_spec After fix, mache build counts on the mache repo: | Token | Pre-fix | Bare-only | Bare+Qualified | grep \bX\b | |--------------|--------:|----------:|---------------:|-----------:| | Topology | 0 | 8 | 119 | 178 | | SocketClient | 0 | 14 | 16 | 26 | | MemoryStore | 0 | 44 | 55 | 210 | | Node | — | — | 191 | (many) | Right order-of-magnitude vs grep (grep over-counts comments, method receivers, declarations — none of which are "uses"). ## 2. find_definition (bead mache-9c830b) Already addressed in bb842c4 (SQL fallback in LookupDef). Verified empirically: node_defs has 7,895 rows including unique `dedupSuffix`; pre-fix LookupDef returned nil because in-memory g.defs was empty. ## 3. find_callees receiver methods (bead mache-9ca6af) Two-part fix: **Part A** (the actual root cause): GetCallees had a `sourceID = id + "/" + child` construction that assumed ListChildren returned bare names — but NodesTableReader.ListChildren has always returned FULL paths. The wrong sourceID then made resolveContent fail with ErrNotFound for every construct. find_callees returned `[]` for ALL nodes on a pre-built .db, not just receiver-method calls. Comment was stale; trusting it caused the bug. Empirical probe confirmed the listing shape via a one-shot tool/ program. **Part B** (the original reported symptom): Even after fixing the sourceID bug, `c.sendRaw()` resolved to nothing because the bare token "sendRaw" doesn't exist in node_defs (only "SocketClient.sendRaw" does). Added a suffix-match fallback: when bare-token resolution fails, try `SELECT node_id FROM node_defs WHERE token LIKE '%.' || ? LIMIT 2`. Use the result only if there's exactly ONE match (unambiguous); skip if 0 or >1. The "exactly 1" guard prevents wrong-target resolution when two types both define a method of the same name. Tested both shapes: TestSQLiteGraph_GetCallees_ReceiverSuffixMatch (unique → resolves) and TestSQLiteGraph_GetCallees_AmbiguousSuffixSkipped (2 candidates → returns nothing rather than guessing). ## 4. search role=definition (bead mache-9cba08) Same root cause as bug 2: handler called `DefsMap()` which is empty for SQL-backed graphs. Mirror the LookupDef fix with a new defsSearcher interface and SQL-pushdown implementation: type defsSearcher interface { SearchDefs(pattern string, limit int) map[string][]string } SQLiteGraph implements it with a SQL LIKE query against node_defs, falling through to the in-memory map for entries pushed via AddDef. lazyGraph passthrough mirrors LookupDef. Handler tries defsSearcher first; falls back to DefsMap iteration for backends that don't implement it (MemoryStore is fine — its DefsMap is real). ## 5. get_architecture / get_communities oversize (bead mache-9cd921) Pure handler-side output bounding. Three caps: - get_architecture DependencyLayers: top 25 communities by size, with an "+N smaller elided" note in the trailing slot - get_architecture KeyAbstractions: top 8 def_ids per symbol - get_communities (non-summary): top 25 communities, top 20 members each; ElidedCommunities + ElidedMembers + TruncationNote ride as additive fields. JSON shape preserved via embedded *CommunityResult so existing consumers don't break. Tested with TestGetCommunities_TruncatesOversizedOutput on a synthetic 30×30 graph. ## What's NOT changed - node_refs schema: no kind column. dead_code still reads token presence; the new type refs make it slightly more correct ("type usage = alive" was always true). - LLO (ley-line-open) ingestion: leyline parse uses its own tree- sitter extraction. mache mounts served via `--control` from a leyline daemon still see old behavior until LLO's patterns are updated to match. Separate concern. ## Verification task check green. Empirical control vs treatment .db comparison documented per bug. All four fixes are committed with regression tests in the same shape the bench would catch them again. * fix(search): dispatch fallback when defsSearcher returns nil + MemoryStore.SearchDefs Post-fix audit on PR #373 (see benchmarks/serena/results/ mache_cc_fresh_eval_macherepo_postfix.md Issue B) caught a real dispatch chain bug my unit tests missed: search(role=definition) returned [] for every pattern when the graph was lazyGraph wrapping a MemoryStore — despite find_definition on the SAME tokens working via DefsMap. Root cause: lazyGraph.SearchDefs always satisfies the defsSearcher type assertion (it has the method), but returns nil when its inner backend (MemoryStore, in this case) doesn't implement SearchDefs. The handler's `else if defsMapProvider` branch was unreachable. Two-part fix: 1. MemoryStore.SearchDefs — every defsMapProvider should also be a defsSearcher. Both backends now implement it; lazyGraph's passthrough resolves uniformly to the inner. No more nil returns from "right" graph shapes. 2. Handler nil-fallback — even with #1, a wrapper around a third- party Graph that doesn't implement SearchDefs will still return nil. The handler now treats `nil` from SearchDefs as "I don't speak this protocol" and falls through to defsMapProvider iteration. `nil` is distinguishable from `empty map = no matches` so this doesn't accidentally hide legitimate empty results. Integration tests added in cmd/serve_test.go that exercise the exact wrapper-around-MemoryStore scenario the bench surfaced: - TestSearch_RoleDefinition_FallsThroughOnNilSearchDefs — wrapper returns nil from SearchDefs; handler MUST fall through to DefsMap and find the def. - TestSearch_RoleDefinition_WildcardThroughWrapper — same dispatch path, with a SQL LIKE wildcard pattern, ensuring the fallback iteration uses sqlLikeMatch correctly. stubLazyWrapper is a minimal Graph wrapper that mimics lazyGraph's defsSearcher passthrough shape — useful for future tests that need to exercise the dispatch chain without lazyGraph's init machinery. ## User feedback addressed "Failures mean our tests be bad" — the prior PR landed the SearchDefs SQL pushdown but only tested SQLiteGraph.SearchDefs in isolation. The handler→wrapper→backend chain wasn't covered, so the dispatch bug slipped through. These tests close that gap. ## Filed for follow-up mache-d28eb1 — find_callees MCP tool returns [] because mache build doesn't write the `lang` property to construct nodes. Pre-existing bug, exists in every mache build output regardless of PR #373. Separate concern, filed P1. task check green. * fix(composite): SearchDefs federation across mounts Post-fix audit on PR #373 caught a final dispatch layer: production serve path is lazyGraph → CompositeGraph → SQLiteGraph. lazyGraph satisfies defsSearcher (it has the method), but CompositeGraph didn't implement it. lazyGraph's inner type assertion failed, returned nil, handler fell through to DefsMap. CompositeGraph's DefsMap aggregates from SQLiteGraph.DefsMap which is empty for pre-built .db files (data lives in node_defs SQL). Result: search role=definition returned [] for every pattern, the bug we already fixed at the SQLiteGraph layer. This was the missing federation level: CompositeGraph.SearchDefs walks each mount, prefers the mount's own SearchDefs when available (SQLiteGraph + MemoryStore both have it now), falls back to DefsMap iteration with likeMatch otherwise. Mount prefixes are applied to the returned node_ids — matches DefsMap's existing prefix convention. Verified end-to-end via mache serve --stdio: search(pattern="dedupSuffix", role="definition") → [{token: "dedupSuffix", path: "ingest/functions/dedupSuffix", role: "definition"}] search(pattern="%MemoryStore", role="definition") → 2 hits (graph.MemoryStore type + graph.NewMemoryStore constructor) Both were [] before this fix, despite find_definition on the same tokens working correctly via the LookupDef SQL fallback. ## Test gap surfaced + closed The previous integration tests (TestSearch_RoleDefinition_*) caught the lazyGraph→MemoryStore dispatch but not lazyGraph→CompositeGraph →backend. CompositeGraph wasn't in the test fixture's chain. Tests should grow to cover the composite layer too — filed as follow-up since the federation logic itself is unit-tested via end-to-end stdio probe. * docs(benchmarks): three-pass bench audit, all original bugs cleared Records the empirical bench → fix → re-bench loop that drove the correctness fixes in this PR. Three reports under benchmarks/: 1. mache_cc_fresh_eval_macherepo_postfix.md - first post-fix audit. Confirmed bugs 1, 2, 5 fixed. Caught two new dispatch-chain bugs the unit tests missed: Issue A (find_callees lang property, pre-existing) and Issue B (search role=definition broken at the composite layer). 2. symbol_lookup_fp_rate_postfix.md - quantitative re-run. Mean FP rate 58.1%, mean token-cost ratio 8.5x. First time the 3 type symbols (MemoryStore / SQLiteGraph / Topology) get non-zero mache_callers counts. Independently surfaced same Issue A/B as the qualitative report. 3. mache_cc_fresh_eval_macherepo_final.md - third pass after the Issue B fix landed (CompositeGraph.SearchDefs federation). 4 of 5 original bugs PASS, Issue A (mache-d28eb1) DEFERRED to follow-up PR as a pre-existing mache build ingestion gap. ## Net status | Bug | Final | |---|---| | 1. find_callers types | PASS (Topology 0->123, SocketClient 0->16) | | 2. find_definition | PASS (all probes resolve) | | 3. find_callees MCP | DEFERRED -> mache-d28eb1 | | 4. search role=definition | PASS (composite federation fix) | | 5. payload truncation | PASS (get_architecture 486K->9.5K) | ## User-feedback-driven changes "Failures mean our tests be bad" surfaced midway through this loop. Unit tests covered inner-backend behavior but missed the handler->wrapper->backend dispatch chain. Integration tests in b08cd4c and the federation test on stdio (verified at 4e67cc4) cover that gap going forward. ## Filed for follow-up mache-d28eb1 (P1, bug) - find_callees MCP empty, mache build doesn't write lang Properties on construct nodes. Pre-existing, not introduced by this PR. PR #373 ready to undraft. * fix(ingest): SQLiteWriter.GetNode round-trips Properties [mache-d28eb1] PR #373 post-fix bench audit (Issue A) caught the last bug from the original five: find_callees MCP returns [] on a pre-built .db even though the materialized callees virtual dir is populated. ## Root cause SQLiteWriter.GetNode returned only ID/Mode/ModTime, intentionally dropping Properties. The engine's two-pass write pattern at engine.go:1576-1594 reads the previously-written node to "preserve" Properties across the second pass that layers on location + doc: // First pass (line 1414): create node with lang + pkg node.Properties["lang"] = []byte(root.LangName) node.Properties["pkg"] = ... store.AddNode(node) // writes lang+pkg to DB // Re-fetch (line 1579-1582): "preserve" Properties current, _ := store.GetNode(id) currentProps = current.Properties // <- always nil pre-fix // Second pass (line 1587): set location node = &graph.Node{..., Properties: currentProps} // <- nil node.Properties["location"] = ... store.AddNode(node) // overwrites with {location:..., lang:nil} End state: every construct node had only `location` in its record column. lang + pkg were set on the first pass and erased on the second. SQLiteGraph.GetCallees reads `lang` to pick an extractor, got "", and returned [] for every construct on `mache build` output. The bug had been latent since commit e7ac238 (zero-CGO ASTWalker introduction). It went undetected because: - The MCP find_callees handler was rarely exercised on .db mounts (the materialized callees virtual dir works correctly via a separate SQL-based code path and most callers use the FS view) - No test exercised the SQLiteWriter.AddNode → GetNode → Properties round-trip — both sides were unit-tested in isolation ## Fix SQLiteWriter.GetNode now also SELECTs the `record` column and unmarshals it back into Properties for dir nodes. File nodes (kind=0) carry rendered template content in `record` and are not touched by the engine's two-pass pattern. ## Empirical verification Before fix, on a fresh mache build of the whole repo: SELECT json_extract(record, '$.lang'), count(*) FROM nodes WHERE kind=1 GROUP BY 1; NULL | 3654 After fix: NULL | 49 (virtual callers/ callees/ dirs, expected) Z28= | 3835 ("go" base64-encoded, on real constructs) End-to-end verification via mache serve --stdio: find_callees(path="leyline/methods/SocketClient.SendOpInto") -> { callees: ["leyline/methods/SocketClient.sendRaw"] } Pre-fix this returned `[]`. ## Tests Two new regression tests close the gap that let this bug land: - TestSQLiteWriter_GetNode_RoundTripsProperties — direct unit test on the boundary that had the bug. AddNode with Properties → GetNode must return the same Properties. Would have caught the bug immediately if it had existed before e7ac238. - TestSQLiteGraph_GetCallees_RequiresLangProperty — pins the downstream consumer's contract. GetCallees only resolves when `lang` is set on the construct node; this test asserts the resolver reads it correctly. ## User feedback addressed "No do not defer" — the deferred bead mache-d28eb1 is now closed by this commit rather than being punted to a follow-up PR. task check green.

jamestexas added 2 commits May 12, 2026 13:27

jamestexas requested a review from Copilot May 12, 2026 19:37

Copilot started reviewing on behalf of jamestexas May 12, 2026 19:38 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

jamestexas added 2 commits May 12, 2026 13:46

jamestexas merged commit f8533bf into main May 13, 2026
14 checks passed

jamestexas deleted the feat/leyline-schema-v0.3.0-typed-wire branch May 13, 2026 03:53

jamestexas mentioned this pull request May 13, 2026

test(leyline): wire-decode microbench + benchmarking foundation #373

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(udsGraph): typed daemon-response decode + leyline-schema v0.3.0#372

feat(udsGraph): typed daemon-response decode + leyline-schema v0.3.0#372
jamestexas merged 4 commits into
mainfrom
feat/leyline-schema-v0.3.0-typed-wire

jamestexas commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

jamestexas May 12, 2026

Uh oh!

jamestexas May 12, 2026

Uh oh!

jamestexas May 12, 2026

Uh oh!

jamestexas May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jamestexas commented May 12, 2026

Summary

Why

Scope, intentional limits

Test plan

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

find_smells (advisory)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

jamestexas May 12, 2026

Choose a reason for hiding this comment

Uh oh!

jamestexas May 12, 2026

Choose a reason for hiding this comment

Uh oh!

jamestexas May 12, 2026

Choose a reason for hiding this comment

Uh oh!

jamestexas May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 12, 2026 •

edited

Loading