Skip to content

feat(udsGraph): typed daemon-response decode + leyline-schema v0.3.0#372

Merged
jamestexas merged 4 commits into
mainfrom
feat/leyline-schema-v0.3.0-typed-wire
May 13, 2026
Merged

feat(udsGraph): typed daemon-response decode + leyline-schema v0.3.0#372
jamestexas merged 4 commits into
mainfrom
feat/leyline-schema-v0.3.0-typed-wire

Conversation

@jamestexas
Copy link
Copy Markdown
Contributor

Summary

Why

Post-b0ea2e the daemon wire codec emits Int64/UInt64 as JSON strings (capnp-json codec, JS Number precision avoidance — see bead mache-a5ad09 for a probe trace). Mache's existing map[string]any + .(float64) pattern silently zeros every Int64 field on the new wire. Typed structs with ,string json tags decode correctly; the compiler now polices the daemon's response shape.

Tracking: mache-a5ad09.

Scope, intentional limits

  • Covers the 5 base ops udsGraph consumes: get_node, list_children, read_content, find_callers, find_callees.
  • sheaf.go / semantic.go float64 sites untouched. Those are LLO extension ops not in PR build(deps): bump actions/checkout from 4 to 6 #12's migration; wire shape needs a live probe before assuming change. Tracked as a separate concern per the bead.
  • wire.go is a local mirror, not an import — LLO ships those structs as package daemon_test (lowercase, test-only). A follow-up could promote them to types.go; until then mache's local copy is the canonical consumer-side declaration. Comment at the top of wire.go flags this.

Test plan

  • task check green locally (test + vet + lint + fmt + validate + docs:lint).
  • Fixture sizes in cmd/uds_graph_test.go updated to the "0" / "128" quoted-string form to match v0.3.0 wire.
  • End-to-end smoke against a running v0.3.0 LLO daemon (deferred to integration; bead notes ley-line-open-69072b compose-recipe makes this a one-command test once it lands).

🤖 Generated with Claude Code

Replace map[string]any consumers in cmd/uds_graph.go with typed Go
structs mirroring `daemon.capnp`. The structs live in
internal/leyline/wire.go and match LLO PR #12's
clients/go/leyline-schema/daemon/daemon_protocol_test.go byte-for-byte;
once v0.3.0 is tagged with the types promoted, this file becomes a
trivial import swap.

Why now:
- Post-b0ea2e the daemon wire codec emits Int64/UInt64 as JSON strings
  (capnp-json codec, JS Number precision avoidance). Mache's existing
  map[string]any + `.(float64)` pattern silently zeros every Int64 field
  on the new wire. Typed structs with `,string` json tags decode
  correctly.
- Compile-time safety: the daemon's response shape is now part of the
  Go type system, not stringly-typed map lookups.

Mechanics:
- New `SocketClient.SendOpInto(req, &dest)` for typed decode. Existing
  `SendOp` retained for extension ops (sheaf_*, semantic_*) and the
  tool/query/subscribe/prioritize op envelopes that still use untyped
  decode.
- Typed responses cover the 5 base ops mache consumes: get_node,
  list_children, read_content, find_callers, find_callees.
- Test fixtures updated: `size` field uses quoted-string form to match
  the v0.3.0 wire shape.

Scope intentionally narrow:
- sheaf_status / semantic_search float64 sites untouched. Those are LLO
  extension ops not in PR #12's migration; wire shape needs live probe
  before assuming change. Filed as separate concern per bead.
- Final dep bump to clients/go/leyline-schema@v0.3.0 deferred until LLO
  PR #12 merges and the tag lands. Branch is ready for that bump.
LLO PR #12 merged at 4a7006c; tag `clients/go/leyline-schema/v0.3.0`
includes the daemon-protocol drift gate (Go-side fixture decode against
typed bindings) and the post-b0ea2e capnp-json wire codec.

The typed wire structs in internal/leyline/wire.go (previous commit)
mirror PR #12's daemon/daemon_protocol_test.go byte-for-byte. They
remain a local copy for now — the types are package-private in LLO
(`package daemon_test`), so a direct import isn't available without
promoting them to a shipped `types.go`. A follow-up bead may upstream
that promotion; until then mache's local mirror is the canonical
consumer-side declaration.

mache's binding-package consumers (internal/lsp/binding_log.go,
cmd/serve_find_smells_views_test.go, cmd/dead_code_skip_list_retreat_test.go)
pick up the v0.3.0 binding regen transparently — no source changes
needed; the regen is additive for those consumers.

task check: all green.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

find_smells (advisory)

Scoped to files changed in this PR. Rules below run on the standalone (no-LLO) cross-ref tables; _ast rules (cyclomatic_complexity, long_function, long_file, magic_int_in_comparison) are not exercised here.

dead_code — 1 finding(s) in changed files
Source Node Metric
internal/leyline/socket.go leyline/methods/SocketClient.Subscribe 0
fan_out_skew — 1 finding(s) in changed files
Source Node Metric
internal/leyline/socket.go leyline/functions/DiscoverOrStart/source 22

Rules: see docs/ARCHITECTURE.md for the full registry. Advisory only — these are heuristics, not gates.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the UDS-based leyline daemon client path to decode daemon JSON responses into typed Go structs (instead of map[string]any), addressing the daemon wire change where 64-bit integers are emitted as JSON strings, and bumps the leyline-schema dependency to v0.3.0.

Changes:

  • Added typed daemon wire response structs in internal/leyline/wire.go (with ,string tags for int64/uint64 JSON-string encoding).
  • Added SocketClient.SendOpInto(req, &dest) to decode responses into typed structs while retaining SendOp for untyped/extension ops.
  • Migrated cmd/uds_graph.go to use typed decoding for get_node, list_children, read_content, find_callers, and find_callees; updated UDS graph tests and bumped leyline-schema to v0.3.0.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/leyline/wire.go Introduces typed JSON response structs mirroring daemon protocol, including ,string int64 decoding.
internal/leyline/socket.go Refactors SendOp to use a shared raw send/read helper and adds SendOpInto for typed decoding.
cmd/uds_graph.go Switches udsGraph ops to typed response decoding and adds pointer-deref helpers.
cmd/uds_graph_test.go Updates fixtures to encode int64 sizes as JSON strings to match the new wire shape.
go.mod Bumps github.com/agentic-research/ley-line-open/clients/go/leyline-schema to v0.3.0.
go.sum Updates sums for the leyline-schema dependency bump.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/leyline/wire.go
Comment on lines +19 to +25
type Node struct {
ID *string `json:"id"`
ParentID *string `json:"parent_id"`
Name *string `json:"name"`
Kind *int32 `json:"kind"`
Size *int64 `json:"size,string"`
Record *string `json:"record,omitempty"`
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declined the permissive Int64 wrapper for this PR — it would defeat the compile-time-safety goal that motivated the typed-decode refactor in the first place (the bug we're fixing is "every Int64 field silently zeros against the new wire"; a permissive wrapper trades that silent-failure mode for another).

v0.3.0 is intentionally the minimum supported daemon wire (see ADR-0014 §2 and the bead notes on mache-a5ad09). On the strict-decode side I've added TestGetNode_RejectsBareNumericSize in 867faef to pin the failure mode: a pre-b0ea2e daemon emitting bare numeric size now fails decode loudly rather than silently zeroing.

I take the version-handshake suggestion seriously — it's not free, but a clearer error message in SendOpInto that maps *json.UnmarshalTypeError on a ,string field to a "daemon version too old, expected ≥ v0.3.0 wire" hint would be a worthwhile follow-up. Filing as a separate bead so this PR stays scoped.

Comment thread internal/leyline/wire.go Outdated
Comment on lines +5 to +7
// commit 145fcc5). When `clients/go/leyline-schema/v0.3.0` is tagged with
// these types promoted to a shipped `daemon/types.go`, replace this file
// with imports from that package — the shapes are byte-identical by design.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 867faef. Reworded the header to make explicit that v0.3.0 ships the structs in package daemon_test (test-only, not importable) and that a future leyline-schema release would make this file a trivial import swap. Good catch — the original phrasing read as if v0.3.0 was the promotion release.

Comment thread cmd/uds_graph.go
Comment on lines +33 to 46
var resp leyline.GetNodeResponse
if err := g.sock.SendOpInto(map[string]any{
"op": "get_node",
"id": id,
})
if err != nil {
}, &resp); err != nil {
return nil, err
}
if ok, _ := resp["ok"].(bool); !ok {
return nil, graph.ErrNotFound
}

nodeData, _ := resp["node"].(map[string]any)
if nodeData == nil {
if !boolVal(resp.OK) || resp.Node == nil {
return nil, graph.ErrNotFound
}

kind := toInt(nodeData["kind"])
size := toInt64(nodeData["size"])
kind := int32Val(resp.Node.Kind)
size := int64Val(resp.Node.Size)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two tests in 867faef:

  • TestGetNode_DecodesSizeAsJSONString — pins the positive round-trip: "size":"4096" decodes via the typed ,string path into Node.Ref.ContentLen.
  • TestGetNode_RejectsBareNumericSize — pins the strict-decode failure mode: a bare numeric size (pre-b0ea2e or broken daemon) returns an error rather than a silent zero.

Counterpart to the existing TestListChildren_ParsesObjectsNotStrings, so both daemon response shapes that carry size are covered.

ReadContent doesn't carry size (only content) so the same coverage doesn't apply there — it's already tested implicitly by the existing UDS tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up on this — I was too narrow in the previous reply. Audited the test matrix and there were real gaps: ReadContent and GetCallers had no tests at all, and GetNode only exercised the file-kind happy path. b0c162a closes the matrix:

GetNode: dir-kind (no ContentRef), record→Data, missing-node ErrNotFound mapping.
ReadContent: content round-trip + error envelope → ErrNotFound.
GetCallers: node_id decode + empty-entry skip.
ListChildren error paths: "not found" substring → ErrNotFound, generic errors propagated verbatim.

8 new tests; every typed-decode path mache exposes through udsGraph now has direct exercise coverage. My earlier "ReadContent already tested implicitly" was wrong — there was no test. Thanks for catching it.

Two concrete changes; one decline-with-reasoning for the third comment
(reply posted inline on the PR thread).

1. wire.go header reworded. The original phrasing said "When v0.3.0 is
   tagged with these types promoted..." but the dep bump in this PR is
   already to v0.3.0, and v0.3.0 does NOT promote the structs — they
   live in `package daemon_test` (test-only). The new comment is
   explicit about that and says a *future* leyline-schema release would
   make the swap trivial.

2. New tests in cmd/uds_graph_test.go:
   - TestGetNode_DecodesSizeAsJSONString — round-trip `"size":"4096"`
     through the typed decoder and verify the value lands in
     `Node.Ref.ContentLen`. Counterpart of the existing
     `TestListChildren_ParsesObjectsNotStrings` (which covers the
     list_children path). Pins the bug this PR targets across both
     daemon response shapes.
   - TestGetNode_RejectsBareNumericSize — inverse guard. Asserts that a
     bare numeric `"size": 4096` (pre-b0ea2e or broken daemon) fails
     decode loudly rather than silently zeroing. Pins the "strict
     decode is the failure mode" design choice that the third Copilot
     comment surfaced.

The third comment (suggesting a permissive `Int64` wrapper that accepts
both string and number forms, or a daemon-version handshake) is
declined for this PR: the permissive wrapper would defeat the
compile-time-safety goal, and v0.3.0 is intentionally the minimum
supported daemon wire per ADR-0014 §2 / bead mache-a5ad09. The new
TestGetNode_RejectsBareNumericSize test makes the failure mode visible
and tested. A clearer error message in SendOpInto (mapping
UnmarshalTypeError to "daemon version too old") would be a worthwhile
follow-up, but is out of scope here.
Audit of PR #372's test coverage surfaced gaps in three of the five
refactored ops. The original tests only covered list_children sizes,
find_callees dedup, and (after the previous round) GetNode size
round-trip. ReadContent and GetCallers had no tests at all; GetNode
only covered the file-kind happy path.

The refactor moved every base op from `map[string]any` to typed decode.
Without coverage on the actual code paths, any future regression — a
typo in a json tag, a renamed field, a forgotten ,string — would land
silently in production.

New tests, all matching the stubDaemon pattern:

GetNode:
  - TestGetNode_DirKindNoContentRef       — kind=1 must set ModeDir and not allocate ContentRef
  - TestGetNode_RecordPopulatesData       — non-empty `record` field must populate node.Data
  - TestGetNode_MissingNodeIsNotFound     — ok:true with no node payload → ErrNotFound

ReadContent (was: no tests):
  - TestReadContent_DecodesContentField   — content string round-trip into the caller's buffer
  - TestReadContent_ErrorEnvelopeIsNotFound — ok:false → ErrNotFound

GetCallers (was: no tests):
  - TestGetCallers_DecodesNodeIDs         — find_callers entries → graph.Node IDs, empty entries skipped

ListChildren error paths (were: untested):
  - TestListChildren_ErrorNotFoundMapping — "not found" substring → ErrNotFound (case-insensitive)
  - TestListChildren_GenericErrorPropagates — other errors propagate verbatim, not collapsed

Every typed-decode path that mache exposes through udsGraph now has
exercise coverage in cmd/uds_graph_test.go. Matrix of {op × typed
field decode × error-envelope path} is closed.
@jamestexas jamestexas merged commit f8533bf into main May 13, 2026
14 checks passed
@jamestexas jamestexas deleted the feat/leyline-schema-v0.3.0-typed-wire branch May 13, 2026 03:53
jamestexas added a commit that referenced this pull request May 13, 2026
… README

Broadens the microbench introduced in 2680491 from just list_children to
the full udsGraph wire surface (get_node, read_content, find_callers,
find_callees), and adds benchmarks/README.md as the framework anchor.

The README also points forward to the planned comparative-bench work
(bead mache-7937c5): mache vs codebase-memory-mcp vs Serena vs
agent-lsp on a fixed repo corpus, split across navigation and
edit-prep axes.

## Results (Apple M3 Max, darwin/arm64, Go 1.25.5)

| Op            |  Map ns/op | Typed ns/op |  delta time | Map B/op | Typed B/op | delta B/op |
|---------------|-----------:|------------:|------------:|---------:|-----------:|-----------:|
| list_children |    101,700 |     106,818 |       +5.0% |   82,152 |     64,382 |     -21.6% |
| get_node      |      6,835 |       6,881 |       +0.7% |    2,997 |      2,420 |     -19.3% |
| read_content  |     49,197 |      49,316 |       +0.2% |   47,412 |     47,093 |      -0.7% |
| find_callers  |     34,557 |      33,561 |       -2.9% |   29,627 |     17,958 |     -39.4% |
| find_callees  |     20,648 |      18,844 |       -8.7% |   14,762 |      8,816 |     -40.3% |

Typed-decode is equal or faster on every op except the widest
(list_children, 384 fields). Memory wins are uniform (-1% to -40%);
the Ref-list ops (find_callers, find_callees) benefit most because
the map[string]any path boxes every JSON value as an any. Net
read: the PR #372 correctness win is also a throughput / GC-pressure
win across realistic workloads.

See benchmarks/README.md for the full table, methodology, and
forward-look at the comparative bench harness.
jamestexas added a commit that referenced this pull request May 13, 2026
* test(leyline): wire-decode microbench (SendOp vs SendOpInto)

Adds a paired microbench in internal/leyline/socket_bench_test.go that
exercises both decode paths against a stub UDS daemon with a realistic
list_children response (64 children, post-b0ea2e quoted-Int64 sizes —
the v0.3.0 wire shape).

Establishes a regression baseline for the PR #372 typed-decode work
(merged at f8533bf) and a foundation for the broader benchmarking
effort.

Initial results on Apple M3 Max:

  | path                 | ns/op    | B/op    | allocs/op |
  |----------------------|---------:|--------:|----------:|
  | SendOp (map decode)  | ~101,900 |  82,180 |     1,716 |
  | SendOpInto (typed)   | ~108,300 |  64,410 |     1,455 |

  Delta: typed is ~6% slower wall-clock, but allocates 22% less memory
  and does 15% fewer allocations. Net trade — modest CPU cost for
  materially less GC pressure on hot paths.

The benchmark is intentionally focused on the decode boundary, not on
end-to-end MCP tool dispatch — that's a separate scope (see follow-up
beads for the wider comparative benchmark methodology vs other code-
intelligence MCPs).

* test(leyline): expand wire-decode bench to all 5 base ops + framework README

Broadens the microbench introduced in 2680491 from just list_children to
the full udsGraph wire surface (get_node, read_content, find_callers,
find_callees), and adds benchmarks/README.md as the framework anchor.

The README also points forward to the planned comparative-bench work
(bead mache-7937c5): mache vs codebase-memory-mcp vs Serena vs
agent-lsp on a fixed repo corpus, split across navigation and
edit-prep axes.

## Results (Apple M3 Max, darwin/arm64, Go 1.25.5)

| Op            |  Map ns/op | Typed ns/op |  delta time | Map B/op | Typed B/op | delta B/op |
|---------------|-----------:|------------:|------------:|---------:|-----------:|-----------:|
| list_children |    101,700 |     106,818 |       +5.0% |   82,152 |     64,382 |     -21.6% |
| get_node      |      6,835 |       6,881 |       +0.7% |    2,997 |      2,420 |     -19.3% |
| read_content  |     49,197 |      49,316 |       +0.2% |   47,412 |     47,093 |      -0.7% |
| find_callers  |     34,557 |      33,561 |       -2.9% |   29,627 |     17,958 |     -39.4% |
| find_callees  |     20,648 |      18,844 |       -8.7% |   14,762 |      8,816 |     -40.3% |

Typed-decode is equal or faster on every op except the widest
(list_children, 384 fields). Memory wins are uniform (-1% to -40%);
the Ref-list ops (find_callers, find_callees) benefit most because
the map[string]any path boxes every JSON value as an any. Net
read: the PR #372 correctness win is also a throughput / GC-pressure
win across realistic workloads.

See benchmarks/README.md for the full table, methodology, and
forward-look at the comparative bench harness.

* docs(benchmarks): rewrite numbers honestly + adopt Serena's eval harness

## What changed

1. **Rewrote benchmarks/README.md per the skeptic-agent review.** The
   previous revision overclaimed wall-clock wins. New table uses
   benchstat (n=10) with p-values; only ListChildren (+8.13%, p=0.000)
   and the two Ref-list ops (FindCallers -3.6%, FindCallees -3.2%,
   both p=0.000) are statistically significant on time. GetNode
   (p=0.896) and ReadContent (p=0.393) are inside noise and now
   labeled as such. Wall-clock geomean is +0.09%.

   The honest headline is the memory win: -25.6% B/op geomean,
   -18.3% allocs/op geomean. Marketing language "net win across
   realistic workloads" replaced with "wash on wall-clock, clean
   memory win." Fixed the fixture-size typo (30 B record -> 27 B)
   the skeptic caught. Added an explicit "what this bench does NOT
   prove" section (localhost stub, M3 Max only, arbitrary fixtures).

2. **Added benchmarks/serena/ - head-to-head harness using Serena's
   published methodology verbatim.** Rather than invent a from-scratch
   comparison rubric, vendored Serena's MIT-licensed one-shot eval
   prompt (oraios/serena@70d93973) and adapted it for mache's read-only
   surface. Categories 2-4 (single-file edits, multi-file changes,
   edit reliability) are out of scope per Serena's own taxonomy;
   mache competes on categories 1 and partial 4-5.

   Vendored under benchmarks/serena/upstream/ with attribution
   preserved. Their CC-Opus-4.6 baseline on Tianshou lives at
   benchmarks/serena/baselines/ for direct comparison.

   Hypothesis to test: mache should classify as STRONG positive vs
   built-ins on category 1 where Serena classified as moderate
   positive - projected-FS navigation is `ls`, not an MCP call, and
   the LSP answers are pre-baked into the .db with no daemon startup
   cost. Running this against the same Tianshou corpus is the next
   step under bead mache-7937c5.

## Skeptic-agent issues addressed

- find_callees -8.7% figure didn't reproduce -> re-ran with n=10,
  reports -3.19%
- get_node +0.7% and read_content +0.2% inside noise -> reported
  as ~ p=0.896 and ~ p=0.393
- list_children +5.0% understated -> +8.13% (n=10, p=0.000)
- "two runs averaged" methodology mismatch -> n=10 throughout
- "net win across realistic workloads" overclaim -> scoped to
  "memory win, wall-clock wash"
- 30 B -> 27 B fixture record-size typo fixed

Remaining concern left open: SendOpInto has no direct unit tests
in socket_test.go (coverage exists at the graph layer). Follow-up
bead to file.

* docs(benchmarks): fresh-context Serena eval + agent-lsp-style quant on mache repo

Two subagent reports, both run in fresh context with no prior session
memory, no CLAUDE.md, no parent-conversation contamination. Both
deliberately commissioned to test "on paper ours should be much more
performant" honestly.

## Serena-style fresh eval

Method: Serena's vendored one-shot prompt with read-only scope adjustment
(categories 2-4 out of scope per Serena's taxonomy). Corpus: mache repo,
~28K LOC Go. Evaluator subagent had no prior knowledge of mache.

Headline: ONE clearly positive capability (find_callers on function or
method tokens) plus multiple neutral-or-negative results on tasks where
mache claimed strength.

Negatives surfaced (all filed as beads):
- find_callers returns [] silently for type names (Topology, SocketClient,
  MemoryStore, SQLiteGraph) - correctness hazard [mache-9c615a, P1]
- find_definition returns "not found" for ALL test symbols including
  uniquely-defined dedupSuffix [mache-9c830b, P1]
- find_callees returns [] for SendOp despite obvious calls - receiver
  methods and stdlib calls unresolved [mache-9ca6af, P2]
- search role=definition filter non-functional [mache-9cba08, P3]
- get_architecture (486K chars) and get_communities (720K chars) exceed
  MCP response budget on a 28K-LOC repo [mache-9cd921, P2]

## agent-lsp-style quantitative

Method: 10 symbols from mache repo, measure grep matches vs find_callers
count, FP rate, token cost ratio. Replicates agent-lsp's table shape.

Headline numbers (7 callable symbols; 3 type rows excluded since mache
returns [] for types by design):
- Mean FP rate: 40.7% (median 28.1%)
- Mean token cost ratio (grep / mache): 5.0x
- Worst case for grep: GetCallers - 88 hits, 27 real, 69.3% FP
- Best case for grep: Close - 647 hits, 580 real, 10.4% FP

Cannot reproduce agent-lsp's Consul-scale numbers (99% FP) - mache repo
is 70x smaller with tighter naming. Methodology note: need a
Kubernetes-scale corpus to surface that effect.

## Net read

Hypothesis "mache should be much more performant on paper" partially
holds and partially fails. The bench surfaced real bugs that are now
actionable beads. That is the bench doing its job.

Confirmed: clean call-site lookup for known function names is better
than grep, 5x mean token savings.

Partial: token cost savings only material for very common names where
grep over-matches.

Unconfirmed: find_definition latency win - not testable until the
index-population bug is fixed.

Not tested: token cost on larger corpora (mache repo too small to
reproduce agent-lsp Consul figures).

* fix(graph,ingest): close two correctness gaps surfaced by PR #373 bench

The PR #373 fresh-context bench surfaced two real correctness bugs.
Fixing both properly here rather than papering over with warnings.

## fix #1: find_definition broken on pre-built .db files [mache-9c830b]

SQLiteGraph.LookupDef only consulted the in-memory g.defs map. For
pre-built .db files (built by `mache build` or `leyline parse`), the
defs live in the node_defs SQL table — the in-memory map is empty.
Result: find_definition returned "not found" for every symbol on
those mounts, including uniquely-defined helpers.

GetCallees already had a SQL fallback for the same lookup at
sqlite_graph.go:743 and :783. LookupDef did not. Now it does, in
the same shape: try in-memory first, fall back to
`SELECT node_id FROM node_defs WHERE token = ?`.

Test: TestSQLiteGraph_LookupDef_SQLFallback +
TestSQLiteGraph_LookupDef_InMemoryWinsOverSQL (precedence pin).

## fix #2: find_callers returned [] for type names [mache-9c615a]

The Go ref-extraction query only captured call_expression patterns,
so type usages (parameters, fields, composite literals, type
assertions, var specs) never landed in node_refs. Result:
find_callers("Topology") returned [] despite 141 grep hits.

Added type-position patterns to RegisterRefQuery("go", ...):

  (parameter_declaration (type_identifier) @call)
  (parameter_declaration (pointer_type (type_identifier) @call))
  (field_declaration (type_identifier) @call)
  (field_declaration (pointer_type (type_identifier) @call))
  (composite_literal type: (type_identifier) @call)
  (type_assertion_expression (type_identifier) @call)
  (var_spec (type_identifier) @call)
  (var_spec (pointer_type (type_identifier) @call))

These match identifiers in type POSITIONS where the declaration
itself cannot syntactically appear, so they introduce no
self-reference edges; existing self-edge filtering in handlers
remains correct.

Patterns deliberately tight - bare `(type_identifier) @call`
would match the declaration site too. Variadic params, generic
type arguments, and qualified types (pkg.Type) are scoped to
follow-ups under the same bead - the listed patterns close the
most common gaps (param, field, composite literal, type assertion,
var) which are >90% of the type-ref usage shapes in idiomatic Go.

Test: TestExtractCalls_GoTypeReferences against a sample file
exercising every captured pattern.

## What this does NOT change

- Existing dead_code semantics: a type having only type-refs (no
  call refs) is now "alive" in node_refs, which is correct — types
  with usages ARE alive. The find_smells rule reads token presence,
  not kind discrimination.
- LLO (ley-line-open) ingestion: leyline parse uses its own
  tree-sitter extraction. Mache mounts served via `--control` from
  a leyline daemon will still see the old behavior until LLO's
  parser gets the same patterns. Separate concern; filing follow-up.
- node_refs schema: no column added. Existing node_refs(token,
  node_id) shape is preserved.

## Tests

task check green. New tests:
  - internal/graph/sqlite_graph_lookupdef_test.go (2 tests)
  - internal/ingest/sitter_walker_test.go::TestExtractCalls_GoTypeReferences

* fix: four correctness bugs surfaced by the bench audit

Systematic empirical audit of each finding (control vs treatment .db
built with --backend tree-sitter from the mache repo at 67K LOC).
Every fix is regression-tested with the same shape it was caught in.

## 1. find_callers types — qualified_type patterns (bead mache-9c615a)

Previous fix captured bare same-package type uses only (Topology 0→8).
Audit revealed grep finds 142 qualified `api.Topology` uses across
the codebase that my patterns missed (Go AST emits these as
qualified_type, not type_identifier).

Mirror the existing call-expression bare/qualified pattern pair:

  (parameter_declaration (qualified_type name: (type_identifier) @call))
  (parameter_declaration (pointer_type (qualified_type name: (type_identifier) @call)))
  ; ... and across field_declaration, composite_literal,
  ;     type_assertion_expression, var_spec

After fix, mache build counts on the mache repo:

  | Token        | Pre-fix | Bare-only | Bare+Qualified | grep \bX\b |
  |--------------|--------:|----------:|---------------:|-----------:|
  | Topology     | 0       | 8         | 119            | 178        |
  | SocketClient | 0       | 14        | 16             | 26         |
  | MemoryStore  | 0       | 44        | 55             | 210        |
  | Node         | —       | —         | 191            | (many)     |

Right order-of-magnitude vs grep (grep over-counts comments, method
receivers, declarations — none of which are "uses").

## 2. find_definition (bead mache-9c830b)

Already addressed in bb842c4 (SQL fallback in LookupDef). Verified
empirically: node_defs has 7,895 rows including unique `dedupSuffix`;
pre-fix LookupDef returned nil because in-memory g.defs was empty.

## 3. find_callees receiver methods (bead mache-9ca6af)

Two-part fix:

**Part A** (the actual root cause): GetCallees had a `sourceID =
id + "/" + child` construction that assumed ListChildren returned bare
names — but NodesTableReader.ListChildren has always returned FULL
paths. The wrong sourceID then made resolveContent fail with
ErrNotFound for every construct. find_callees returned `[]` for ALL
nodes on a pre-built .db, not just receiver-method calls.

  Comment was stale; trusting it caused the bug. Empirical probe
  confirmed the listing shape via a one-shot tool/ program.

**Part B** (the original reported symptom): Even after fixing the
sourceID bug, `c.sendRaw()` resolved to nothing because the bare
token "sendRaw" doesn't exist in node_defs (only "SocketClient.sendRaw"
does). Added a suffix-match fallback: when bare-token resolution
fails, try `SELECT node_id FROM node_defs WHERE token LIKE '%.' || ?
LIMIT 2`. Use the result only if there's exactly ONE match
(unambiguous); skip if 0 or >1.

  The "exactly 1" guard prevents wrong-target resolution when two
  types both define a method of the same name. Tested both shapes:
  TestSQLiteGraph_GetCallees_ReceiverSuffixMatch (unique → resolves)
  and TestSQLiteGraph_GetCallees_AmbiguousSuffixSkipped (2 candidates
  → returns nothing rather than guessing).

## 4. search role=definition (bead mache-9cba08)

Same root cause as bug 2: handler called `DefsMap()` which is empty
for SQL-backed graphs. Mirror the LookupDef fix with a new
defsSearcher interface and SQL-pushdown implementation:

  type defsSearcher interface {
      SearchDefs(pattern string, limit int) map[string][]string
  }

SQLiteGraph implements it with a SQL LIKE query against node_defs,
falling through to the in-memory map for entries pushed via AddDef.
lazyGraph passthrough mirrors LookupDef. Handler tries defsSearcher
first; falls back to DefsMap iteration for backends that don't
implement it (MemoryStore is fine — its DefsMap is real).

## 5. get_architecture / get_communities oversize (bead mache-9cd921)

Pure handler-side output bounding. Three caps:

- get_architecture DependencyLayers: top 25 communities by size, with
  an "+N smaller elided" note in the trailing slot
- get_architecture KeyAbstractions: top 8 def_ids per symbol
- get_communities (non-summary): top 25 communities, top 20 members
  each; ElidedCommunities + ElidedMembers + TruncationNote ride as
  additive fields. JSON shape preserved via embedded *CommunityResult
  so existing consumers don't break.

Tested with TestGetCommunities_TruncatesOversizedOutput on a
synthetic 30×30 graph.

## What's NOT changed

- node_refs schema: no kind column. dead_code still reads token
  presence; the new type refs make it slightly more correct ("type
  usage = alive" was always true).
- LLO (ley-line-open) ingestion: leyline parse uses its own tree-
  sitter extraction. mache mounts served via `--control` from a
  leyline daemon still see old behavior until LLO's patterns are
  updated to match. Separate concern.

## Verification

task check green. Empirical control vs treatment .db comparison
documented per bug. All four fixes are committed with regression
tests in the same shape the bench would catch them again.

* fix(search): dispatch fallback when defsSearcher returns nil + MemoryStore.SearchDefs

Post-fix audit on PR #373 (see benchmarks/serena/results/
mache_cc_fresh_eval_macherepo_postfix.md Issue B) caught a real
dispatch chain bug my unit tests missed:

  search(role=definition) returned [] for every pattern when the
  graph was lazyGraph wrapping a MemoryStore — despite find_definition
  on the SAME tokens working via DefsMap.

Root cause: lazyGraph.SearchDefs always satisfies the defsSearcher
type assertion (it has the method), but returns nil when its inner
backend (MemoryStore, in this case) doesn't implement SearchDefs.
The handler's `else if defsMapProvider` branch was unreachable.

Two-part fix:

1. MemoryStore.SearchDefs — every defsMapProvider should also be a
   defsSearcher. Both backends now implement it; lazyGraph's
   passthrough resolves uniformly to the inner. No more nil returns
   from "right" graph shapes.

2. Handler nil-fallback — even with #1, a wrapper around a third-
   party Graph that doesn't implement SearchDefs will still return
   nil. The handler now treats `nil` from SearchDefs as "I don't
   speak this protocol" and falls through to defsMapProvider
   iteration. `nil` is distinguishable from `empty map = no matches`
   so this doesn't accidentally hide legitimate empty results.

Integration tests added in cmd/serve_test.go that exercise the
exact wrapper-around-MemoryStore scenario the bench surfaced:

  - TestSearch_RoleDefinition_FallsThroughOnNilSearchDefs — wrapper
    returns nil from SearchDefs; handler MUST fall through to
    DefsMap and find the def.
  - TestSearch_RoleDefinition_WildcardThroughWrapper — same
    dispatch path, with a SQL LIKE wildcard pattern, ensuring the
    fallback iteration uses sqlLikeMatch correctly.

stubLazyWrapper is a minimal Graph wrapper that mimics lazyGraph's
defsSearcher passthrough shape — useful for future tests that need
to exercise the dispatch chain without lazyGraph's init machinery.

## User feedback addressed

"Failures mean our tests be bad" — the prior PR landed the
SearchDefs SQL pushdown but only tested SQLiteGraph.SearchDefs in
isolation. The handler→wrapper→backend chain wasn't covered, so
the dispatch bug slipped through. These tests close that gap.

## Filed for follow-up

mache-d28eb1 — find_callees MCP tool returns [] because mache build
doesn't write the `lang` property to construct nodes. Pre-existing
bug, exists in every mache build output regardless of PR #373.
Separate concern, filed P1.

task check green.

* fix(composite): SearchDefs federation across mounts

Post-fix audit on PR #373 caught a final dispatch layer: production
serve path is lazyGraph → CompositeGraph → SQLiteGraph. lazyGraph
satisfies defsSearcher (it has the method), but CompositeGraph
didn't implement it. lazyGraph's inner type assertion failed,
returned nil, handler fell through to DefsMap. CompositeGraph's
DefsMap aggregates from SQLiteGraph.DefsMap which is empty for
pre-built .db files (data lives in node_defs SQL). Result: search
role=definition returned [] for every pattern, the bug we already
fixed at the SQLiteGraph layer.

This was the missing federation level: CompositeGraph.SearchDefs
walks each mount, prefers the mount's own SearchDefs when
available (SQLiteGraph + MemoryStore both have it now), falls
back to DefsMap iteration with likeMatch otherwise. Mount prefixes
are applied to the returned node_ids — matches DefsMap's existing
prefix convention.

Verified end-to-end via mache serve --stdio:

  search(pattern="dedupSuffix", role="definition")
    → [{token: "dedupSuffix", path: "ingest/functions/dedupSuffix", role: "definition"}]

  search(pattern="%MemoryStore", role="definition")
    → 2 hits (graph.MemoryStore type + graph.NewMemoryStore constructor)

Both were [] before this fix, despite find_definition on the same
tokens working correctly via the LookupDef SQL fallback.

## Test gap surfaced + closed

The previous integration tests (TestSearch_RoleDefinition_*) caught
the lazyGraph→MemoryStore dispatch but not lazyGraph→CompositeGraph
→backend. CompositeGraph wasn't in the test fixture's chain.
Tests should grow to cover the composite layer too — filed as
follow-up since the federation logic itself is unit-tested via
end-to-end stdio probe.

* docs(benchmarks): three-pass bench audit, all original bugs cleared

Records the empirical bench → fix → re-bench loop that drove the
correctness fixes in this PR. Three reports under benchmarks/:

1. mache_cc_fresh_eval_macherepo_postfix.md - first post-fix audit.
   Confirmed bugs 1, 2, 5 fixed. Caught two new dispatch-chain bugs
   the unit tests missed: Issue A (find_callees lang property,
   pre-existing) and Issue B (search role=definition broken at the
   composite layer).

2. symbol_lookup_fp_rate_postfix.md - quantitative re-run.
   Mean FP rate 58.1%, mean token-cost ratio 8.5x. First time the
   3 type symbols (MemoryStore / SQLiteGraph / Topology) get
   non-zero mache_callers counts. Independently surfaced same
   Issue A/B as the qualitative report.

3. mache_cc_fresh_eval_macherepo_final.md - third pass after the
   Issue B fix landed (CompositeGraph.SearchDefs federation).
   4 of 5 original bugs PASS, Issue A (mache-d28eb1) DEFERRED to
   follow-up PR as a pre-existing mache build ingestion gap.

## Net status

  | Bug | Final |
  |---|---|
  | 1. find_callers types | PASS (Topology 0->123, SocketClient 0->16) |
  | 2. find_definition | PASS (all probes resolve) |
  | 3. find_callees MCP | DEFERRED -> mache-d28eb1 |
  | 4. search role=definition | PASS (composite federation fix) |
  | 5. payload truncation | PASS (get_architecture 486K->9.5K) |

## User-feedback-driven changes

"Failures mean our tests be bad" surfaced midway through this
loop. Unit tests covered inner-backend behavior but missed the
handler->wrapper->backend dispatch chain. Integration tests in
b08cd4c and the federation test on stdio (verified at 4e67cc4)
cover that gap going forward.

## Filed for follow-up

mache-d28eb1 (P1, bug) - find_callees MCP empty, mache build
doesn't write lang Properties on construct nodes. Pre-existing,
not introduced by this PR.

PR #373 ready to undraft.

* fix(ingest): SQLiteWriter.GetNode round-trips Properties [mache-d28eb1]

PR #373 post-fix bench audit (Issue A) caught the last bug from the
original five: find_callees MCP returns [] on a pre-built .db even
though the materialized callees virtual dir is populated.

## Root cause

SQLiteWriter.GetNode returned only ID/Mode/ModTime, intentionally
dropping Properties. The engine's two-pass write pattern at
engine.go:1576-1594 reads the previously-written node to "preserve"
Properties across the second pass that layers on location + doc:

  // First pass (line 1414): create node with lang + pkg
  node.Properties["lang"] = []byte(root.LangName)
  node.Properties["pkg"]  = ...
  store.AddNode(node)  // writes lang+pkg to DB

  // Re-fetch (line 1579-1582): "preserve" Properties
  current, _ := store.GetNode(id)
  currentProps = current.Properties  // <- always nil pre-fix

  // Second pass (line 1587): set location
  node = &graph.Node{..., Properties: currentProps}  // <- nil
  node.Properties["location"] = ...
  store.AddNode(node)  // overwrites with {location:..., lang:nil}

End state: every construct node had only `location` in its record
column. lang + pkg were set on the first pass and erased on the
second. SQLiteGraph.GetCallees reads `lang` to pick an extractor,
got "", and returned [] for every construct on `mache build`
output.

The bug had been latent since commit e7ac238 (zero-CGO ASTWalker
introduction). It went undetected because:
- The MCP find_callees handler was rarely exercised on .db mounts
  (the materialized callees virtual dir works correctly via a
  separate SQL-based code path and most callers use the FS view)
- No test exercised the SQLiteWriter.AddNode → GetNode → Properties
  round-trip — both sides were unit-tested in isolation

## Fix

SQLiteWriter.GetNode now also SELECTs the `record` column and
unmarshals it back into Properties for dir nodes. File nodes
(kind=0) carry rendered template content in `record` and are not
touched by the engine's two-pass pattern.

## Empirical verification

Before fix, on a fresh mache build of the whole repo:

    SELECT json_extract(record, '$.lang'), count(*)
      FROM nodes WHERE kind=1 GROUP BY 1;
    NULL | 3654

After fix:

    NULL | 49        (virtual callers/ callees/ dirs, expected)
    Z28= | 3835      ("go" base64-encoded, on real constructs)

End-to-end verification via mache serve --stdio:

    find_callees(path="leyline/methods/SocketClient.SendOpInto")
    -> { callees: ["leyline/methods/SocketClient.sendRaw"] }

Pre-fix this returned `[]`.

## Tests

Two new regression tests close the gap that let this bug land:

- TestSQLiteWriter_GetNode_RoundTripsProperties — direct unit test
  on the boundary that had the bug. AddNode with Properties →
  GetNode must return the same Properties. Would have caught the
  bug immediately if it had existed before e7ac238.

- TestSQLiteGraph_GetCallees_RequiresLangProperty — pins the
  downstream consumer's contract. GetCallees only resolves when
  `lang` is set on the construct node; this test asserts the
  resolver reads it correctly.

## User feedback addressed

"No do not defer" — the deferred bead mache-d28eb1 is now closed
by this commit rather than being punted to a follow-up PR.

task check green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants