feat: kill ExtractActiveDB on --control, consume LLO 0.2.2 find_callees#371
Conversation
…p ExtractActiveDB on that path mountControl's read-only branch now connects to the ley-line daemon via the existing udsGraph (the same path buildControlGraph uses). Removes: - ExtractActiveDB call on initial mount (used to copy the active arena buffer — ~352MB with _source BLOBs — to a temp .db file) - ExtractActiveDB call on every hot-swap (same cost per root bump) - OpenSQLiteGraph + the in-process SQLite connection it implied - The HotSwapGraph wrapper for this path (each udsGraph query reflects the daemon's current snapshot — hot-swap is implicit; no graph-level swap needed on the mache side) - The watcher goroutine that polled current_root / subscribed to daemon snapshot events and rebuilt the local graph - trySubscribe helper (was only used by that watcher) Writable mount keeps ExtractActiveDB + OpenWritableGraph — the WritableGraph splices edits into a real SQL connection before flushing back to the arena, and the daemon doesn't expose write ops over UDS. That path is unchanged. Also (mache-9051f0): - Auto-download URL constant moved from agentic-research/ley-line (private repo, retired) to agentic-research/ley-line-open. Documents that the bundle path is the canonical deployment and this download flow only runs for non-bundle invocations. - Lift URL into a package constant (leylineReleaseURLTemplate) so it can be regression-tested. - New TestLeylineReleaseURL_PointsAtPublicRepo guards against future drift to the private repo. Docs: - README: split deployment into "Bundle / image (canonical production)" and "Local / dev path" sections, document MACHE_NO_LEYLINE=1 for CI, note that mache itself doesn't implement perimeter auth (orchestrator handles it for the bundle deployment). - ARCHITECTURE.md: add a key-file-reference row for the UDS query proxy explaining the daemon-owns-SQLite zero-copy story.
find_smells (advisory)Scoped to files changed in this PR. Rules below run on the standalone (no-LLO) cross-ref tables; dead_code — 1 finding(s) in changed files
fan_out_skew — 3 finding(s) in changed files
god_file — 1 finding(s) in changed files
Rules: see |
kiln was retired when apko + melange landed in-repo (mache-172a50 / PR #370). The `notify-kiln` job in ci.yml and the kiln dispatch step in release.yml fire OIDC exchanges against an archived repo that no longer accepts dispatches, surfacing as a perpetual SKIPPED check on PRs and a noisy no-op step on releases. - ci.yml: drop the entire `notify-kiln` job - release.yml: drop the `octo-sts` token exchange + `Dispatch to kiln` step; update the file-header comment to reflect ley-line-open as the upstream linkage instead of "notify kiln"
There was a problem hiding this comment.
Pull request overview
This PR refactors the read-only --control mount to query the ley-line daemon over the existing UDS-backed graph (avoiding temp .db extraction and hot-swap plumbing), and updates the leyline auto-download URL to point at the public ley-line-open repo with a regression test.
Changes:
- Route read-only
--controlmounts throughudsGraphover UDS; keep writable mounts on the temp-file SQLite path. - Update leyline auto-download URL to
agentic-research/ley-line-openand add a regression test. - Update README + architecture docs to reflect the bundle vs local/dev deployment shapes and the daemon-owns-SQLite story.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Documents deployment modes and leyline auto-discovery/download behavior. |
| internal/leyline/socket.go | Introduces a URL template for public releases and special-cases 404 handling in download flow. |
| internal/leyline/socket_test.go | Adds a regression test ensuring the download URL points at ley-line-open. |
| docs/ARCHITECTURE.md | Updates the key file reference to include the UDS query proxy path and removes legacy hot-swap description. |
| cmd/mount.go | Replaces read-only control hot-swap (temp .db + watcher) with a UDS-backed graph connection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- internal/leyline/socket.go: wire MACHE_NO_LEYLINE=1 into the
auto-download path so the README claim ("disables leyline
auto-download in CI") matches actual behavior. Without leyline on
PATH and with MACHE_NO_LEYLINE set, return a clear "install or
unset" error instead of attempting a download.
- internal/leyline/socket.go: reword the 404 case message — no
implicit "falling back" claim (DiscoverOrStart's caller decides
what to do; the function itself does not fall back). New wording
surfaces the "no release published yet" condition explicitly.
- cmd/uds_graph.go: document the GetCallees stub limitation —
daemon has no find_callees op yet; (nil,nil) matches existing
Graph backends' "no callees" semantics so consumers degrade
gracefully. Tracked under mache-75d847 (paired with LLO Bead B).
- cmd/uds_graph.go: document the N+1 in ListChildStats and the
planned daemon-side list_child_stats fix (mache-75d847 / LLO
Bead C).
No behavior change for the perf items — fixes are tracked upstream
and will land when both sides implement the daemon ops.
… switch pending GetCallees
Two corrections to the prior commit on this branch:
## Real bug fixed (mache-a55625): udsGraph parses list_children responses
The LLO daemon's list_children op returns
`[{id, name, kind, size}, ...]` per child (full stats in one
round-trip — rs/ll-open/cli-lib/src/daemon/ops.rs::op_list_children).
udsGraph's ListChildren and ListChildStats were doing `c.(string)` on
each entry — but each entry is `map[string]any`, not a string. The
assertion failed silently, so every non-empty directory listing came
back as an empty slice. New listChildrenResponse helper parses the
objects properly; both methods now consume the same payload. As a
consequence ListChildStats is now single-shot — no more per-child
GetNode fan-out — which incidentally closes the previously-suspected
N+1 (it was never the right diagnosis: the daemon already returned
stats).
New tests in cmd/uds_graph_test.go pin the parse contract:
- TestListChildren_ParsesObjectsNotStrings — regression guard
- TestListChildStats_SingleShotFromListChildrenResponse — asserts no
get_node ops fire during a ListChildStats call
## Revert: mount.go control read-only stays on SQLiteGraph
The previous commit switched mountControl's read-only branch to
udsGraph. That eliminates the 352MB ExtractActiveDB temp-copy on
that path, but silently regresses callees-dependent features
(/callees vfs, find_impact, find_smells call-graph rules) because
the daemon has no find_callees op (base_op_names lists find_callers
+ find_defs only). Reverted; mount-control read-only continues to
use ExtractActiveDB + OpenSQLiteGraph + HotSwapGraph + the watcher
goroutine + trySubscribe helper.
The eventual switch is tracked under mache-98b9bf, gated on
mache-a5e4ea (mache-side GetCallees impl), which is gated on LLO
Bead B ley-line-open-a47d7d (daemon-side find_callees op).
- uds_graph.go: listChildrenResponse now checks the daemon's
{ok, error} envelope. Returns graph.ErrNotFound when the error
message contains "not found" so readdir on a stale ID surfaces
cleanly, matching SQLiteGraph.ListChildren semantics. Wraps other
daemon-side errors instead of silently returning an empty slice.
- uds_graph_test.go: sawGetNode is now atomic.Bool — the previous
bool was written by the stub-daemon goroutine and read by the
test goroutine without synchronization, tripping `go test -race`.
- README.md: grammar fix — "the unit that a cluster orchestrator
deploys" (missing 'that').
- release.yml: header comment was wrong in the kiln-removal commit.
The release workflow still links against ley-line FFI artifacts
from the private agentic-research/ley-line repo (the public
ley-line-open repo carries schemas + daemon, not the linker
artifacts). Updated the comment to reflect actual behavior +
note the eventual migration path.
… re-apply mount --control UDS switch
LLO 0.2.2 (clients/go/leyline-schema/v0.2.2, daemon op_find_callees in
ley-line-open#7) added the daemon-side op the prior revert was gated
on. With find_callees in place, the read-only `mount --control` path
can finally drop ExtractActiveDB without regressing callees-dependent
features.
## udsGraph.GetCallees (mache-a5e4ea)
Sends `find_callees(id)` and decodes the daemon's
`{callees: [{node_id, source_id}, ...]}` response. Daemon-side SQL is
SELECT DISTINCT d.node_id, d.source_id
FROM node_refs r JOIN node_defs d ON r.token = d.token
WHERE r.node_id = ?
Returns `(nil, nil)` for not-found / empty-result, matching
SQLiteGraph + MemoryStore semantics so consumers (`/callees` vfs,
find_impact, find_smells) degrade silently when a construct has no
outgoing edges. Self-edges + duplicates filtered client-side
(defense in depth — daemon also de-dups via SELECT DISTINCT).
Two new tests pin the wire and dedup behavior:
- TestGetCallees_ResolvesViaDaemonFindCalleesOp: asserts the
find_callees op fires exactly once and the response is correctly
parsed, deduped, and self-edge-filtered.
- TestGetCallees_EmptyResultIsNotAnError: daemon `{ok: false}` does
not surface as an error.
## mount.go switch re-applied (mache-98b9bf)
Read-only `mount --control` now routes through `newUDSGraph` instead
of `ExtractActiveDB + OpenSQLiteGraph + HotSwapGraph + watcher`. The
352MB temp-copy is eliminated on this path. Hot-swap is implicit —
each udsGraph query reflects the daemon's current snapshot, so root
bumps don't need a graph-level swap. Removed: HotSwapGraph wrapping,
the watcher goroutine, trySubscribe helper (was only used by the
watcher). Writable mount still extracts because WritableGraph needs
an in-process SQL connection for splice-on-close.
## go.mod
Bumped `leyline-schema` v0.2.1 → v0.2.2 (carries the new daemon
schema types FindCalleesRequest/Response, TokenMapEntry,
GetRefsMapRequest/Response, etc. — this PR only consumes
find_callees, the others are available for future use).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
internal/leyline/socket.go:126
- The new
MACHE_NO_LEYLINEbranch changes DiscoverOrStart behavior (it now hard-fails early instead of attempting auto-download), but there’s no unit test covering this path. Adding a regression test in socket_test.go (e.g., set PATH to empty + ensure ~/.mache/bin missing + set MACHE_NO_LEYLINE=1) would prevent accidental reintroduction of network downloads in CI.
localBin := filepath.Join(home, ".mache", "bin", "leyline")
if _, statErr := os.Stat(localBin); statErr == nil {
leylineBin = localBin
} else if os.Getenv("MACHE_NO_LEYLINE") != "" {
// CI / bundle deployments set this to opt out of the
// network-fetch path entirely. Surface a clear error
// rather than attempting a download that's likely to
// 404 against an unpublished ley-line-open release.
return "", fmt.Errorf("leyline not on PATH and MACHE_NO_LEYLINE is set; install leyline or unset MACHE_NO_LEYLINE")
} else {
// Auto-download from GitHub releases
downloaded, dlErr := downloadLeyline(localBin)
if dlErr != nil {
return "", fmt.Errorf("leyline not on PATH and auto-download failed: %w", dlErr)
}
leylineBin = downloaded
- cmd/mount.go: replace stat-based socket readiness with dial-loop. The previous loop os.Stat'd the .sock file and then dialed once, which fails immediately on (a) a stale leftover .sock from a dead daemon — file exists but no listener; (b) the bind-then-listen race where the socket file appears before the daemon accepts connections. Loop on newUDSGraph(sockPath) directly until it succeeds or the 30s deadline expires; retries every 100ms. - internal/leyline/socket_test.go: add regression test for MACHE_NO_LEYLINE=1 short-circuiting DiscoverOrStart's auto-download path. Sets HOME → tempdir so the ~/.mache/bin fallback misses; asserts the error names MACHE_NO_LEYLINE and does NOT mention a download attempt (which would mean the env var didn't gate). Suppressed-low-confidence note from the 3rd Copilot review pass.
…ons [mache-52a23a] PR #371 smoke testing surfaced six orphaned `leyline daemon` processes, all spawned by `DiscoverOrStart` and all racing for the well-known socket at `~/.mache/default.sock`. Two failure modes were live simultaneously: 1) Post-SIGKILL stale socket. When mache was SIGKILL'd, `StopManaged` never fired and the daemon orphaned. The socket inode lingered on disk. On next mache startup, `findExistingSocket` did `os.Stat` only — the path existed, so the caller got back a path no daemon was listening on; the very next `DialSocket` failed with "connect: connection refused" long after `DiscoverOrStart` had reported success. 2) In-process double-spawn. `DiscoverOrStart` released `managed.mu` before reaching `cmd.Start()`. Two goroutines that both saw `managed.sock == ""` could each enter the spawn block; the second binder lost the listen-path race and exited silently while mache kept a handle to a daemon it never validated. The fix: * New `isSocketAlive` performs a 500ms Unix-domain `Dial` against any candidate path. The probe distinguishes "live listener" from "stale socket file" — the discrimination the stat-only check lacked. 500ms is generous enough to absorb daemon scheduling jitter and short enough that a stale socket (instant ECONNREFUSED) adds no measurable startup latency. * `DiscoverOrStart` now holds `managed.mu` for the entire discover-then-spawn cycle. Concurrent in-process callers serialize through the lock; only the first one through reaches the spawn block, the rest piggyback on the resulting socket. * Stale socket files at the well-known path are removed before spawn so the new daemon can bind cleanly. An explicit `LEYLINE_SOCKET` pointing at a dead listener is reported as an error rather than silently overwritten — it isn't mache's to manage. * The post-spawn wait loop also uses `isSocketAlive` rather than `os.Stat`, so we don't return a path whose inode appeared before the daemon's accept loop was ready. Tests: * `TestDiscoverOrStart_OrphanedSocketRemovedThenSpawnsFresh` plants a touch-file at the well-known path with no listener, verifies the spawn path is reached (gated to fail via `MACHE_NO_LEYLINE=1`), and asserts the stale inode is gone after the call. * `TestDiscoverOrStart_LiveDaemonNotRespawned` stands up a real UDS listener, sets `MACHE_NO_LEYLINE=1` as a tripwire (any fall-through to spawn would surface), and verifies the live socket is returned without entering the spawn block. * `TestDiscoverOrStart_ConcurrentCallersDoNotDoubleSpawn` fans 16 goroutines through `DiscoverOrStart` against a single live listener and asserts every caller returns the same path with no error. The prior code's race window — pre-lock fast path + late lock — would surface either as spawn-attempt errors or path divergence. * The pre-existing `TestDiscoverOrStart_UsesExistingSocket` now needs a real listener (touch-file isn't enough) and was updated alongside to share an `acceptAndClose` helper used by the new tests. Verified clean under `go test -race -count=10 -run TestDiscover ./internal/leyline/` and `task check`.
…che-52a23a] Fixes two real bugs in DiscoverOrStart: 1. Orphaned daemons on crash/SIGKILL — six observed during PR #371 smoke test 2. In-process double-spawn — no synchronization between findExistingSocket and cmd.Start Approach (hybrid): - isSocketAlive() — net.DialTimeout-based liveness check (500ms budget), replaces naive os.Stat - managed.mu lock spans the FULL discover-then-spawn cycle so concurrent in-process callers serialize - Stale socket file removal happens inside the lock - LEYLINE_SOCKET (user-pointed external daemon) is respected: stale → return clear error 9 new tests in socket_test.go (3 hermetic + 6 live-leyline E2E with skip-if-no-binary). Coverage 25.4% → 85.7%; remaining 14.3% is 5 fault-injection-only branches. One TOCTOU defensive guard annotated // coverage:ignore with explicit justification. Coverage-gate (mache-66d8df): exit 0 against trunk. Fresh-eyes audit: APPROVE. Closes mache-52a23a. Lands on feat/evolve-coverage-trunk (PR #385).
…che-52a23a] Fixes two real bugs in DiscoverOrStart: 1. Orphaned daemons on crash/SIGKILL — six observed during PR #371 smoke test 2. In-process double-spawn — no synchronization between findExistingSocket and cmd.Start Approach (hybrid): - isSocketAlive() — net.DialTimeout-based liveness check (500ms budget), replaces naive os.Stat - managed.mu lock spans the FULL discover-then-spawn cycle so concurrent in-process callers serialize - Stale socket file removal happens inside the lock - LEYLINE_SOCKET (user-pointed external daemon) is respected: stale → return clear error 9 new tests in socket_test.go (3 hermetic + 6 live-leyline E2E with skip-if-no-binary). Coverage 25.4% → 85.7%; remaining 14.3% is 5 fault-injection-only branches. One TOCTOU defensive guard annotated // coverage:ignore with explicit justification. Coverage-gate (mache-66d8df): exit 0 against trunk. Fresh-eyes audit: APPROVE. Closes mache-52a23a. Lands on feat/evolve-coverage-trunk (PR #385).
Summary
End-to-end completion of the substrate-thesis story across
mache-172a50→mache-3bf414→mache-98aa97→ this PR: in--controlmode, mache no longer opens SQLite at all. The daemon owns it (zero-copy viasqlite3_deserializeon the arena); mache is a pure UDS query client. No more 352MB temp-copy on any--controlpath.What ships
mache-a5e4ea+mache-98b9bf(commitdbf4bd5) — bumpsleyline-schemato v0.2.2 (carries LLO build(deps): bump golangci/golangci-lint-action from 6 to 9 #7'sfind_calleesdaemon op). ImplementsudsGraph.GetCalleesagainst it. Re-applies the read-onlymount --controlswitch toudsGraphthat was reverted earlier in this PR pending the daemon op.ExtractActiveDB + OpenSQLiteGraph + HotSwapGraph + watcher goroutine + trySubscribe helperall gone on the read-only path. Hot-swap is implicit (each query reflects the daemon's current snapshot). Two new tests pin the wire shape + dedup + self-edge filtering for GetCallees.mache-a55625(commitd4dadd9) —udsGraph.ListChildren/ListChildStatswere doingc.(string)on each child, but the daemon returns[{id, name, kind, size}, ...]— silent type-assertion failure made every non-empty dir come back empty. NewlistChildrenResponsehelper parses the objects properly; both methods consume the same payload.ListChildStatsis now single-shot — no per-childGetNodefan-out. Plusok/errorenvelope check so daemon-side errors surface asgraph.ErrNotFound(matchingSQLiteGraphsemantics) rather than empty slices.mache-9051f0(commitb76e36a) — auto-download URL fixed (privateley-line→ publicley-line-open); URL lifted to a package constant; regression test added;MACHE_NO_LEYLINE=1wired intoDiscoverOrStartso the README claim is true.Kiln cross-repo dispatch removed (commit
459abb0) — kiln is archived (mache-172a50shipped apko+melange in-repo).notify-kilnjob removed fromci.yml; kiln dispatch step + OIDC exchange removed fromrelease.yml.Docs reorientation — README splits deployment into Bundle / image (canonical production) vs Local / dev path, documents
MACHE_NO_LEYLINE=1for CI, notes that mache itself doesn't implement perimeter auth (orchestrator does it per cloister ADR-0005 amendment). ARCHITECTURE.md row for "UDS query proxy" updated to reflect end state: bothserve --controlandmount --control(read-only) route through it; only writable mount still extracts.Test plan
task lint—0 issuestask test— full suite green (exit=0)TestListChildren_ParsesObjectsNotStrings— regression guard for the parse bugTestListChildStats_SingleShotFromListChildrenResponse— asserts noget_nodeops fireTestGetCallees_ResolvesViaDaemonFindCalleesOp— pins find_callees wire + dedup + self-edge skipTestGetCallees_EmptyResultIsNotAnError— daemon{ok: false}doesn't surface as errorTestLeylineReleaseURL_PointsAtPublicRepo— guards URL constantfeat/mache-98b9bf-kill-extract-active-dbCloses
mache-98b9bf— kill ExtractActiveDB on --control read-only mountmache-a5e4ea— udsGraph.GetCallees against LLO find_calleesmache-a55625— udsGraph list_children parse bugmache-9051f0— auto-download URL fixCross-repo coordination (paired with LLO 0.2.2)
ley-line-open#7("4 missing ops mache needs") shipped + taggedclients/go/leyline-schema/v0.2.2on 2026-05-11. This PR is themache-side consumer.