Skip to content

feat: kill ExtractActiveDB on --control, consume LLO 0.2.2 find_callees#371

Merged
jamestexas merged 7 commits into
mainfrom
feat/mache-98b9bf-kill-extract-active-db
May 11, 2026
Merged

feat: kill ExtractActiveDB on --control, consume LLO 0.2.2 find_callees#371
jamestexas merged 7 commits into
mainfrom
feat/mache-98b9bf-kill-extract-active-db

Conversation

@jamestexas
Copy link
Copy Markdown
Contributor

@jamestexas jamestexas commented May 11, 2026

Summary

End-to-end completion of the substrate-thesis story across mache-172a50mache-3bf414mache-98aa97 → this PR: in --control mode, mache no longer opens SQLite at all. The daemon owns it (zero-copy via sqlite3_deserialize on the arena); mache is a pure UDS query client. No more 352MB temp-copy on any --control path.

What ships

  • mache-a5e4ea + mache-98b9bf (commit dbf4bd5) — bumps leyline-schema to v0.2.2 (carries LLO build(deps): bump golangci/golangci-lint-action from 6 to 9 #7's find_callees daemon op). Implements udsGraph.GetCallees against it. Re-applies the read-only mount --control switch to udsGraph that was reverted earlier in this PR pending the daemon op. ExtractActiveDB + OpenSQLiteGraph + HotSwapGraph + watcher goroutine + trySubscribe helper all gone on the read-only path. Hot-swap is implicit (each query reflects the daemon's current snapshot). Two new tests pin the wire shape + dedup + self-edge filtering for GetCallees.

  • mache-a55625 (commit d4dadd9) — udsGraph.ListChildren / ListChildStats were doing c.(string) on each child, but the daemon returns [{id, name, kind, size}, ...] — silent type-assertion failure made every non-empty dir come back empty. New listChildrenResponse helper parses the objects properly; both methods consume the same payload. ListChildStats is now single-shot — no per-child GetNode fan-out. Plus ok/error envelope check so daemon-side errors surface as graph.ErrNotFound (matching SQLiteGraph semantics) rather than empty slices.

  • mache-9051f0 (commit b76e36a) — auto-download URL fixed (private ley-line → public ley-line-open); URL lifted to a package constant; regression test added; MACHE_NO_LEYLINE=1 wired into DiscoverOrStart so the README claim is true.

  • Kiln cross-repo dispatch removed (commit 459abb0) — kiln is archived (mache-172a50 shipped apko+melange in-repo). notify-kiln job removed from ci.yml; kiln dispatch step + OIDC exchange removed from release.yml.

  • Docs reorientation — README splits deployment into Bundle / image (canonical production) vs Local / dev path, documents MACHE_NO_LEYLINE=1 for CI, notes that mache itself doesn't implement perimeter auth (orchestrator does it per cloister ADR-0005 amendment). ARCHITECTURE.md row for "UDS query proxy" updated to reflect end state: both serve --control and mount --control (read-only) route through it; only writable mount still extracts.

Test plan

  • task lint0 issues
  • task test — full suite green (exit=0)
  • TestListChildren_ParsesObjectsNotStrings — regression guard for the parse bug
  • TestListChildStats_SingleShotFromListChildrenResponse — asserts no get_node ops fire
  • TestGetCallees_ResolvesViaDaemonFindCalleesOp — pins find_callees wire + dedup + self-edge skip
  • TestGetCallees_EmptyResultIsNotAnError — daemon {ok: false} doesn't surface as error
  • TestLeylineReleaseURL_PointsAtPublicRepo — guards URL constant
  • CI must be green on feat/mache-98b9bf-kill-extract-active-db

Closes

  • mache-98b9bf — kill ExtractActiveDB on --control read-only mount
  • mache-a5e4ea — udsGraph.GetCallees against LLO find_callees
  • mache-a55625 — udsGraph list_children parse bug
  • mache-9051f0 — auto-download URL fix

Cross-repo coordination (paired with LLO 0.2.2)

ley-line-open#7 ("4 missing ops mache needs") shipped + tagged
clients/go/leyline-schema/v0.2.2 on 2026-05-11. This PR is the
mache-side consumer.

…p ExtractActiveDB on that path

mountControl's read-only branch now connects to the ley-line daemon via
the existing udsGraph (the same path buildControlGraph uses). Removes:

- ExtractActiveDB call on initial mount (used to copy the active
  arena buffer — ~352MB with _source BLOBs — to a temp .db file)
- ExtractActiveDB call on every hot-swap (same cost per root bump)
- OpenSQLiteGraph + the in-process SQLite connection it implied
- The HotSwapGraph wrapper for this path (each udsGraph query
  reflects the daemon's current snapshot — hot-swap is implicit;
  no graph-level swap needed on the mache side)
- The watcher goroutine that polled current_root / subscribed to
  daemon snapshot events and rebuilt the local graph
- trySubscribe helper (was only used by that watcher)

Writable mount keeps ExtractActiveDB + OpenWritableGraph — the
WritableGraph splices edits into a real SQL connection before
flushing back to the arena, and the daemon doesn't expose write
ops over UDS. That path is unchanged.

Also (mache-9051f0):
- Auto-download URL constant moved from agentic-research/ley-line
  (private repo, retired) to agentic-research/ley-line-open. Documents
  that the bundle path is the canonical deployment and this download
  flow only runs for non-bundle invocations.
- Lift URL into a package constant (leylineReleaseURLTemplate) so it
  can be regression-tested.
- New TestLeylineReleaseURL_PointsAtPublicRepo guards against future
  drift to the private repo.

Docs:
- README: split deployment into "Bundle / image (canonical production)"
  and "Local / dev path" sections, document MACHE_NO_LEYLINE=1 for CI,
  note that mache itself doesn't implement perimeter auth (orchestrator
  handles it for the bundle deployment).
- ARCHITECTURE.md: add a key-file-reference row for the UDS query
  proxy explaining the daemon-owns-SQLite zero-copy story.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

find_smells (advisory)

Scoped to files changed in this PR. Rules below run on the standalone (no-LLO) cross-ref tables; _ast rules (cyclomatic_complexity, long_function, long_file, magic_int_in_comparison) are not exercised here.

dead_code — 1 finding(s) in changed files
Source Node Metric
internal/leyline/socket.go leyline/methods/SocketClient.Subscribe 0
fan_out_skew — 3 finding(s) in changed files
Source Node Metric
cmd/mount.go cmd/variables/rootCmd/source 77
cmd/mount.go cmd/functions/mountNFS/source 38
internal/leyline/socket.go leyline/functions/DiscoverOrStart/source 22
god_file — 1 finding(s) in changed files
Source Node Metric
cmd/mount.go 100

Rules: see docs/ARCHITECTURE.md for the full registry. Advisory only — these are heuristics, not gates.

kiln was retired when apko + melange landed in-repo (mache-172a50 /
PR #370). The `notify-kiln` job in ci.yml and the kiln dispatch step
in release.yml fire OIDC exchanges against an archived repo that no
longer accepts dispatches, surfacing as a perpetual SKIPPED check on
PRs and a noisy no-op step on releases.

- ci.yml: drop the entire `notify-kiln` job
- release.yml: drop the `octo-sts` token exchange + `Dispatch to kiln`
  step; update the file-header comment to reflect ley-line-open as the
  upstream linkage instead of "notify kiln"
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the read-only --control mount to query the ley-line daemon over the existing UDS-backed graph (avoiding temp .db extraction and hot-swap plumbing), and updates the leyline auto-download URL to point at the public ley-line-open repo with a regression test.

Changes:

  • Route read-only --control mounts through udsGraph over UDS; keep writable mounts on the temp-file SQLite path.
  • Update leyline auto-download URL to agentic-research/ley-line-open and add a regression test.
  • Update README + architecture docs to reflect the bundle vs local/dev deployment shapes and the daemon-owns-SQLite story.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
README.md Documents deployment modes and leyline auto-discovery/download behavior.
internal/leyline/socket.go Introduces a URL template for public releases and special-cases 404 handling in download flow.
internal/leyline/socket_test.go Adds a regression test ensuring the download URL points at ley-line-open.
docs/ARCHITECTURE.md Updates the key file reference to include the UDS query proxy path and removes legacy hot-swap description.
cmd/mount.go Replaces read-only control hot-swap (temp .db + watcher) with a UDS-backed graph connection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
Comment thread internal/leyline/socket.go Outdated
Comment thread cmd/mount.go Outdated
Comment thread cmd/mount.go Outdated
- internal/leyline/socket.go: wire MACHE_NO_LEYLINE=1 into the
  auto-download path so the README claim ("disables leyline
  auto-download in CI") matches actual behavior. Without leyline on
  PATH and with MACHE_NO_LEYLINE set, return a clear "install or
  unset" error instead of attempting a download.
- internal/leyline/socket.go: reword the 404 case message — no
  implicit "falling back" claim (DiscoverOrStart's caller decides
  what to do; the function itself does not fall back). New wording
  surfaces the "no release published yet" condition explicitly.
- cmd/uds_graph.go: document the GetCallees stub limitation —
  daemon has no find_callees op yet; (nil,nil) matches existing
  Graph backends' "no callees" semantics so consumers degrade
  gracefully. Tracked under mache-75d847 (paired with LLO Bead B).
- cmd/uds_graph.go: document the N+1 in ListChildStats and the
  planned daemon-side list_child_stats fix (mache-75d847 / LLO
  Bead C).

No behavior change for the perf items — fixes are tracked upstream
and will land when both sides implement the daemon ops.
… switch pending GetCallees

Two corrections to the prior commit on this branch:

## Real bug fixed (mache-a55625): udsGraph parses list_children responses

The LLO daemon's list_children op returns
`[{id, name, kind, size}, ...]` per child (full stats in one
round-trip — rs/ll-open/cli-lib/src/daemon/ops.rs::op_list_children).
udsGraph's ListChildren and ListChildStats were doing `c.(string)` on
each entry — but each entry is `map[string]any`, not a string. The
assertion failed silently, so every non-empty directory listing came
back as an empty slice. New listChildrenResponse helper parses the
objects properly; both methods now consume the same payload. As a
consequence ListChildStats is now single-shot — no more per-child
GetNode fan-out — which incidentally closes the previously-suspected
N+1 (it was never the right diagnosis: the daemon already returned
stats).

New tests in cmd/uds_graph_test.go pin the parse contract:
- TestListChildren_ParsesObjectsNotStrings — regression guard
- TestListChildStats_SingleShotFromListChildrenResponse — asserts no
  get_node ops fire during a ListChildStats call

## Revert: mount.go control read-only stays on SQLiteGraph

The previous commit switched mountControl's read-only branch to
udsGraph. That eliminates the 352MB ExtractActiveDB temp-copy on
that path, but silently regresses callees-dependent features
(/callees vfs, find_impact, find_smells call-graph rules) because
the daemon has no find_callees op (base_op_names lists find_callers
+ find_defs only). Reverted; mount-control read-only continues to
use ExtractActiveDB + OpenSQLiteGraph + HotSwapGraph + the watcher
goroutine + trySubscribe helper.

The eventual switch is tracked under mache-98b9bf, gated on
mache-a5e4ea (mache-side GetCallees impl), which is gated on LLO
Bead B ley-line-open-a47d7d (daemon-side find_callees op).
@jamestexas jamestexas changed the title feat: route read-only --control mount through UDS + fix download URL fix(uds): parse list_children objects + URL fix + bundle docs May 11, 2026
@jamestexas jamestexas requested a review from Copilot May 11, 2026 20:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Comment thread cmd/uds_graph.go
Comment thread cmd/uds_graph_test.go Outdated
Comment thread README.md Outdated
Comment thread .github/workflows/release.yml Outdated
- uds_graph.go: listChildrenResponse now checks the daemon's
  {ok, error} envelope. Returns graph.ErrNotFound when the error
  message contains "not found" so readdir on a stale ID surfaces
  cleanly, matching SQLiteGraph.ListChildren semantics. Wraps other
  daemon-side errors instead of silently returning an empty slice.
- uds_graph_test.go: sawGetNode is now atomic.Bool — the previous
  bool was written by the stub-daemon goroutine and read by the
  test goroutine without synchronization, tripping `go test -race`.
- README.md: grammar fix — "the unit that a cluster orchestrator
  deploys" (missing 'that').
- release.yml: header comment was wrong in the kiln-removal commit.
  The release workflow still links against ley-line FFI artifacts
  from the private agentic-research/ley-line repo (the public
  ley-line-open repo carries schemas + daemon, not the linker
  artifacts). Updated the comment to reflect actual behavior +
  note the eventual migration path.
… re-apply mount --control UDS switch

LLO 0.2.2 (clients/go/leyline-schema/v0.2.2, daemon op_find_callees in
ley-line-open#7) added the daemon-side op the prior revert was gated
on. With find_callees in place, the read-only `mount --control` path
can finally drop ExtractActiveDB without regressing callees-dependent
features.

## udsGraph.GetCallees (mache-a5e4ea)

Sends `find_callees(id)` and decodes the daemon's
`{callees: [{node_id, source_id}, ...]}` response. Daemon-side SQL is

  SELECT DISTINCT d.node_id, d.source_id
  FROM node_refs r JOIN node_defs d ON r.token = d.token
  WHERE r.node_id = ?

Returns `(nil, nil)` for not-found / empty-result, matching
SQLiteGraph + MemoryStore semantics so consumers (`/callees` vfs,
find_impact, find_smells) degrade silently when a construct has no
outgoing edges. Self-edges + duplicates filtered client-side
(defense in depth — daemon also de-dups via SELECT DISTINCT).

Two new tests pin the wire and dedup behavior:
- TestGetCallees_ResolvesViaDaemonFindCalleesOp: asserts the
  find_callees op fires exactly once and the response is correctly
  parsed, deduped, and self-edge-filtered.
- TestGetCallees_EmptyResultIsNotAnError: daemon `{ok: false}` does
  not surface as an error.

## mount.go switch re-applied (mache-98b9bf)

Read-only `mount --control` now routes through `newUDSGraph` instead
of `ExtractActiveDB + OpenSQLiteGraph + HotSwapGraph + watcher`. The
352MB temp-copy is eliminated on this path. Hot-swap is implicit —
each udsGraph query reflects the daemon's current snapshot, so root
bumps don't need a graph-level swap. Removed: HotSwapGraph wrapping,
the watcher goroutine, trySubscribe helper (was only used by the
watcher). Writable mount still extracts because WritableGraph needs
an in-process SQL connection for splice-on-close.

## go.mod

Bumped `leyline-schema` v0.2.1 → v0.2.2 (carries the new daemon
schema types FindCalleesRequest/Response, TokenMapEntry,
GetRefsMapRequest/Response, etc. — this PR only consumes
find_callees, the others are available for future use).
@jamestexas jamestexas changed the title fix(uds): parse list_children objects + URL fix + bundle docs feat: kill ExtractActiveDB on --control, consume LLO 0.2.2 find_callees May 11, 2026
@jamestexas jamestexas requested a review from Copilot May 11, 2026 22:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

internal/leyline/socket.go:126

  • The new MACHE_NO_LEYLINE branch changes DiscoverOrStart behavior (it now hard-fails early instead of attempting auto-download), but there’s no unit test covering this path. Adding a regression test in socket_test.go (e.g., set PATH to empty + ensure ~/.mache/bin missing + set MACHE_NO_LEYLINE=1) would prevent accidental reintroduction of network downloads in CI.
		localBin := filepath.Join(home, ".mache", "bin", "leyline")
		if _, statErr := os.Stat(localBin); statErr == nil {
			leylineBin = localBin
		} else if os.Getenv("MACHE_NO_LEYLINE") != "" {
			// CI / bundle deployments set this to opt out of the
			// network-fetch path entirely. Surface a clear error
			// rather than attempting a download that's likely to
			// 404 against an unpublished ley-line-open release.
			return "", fmt.Errorf("leyline not on PATH and MACHE_NO_LEYLINE is set; install leyline or unset MACHE_NO_LEYLINE")
		} else {
			// Auto-download from GitHub releases
			downloaded, dlErr := downloadLeyline(localBin)
			if dlErr != nil {
				return "", fmt.Errorf("leyline not on PATH and auto-download failed: %w", dlErr)
			}
			leylineBin = downloaded

Comment thread cmd/mount.go Outdated
- cmd/mount.go: replace stat-based socket readiness with dial-loop.
  The previous loop os.Stat'd the .sock file and then dialed once,
  which fails immediately on (a) a stale leftover .sock from a dead
  daemon — file exists but no listener; (b) the bind-then-listen
  race where the socket file appears before the daemon accepts
  connections. Loop on newUDSGraph(sockPath) directly until it
  succeeds or the 30s deadline expires; retries every 100ms.
- internal/leyline/socket_test.go: add regression test for
  MACHE_NO_LEYLINE=1 short-circuiting DiscoverOrStart's
  auto-download path. Sets HOME → tempdir so the ~/.mache/bin
  fallback misses; asserts the error names MACHE_NO_LEYLINE and
  does NOT mention a download attempt (which would mean the env
  var didn't gate). Suppressed-low-confidence note from the 3rd
  Copilot review pass.
@jamestexas jamestexas merged commit b05e58c into main May 11, 2026
14 checks passed
@jamestexas jamestexas deleted the feat/mache-98b9bf-kill-extract-active-db branch May 11, 2026 22:49
jamestexas added a commit that referenced this pull request May 19, 2026
…ons [mache-52a23a]

PR #371 smoke testing surfaced six orphaned `leyline daemon` processes,
all spawned by `DiscoverOrStart` and all racing for the well-known socket
at `~/.mache/default.sock`. Two failure modes were live simultaneously:

1) Post-SIGKILL stale socket. When mache was SIGKILL'd, `StopManaged`
   never fired and the daemon orphaned. The socket inode lingered on
   disk. On next mache startup, `findExistingSocket` did `os.Stat`
   only — the path existed, so the caller got back a path no daemon
   was listening on; the very next `DialSocket` failed with
   "connect: connection refused" long after `DiscoverOrStart` had
   reported success.

2) In-process double-spawn. `DiscoverOrStart` released `managed.mu`
   before reaching `cmd.Start()`. Two goroutines that both saw
   `managed.sock == ""` could each enter the spawn block; the second
   binder lost the listen-path race and exited silently while
   mache kept a handle to a daemon it never validated.

The fix:

* New `isSocketAlive` performs a 500ms Unix-domain `Dial` against any
  candidate path. The probe distinguishes "live listener" from "stale
  socket file" — the discrimination the stat-only check lacked. 500ms
  is generous enough to absorb daemon scheduling jitter and short
  enough that a stale socket (instant ECONNREFUSED) adds no
  measurable startup latency.

* `DiscoverOrStart` now holds `managed.mu` for the entire
  discover-then-spawn cycle. Concurrent in-process callers serialize
  through the lock; only the first one through reaches the spawn
  block, the rest piggyback on the resulting socket.

* Stale socket files at the well-known path are removed before spawn
  so the new daemon can bind cleanly. An explicit `LEYLINE_SOCKET`
  pointing at a dead listener is reported as an error rather than
  silently overwritten — it isn't mache's to manage.

* The post-spawn wait loop also uses `isSocketAlive` rather than
  `os.Stat`, so we don't return a path whose inode appeared before
  the daemon's accept loop was ready.

Tests:

* `TestDiscoverOrStart_OrphanedSocketRemovedThenSpawnsFresh` plants
  a touch-file at the well-known path with no listener, verifies the
  spawn path is reached (gated to fail via `MACHE_NO_LEYLINE=1`),
  and asserts the stale inode is gone after the call.

* `TestDiscoverOrStart_LiveDaemonNotRespawned` stands up a real UDS
  listener, sets `MACHE_NO_LEYLINE=1` as a tripwire (any fall-through
  to spawn would surface), and verifies the live socket is returned
  without entering the spawn block.

* `TestDiscoverOrStart_ConcurrentCallersDoNotDoubleSpawn` fans 16
  goroutines through `DiscoverOrStart` against a single live listener
  and asserts every caller returns the same path with no error. The
  prior code's race window — pre-lock fast path + late lock — would
  surface either as spawn-attempt errors or path divergence.

* The pre-existing `TestDiscoverOrStart_UsesExistingSocket` now needs
  a real listener (touch-file isn't enough) and was updated alongside
  to share an `acceptAndClose` helper used by the new tests.

Verified clean under `go test -race -count=10 -run TestDiscover
./internal/leyline/` and `task check`.
jamestexas added a commit that referenced this pull request May 19, 2026
…che-52a23a]

Fixes two real bugs in DiscoverOrStart:

1. Orphaned daemons on crash/SIGKILL — six observed during PR #371 smoke test
2. In-process double-spawn — no synchronization between findExistingSocket and cmd.Start

Approach (hybrid):
- isSocketAlive() — net.DialTimeout-based liveness check (500ms budget), replaces naive os.Stat
- managed.mu lock spans the FULL discover-then-spawn cycle so concurrent in-process callers serialize
- Stale socket file removal happens inside the lock
- LEYLINE_SOCKET (user-pointed external daemon) is respected: stale → return clear error

9 new tests in socket_test.go (3 hermetic + 6 live-leyline E2E with skip-if-no-binary). Coverage 25.4% → 85.7%; remaining 14.3% is 5 fault-injection-only branches. One TOCTOU defensive guard annotated // coverage:ignore with explicit justification.

Coverage-gate (mache-66d8df): exit 0 against trunk. Fresh-eyes audit: APPROVE.

Closes mache-52a23a. Lands on feat/evolve-coverage-trunk (PR #385).
jamestexas added a commit that referenced this pull request May 19, 2026
…che-52a23a]

Fixes two real bugs in DiscoverOrStart:

1. Orphaned daemons on crash/SIGKILL — six observed during PR #371 smoke test
2. In-process double-spawn — no synchronization between findExistingSocket and cmd.Start

Approach (hybrid):
- isSocketAlive() — net.DialTimeout-based liveness check (500ms budget), replaces naive os.Stat
- managed.mu lock spans the FULL discover-then-spawn cycle so concurrent in-process callers serialize
- Stale socket file removal happens inside the lock
- LEYLINE_SOCKET (user-pointed external daemon) is respected: stale → return clear error

9 new tests in socket_test.go (3 hermetic + 6 live-leyline E2E with skip-if-no-binary). Coverage 25.4% → 85.7%; remaining 14.3% is 5 fault-injection-only branches. One TOCTOU defensive guard annotated // coverage:ignore with explicit justification.

Coverage-gate (mache-66d8df): exit 0 against trunk. Fresh-eyes audit: APPROVE.

Closes mache-52a23a. Lands on feat/evolve-coverage-trunk (PR #385).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants