feat: cross-reference indexing with roaring bitmaps#6
Merged
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rface alignment - Fix dual PRIMARY KEY DDL crash in file_ids table (BLOCKER) - Restore source DB to read-only; write refs to sidecar <db>.refs.db - Replace per-call AddRef SQL with in-memory bitmap accumulation + FlushRefs - AddRef now returns error on IngestionTarget; propagated in processNode - Cache compiled tree-sitter call query per language on SitterWalker - Use unsafe.String for zero-alloc dedup check in ExtractCalls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jamestexas
added a commit
that referenced
this pull request
Mar 11, 2026
…es (#72) TDD implementation for 5 issues from the agent field report: 1. .gitignore respect (Critical #1): Files matching .gitignore patterns are now excluded during ingestion. Supports nested .gitignore files, directory patterns, globs, and negation. On by default; opt out with engine.RespectGitignore = false. 2. file:line location metadata (High #5): Projected construct directories now have a Properties["location"] field with "path:startline:endline" format, closing the read→edit gap for agents. 3. Write-back trailing newline normalization (#6): Splice now strips trailing \n from agent-written content when the original source region didn't end with one, preventing blank-line artifacts from echo/heredoc. 4. Callees in --out mode (#7): materializeCallees writes callees/ directories into the output SQLite DB, matching the callers/ pattern for consistent mount vs --out behavior. 5. Improved agent prompt (PROMPT.txt): Restructured for discovery-first workflow with concrete ls/cat examples and package-level navigation. Also adds tests for unmount metadata handling (#8) and context deduplication (#3, already passing — confirmed working).
5 tasks
jamestexas
added a commit
that referenced
this pull request
Mar 11, 2026
* feat: address agent field report — gitignore, location, splice, callees (#72) TDD implementation for 5 issues from the agent field report: 1. .gitignore respect (Critical #1): Files matching .gitignore patterns are now excluded during ingestion. Supports nested .gitignore files, directory patterns, globs, and negation. On by default; opt out with engine.RespectGitignore = false. 2. file:line location metadata (High #5): Projected construct directories now have a Properties["location"] field with "path:startline:endline" format, closing the read→edit gap for agents. 3. Write-back trailing newline normalization (#6): Splice now strips trailing \n from agent-written content when the original source region didn't end with one, preventing blank-line artifacts from echo/heredoc. 4. Callees in --out mode (#7): materializeCallees writes callees/ directories into the output SQLite DB, matching the callers/ pattern for consistent mount vs --out behavior. 5. Improved agent prompt (PROMPT.txt): Restructured for discovery-first workflow with concrete ls/cat examples and package-level navigation. Also adds tests for unmount metadata handling (#8) and context deduplication (#3, already passing — confirmed working). * feat: live graph — mtime-based staleness detection and on-demand re-ingestion MemoryStore now tracks source file mtimes at index time. On ReadContent, if the source file's mtime has changed, the refresher callback re-ingests just that file before returning content. Zero cost when idle, sub-second on stale access. - SetRefresher/RecordFileMtime/FileMtime/IsFileStale on MemoryStore - Staleness check in ReadContent with double-check locking - Engine.ReIngestFile for single-file re-ingestion preserving RootPath - Wired in cmd/mount.go after ingestion completes - 7 new tests (5 graph, 2 engine) * fix: address PR #73 review feedback — races, correctness, performance Review fixes: 1. Write-back mtime race: RecordFileMtime after splice prevents redundant re-ingest 2. N+1 query in materializeCallees: pre-load defs map, no nested cursors 3. Swallowed refresher error: now logged via log.Printf 4. Engine.sourceFile thread safety: removed mutable field, passed as parameter 5. Per-file refresh mutex: sync.Map of per-file mutexes instead of global lock 6. Deleted files: IsFileStale returns true when os.Stat fails 7. Gitignore negation: evalPatterns returns (ignored, matched) for proper propagation 8. ** glob patterns: matchDoublestar handles **/foo, foo/**, a/**/b 9. Deterministic nested gitignore: sorted by depth, deepest wins 10. Duplicate AddNode: location property set before first store.AddNode 11. extractCallerDir coupling documented 3 new tests for gitignore negation, doublestar, and nested ordering.
jamestexas
added a commit
that referenced
this pull request
Apr 30, 2026
…203) bd doctor flags .beads/interactions.jsonl as a tracked runtime file — every bd command writes to it, forcing a stash dance on every git operation. Same for .beads-credential-key (per-clone secret). Untracks both via git rm --cached + adds them to .beads/.gitignore and the project .gitignore. File contents stay on disk for local inspection (interactions.jsonl is bd's audit log; reviewers can still tail it). Just stops fighting git on every bd call. Adds the rest of bd doctor's recommended runtime patterns to .beads/.gitignore: daemon.*, dolt-server.activity, *.lock, *.corrupt.backup/, .env, export-state.json. Catches future runtime artifacts without per-file additions. Closes part of bd doctor Git Integration warnings (#5, #6, #7).
jamestexas
added a commit
that referenced
this pull request
May 18, 2026
Two critical bugs Copilot caught — both invalidated the production codepath the original PR was supposed to wire. Plus one correctness fix (atomic state install) the prior split sync/async pattern silently allowed. #1 (sock lifetime) — get_communities goroutine deferred sock.Close() unconditionally. The SheafClient stored on the invalidator wrapped that same sock; from the goroutine's return onward the watcher's cascade calls were talking to a closed connection and silently falling back to single-node. The cascade NEVER FIRED in practice under the original wiring; it only worked in tests because they held the SheafClient in scope for the test's lifetime. Fix: handoff pattern. Track handedOff bool; defer only closes sock when ownership wasn't transferred to the invalidator (e.g. on PushTopology failure paths). New leyline.SheafClient.Close() lets callers uniformly close prior backends via io.Closer. #2 (concurrent SendOp races) — SocketClient.SendOp performed an unsynchronized write-then-read on a shared connection. The file watcher fires onChange callbacks from independent debounce timers, so a save burst routes multiple InvalidateWithCascade calls through the same SheafClient concurrently — interleaving requests and reads on the line-delimited JSON protocol with no per-message correlation. Caller A would read caller B's response and the daemon couldn't detect the swap. Fix: sendMu mutex serializes sendRaw (used by SendOp + SendOpInto). TestSendOp_ConcurrentCallsDoNotInterleave races 50 parallel SendOps with id-tagged responses; pre-fix it caught the race under -race + caused id mismatches; post-fix runs clean. Subscribe stays documented-incompatible with concurrent SendOp on the same conn (it owns the read side once subscribed) — that boundary is c14c43 territory. #6 (atomic state swap) — get_communities did SetCommunityResult synchronously then SetSheaf async-after-push. Between the two writes the watcher could observe new membership paired with the OLD sheaf (or, worse, paired with a sheaf pointing at the daemon's PRE-push topology), producing cascades against region IDs the daemon didn't know about yet. Fix: SheafInvalidator.SetState(result, sheaf) — single Lock, both fields swap together. Returns the prior backend so the caller can close it (closes the leaked-prior-socket follow-on from #1). The handler now installs nothing synchronously and atomically swaps state ONLY after PushTopology succeeds. Watcher fires in the dial+ push window observe the prior pair atomically (correct degradation) rather than a mismatched mix (incorrect cascade). Test additions: - TestSendOp_ConcurrentCallsDoNotInterleave (internal/leyline) — race detector regression guard for #2 - TestSheafInvalidator_SetState_AtomicSwap — pins atomic swap semantics + prior-return contract for #6 - TestSheafInvalidator_SetSheaf_ReturnsPrior — standalone path that #6's SetState piggybacks on - TestGetCommunities_PopulatesInvalidator_WithDaemon — replaces the prior happy-path test; drives a mock UDS daemon so PushTopology actually succeeds, verifies atomic install - TestGetCommunities_NoDaemon_LeavesInvalidatorEmpty — pins the correctness side: no daemon → don't install state (graceful degradation) Full repo `go test -race -short` clean.
jamestexas
added a commit
that referenced
this pull request
May 18, 2026
…11848 + mache-4a0c05] (#383) * feat(graph): make SheafInvalidator hot-swappable + add NodesForPath [mache-c11848] Groundwork for wiring the file watcher into the sheaf cascade. Pure internal/graph changes — no caller wiring yet; that lands in the next commit which actually plugs serve.go into these primitives. SheafInvalidator additions (sheaf_invalidate.go): - sync.RWMutex guards sheaf + result so the watcher goroutine can call InvalidateWithCascade while an MCP handler concurrently mutates the invalidator via SetCommunityResult / SetSheaf. Without this protection, the same shape of race PR #380's snapshot fix caught (concurrent read/write on shared graph state) reappears the moment we wire both producers up — better to lock the contract here than discover it under CI's resource-constrained scheduler. - SetSheaf hot-swap: invalidator can be constructed pre-daemon-dial (at serve startup) with sheaf=nil and have the backend installed later. Until swap, falls back to single-node Graph.Invalidate. - HasResult accessor + nil-receiver safety: lets the watcher decide whether a cascade attempt is worth constructing. - InvalidateWithCascade now reads a (sheaf, result) snapshot under the read lock and releases before network I/O — daemon round-trips don't starve writers. MemoryStore.NodesForPath (graph.go): - Exposes what fileToNodes already tracks privately. O(k) via the roaring bitmap; falls back to linear scan only for paths not yet indexed (same fallback path DeleteFileNodes uses). - NodesForPathProvider interface declared alongside so non-MemoryStore backends can plug in without leaking *MemoryStore type assertions into watcher code. Kept separate from the Graph interface so consumers that don't need this query aren't forced to implement it. Test additions (sheaf_invalidate_test.go): - mockGraph + mockSheafBackend now mutex-protected with Calls() / Invalidated() / resetInvalidated() helpers that return snapshots. Required for the new race-detector test; benign for the existing single-goroutine tests. - TestSheafInvalidator_HasResult — nil-safe accessor contract. - TestSheafInvalidator_SetSheaf_Hotswap — pre-swap fallback, post-swap cascade, nil-swap reverts to fallback (no panic). - TestSheafInvalidator_ConcurrentReadWrite — 8 readers × 4 writers × 200 iterations under -race; pins that the mutex actually protects (this test failed under -race against my own pre-mutex draft and passed after). Locally: `go test -race ./internal/graph/...` clean (2.9s). * feat(serve): wire file watcher into SheafInvalidator [mache-c11848] The first link in the audit's 7-step chain. Before this commit the watcher's onChange/onDelete callbacks did purely local reingest — the daemon never learned that a file changed, so the sheaf cascade (real and gated since LLO v0.4.1) never ran in response to an edit. After this commit the cascade is one type-enforced step away from "engaged": get_communities just needs to call SetCommunityResult + SetSheaf on the invalidator returned alongside the graph. Signature change: buildServeGraph (+ buildMaybeMultiGraph + openDBGraph + buildControlGraph) now return a *graph.SheafInvalidator between the graph and the cleanup func. The invalidator is: - non-nil for directory sources (the only construction path that builds a watcher to fire it) - nil for .db sources (frozen at build time, no watcher) - nil for control mode (daemon owns the arena, daemon's own reparse drives freshness) - nil for composite mounts (per-mount invalidators each have their own CommunityResult; no semantically correct unified invalidator exposes to the MCP layer — documented in buildMaybeMultiGraph) - nil for single-file sources (no watcher) Why a return-value change over the alternative (stash on MemoryStore + type-assert in handlers): the type-assert variant silently breaks the day someone swaps the backing store, which is exactly the regression PR #380's snapshot fix taught us to lock with an explicit contract. A non-nil invalidator next to the graph is part of the construction contract; nil-from-this-mode is documented at each call site. buildServeGraph's directory branch now also wires the cascade into both watcher callbacks: - onChange: post-reingest, snapshots affected node IDs via store.NodesForPath (added in the previous commit) and calls InvalidateWithCascade for each. - onDelete: snapshots NodesForPath BEFORE DeleteFileNodes wipes the bitmap (or the cascade has nothing to invalidate), then fires InvalidateWithCascade. Pre-engagement (no SheafBackend installed, no CommunityResult), both calls degrade to single-node Graph.Invalidate — correct but not the moat. Engagement happens when get_communities runs (next PR). lazyGraph stores the *SheafInvalidator + exposes via SheafInvalidator() accessor that the get_communities handler will use to install the post-detection state. Test coverage (cmd/sheaf_wire_test.go): - ReturnsInvalidatorForDir — happy path returns non-nil - NilInvalidatorForNonDir — file source returns nil - InvalidatorWiredIntoStore — fallback path through the invalidator actually calls Graph.Invalidate (single-node) - WatcherFiresInvalidator — writes a file, observes watcher loop - IngestErrorReturnsNilInvalidator — error paths return consistent nil/nil/noop/err shape (regression guard against the partial- construction smell) - OnDeleteAlsoFiresInvalidator — symmetric onDelete path also routes through the invalidator Existing mount tests updated for new signature; assert the composite mount path returns nil invalidator (documented forfeit) and the single-source pass-through returns non-nil. Locally: `go test -race -short ./...` clean. * feat(serve): get_communities installs CommunityResult + SheafClient on invalidator [mache-c11848] Closes the consumer-side loop the mache-49bf9a audit flagged. Until this commit the watcher had an invalidator wired in (previous commit) but no way to *engage* the cascade — get_communities was the only path that produced both the CommunityResult and the dialed SheafClient, and it kept both local to its handler. The watcher's InvalidateWithCascade calls degraded to single-node forever. After this commit: 1. Synchronous on the handler path: as soon as DetectCommunities returns, type-assert sheafInvalidatorProvider (lazyGraph fulfills it; control mode + composite mounts don't, and degrade silently as documented). Call SetCommunityResult on the invalidator with the SAME snapshot the goroutine uses for PushTopology — keeps the two views in lockstep. 2. Asynchronous on the daemon-dial path: once PushTopology succeeds (i.e. the daemon has the topology baseline), call SetSheaf to swap the live SheafClient into the invalidator. From here on, watcher fires route through the cross-region cascade — moat engaged. The order matters: SetCommunityResult sync, SetSheaf async-after- push. Between them the invalidator has membership data but no sheaf backend, so InvalidateWithCascade falls through to single-node. This is the documented graceful degradation — the cascade self-engages when the daemon comes online and self-disengages when the daemon's UDS connection errors (existing fallback path in SheafInvalidator logs + falls back per call). Backends that don't implement sheafInvalidatorProvider (control-mode lazyGraph, composite mounts) skip both steps cleanly. The handler return value is unchanged in all cases — only the wiring side-effect differs. Test coverage (cmd/sheaf_wire_test.go): - PopulatesInvalidator — pins the core contract: after handler runs, invalidator.HasResult() is true. Uses a testGraphWithSI wrapper that satisfies both refsMapProvider (via embedded *MemoryStore) and sheafInvalidatorProvider. - NoInvalidatorWhenGraphDoesntProvide — passes a bare MemoryStore (no provider). Handler succeeds, doesn't panic, no state mutation — pins the silent-degrade contract for control-mode and composite-mount paths. Both tests gate the daemon discovery via MACHE_NO_LEYLINE+PATH+HOME +LEYLINE_SOCKET (same pattern as PR #380's get_communities tests) and wait on the pushDone channel for deterministic completion. Full cmd race-short suite clean (111s). * feat(mcp): get_sheaf_status tool surfaces daemon cache state [mache-4a0c05] The agent-facing visibility layer for the sheaf moat. Without this, the daemon's monotonic generation counter (advances on every cascade run) is invisible to agents — they have no signal that an edit they made actually propagated through the cache. With this tool, agents can poll for {generation, valid, total, defect} and compare against the value they cached alongside their previous query. Design contract — graceful degradation: The handler MUST NOT surface daemon unavailability as an MCP error. Agents polling this tool on a periodic freshness check would otherwise see transport failures whenever the daemon is down, hasn't been dialed yet, or the user is running mache without ley-line. Instead, return a structured {available: false, reason: "..."} response. The reason field is the only place the caller learns *why* state is unavailable. DiscoverSocket (not DiscoverOrStart) is the right primitive: a status check should never trigger a daemon auto-spawn — which can take seconds and may even download the binary on first run. Registered directly on the MCP server (not via r.wrapHandler) because the tool is session-independent — it doesn't touch the per-session graph, only the daemon over UDS. Test coverage (cmd/sheaf_wire_test.go): - ReturnsDaemonState — mock UDS server returns sheaf_status with a QUOTED-STRING generation ("42", per capnp-json Int64 codec — the live wire shape PR #382 added parseUint64 for). The tool must route through SheafClient.Status (which knows that codec) and surface 42 as an integer. Regression guard against a future refactor parsing the daemon response directly and silently dropping Int64 values. - NoDaemonReturnsUnavailable — pins the graceful-degradation contract: no LEYLINE_SOCKET + tempdir HOME → no socket found → returns {available: false, reason: ...} not an MCP error. - RegisteredInToolSet — pins the tool is wired into registerMCPTools. A future refactor that drops the registration would otherwise silently disappear the visibility layer. Bonus: startMockSheafServer + listRegisteredTools test helpers extracted as local-to-cmd utilities (mirroring the unexported internal/leyline pattern) so the package boundary stays clean. Full repo race-short suite clean. * fix(serve,leyline): socket-layer bugs caught by Copilot on PR #383 Two critical bugs Copilot caught — both invalidated the production codepath the original PR was supposed to wire. Plus one correctness fix (atomic state install) the prior split sync/async pattern silently allowed. #1 (sock lifetime) — get_communities goroutine deferred sock.Close() unconditionally. The SheafClient stored on the invalidator wrapped that same sock; from the goroutine's return onward the watcher's cascade calls were talking to a closed connection and silently falling back to single-node. The cascade NEVER FIRED in practice under the original wiring; it only worked in tests because they held the SheafClient in scope for the test's lifetime. Fix: handoff pattern. Track handedOff bool; defer only closes sock when ownership wasn't transferred to the invalidator (e.g. on PushTopology failure paths). New leyline.SheafClient.Close() lets callers uniformly close prior backends via io.Closer. #2 (concurrent SendOp races) — SocketClient.SendOp performed an unsynchronized write-then-read on a shared connection. The file watcher fires onChange callbacks from independent debounce timers, so a save burst routes multiple InvalidateWithCascade calls through the same SheafClient concurrently — interleaving requests and reads on the line-delimited JSON protocol with no per-message correlation. Caller A would read caller B's response and the daemon couldn't detect the swap. Fix: sendMu mutex serializes sendRaw (used by SendOp + SendOpInto). TestSendOp_ConcurrentCallsDoNotInterleave races 50 parallel SendOps with id-tagged responses; pre-fix it caught the race under -race + caused id mismatches; post-fix runs clean. Subscribe stays documented-incompatible with concurrent SendOp on the same conn (it owns the read side once subscribed) — that boundary is c14c43 territory. #6 (atomic state swap) — get_communities did SetCommunityResult synchronously then SetSheaf async-after-push. Between the two writes the watcher could observe new membership paired with the OLD sheaf (or, worse, paired with a sheaf pointing at the daemon's PRE-push topology), producing cascades against region IDs the daemon didn't know about yet. Fix: SheafInvalidator.SetState(result, sheaf) — single Lock, both fields swap together. Returns the prior backend so the caller can close it (closes the leaked-prior-socket follow-on from #1). The handler now installs nothing synchronously and atomically swaps state ONLY after PushTopology succeeds. Watcher fires in the dial+ push window observe the prior pair atomically (correct degradation) rather than a mismatched mix (incorrect cascade). Test additions: - TestSendOp_ConcurrentCallsDoNotInterleave (internal/leyline) — race detector regression guard for #2 - TestSheafInvalidator_SetState_AtomicSwap — pins atomic swap semantics + prior-return contract for #6 - TestSheafInvalidator_SetSheaf_ReturnsPrior — standalone path that #6's SetState piggybacks on - TestGetCommunities_PopulatesInvalidator_WithDaemon — replaces the prior happy-path test; drives a mock UDS daemon so PushTopology actually succeeds, verifies atomic install - TestGetCommunities_NoDaemon_LeavesInvalidatorEmpty — pins the correctness side: no daemon → don't install state (graceful degradation) Full repo `go test -race -short` clean. * fix(serve,graph): cascade-correctness fixes from Copilot review on #383 Addresses the remaining four Copilot findings (one architectural gap documented for follow-up, the rest fixed in-PR). #3 (snapshot pre-edit IDs in onChange) — the watcher's onChange used NodesForPath only AFTER DeleteFileNodes + ReIngestFile. The post-reingest IDs reflect renames/moves/removes from the edit, so nodes that used to live in the file (e.g. FunctionFoo renamed to FunctionBar) never appear in the cascade — their old region IDs stay live in the daemon's topology with no invalidation signal. Fix: snapshot NodesForPath BEFORE DeleteFileNodes, snapshot again AFTER ReIngestFile, union both sets, cascade the union. New helper `unionStringSlices` keeps the merge logic isolated. #7 (per-region dedupe) — the prior onChange called InvalidateWithCascade once per affected node ID. For a large file with 50 functions all in the same community, that's 50 redundant daemon round-trips, 50 generation-counter bumps, 50 re-cascades of the same region set on the daemon side. Fix: new `SheafInvalidator.InvalidateNodesWithCascade([]string)` method dedupes inputs to the set of unique region IDs (one daemon call per unique region), unions the affected-region sets from each cascade, and invalidates every member node exactly once. Pinned by TestSheafInvalidator_InvalidateNodesWithCascade_DedupsByRegion (5 nodes / 2 regions → exactly 2 daemon calls) and the _FallbackPaths subtests for the degraded modes (no sheaf, no result, ids not in membership, empty input). onChange + onDelete both use this new method. #5 (WatcherFiresInvalidator was a no-op test) — the prior test slept for 2s and asserted on `si.HasResult()` being false, which would have passed even if the watcher → invalidator wiring were deleted entirely. Replaced with the actual contract: install a counting `countingSheafBackend` + a synthetic CommunityResult mapping the fixture's node IDs to a known region, edit the file, observe the backend recorded at least one Invalidate for that region within 2s. The watcher break loop now exits as soon as the call is recorded (~170ms instead of always 2s). #4 (auto-leyline path bypasses cascade — DOCUMENTED, not fixed) — the default `mache serve` flow goes through autoInvokeLeylineParse → openDBGraph, which returns a nil invalidator. Only the MACHE_NO_LEYLINE=1 in-process MemoryStore path exercises the cascade. Closing this requires either flipping auto-leyline from one-shot to managed-daemon mode (which depends on mache-c14c43's event subscribe), or unconditionally using in-process for serve. Both are non-trivial design calls; tracked as mache-6c9e1d. Added an inline doc comment in buildServeGraph + a startup log on the auto-leyline branch warning that the cascade is not engaged. The log is one-shot per build, scoped to where the gap actually matters — agents using auto-leyline see the warning and know to flip the env var if live invalidation matters to them. Full repo `go test -race -short ./...` clean (114s).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GetCallers(token)queries<db>.refs.dbto keep Venturi source DBs immutable/read-onlyAddRefaccumulates in-memory,FlushRefs()writes all bitmaps in a single transaction (eliminates N*M SQL round-trips)unsafe.StringIngestionTarget.AddRefnow returnserrorwith full propagationTest plan
task testgreen across fs/graph/ingest)TestEngine_IngestTreeSitter_CrossReferencevalidates end-to-end: two Go files, one calls the other,GetCallersreturns correct callerFlushRefs()incmd/mount.gopost-ingestion (follow-up).dbsources (follow-up)🤖 Generated with Claude Code