Skip to content

feat: [ENG-2158] dual-write runtime signals to sidecar (3/6)#449

Merged
danhdoan merged 1 commit intoproj/runtime-signalfrom
feat/ENG-2158
Apr 18, 2026
Merged

feat: [ENG-2158] dual-write runtime signals to sidecar (3/6)#449
danhdoan merged 1 commit intoproj/runtime-signalfrom
feat/ENG-2158

Conversation

@danhdoan
Copy link
Copy Markdown
Collaborator

Summary

  • Problem: Commit 2 landed a sidecar store for per-machine ranking signals, but no code wrote to it. Signals still lived in markdown frontmatter, dirtying version control on every query.
  • Why it matters: This commit populates the sidecar at every mutation site. Commit 4 switches reads over; commit 5 stops writing signals to markdown. This PR is the pivot — from commit 3 onward, both stores have the same data, so later commits can flip without losing state.
  • What changed: Dual-write at 7 mutation sites (flushAccessHits, curate ADD/UPDATE/MERGE/DELETE, archive/restore). ToolServices gains runtimeSignalStore; the daemon builds its own store via a local FileKeyStorage. Initializer reordered so the store is available to ToolProvider at construction time.
  • What did NOT change (scope boundary): No read path touches the sidecar. Ranking still reads from markdown. YAML emitter still writes all scoring fields. Cloud sync untouched. Those land in commits 4-6.

Type of change

  • Bug fix
  • New feature
  • Refactor (no behavior change)
  • Documentation
  • Test
  • Chore (build, dependencies, CI)

Scope (select all touched areas)

  • TUI / REPL
  • Agent / Tools
  • LLM Providers
  • Server / Daemon
  • Shared (constants, types, transport events)
  • CLI Commands (oclif)
  • Hub / Connectors
  • Cloud Sync
  • CI/CD / Infra

Linked issues

Root cause (bug fixes only, otherwise write N/A)

  • Root cause: N/A
  • Why this was not caught earlier: N/A

Test plan

  • Coverage added:
    • Unit test
    • Integration test
    • Manual verification only
  • Test file(s):
    • test/unit/agent/tools/curate-tool-sidecar-dual-write.test.ts — 8 tests
    • test/unit/agent/tools/search-knowledge-flush-sidecar.test.ts — 4 tests
    • test/unit/infra/context-tree/archive-service-sidecar-dual-write.test.ts — 4 tests
    • test/integration/runtime-signals/dual-write-pipeline.test.ts — 1 end-to-end test
  • Key scenario(s) covered:
    • ADD seeds default signals in sidecar
    • UPDATE bumps importance/recency/updateCount and retiered maturity; updatedAt not mirrored
    • DELETE drops sidecar entry
    • MERGE combines source+target signals via mergeScoring, drops source entry, retiered maturity
    • flushAccessHits mirrors importance/accessCount/maturity via batchUpdate
    • archive drops sidecar entry; restore seeds defaults
    • Hysteresis invariant: repeated UPDATE bumps promote draftvalidated at importance ≥ 65
    • Failure isolation: throwing sidecar does not abort markdown writes at any site
    • Markdown/sidecar consistency: frontmatter scoring fields match sidecar after ADD and UPDATE
    • Full pipeline: ADD → UPDATE → flushAccessHits → MERGE → archive — sidecar state matches computed end state, no orphans remain

User-visible changes

None. Markdown is still canonical; the sidecar shadows it. Users will not observe any behavior change until commit 4 redirects reads.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording

New dual-write tests:

Full repo suite: 6476 passing, 0 failing (commit 2 ended at 6459; +17 new tests across this series of PRs).

Checklist

  • Tests added or updated and passing (npm test)
  • Lint passes (npm run lint) — 0 errors, 202 pre-existing warnings (none new)
  • Type check passes (npm run typecheck)
  • Build succeeds (npm run build)
  • Commits follow Conventional Commits format
  • Documentation updated (if applicable) — N/A, internal change
  • No breaking changes
  • Branch is up to date with main

Risks and mitigations

  • Risk: A caller that bumps importance forgets to recompute maturity via determineTier. The store layer does not enforce the hysteresis invariant — it accepts any schema-valid combination. A bad updater would produce {importance: 90, maturity: 'draft'} which violates the tier rules.
    • Mitigation: Every site in this PR recomputes maturity inside the atomic updater. Hysteresis-promotion test confirms the invariant holds for both UPDATE and flushAccessHits. Merge delegates to mergeScoring + determineTier together. Reviewer must check that any future sidecar updater follows the same pattern.
  • Risk: updatedAt leaks into the sidecar. updatedAt reflects content change, not a runtime signal — must stay in markdown only.
    • Mitigation: RuntimeSignals type has no updatedAt field. By construction the type system blocks accidental mirroring. Verified at every updater by inspection.
  • Risk: Sidecar failures silently swallowed without logging. Each try/catch in the curate helpers and archive service swallows operational errors (disk, permissions) without a warn log. RuntimeSignalStore.validateOrDefault logs corrupt-data cases but not write failures.
    • Mitigation: Acceptable for phase 3 (markdown canonical — failures self-heal on the next bump). Documented in backlog with a trigger (fix before commit 5 ships, when sidecar becomes canonical).
  • Risk: Swarm's internal SearchKnowledgeService and dream.ts CLI path don't receive the store. Two peripheral code paths still write to markdown only.
    • Mitigation: Both documented in the backlog. Primary user-facing paths (daemon search, curate tools, archive via daemon dream flow) are covered. These peripheral paths become visible gaps only when commit 5 stops writing markdown — will be resolved before that commit merges.
  • Risk: Daemon-side FileKeyStorage in agent-process.ts creates a second store instance targeting the same on-disk path as the agent-side instance. Both processes can race on the same signal record.
    • Mitigation: Explicitly documented in the interface — cross-process contention has a narrow lost-update window, acceptable for ranking signals. The daemon and agent-side instances share the same file-backed IKeyStorage layout, so writes are consistent on disk; only per-read-modify-write atomicity is in-process.

Related

  • Previous commits in this series:
    • [Runtime Signals 1/6] RuntimeSignals schema and types
    • [Runtime Signals 2/6] RuntimeSignalStore implementation and wiring
  • Next commit: [Runtime Signals 4/6] Switch read paths to sidecar
  • Backlog: features/runtime-signals/backlog.md — swarm coverage, dream CLI path, sidecar failure logging, metrics

Third commit in the series. Every mutation site that previously touched
scoring fields in context-tree markdown frontmatter now also mirrors the
change to the RuntimeSignalStore sidecar introduced in commit 2.
Markdown remains canonical; this commit populates the sidecar so
commit 4 has real data to read from.

Mutation sites dual-writing:
- SearchKnowledgeService.flushAccessHits — mirrors
  importance/accessCount
  bumps and retiered maturity via batchUpdate
- CurateTool executeAdd — seeds sidecar with default signals
- CurateTool executeUpdate — mirrors importance +5, recency=1,
  updateCount +1, retiered maturity (updatedAt stays in markdown only)
- CurateTool executeMerge — delegates to mergeScoring to keep the
  sidecar
  merge policy identical to markdown; drops source entry
- CurateTool executeDelete — drops sidecar entry
- FileContextTreeArchiveService.archiveEntry — drops sidecar entry
- FileContextTreeArchiveService.restoreEntry — seeds defaults

Wiring: runtimeSignalStore added to ToolServices and threaded through
the tool registry (CODE_EXEC sandbox, CURATE, EXPAND_KNOWLEDGE). The
daemon (agent-process.ts) constructs its own FileKeyStorage +
RuntimeSignalStore so `brv search` and the dream-flow archive service
both participate in dual-write.

Invariants enforced at every site that bumps importance:
- maturity recomputed via determineTier inside the atomic updater
- updatedAt never written to sidecar (not a runtime signal)
- sidecar failures wrapped in try/catch — markdown writes always
  complete
- merge logic delegated to mergeScoring (no policy duplication)

Service initializer reordered so keyStorage/runtimeSignalStore are
constructed before ToolProvider — eliminates the prior updateServices
late-bind workaround.

Tests: 11 new tests covering dual-write invariants, hysteresis
promotion, failure isolation, markdown/sidecar consistency, and a
full pipeline integration test (ADD → UPDATE → flushAccessHits →
MERGE → archive). 6476 total passing, 0 failing.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 18, 2026

Claude finished @danhdoan's task in 2m 39s —— View job


PR Review: feat: [ENG-2158] dual-write runtime signals to sidecar (3/6)

Tasks:

  • Gather context and understand the request
  • Read changed source files
  • Read changed test files
  • Analyze code for correctness, security, and quality
  • Post final review with inline comments and overall score

Overall Assessment — Score: 4 / 5

This is a well-structured, thoughtfully scoped PR. The dual-write pattern is applied consistently across all 7 mutation sites, failure isolation is uniformly enforced, and the test suite (unit + integration + failure-path) is thorough. The scope boundary (no read-path changes) is strictly observed. Issues below are mostly minor; one medium-severity gap around folder-delete orphans is worth resolving before commit 5 ships.


Findings

suggestion — non-null assertions violate project coding standard

File: src/agent/infra/tools/implementations/curate-tool.ts lines 68–72, and search-knowledge-service.ts lines 1174–1177

// curate-tool.ts mirrorCurateUpdate
const bumpedImportance = bumped.importance!
return {
  ...current,
  importance: bumpedImportance,
  recency: bumped.recency!,
  updateCount: bumped.updateCount!,
}

// search-knowledge-service.ts mirrorHitsToSignalStore
const bumpedImportance = bumped.importance!
return {
  ...current,
  accessCount: bumped.accessCount!,
  importance: bumpedImportance,
}

CLAUDE.md is explicit: "Avoid ! non-null assertions — narrow with type guards or throw explicitly. Lazy-initialized singletons are the only acceptable exception." The inline comments correctly explain the invariant (recordCurateUpdate / recordAccessHits always returns concrete values when given a fully-valued RuntimeSignals), but comments aren't enforcement.

Preferred pattern — use nullish coalescing and fall back to the pre-bump value:

const bumpedImportance = bumped.importance ?? current.importance
return {
  ...current,
  importance: bumpedImportance,
  maturity: determineTier(bumpedImportance, current.maturity),
  recency: bumped.recency ?? current.recency,
  updateCount: bumped.updateCount ?? current.updateCount,
}

This preserves the semantic intent (if a field is somehow missing, we keep the current value rather than crash) and removes the style-guide violation. The ?? form is also more descriptive to a reader who hasn't memorised the recordCurateUpdate contract.

Fix this →


issue — folder DELETE leaves orphaned sidecar entries

File: src/agent/infra/tools/implementations/curate-tool.ts around the folder-delete branch of executeDelete

dropSidecar is called only for the specific filePath in the if (title) (single-file) branch. The folder-deletion branch (no title) recursively deletes all .md files on disk but never iterates them to remove their sidecar entries. Every file in the deleted folder becomes an orphan in the sidecar store.

This is an acceptable debt for commit 3 since markdown is still canonical, but it must be resolved before commit 5 (when the sidecar becomes canonical) — orphans from folder deletes would silently resurface as stale signals. The PR backlog already mentions pruneOrphans, but this specific trigger (curate folder DELETE) isn't called out there.

Suggested fix for commit 3: after collecting files in the folder-delete branch, iterate and call dropSidecar for each:

// existing folder-delete path
const files = await DirectoryManager.listFiles(folderPath)
for (const f of files) {
  await dropSidecar(runtimeSignalStore, relPathFromContextPath(f, basePath))
}
await DirectoryManager.deleteFolder(folderPath)

Fix this →


nitpick — daemonLogger collapses all severity levels to the same sink

File: src/server/infra/daemon/agent-process.ts lines ~378–383

const daemonLogger = {
  debug: (msg: string): void => agentLog(msg),
  error: (msg: string): void => agentLog(msg),
  info:  (msg: string): void => agentLog(msg),
  warn:  (msg: string): void => agentLog(msg),
}

RuntimeSignalStore.validateOrDefault logs corrupt-data warnings via logger.warn. With this adapter those warnings are indistinguishable from debug traces. If corrupt sidecar data ever needs to be diagnosed in production logs this makes it hard. Not blocking, but worth noting for when sidecar becomes canonical.


nitpick — import brace spacing inconsistency in tool-registry.ts

File: src/agent/infra/tools/tool-registry.ts line 1

import type { IRuntimeSignalStore } from '../../../server/core/interfaces/storage/i-runtime-signal-store.js'

All other import type statements in this file use {NoBraceSpaces}. This new import has spaces. ESLint may or may not enforce this depending on the project's import style rule; worth keeping consistent either way.


praise — failure isolation is uniformly applied

Every try/catch around a sidecar write explicitly comments "Fail-open — markdown is canonical during phase 3" or equivalent. The isolation tests (sidecar failure isolation suites in both unit files) verify this at the test level. This is the right pattern for a shadow store: the primary path must never regress because of the secondary.


praise — mirrorCurateUpdate delegates to recordCurateUpdate rather than inline logic

Reusing the same scoring function for both the markdown write and the sidecar mirror means the two paths cannot drift when weighting constants change. The MERGE path similarly delegates to mergeScoring + determineTier together. This is the right design.


praise — relPathFromContextPath handles Windows paths

The split('\\').join('/') normalisation in relPathFromContextPath is a nice defensive touch for cross-platform correctness.


praise — comprehensive test coverage

The pipeline test (ADD → UPDATE → flushAccessHits → MERGE → archive) is particularly valuable: it exercises the full lifecycle in one shot and checks for orphan absence at the end, which is the correctness property that matters most for commit 5.


@danhdoan danhdoan merged commit 7257f24 into proj/runtime-signal Apr 18, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants