feature(semantic-cache): add discovery marker protocol (0.2.0)#127
Conversation
Three findings from Cursor Bugbot on PR #127: - register() now writes the initial heartbeat key before scheduling the setInterval, so Monitor sees the cache as alive immediately after initialize() returns instead of waiting up to heartbeatIntervalMs for the first scheduled tick - tickHeartbeat() SET failures now call onWriteFailed(), so the semantic_cache_discovery_write_failed_total counter reflects heartbeat ACL issues as documented - SemanticCache._doInitialize() now sets _initialized = true only after registerDiscovery() succeeds, so a name-collision error correctly leaves the cache unusable (previous order let check()/store() still work after initialize() rejected) Regression tests added for the initial heartbeat write and the heartbeat-failure counter path.
Code reviewFound 1 issue:
monitor/packages/semantic-cache/src/SemanticCache.ts Lines 360 to 370 in 1033b74 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
…dispose Second review pass on PR #127: the _initGeneration guard in _doInitialize runs before the new await this.registerDiscovery(), but registerDiscovery has its own internal awaits (HGET + HSET + SET + tickHeartbeat) during which flush() or dispose() can slip in. With the old code, those concurrent calls saw this.discovery still null and no-opped, then registerDiscovery finished, the manager's interval was already scheduled, and _initialized = true silently overrode the concurrent flush's _initialized = false — leaving the cache with a leaked heartbeat and no index. Fix: - Return the DiscoveryManager from registerDiscovery() instead of assigning to this.discovery internally - In _doInitialize, after the await, re-check both _initGeneration (flush guard) and a new _disposed flag. If either indicates the cache was torn down mid-register, stop the manager we just created (deleting the heartbeat key) and return without setting _initialized = true or this.discovery = manager - dispose() sets _disposed = true before stopping the existing manager so an in-flight registration sees the signal - _doInitialize resets _disposed = false at entry so a cache can be re-initialized after dispose()
CacheProposalService with type-specific validation, duplicate-pending rejection, and per-connection 30/hour rate limit. Resolves caches via HGETALL __betterdb:caches (discovery markers from PRs #127/#128). Typed errors (validation, invalid cache type, not found, duplicate, rate limited) are mapped to HTTP status codes at the MCP controller. Three MCP tools registered: - cache_propose_threshold_adjust (semantic_cache only, 0..2 range) - cache_propose_tool_ttl_adjust (agent_cache only, 10..86400 range) - cache_propose_invalidate (filter_kind discriminates by type; warns when estimated_affected > 10000) All proposals start as 'pending' with a 24h expiry, awaiting human approval (Day 5 spec). The module imports StorageModule for proposal CRUD and ConnectionsModule for the Valkey client used to read markers. Stacks on feature/cache-proposal-data-model (PR #134).
CacheProposalService with type-specific validation, duplicate-pending rejection, and per-connection 30/hour rate limit. Resolves caches via HGETALL __betterdb:caches (discovery markers from PRs #127/#128). Typed errors (validation, invalid cache type, not found, duplicate, rate limited) are mapped to HTTP status codes at the MCP controller. Three MCP tools registered: - cache_propose_threshold_adjust (semantic_cache only, 0..2 range) - cache_propose_tool_ttl_adjust (agent_cache only, 10..86400 range) - cache_propose_invalidate (filter_kind discriminates by type; warns when estimated_affected > 10000) All proposals start as 'pending' with a 24h expiry, awaiting human approval (Day 5 spec). The module imports StorageModule for proposal CRUD and ConnectionsModule for the Valkey client used to read markers. Stacks on feature/cache-proposal-data-model (PR #134).
|
A few things
|
Implements the semantic-cache side of the shared __betterdb:* discovery marker protocol. Consumers (BetterDB Monitor, the upcoming cache intelligence MCP tools) can now enumerate semantic caches on any monitored Valkey instance without customer configuration. - initialize() registers the instance in the __betterdb:caches hash and starts a 30s heartbeat with a 60s TTL - flush() and new dispose() method stop the heartbeat and remove the heartbeat key, preserving the registry entry so Monitor retains history - Name collisions across cache types (semantic vs agent) throw; all other Valkey write failures are best-effort and increment the new semantic_cache_discovery_write_failed_total counter - Opt out via SemanticCacheOptions.discovery.enabled = false - threshold_adjust intentionally omitted from capabilities until SemanticCache.check() re-reads the config hash at runtime Bumps @betterdb/semantic-cache to 0.2.0. Spec: docs/plans/specs/spec-semantic-cache-discovery-markers.md
Three findings from Cursor Bugbot on PR #127: - register() now writes the initial heartbeat key before scheduling the setInterval, so Monitor sees the cache as alive immediately after initialize() returns instead of waiting up to heartbeatIntervalMs for the first scheduled tick - tickHeartbeat() SET failures now call onWriteFailed(), so the semantic_cache_discovery_write_failed_total counter reflects heartbeat ACL issues as documented - SemanticCache._doInitialize() now sets _initialized = true only after registerDiscovery() succeeds, so a name-collision error correctly leaves the cache unusable (previous order let check()/store() still work after initialize() rejected) Regression tests added for the initial heartbeat write and the heartbeat-failure counter path.
- CacheType in this package is always 'semantic_cache' — dropped the sibling-union, exported the literal as CACHE_TYPE, and switched the collision check to compare against that constant directly - Read PACKAGE_VERSION from the library's own package.json at runtime instead of duplicating the version string in source - Stripped explanatory comments from production source per project rule
…dispose Second review pass on PR #127: the _initGeneration guard in _doInitialize runs before the new await this.registerDiscovery(), but registerDiscovery has its own internal awaits (HGET + HSET + SET + tickHeartbeat) during which flush() or dispose() can slip in. With the old code, those concurrent calls saw this.discovery still null and no-opped, then registerDiscovery finished, the manager's interval was already scheduled, and _initialized = true silently overrode the concurrent flush's _initialized = false — leaving the cache with a leaked heartbeat and no index. Fix: - Return the DiscoveryManager from registerDiscovery() instead of assigning to this.discovery internally - In _doInitialize, after the await, re-check both _initGeneration (flush guard) and a new _disposed flag. If either indicates the cache was torn down mid-register, stop the manager we just created (deleting the heartbeat key) and return without setting _initialized = true or this.discovery = manager - dispose() sets _disposed = true before stopping the existing manager so an in-flight registration sees the signal - _doInitialize resets _disposed = false at entry so a cache can be re-initialized after dispose()
…ignalling abort
Previous fix used a _disposed flag that made _doInitialize early-return
on concurrent dispose(). The downside was the stuck-state bug bugbot
flagged: _initPromise retained a resolved promise after the early
return, so a subsequent initialize() short-circuited via
!this._initPromise and never re-ran _doInitialize — leaving the cache
permanently un-initialized with no recovery path other than flush().
Switch to the same pattern used by @betterdb/agent-cache's shutdown():
dispose() awaits the in-flight _initPromise before tearing down the
discovery manager. This lets _doInitialize complete normally — assigning
this.discovery = manager and _initialized = true — then dispose's
existing code stops the manager and deletes the heartbeat key. Removes
the _disposed field and the second guard check; _initGeneration still
covers the flush()-during-init race.
_initPromise.catch(() => {}) guards against propagating a reject from
the in-flight init (e.g. collision error) into dispose()'s caller.
2d136fc to
6c7b1f1
Compare
6c7b1f1 to
ee0f92d
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ee0f92d. Configure here.
Rebased
Bumped in package.json and CHANGELOG.md
The PR adds three DB keys, all under a
The JSON in the registry hash has: For semantic-cache specifically, the marker also carries Nothing else changes — the existing |
…re await bugbot LOW: flush() updated _initialized / _initPromise / _initGeneration synchronously but called 'await this.discovery.stop()' before nulling this.discovery. A concurrent _doInitialize() that passed the generation guard could complete during that await and assign a fresh DiscoveryManager to this.discovery; flush() would then overwrite it with null and leak the new manager's heartbeat interval. Capture the reference and null this.discovery before the await, mirroring how the other state fields are handled at the top of flush().
2d136fc to
af4eefd
Compare
## Summary The 6 read-only MCP tools from [`spec-cache-mcp-readonly-tools.md`](../../tree/feature/cache-mcp-readonly-tools/docs/plans/specs/spec-cache-mcp-readonly-tools.md). **Stacked on #134** (cache proposal data model + service); base branch is \`feature/cache-proposal-data-model\` so it'll auto-rebase to \`master\` when #134 merges. - \`cache_list\` — enumerate \`__betterdb:caches\`, return hit rate + total ops + live/stale status per cache. - \`cache_health\` — discriminated by cache_type. Semantic returns category breakdown + uncertain_hit_rate; agent returns per-tool breakdown. No nullable-field fudging. - \`cache_threshold_recommendation\` — semantic only. Replicates \`SemanticCache.thresholdEffectiveness()\` by reading \`{prefix}:__similarity_window\` directly. - \`cache_tool_effectiveness\` — agent only. Reads \`{prefix}:__stats\` and applies the same TTL-recommendation rules as \`AgentCache.toolEffectiveness()\`. - \`cache_similarity_distribution\` — exactly 20 buckets of width 0.1 across the 0..2 cosine-distance range, with optional category + window-hours filters. - \`cache_recent_changes\` — recent proposals (any status) for a single cache, newest first, so agents can avoid re-proposing already-pending changes. Per-type dispatch surfaces \`INVALID_CACHE_TYPE\` (HTTP 400) when a semantic-only or agent-only tool is called against the wrong cache type. \`CACHE_NOT_FOUND\` (HTTP 404) when the cache isn't in the discovery markers. The cache-library methods (\`thresholdEffectiveness\`, \`toolEffectiveness\`) require \`initialize()\` which side-effect-creates the FT search index. Calling them from an HTTP request would mutate the cache as a side effect of an inspection. The service reads the underlying Valkey keys directly instead. ## Test plan - [x] \`pnpm --filter api test -- --testPathPatterns=cache-readonly\` — 14 tests pass (list both types, health discriminated union, INVALID_CACHE_TYPE for both wrong-type calls, distribution 20 buckets w=0.1 with non-negative integer counts, recent_changes ordering+limit, threshold_recommendation insufficient_data + tighten_threshold) - [x] \`pnpm --filter api exec tsc --noEmit\` - [x] \`pnpm --filter @betterdb/mcp build\` - [ ] After #127 + #128 + #134 merge: integration test driving the full read flow against real Valkey + real Postgres <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Adds new endpoints and background expiry plus code that can delete cache keys and modify runtime cache config/TTL policies; mistakes could cause unintended invalidations or incorrect proposal state transitions. > > **Overview** > Adds a **cache proposal review + execution pipeline**: new `CacheProposalController` endpoints to list pending/history, fetch details with audit, and `approve`/`reject`/`edit-and-approve`; `CacheProposalService` now enforces pending/expiry rules, records audit events with `actor_source`, supports proposal expiry, and (optionally) triggers apply via `CacheApplyService`. > > Implements proposal application to Valkey via `CacheApplyDispatcher` (semantic threshold overrides, agent tool TTL policy updates, and both cache types’ invalidate flows), including safer error propagation (`ApplyFailedError` w/ proposalId) and metrics like `actualAffected`/duration. > > Introduces `CacheReadonlyService` plus new MCP HTTP endpoints and `@betterdb/mcp` tools for cache listing/health, threshold recommendations, tool effectiveness, similarity distributions, and recent changes, and adds a web “Cache Proposals” page with unread badge/polling, pending cards (approve/reject/edit), and history/detail views. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 1f6442e. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->

Summary
Implements the semantic-cache side of a shared
__betterdb:*discovery marker protocol. BetterDB Monitor (and the upcoming Cache Intelligence MCP tools) can now enumerate semantic-cache instances on any Valkey they already monitor — no customer configuration beyond constructing the library as before.This is the first of a pair: the agent-cache counterpart ships in a separate PR.
Protocol
__betterdb:caches— hash. Field =options.name, value = JSON metadata (type, prefix, version, protocol_version, capabilities, index/stats/config keys, default threshold, etc).__betterdb:heartbeat:<name>— string, TTL 60s, refreshed every 30s.__betterdb:protocol— string,SET NX 1. Version marker.No sensitive data is written — only cache metadata.
What lands
initialize()registers the instance in__betterdb:cachesand starts a 30s heartbeat with a 60s TTLflush()and a newdispose()method stop the heartbeat and delete the heartbeat key, preserving the registry entry so Monitor retains history after the cache is goneSemanticCacheUsageErroroninitialize(). All other Valkey write failures are best-effort and increment a newsemantic_cache_discovery_write_failed_totalcounter so operators can alert on ACL issuesSemanticCacheOptions.discovery = { enabled: false }.heartbeatIntervalMsandincludeCategoriesare also exposedthreshold_adjustis intentionally omitted from publishedcapabilitiesuntilSemanticCache.check()re-reads the config hash at runtime — that's a separate follow-upBumps
@betterdb/semantic-cache0.1.0 → 0.2.0.Test plan
pnpm --filter @betterdb/semantic-cache test— 43 tests pass, including 13 new unit tests covering schema correctness, cross-type collision (throws), version-mismatch warn+overwrite, ACL-denied resilience, heartbeat TTL, opt-out, and registry-not-touched-on-stoppnpm --filter @betterdb/semantic-cache typecheckcleanpnpm --filter @betterdb/semantic-cache buildcleanHGETALL __betterdb:cachesis populated afterinitialize(), heartbeat TTL in range,dispose()clears heartbeat key,discovery: { enabled: false }writes nothingNote
Medium Risk
Adds new best-effort Valkey writes and a background heartbeat tied to
initialize()/flush()/dispose(), which could affect deployments with strict ACLs or unexpected keyspace policies. Core cache behavior is largely unchanged, but lifecycle/initialization paths now have more moving parts and timing concerns.Overview
Adds a BetterDB discovery-marker protocol to
@betterdb/semantic-cache(0.1.0→0.2.0):initialize()now registers cache metadata in__betterdb:caches, sets__betterdb:protocol, and maintains a 60s-TTL__betterdb:heartbeat:<name>refreshed on an interval (opt-out viaSemanticCacheOptions.discovery).Introduces
cache.dispose()for graceful shutdown and updatesflush()/initialization concurrency handling to stop/delete the heartbeat without deleting the registry entry; all discovery writes are best-effort with a new Prometheus countersemantic_cache_discovery_write_failed_totalfor ACL/permission failures, plus tests covering collisions, ACL-denied behavior, and heartbeat semantics.Reviewed by Cursor Bugbot for commit 2d136fc. Bugbot is set up for automated code reviews on this repo. Configure here.