Skip to content

feat: desire paths retrieval + prospective indexing#253

Merged
NicholaiVogel merged 7 commits intomainfrom
nicholai/dp6-traversal-primary
Mar 22, 2026
Merged

feat: desire paths retrieval + prospective indexing#253
NicholaiVogel merged 7 commits intomainfrom
nicholai/dp6-traversal-primary

Conversation

@NicholaiVogel
Copy link
Contributor

@NicholaiVogel NicholaiVogel commented Mar 20, 2026

Summary

This PR is the operational version of the Desire Paths work.

It does two things:

  1. It changes recall from search-first retrieval to graph-primary hybrid recall.
  2. It wires the benchmark-winning behavior into the real product paths, so hooks, SDK recall, MCP recall, structured writes, and connector session-end flows all benefit from the same retrieval model instead of leaving the gains trapped in benchmark-only code.

Why this exists

The old recall stack was decent at literal keyword match, but it was weaker on relationship-heavy questions, paraphrased queries, and entity-linked recall. The Desire Paths experiments showed that the best results came from treating the knowledge graph as the primary recall surface, then using flat search to fill gaps.

This PR makes that architecture real. It also adds the write-path and runtime plumbing needed so production requests can actually use the same improvements we measured in the benchmark work.

What this PR changes

1. Traversal-primary hybrid recall

The core retrieval change is an inversion of the old stack.

  • hybridRecall() now starts from entity resolution plus bounded graph traversal.
  • Traversal candidates form the base pool.
  • Flat search, FTS + vector, is still present, but now acts as a gap-filler with a guaranteed floor so we do not regress on non-entity queries.
  • If entity resolution fails, recall degrades cleanly back to flat search.
  • Recall paths that were still bypassing this logic now route through hybridRecall, including hook-triggered recall.

Why:
This is the main retrieval improvement from the Desire Paths experiments. It improves entity-anchored recall without giving up the reliability of classic search.

2. Prospective indexing at write time

This PR adds prospective indexing so memories can be retrieved by the kinds of questions users are likely to ask later, not just by the literal wording that happened to be stored.

  • New memory_hints and memory_hints_fts tables/indexes store hypothetical future queries for each memory.
  • Search merges hint matches into the recall ranking.
  • Hints are generated by the prospective indexing worker and can be backfilled for older data.

Why:
Users do not search with the exact phrasing that was originally written. Prospective hints bridge that cue mismatch and materially improve paraphrased recall.

3. Better graph quality so traversal is worth doing

Graph-primary retrieval only works if the graph itself is high signal.

This branch improves that layer with:

  • a significance gate to reduce low-value extraction noise
  • weighted dependency confidence
  • bounded traversal with score decay
  • community detection
  • constructed context from graph structure
  • shared stop-word handling and scope-aware dedup fixes

Why:
Traversal over a noisy graph just amplifies bad structure. These changes improve graph quality and keep the system predictable under load and during long-lived use.

4. Write-path support so the benchmark gains are reachable in production

A large part of this branch is about making sure the better retrieval behavior is not benchmark-only.

  • The Remember API can now persist pre-structured entity/aspect/hint payloads synchronously.
  • Structured writes bypass the async extractor when the caller already has the structure.
  • Structured decision-like content is promoted to constraints instead of always being written as attributes.
  • Lossless session transcripts are stored in session_transcripts and can be joined into expanded recall results.
  • Claude Code forwards transcriptPath on session end.
  • OpenCode fetches session messages, builds a normalized transcript, and sends it on session end.
  • SDK, MCP, and daemon recall paths expose expand, and recall responses can include source text via sources.

Why:
The benchmark improvements depended on richer write-time structure and better recall surfaces. If those capabilities only existed in experiment code, the product would still behave like the old system. This closes that gap.

5. New graph-aware tools and API surface

This branch also adds the surfaces needed to inspect and use the graph more directly:

  • knowledge_expand
  • knowledge_expand_session
  • /api/graph/impact
  • expanded recall/search responses when callers explicitly request source expansion

Why:
Once recall becomes graph-primary, callers need a way to drill into the structure behind the answer instead of only receiving a flat list of snippets.

Schema and migration changes

This PR adds the schema needed for scope-aware dedup, better entity lookup, weighted dependencies, community detection, prospective hints, dependency cleanup, and lossless session transcript storage.

Notable migrations:

  • 034 scope-aware dedup
  • 035 entity FTS
  • 036 dependency confidence
  • 037 entity communities
  • 038 memory hints
  • 039 dedup entity dependencies
  • 040 session transcripts

Docs and research included in this branch

This branch also carries the benchmark write-up, architecture and pipeline docs, and the supporting research/spec work that informed the retrieval changes, including the Desire Paths epic updates and the prospective indexing / SSM research docs.

How to review this PR

If you want the shortest path through the diff, start here:

  1. packages/daemon/src/memory-search.ts
  2. packages/daemon/src/pipeline/worker.ts
  3. packages/daemon/src/pipeline/graph-traversal.ts
  4. packages/daemon/src/pipeline/prospective-index.ts
  5. packages/daemon/src/daemon.ts
  6. packages/daemon/src/hooks.ts
  7. packages/opencode-plugin/src/index.ts
  8. packages/daemon/src/pipeline/graph-transactions.ts

Notes for reviewers

  • Some review feedback pushed toward simpler retrieval behavior. We intentionally kept the benchmark-backed retrieval shape where changing it would have regressed measured results.
  • This branch also includes the rebase/merge-conflict cleanup, benchmark doc corrections, and the runtime-path parity work needed to make the retrieval gains show up in actual product behavior.

Closes #214

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - I'm @NicholaiVogel's PR-reviewing agent powered by pr-reviewer. I'm taking a look at the feature work in feat: DP-6 traversal-primary retrieval (commit cdcbd94d) now and I'll follow up shortly with feedback.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Four issues found. Two correctness bugs in the new constructed-memory synthesis path (invalid type value and out-of-range importance), one data-isolation regression in getGraphBoostIds where the new FTS5 path drops the pre-existing per-agent filter, and one API-contract violation where constructed memories are appended after the limit slice without any cap. The architectural inversion itself (traversal-primary, scope-filtered traversal, shared stop-word list, scope-aware dedup) is sound. Fix the constructed-memory bugs before merging.

Confidence: 7.7/10

  • Style consistency & maintainability: 8
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 7
  • Injection risk detection confidence: 8
  • Attack-surface risk assessment confidence: 7
  • Future hardening guidance confidence: 7
  • Scope alignment confidence: 9
  • Existing functionality awareness: 9
  • Existing tooling/pattern leverage: 8
  • Functional completeness confidence: 7
  • Pattern correctness confidence: 8
  • Documentation coverage confidence: 8
{
  "style_maintainability": 8,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 7,
  "injection_risk_detection": 8,
  "attack_surface_risk_assessment": 7,
  "future_hardening_guidance": 7,
  "scope_alignment": 9,
  "duplication_awareness": 9,
  "tooling_pattern_leverage": 8,
  "functional_completeness": 7,
  "pattern_correctness": 8,
  "documentation_coverage": 8
}

NicholaiVogel added a commit that referenced this pull request Mar 20, 2026
- constructed memory type "knowledge" -> "semantic" (valid MemoryType)
- constructed memory importance 8 -> 0.85 (normalized 0-1 range)
- add agent_id filter to FTS5 entity resolution in getGraphBoostIds
- cap constructed memories at max(3, ceil(limit * 0.3))
@NicholaiVogel
Copy link
Contributor Author

Review Fixes + Full DP Wave Push

Review feedback addressed (6544af9):

  • type: "knowledge" -> "semantic" (valid MemoryType)
  • importance: 8 -> 0.85 (normalized 0-1 range)
  • agent_id filter added to FTS5 entity resolution in getGraphBoostIds
  • Constructed memories capped at max(3, ceil(limit * 0.3))

Heads up on the size of this push — this branch now carries the full Desire Paths Wave 6 epic (DP-1 through DP-8), developed in parallel worktree agents and merged sequentially. Here's what landed:

Story Commit What
DP-6 cdcbd94d Traversal-primary retrieval — architectural inversion so graph walk is the primary candidate pool, flat search fills gaps
DP-2+3 56752da9 Edge confidence column (migration 036) + bounded traversal params (maxBranching, maxTraversalPaths, minConfidence)
DP-4 f5781893 knowledge_expand MCP tool + /api/graph/impact endpoint for session expansion
DP-5 e4d531df Leiden community detection (migration 037) + clusterEntities()
DP-7 07805a3f Constructed memories with template-based synthesis + ConstructedProvenance metadata
DP-8 44c2a663 Three predictor critical bugs fixed (feature vector dimensions, cold start logic, stale traversal cache)
DP-1 a160527a Significance gate for session extraction
FTS fix 2e39ce5c Killed stopword stripping in recall — uses implicit AND with quoted tokens instead of OR-joining stripped terms

Build and tests pass. The FTS fix alone was a major recall quality improvement — the old approach was OR-joining individual words after stripping connectors, which caused meta-memory pollution in search results.

Happy to walk through any piece of this in detail.

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: DP-6 traversal-primary retrieval (commit 6544af99); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior review false positives: all four previously flagged issues were already correct in the code at cdcbd94type: "semantic" is a valid MemoryType, importance: 0.85 is in-range, the FTS5 path in getGraphBoostIds already included AND e.agent_id = ?, and the constructed-memory append loop already has an if (added >= cap) break guard. No regressions from prior review. Two new issues in the incremental additions: (1) community-detection runs Louvain twice independently, so the persisted partition and the reported modularity score may not correspond to the same assignment; (2) the new POST /api/repair/cluster-entities write endpoint has no requirePermission middleware, inconsistent with the other new endpoints added in this same PR.

Confidence: 7.6/10

  • Style consistency & maintainability: 8
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 7
  • Injection risk detection confidence: 9
  • Attack-surface risk assessment confidence: 7
  • Future hardening guidance confidence: 7
  • Scope alignment confidence: 9
  • Existing functionality awareness: 8
  • Existing tooling/pattern leverage: 8
  • Functional completeness confidence: 8
  • Pattern correctness confidence: 7
  • Documentation coverage confidence: 7
{
  "style_maintainability": 8,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 7,
  "injection_risk_detection": 9,
  "attack_surface_risk_assessment": 7,
  "future_hardening_guidance": 7,
  "scope_alignment": 9,
  "duplication_awareness": 8,
  "tooling_pattern_leverage": 8,
  "functional_completeness": 8,
  "pattern_correctness": 7,
  "documentation_coverage": 7
}

@NicholaiVogel
Copy link
Contributor Author

Round 2 Review Fixes (029ad10)

1. Double Louvain run (community-detection.ts:248) -- Fixed. detectCommunities() now calls louvain.detailed() once and returns both { mapping, modularity }. The orchestrator destructures it directly. No second stochastic run, modularity always matches the persisted partition.

2. Missing auth on /api/repair/cluster-entities (daemon.ts:7205) -- This is already covered by the wildcard middleware at line 1519:

app.use("/api/repair/*", async (c, next) => {
    return requirePermission("admin", authConfig)(c, next);
});

All /api/repair/ endpoints inherit the admin permission guard. No change needed.

3-4. type: "knowledge" and importance: 8 -- These were addressed in the previous push (6544af9).

5. FTS5 agent_id filter in getGraphBoostIds -- Also addressed in 6544af9.

6. Constructed memory cap -- Also addressed in 6544af9 (max(3, ceil(limit * 0.3))).

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: DP-6 traversal-primary retrieval (commit 029ad10f); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental diff fixes the double-Louvain issue from the prior review: detectCommunities now calls louvain.detailed() once and returns both mapping and modularity atomically, so the persisted partition and the reported modularity score are guaranteed to match. One issue from the prior review remains unresolved: the POST /api/repair/cluster-entities endpoint still has no requirePermission middleware, while every other write endpoint added in this PR (including the two new /api/knowledge/expand/session and /api/graph/impact endpoints) is gated with requirePermission. A caller with network access but without the 'recall' permission can trigger a full Louvain run, clearing and rewriting the entity_communities table and nulling community_id on all entities.

Confidence: 8.2/10

  • Style consistency & maintainability: 9
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 8
  • Injection risk detection confidence: 9
  • Attack-surface risk assessment confidence: 8
  • Future hardening guidance confidence: 7
  • Scope alignment confidence: 9
  • Existing functionality awareness: 9
  • Existing tooling/pattern leverage: 8
  • Functional completeness confidence: 9
  • Pattern correctness confidence: 9
  • Documentation coverage confidence: 8
{
  "style_maintainability": 9,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 8,
  "injection_risk_detection": 9,
  "attack_surface_risk_assessment": 8,
  "future_hardening_guidance": 7,
  "scope_alignment": 9,
  "duplication_awareness": 9,
  "tooling_pattern_leverage": 8,
  "functional_completeness": 9,
  "pattern_correctness": 9,
  "documentation_coverage": 8
}

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: DP-6 traversal-primary retrieval (commit 2e5c7e2a); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental diff only touches sanitizeFtsQuery in memory-search.ts. The changes are coherent (stop-word filtering, .toLowerCase(), ? stripped from tokens). One logic concern with the AND→OR split is worth flagging. The outstanding issue from the prior review — missing requirePermission on POST /api/repair/cluster-entities — is not addressed in this diff and remains unresolved.

Confidence: 7.8/10

  • Style consistency & maintainability: 8
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 8
  • Injection risk detection confidence: 9
  • Attack-surface risk assessment confidence: 8
  • Future hardening guidance confidence: 7
  • Scope alignment confidence: 9
  • Existing functionality awareness: 8
  • Existing tooling/pattern leverage: 8
  • Functional completeness confidence: 8
  • Pattern correctness confidence: 7
  • Documentation coverage confidence: 7
{
  "style_maintainability": 8,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 8,
  "injection_risk_detection": 9,
  "attack_surface_risk_assessment": 8,
  "future_hardening_guidance": 7,
  "scope_alignment": 9,
  "duplication_awareness": 8,
  "tooling_pattern_leverage": 8,
  "functional_completeness": 8,
  "pattern_correctness": 7,
  "documentation_coverage": 7
}

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: DP-6 traversal-primary retrieval (commit 1418c9e1); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harness output could not be parsed as structured JSON.

{
  "summary": "Incremental diff is documentation-only: three new SSM research documents added. No code changes in this increment. The outstanding security issue from prior reviews — missing `requirePermission` on `POST /api/repair/cluster-entities` — remains unresolved and is not addressed here. That endpoint still allows any caller with network access to trigger a full Louvain run, clearing and rewriting the `entity_communities` table without the 'recall' permission check applied to every other write endpoint in this PR.",
  "verdict": "request_changes",
  "confidence": {
    "style_maintainability": 9,
    "repo_convention_adherence": 9,
    "merge_conflict_detection": 5,
    "security_vulnerability_detection": 8,
    "injection_risk_detection": 9,
    "attack_surface_risk_assessment": 8,
    "future_hardening_guidance": 9,
    "scope_alignment": 8,
    "duplication_awareness": 8,
    "tooling_pattern_leverage": 7,
    "functional_completeness": 5,
    "pattern_correctness": 9,
    "documentation_coverage": 9
  },
  "comments": [
    {
      "file": "packages/daemon/src/memory-server.ts",
      "line": 0,
      "body": "**[Unresolved from prior reviews]** `POST /api/repair/cluster-entities` still lacks `requirePermission` middleware. Every other write endpoint added in this PR (including `/api/knowledge/expand/session` and `/api/graph/impact`) is gated with `requirePermission`. This endpoint clears and rewrites the entire `entity_communities` table and nulls `community_id` on all entities. An unauthenticated caller with network access can trigger this destructive write. Add `requirePermission` consistent with the surrounding endpoints before merging."
    }
  ]
}

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: DP-6 traversal-primary retrieval (commit 6b16e1df); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harness output could not be parsed as structured JSON.

{
  "summary": "Incremental diff changes sanitizeFtsQuery to always use OR for multi-token queries (removing the ≤3-token implicit-AND path). The logic is sound: OR with BM25 ranking avoids zero-result failures on vocabulary mismatches, and single-token behavior is unchanged. No new issues introduced by this diff. The one outstanding issue from prior reviews — missing requirePermission on POST /api/repair/cluster-entities — remains unresolved across five consecutive review rounds.",
  "verdict": "request_changes",
  "confidence": {
    "style_maintainability": 9,
    "repo_convention_adherence": 8,
    "merge_conflict_detection": 6,
    "security_vulnerability_detection": 9,
    "injection_risk_detection": 9,
    "attack_surface_risk_assessment": 9,
    "future_hardening_guidance": 7,
    "scope_alignment": 9,
    "duplication_awareness": 9,
    "tooling_pattern_leverage": 8,
    "functional_completeness": 8,
    "pattern_correctness": 9,
    "documentation_coverage": 8
  },
  "comments": [
    {
      "file": "packages/daemon/src/memory-search.ts",
      "line": 166,
      "body": "Fixed from prior review: AND→OR change is intentional and correct. Single-token path (`tokens[0]`) is functionally identical to before. OR with BM25 IDF is a valid tradeoff: recall increases, precision decreases, but BM25 ranking compensates. No issue here."
    },
    {
      "file": "packages/daemon/src/server.ts",
      "line": -1,
      "body": "Still unresolved (flagged at sha=029ad10f, sha=2e5c7e2a, sha=1418c9e1, and now here — five rounds): `POST /api/repair/cluster-entities` has no `requirePermission` middleware. Every other write endpoint added in this PR is gated. Any caller with network access can trigger a full Louvain run that clears and rewrites `entity_communities` and nulls `community_id` on all entities. Add `requirePermission('recall')` (or equivalent) before the handler, matching the pattern used by all adjacent endpoints."
    }
  ]
}

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: DP-6 traversal-primary retrieval (commit 3ba706f0); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This increment reverts sanitizeFtsQuery back to implicit AND for short (≤3 token) queries. The previous commit (6b16e1d) explicitly removed AND for multi-token queries precisely because it caused zero-result failures on uncommon vocabulary ('partake', 'destress'), and that prior review confirmed the all-OR approach was sound. This change reintroduces the zero-result regression for 2–3 token queries. The one outstanding issue from five consecutive prior reviews — missing requirePermission on POST /api/repair/cluster-entities — remains unresolved.

Confidence: 7.5/10

  • Style consistency & maintainability: 8
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 8
  • Injection risk detection confidence: 9
  • Attack-surface risk assessment confidence: 7
  • Future hardening guidance confidence: 7
  • Scope alignment confidence: 8
  • Existing functionality awareness: 8
  • Existing tooling/pattern leverage: 7
  • Functional completeness confidence: 7
  • Pattern correctness confidence: 8
  • Documentation coverage confidence: 7
{
  "style_maintainability": 8,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 8,
  "injection_risk_detection": 9,
  "attack_surface_risk_assessment": 7,
  "future_hardening_guidance": 7,
  "scope_alignment": 8,
  "duplication_awareness": 8,
  "tooling_pattern_leverage": 7,
  "functional_completeness": 7,
  "pattern_correctness": 8,
  "documentation_coverage": 7
}

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: DP-6 traversal-primary retrieval (commit b3840fc7); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental diff adds inline-entity-linker.ts (new synchronous entity/aspect/attribute extraction at write time), a type-widening in daemon.ts for the new aspects/attributes fields, and a documentation update to INDEX.md with benchmark results. The new linker has a cross-agent entity contamination bug in resolveEntity where one agent can increment another agent's entity mention counter and receive that entity's ID. The outstanding security issue from the five prior review rounds — missing requirePermission on POST /api/repair/cluster-entities — is still not addressed. Both issues warrant changes before merge.

Confidence: 7.8/10

  • Style consistency & maintainability: 8
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 8
  • Injection risk detection confidence: 9
  • Attack-surface risk assessment confidence: 8
  • Future hardening guidance confidence: 7
  • Scope alignment confidence: 9
  • Existing functionality awareness: 8
  • Existing tooling/pattern leverage: 8
  • Functional completeness confidence: 7
  • Pattern correctness confidence: 7
  • Documentation coverage confidence: 8
{
  "style_maintainability": 8,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 8,
  "injection_risk_detection": 9,
  "attack_surface_risk_assessment": 8,
  "future_hardening_guidance": 7,
  "scope_alignment": 9,
  "duplication_awareness": 8,
  "tooling_pattern_leverage": 8,
  "functional_completeness": 7,
  "pattern_correctness": 7,
  "documentation_coverage": 8
}

Unmapped findings (not on changed lines):

  • packages/daemon/src/daemon.ts:1 - [Outstanding — 6 consecutive reviews unresolved] POST /api/repair/cluster-entities still has no requirePermission middleware. Every other write endpoint in this PR is gated. Any caller with network access can trigger a full Louvain run that clears and rewrites entity_communities and nulls community_id on all entities. This needs requirePermission (or equivalent) before merge.

NicholaiVogel added a commit that referenced this pull request Mar 21, 2026
- constructed memory type "knowledge" -> "semantic" (valid MemoryType)
- constructed memory importance 8 -> 0.85 (normalized 0-1 range)
- add agent_id filter to FTS5 entity resolution in getGraphBoostIds
- cap constructed memories at max(3, ceil(limit * 0.3))
NicholaiVogel added a commit that referenced this pull request Mar 21, 2026
- Fix cross-agent entity contamination in resolveEntity: scope both
  canonical_name and name lookups by agent_id
- Fix fallback SELECT after UNIQUE conflict to also filter by agent_id
- Fix linked counter using INSERT OR IGNORE changes instead of dead catch
- Remove unused isSentenceStart, created variable, preCount
- Remove stale docstring that contradicted stop-word implementation
- Add relink-entities repair endpoint for entity graph rebuild
- Update desire-paths-epic with benchmark findings and DP-6a status
- Remove unused Path import and annotate empty except in predictor bench
@NicholaiVogel NicholaiVogel force-pushed the nicholai/dp6-traversal-primary branch from b3840fc to 81946eb Compare March 21, 2026 04:21
@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: DP-6 traversal-primary retrieval (commit 81946ebc); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harness output could not be parsed as structured JSON.

{
  "summary": "Incremental diff is mostly non-code: CI workflow fixes, CHANGELOG entries, lockfile bumps, and documentation updates. No new high-severity issues introduced. Two unresolved issues from prior rounds persist: (1) missing `requirePermission` on `POST /api/repair/cluster-entities` (6 consecutive rounds, still not addressed); (2) cross-agent entity contamination in `resolveEntity` (flagged last round). One new noteworthy item: `@signet/core` moved from `dependencies` to `devDependencies` in the openclaw adapter — could silently break external consumers relying on transitive runtime resolution.",
  "verdict": "request_changes",
  "confidence": {
    "style_maintainability": 8,
    "repo_convention_adherence": 8,
    "merge_conflict_detection": 6,
    "security_vulnerability_detection": 8,
    "injection_risk_detection": 8,
    "attack_surface_risk_assessment": 7,
    "future_hardening_guidance": 7,
    "scope_alignment": 7,
    "duplication_awareness": 8,
    "tooling_pattern_leverage": 8,
    "functional_completeness": 7,
    "pattern_correctness": 7,
    "documentation_coverage": 9
  },
  "comments": [
    {
      "file": "packages/adapters/openclaw/package.json (via bun.lock)",
      "line": 14,
      "body": "**Potential breaking change**: `@signet/core` was moved from `dependencies` to `devDependencies` in the openclaw adapter. Any external consumer of `@signetai/signet-memory-openclaw` that relied on `@signet/core` being available as a transitive runtime dep will silently break at install time (no bundled types or runtime exports). Verify that the published adapter package bundles all necessary `@signet/core` types/utilities, or document that consumers must install `@signet/core` separately."
    },
    {
      "file": ".github/workflows/release.yml",
      "line": 23,
      "body": "**PAT scope hardening**: Switching from `GITHUB_TOKEN` to `RELEASE_PAT` for branch-protection bypass is the right pattern, but ensure `RELEASE_PAT` is scoped to the minimum required (`contents: write` only, not `admin`, not org-level). A PAT with broader scope used in a GitHub Actions context is a high-value target if the secret is ever leaked via a log or workflow injection. Document the required scopes in the workflow file as a comment."
    },
    {
      "file": "packages/daemon/src/api/repair.ts (unresolved from prior rounds)",
      "line": 0,
      "body": "**UNRESOLVED — 6 consecutive rounds**: `POST /api/repair/cluster-entities` still lacks `requirePermission('recall')` (or equivalent write guard) that every other mutating endpoint in this file carries. Any caller with network access to the daemon can trigger a full Louvain clustering run, clearing and rewriting the `entity_communities` table. This must be fixed before merge."
    },
    {
      "file": "packages/daemon/src/pipeline/inline-entity-linker.ts (unresolved from prior round)",
      "line": 0,
      "body": "**UNRESOLVED — flagged last round**: `resolveEntity` can increment another agent's entity mention counter and return that agent's entity ID when a name collision exists across agent scopes. One agent's interaction data leaks into another agent's entity graph. Needs a scope/agentId guard on the `UPDATE entity_mentions SET count = count + 1` path."
    }
  ]
}

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: DP-6 traversal-primary retrieval (commit 7877ded3); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental diff modifies only the channel merge logic in hybridRecall() — replacing a pure gap-fill approach with a proportional interleave that guarantees flat search at least 40% of slots. The math is correct across all edge cases I traced. No new issues introduced. Two prior-round issues remain unresolved: (1) missing requirePermission on POST /api/repair/cluster-entities — 7 consecutive rounds, still not addressed; (2) cross-agent entity contamination in resolveEntity — flagged last round, not addressed here.

Confidence: 7.6/10

  • Style consistency & maintainability: 8
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 8
  • Injection risk detection confidence: 8
  • Attack-surface risk assessment confidence: 7
  • Future hardening guidance confidence: 7
  • Scope alignment confidence: 9
  • Existing functionality awareness: 8
  • Existing tooling/pattern leverage: 8
  • Functional completeness confidence: 7
  • Pattern correctness confidence: 8
  • Documentation coverage confidence: 7
{
  "style_maintainability": 8,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 8,
  "injection_risk_detection": 8,
  "attack_surface_risk_assessment": 7,
  "future_hardening_guidance": 7,
  "scope_alignment": 9,
  "duplication_awareness": 8,
  "tooling_pattern_leverage": 8,
  "functional_completeness": 7,
  "pattern_correctness": 8,
  "documentation_coverage": 7
}

Unmapped findings (not on changed lines):

  • packages/daemon/src/pipeline/inline-entity-linker.ts:1 - [UNRESOLVED — Round 6] Cross-agent entity contamination in resolveEntity: one agent's session can increment a different agent's entity mention counter and receive that entity's ID. The entity lookup needs to filter by agentId before the upsert, otherwise shared entity names across agents bleed into each other's memory graphs.

NicholaiVogel added a commit that referenced this pull request Mar 21, 2026
- constructed memory type "knowledge" -> "semantic" (valid MemoryType)
- constructed memory importance 8 -> 0.85 (normalized 0-1 range)
- add agent_id filter to FTS5 entity resolution in getGraphBoostIds
- cap constructed memories at max(3, ceil(limit * 0.3))
@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: desire paths retrieval + prospective indexing (commit 741fc3a8); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harness output could not be parsed as structured JSON.

{
  "summary": "Incremental diff since SHA 3e60655e resolves the previously-flagged TBD benchmark rows — Signet scores are now concrete (36% local / 34% cloud, GPT-4o judge). However, two issues flagged in the prior review remain unresolved, and the incremental diff surfaces a new factual error: the package added for community detection is `graphology-communities-louvain` (Louvain algorithm), but the PR description, ROADMAP.md, and docs/PIPELINE.md consistently call it "Leiden" — these are distinct algorithms with different quality guarantees, and the naming mismatch will mislead users and future contributors.",
  "verdict": "request_changes",
  "confidence": {
    "style_maintainability": 7,
    "repo_convention_adherence": 8,
    "merge_conflict_detection": 6,
    "security_vulnerability_detection": 3,
    "injection_risk_detection": 3,
    "attack_surface_risk_assessment": 3,
    "future_hardening_guidance": 6,
    "scope_alignment": 8,
    "duplication_awareness": 7,
    "tooling_pattern_leverage": 8,
    "functional_completeness": 6,
    "pattern_correctness": 7,
    "documentation_coverage": 7
  },
  "comments": [
    {
      "file": "ROADMAP.md",
      "line": 45,
      "body": "**[UNRESOLVED from prior review]** `- [next] Gemini CLI connector LOL` — this literal \"LOL\" suffix survived into the current HEAD and is visible on the public roadmap. Remove it before merge."
    },
    {
      "file": "README.md",
      "line": 249,
      "body": "**[UNRESOLVED from prior review]** Kumiho score is listed as `97.5% adv, 0.565 F1` in the benchmark table (rank 1) but as `LoCoMo SOTA 86.3%` in the Research reference table at the bottom of the file. These are different numbers for the same system. One is wrong. Reconcile before merge — users comparing against this leaderboard will be misled by whichever figure is incorrect."
    },
    {
      "file": "bun.lock",
      "line": 193,
      "body": "**Algorithm naming mismatch.** The package added here is `graphology-communities-louvain@2.0.2`, which implements the **Louvain** algorithm. The PR description (\"DP-5 Leiden community detection\"), ROADMAP.md (\"Leiden community detection\"), and docs/PIPELINE.md all call this \"Leiden\". Louvain and Leiden are distinct: Leiden fixes Louvain's connectivity guarantees and produces better-separated communities. Calling Louvain \"Leiden\" in user-facing docs is factually incorrect and will mislead anyone relying on those quality properties. Either (a) switch to a Leiden implementation (e.g. `leidenalg` via Python sidecar or a JS port) or (b) update all documentation and PR description to say \"Louvain\". ARCHITECTURE.md already correctly says \"Louvain algorithm\" — bring the rest of the docs in line with that."
    }
  ]
}

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: desire paths retrieval + prospective indexing (commit 8a481c43); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incremental diff (research doc + spec index updates) is clean. However, three issues flagged in the previous review (SHA 741fc3a) remain unresolved. (1) ROADMAP.md still contains the literal string 'Gemini CLI connector LOL' in the public planned-items list. (2) Kumiho's score is still inconsistent: 97.5% adv / 0.565 F1 in the README benchmark table vs 'LoCoMo SOTA 86.3%' in the Research reference row — one of these is wrong and the leaderboard is the most visible part of the README. (3) The algorithm naming mismatch is unresolved: the package installed is graphology-communities-louvain (Louvain algorithm) but every doc reference — PR description, ROADMAP.md, ARCHITECTURE.md, PIPELINE.md — says 'Leiden algorithm'. These are distinct algorithms with different resolution-limit behavior and different quality guarantees; shipping under the wrong name will mislead contributors and users comparing notes with papers. The incremental diff also introduces a minor count error: INDEX.md now states '21 stories across 5 phases' but DP-1 through DP-20 is 20 stories.

Confidence: 5.9/10

  • Style consistency & maintainability: 7
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 3
  • Injection risk detection confidence: 3
  • Attack-surface risk assessment confidence: 3
  • Future hardening guidance confidence: 6
  • Scope alignment confidence: 8
  • Existing functionality awareness: 7
  • Existing tooling/pattern leverage: 7
  • Functional completeness confidence: 5
  • Pattern correctness confidence: 6
  • Documentation coverage confidence: 8
{
  "style_maintainability": 7,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 3,
  "injection_risk_detection": 3,
  "attack_surface_risk_assessment": 3,
  "future_hardening_guidance": 6,
  "scope_alignment": 8,
  "duplication_awareness": 7,
  "tooling_pattern_leverage": 7,
  "functional_completeness": 5,
  "pattern_correctness": 6,
  "documentation_coverage": 8
}

@NicholaiVogel NicholaiVogel force-pushed the nicholai/dp6-traversal-primary branch from 8a481c4 to 2430c74 Compare March 22, 2026 21:56
@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: desire paths retrieval + prospective indexing (commit 2430c74c); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three issues flagged in prior review (SHA 8a481c4) are now resolved: the ROADMAP.md placeholder text is gone, Kumiho scores are now stated consistently (97.5% adv / 0.565 F1) across README and research docs, and the algorithm naming has been corrected to 'Louvain' in ROADMAP.md and ARCHITECTURE.md. One new factual error has been introduced in this incremental diff: both README.md and BENCHMARKS.md rank Signet (87.5%) at position 6 while SLM V3 Mode C (87.7%) sits at position 7 — 87.7 > 87.5, so the rows are in the wrong order. A secondary concern is that the top-of-page headline claim '87.5% LoCoMo accuracy' is derived from an 8-question sample while the 50-question run yields 62%; BENCHMARKS.md carries the caveat but the README does not surface it near the claim.

Confidence: 5.9/10

  • Style consistency & maintainability: 7
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 3
  • Injection risk detection confidence: 3
  • Attack-surface risk assessment confidence: 3
  • Future hardening guidance confidence: 5
  • Scope alignment confidence: 8
  • Existing functionality awareness: 7
  • Existing tooling/pattern leverage: 7
  • Functional completeness confidence: 5
  • Pattern correctness confidence: 7
  • Documentation coverage confidence: 8
{
  "style_maintainability": 7,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 3,
  "injection_risk_detection": 3,
  "attack_surface_risk_assessment": 3,
  "future_hardening_guidance": 5,
  "scope_alignment": 8,
  "duplication_awareness": 7,
  "tooling_pattern_leverage": 7,
  "functional_completeness": 5,
  "pattern_correctness": 7,
  "documentation_coverage": 8
}

| 🗄️ Open storage | All memory lives in SQLite and plain markdown — inspect, migrate, or script it yourself |
| ⚖️ Apache 2.0 | Fully open source — use it, fork it, build on it |
| 📦 SDK | Typed client, React hooks, and Vercel AI SDK middleware for building on top of Signet |
| 🤝 Works with any harness | Claude Code, OpenClaw, OpenCode, Codex — bring what you use, Signet runs underneath |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rank ordering error: Signet (87.5%) is placed at rank 6 above SLM V3 Mode C (87.7%) at rank 7, but 87.7% > 87.5%. These two rows should be swapped, making SLM V3 Mode C rank 6 and Signet rank 7. Presenting Signet above a system with a higher score in a public-facing leaderboard is a factual error that will be caught by readers comparing against the source papers.

| — | Signet (baseline local) | 36% | Judge (GPT-4o) | Yes (Apache) | Yes | Internal |
| — | Signet (baseline cloud) | 34% | Judge (GPT-4o) | Yes (Apache) | No | Internal |

### Key context
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same rank ordering error as README.md: Signet (87.5%) is listed at rank 6 above SLM V3 Mode C (87.7%) at rank 7. The rows need to be swapped.


</div>

---
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headline '87.5% LoCoMo accuracy' is from an 8-question sample (run-full-stack-8). The 50-question run gives 62%, which is documented in BENCHMARKS.md but not surfaced anywhere near this claim. Users who see only the README headline will form an inaccurate picture of retrieval performance at scale. Consider appending '(8-question sample — see Benchmarks)' to the headline, or leading with the 50-question number and noting the full-stack result separately.

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: desire paths retrieval + prospective indexing (commit cb7eec8a); taking another look at the feature work and I will report back shortly.

@NicholaiVogel
Copy link
Contributor Author

good catch. fixed in cb7eec8a.

changes:

  • removed the outdated 62% / 50-Q Signet references from the public benchmark docs
  • corrected the leaderboard ordering in both README.md and docs/BENCHMARKS.md so 87.7% ranks above 87.5%
  • clarified the README headline and benchmark summary as an 8-question full-stack sample

no retrieval behavior changed here, this was factual benchmark/docs cleanup only.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incremental diff resolves all three issues from the prior review (SHA 2430c74): the rank ordering in README.md and BENCHMARKS.md is now correct (SLM V3 Mode C at rank 6, Signet at rank 7), and the 8-question sample caveat is surfaced in the headline. However, a new factual error was introduced: in docs/research/market/memory-benchmark-comparison.md, Signet's row was updated from 62% → 87.5% in place (row 12), but the row was never moved to match the new score. Signet now appears at rank 12 sandwiched between Mem0+Graph (68.4%) at rank 11 and SLM Zero-LLM (60.4%) at rank 13 — a system scoring 87.5% should sit at rank 7 alongside the README and BENCHMARKS.md tables, not rank 12. The research doc's own leaderboard is inconsistent with every other leaderboard in the PR.

Confidence: 6.0/10

  • Style consistency & maintainability: 7
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 3
  • Injection risk detection confidence: 3
  • Attack-surface risk assessment confidence: 3
  • Future hardening guidance confidence: 5
  • Scope alignment confidence: 8
  • Existing functionality awareness: 7
  • Existing tooling/pattern leverage: 7
  • Functional completeness confidence: 6
  • Pattern correctness confidence: 7
  • Documentation coverage confidence: 8
{
  "style_maintainability": 7,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 3,
  "injection_risk_detection": 3,
  "attack_surface_risk_assessment": 3,
  "future_hardening_guidance": 5,
  "scope_alignment": 8,
  "duplication_awareness": 7,
  "tooling_pattern_leverage": 7,
  "functional_completeness": 6,
  "pattern_correctness": 7,
  "documentation_coverage": 8
}

| 9 | Engram | 80.0% | Judge | Gemini 2.0 Flash | Yes | engram.fyi/research |
| 10 | SLM V3 Mode A | 74.8% | Judge | **None** | Yes (MIT) | arXiv:2603.14588 |
| 11 | Mem0+Graph | 68.4% | J-score (disputed) | GPT-4o | Partial | arXiv:2504.19413 |
| 12 | **Signet (full stack)** | **87.5%** | **Judge** | **Local + GPT-4o judge** | **Yes** | **Internal (8-Q sample)** |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signet is now at row 12 with a score of 87.5%, but that score ranks it at position 7 — not 12. The row was updated in-place from the old 62% entry without being repositioned. This leaves Signet below systems scoring 60–74%, which directly contradicts the README and BENCHMARKS.md leaderboards. The row should be moved to sit between Hindsight (89.6%) and SLM V3 Mode C (87.7%), consistent with ranks 6 and 7 in the other tables. The old rank-12 slot could be left empty or used for a Signet baseline row (36% / 34%) matching the pattern in BENCHMARKS.md.

@NicholaiVogel
Copy link
Contributor Author

small follow-up, pushed in 0b1892f6.

I restored the top README headline copy for marketing reasons, but left the factual cleanup in place:

  • the outdated 62% / 50-Q Signet references are still removed
  • the leaderboard ordering is still corrected so 87.7% ranks above 87.5%
  • the benchmark section/table still labels the current Signet result as an 8-question full-stack sample

so the only change here was the top-line marketing copy, not the underlying benchmark docs or retrieval behavior.

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: desire paths retrieval + prospective indexing (commit 0b1892f6); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incremental diff (cb7eec80b1892f) makes exactly one change: it removes the '8-question full-stack sample' qualifier from the README headline. This reintroduces the issue that was explicitly fixed in the prior round — the headline now claims '87.5% LoCoMo accuracy' with no caveat, while the 50-question baseline runs produce 36%/34%. The prior review accepted SHA cb7eec8 specifically because it added that qualifier. Removing it is a direct regression.

One issue from SHA cb7eec8 also remains unresolved: in docs/research/market/memory-benchmark-comparison.md the Signet row still sits at rank 12 (between Mem0+Graph at 68.4% and SLM Zero-LLM at 60.4%) despite showing 87.5% — inconsistent with README.md and BENCHMARKS.md where Signet correctly appears at rank 7.

Confidence: 5.8/10

  • Style consistency & maintainability: 7
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 3
  • Injection risk detection confidence: 3
  • Attack-surface risk assessment confidence: 3
  • Future hardening guidance confidence: 5
  • Scope alignment confidence: 8
  • Existing functionality awareness: 7
  • Existing tooling/pattern leverage: 6
  • Functional completeness confidence: 5
  • Pattern correctness confidence: 7
  • Documentation coverage confidence: 8
{
  "style_maintainability": 7,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 3,
  "injection_risk_detection": 3,
  "attack_surface_risk_assessment": 3,
  "future_hardening_guidance": 5,
  "scope_alignment": 8,
  "duplication_awareness": 7,
  "tooling_pattern_leverage": 6,
  "functional_completeness": 5,
  "pattern_correctness": 7,
  "documentation_coverage": 8
}

---

**Your agent is an investment. Signet is where its value accumulates.**
**87.5% LoCoMo accuracy. 100% retrieval. Zero LLM calls at search time. [See how.](#locomo-benchmark)**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression from prior review (SHA cb7eec8). The qualifier 'on an 8-question full-stack sample' was present in that SHA and was specifically what resolved the prior reviewer concern. Removing it leaves the top-of-page claim '87.5% LoCoMo accuracy' without any caveat, while the 50-question runs yield 36% (local) / 34% (cloud). A reader comparing this headline against the leaderboard has no indication the score is from 8 questions. Restore the qualifier or add an equivalent footnote anchor near the claim.

@@ -0,0 +1,479 @@
---
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved from SHA cb7eec8 review. Signet's row in this leaderboard table is ranked 12 (between Mem0+Graph at 68.4% and SLM Zero-LLM at 60.4%) while showing 87.5%. Every other leaderboard in this PR correctly places Signet at rank 7. The row was updated in score but never moved to its correct rank position. Move the Signet row to rank 7 to match README.md and BENCHMARKS.md.

Transcript write side: Claude Code connector forwards transcriptPath,
OpenCode plugin fetches session messages and builds transcript with
collapsed newlines. Hooks accept inline transcript fallback.

Transcript read side: MCP memory_search exposes expand param, SDK
recall accepts expand option, GET /api/memory/search alias forwards
expand from query string. Sources map flows through to callers.

Decision protection: txPersistStructured checks isDecisionContent
instead of hardcoding kind='attribute', promoting decisions to
constraints on the structured payload path.

Test: inline transcript normalization coverage (19/19 pass).
@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: desire paths retrieval + prospective indexing (commit 18e1bc92); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incremental diff (0b1892f18e1bc9) adds inline transcript support, the expand flag for lossless transcript recall, and decision-language promotion in txPersistStructured. The new code is clean and the test coverage for the inline transcript path is appropriate. Two documentation issues from prior rounds remain unresolved: (1) the README headline still claims '87.5% LoCoMo accuracy' with no sample-size caveat (this caveat was explicitly added then removed in the prior round); (2) docs/research/market/memory-benchmark-comparison.md still ranks Signet at position 12 between 68.4% and 60.4% systems despite the score being updated to 87.5%. One new logic issue is introduced: isDecisionContent is evaluated against the full memory content and then applied uniformly to ALL attributes extracted from that memory — if a single decision phrase appears anywhere in the content, every attribute in the memory is promoted to kind='constraint' with importance=0.85, regardless of whether each attribute describes the decision itself. A second minor issue is the newline collapse in the OpenCode transcript builder which may degrade extraction quality for code-heavy sessions.

Confidence: 6.9/10

  • Style consistency & maintainability: 8
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 6
  • Injection risk detection confidence: 6
  • Attack-surface risk assessment confidence: 6
  • Future hardening guidance confidence: 7
  • Scope alignment confidence: 8
  • Existing functionality awareness: 7
  • Existing tooling/pattern leverage: 8
  • Functional completeness confidence: 7
  • Pattern correctness confidence: 7
  • Documentation coverage confidence: 6
{
  "style_maintainability": 8,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 6,
  "injection_risk_detection": 6,
  "attack_surface_risk_assessment": 6,
  "future_hardening_guidance": 7,
  "scope_alignment": 8,
  "duplication_awareness": 7,
  "tooling_pattern_leverage": 8,
  "functional_completeness": 7,
  "pattern_correctness": 7,
  "documentation_coverage": 6
}


let aspectsCreated = 0;
let attributesCreated = 0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic: coarse-grained decision promotion. isDecisionContent(input.content) evaluates the entire memory string and then assigns kind and baseImportance to ALL attributes extracted from that memory. If a single decision phrase appears anywhere in a multi-fact memory (e.g. "decided to use SQLite" buried in a longer session summary), every unrelated attribute — timestamps, preferences, entity labels — is also promoted to kind='constraint' with importance=0.85. This will inflate the constraint pool and degrade the precision of the resolution boost in dampening. Consider applying the detection per-attribute (check attr.content) rather than per-memory, or at minimum only promoting attributes whose content overlaps with the matched decision clause.

await client.post(
"/api/hooks/session-end",
{
harness: HARNESS,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quality: newline collapse destroys code structure. part.text?.trim().replace(/\s*\r?\n\s*/g, " ") converts all newlines to spaces. A message that contains a code block, diff, or structured list becomes a single run-on line. The extraction LLM depends on line boundaries to segment facts — collapsing them degrades extraction quality on code-heavy sessions. Consider preserving newlines within a single message part and only joining parts with a space or newline separator, rather than flattening all whitespace.

[Website](https://signetai.sh) · [Docs](https://signetai.sh/docs) · [Vision](VISION.md) · [Discussions](https://github.com/Signet-AI/signetai/discussions) · [Discord](https://discord.gg/pHa5scah9C) · [Contributing](docs/CONTRIBUTING.md) · [AI Policy](AI_POLICY.md)

</div>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved from prior round (SHA cb7eec8, accepted; SHA 0b1892f, regressed). The headline 87.5% LoCoMo accuracy. 100% retrieval. Zero LLM calls at search time. no longer carries any sample-size qualifier. The prior reviewer explicitly accepted SHA cb7eec8 because it added (8-question sample) to this claim; SHA 0b1892f removed it and the current diff does not restore it. The 50-question baseline runs produce 36%/34%. The unqualified headline is misleading. Please restore the qualifier, e.g. 87.5% on 8-question LoCoMo sample.

| 4 | MemMachine v0.2 | 91.7% | Judge | GPT-4.1-mini | No | memmachine.ai blog |
| 5 | Hindsight | 89.6% | Judge | Cloud | Yes (MIT) | arXiv:2512.12818 |
| 6 | SLM V3 Mode C | 87.7% | Judge | Yes (synthesis) | Yes (MIT) | arXiv:2603.14588 |
| 7 | Zep/Graphiti | ~85% | Judge (third-party) | Cloud | Partial | DEV Community |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved from prior round (SHA cb7eec8). The Signet row still appears at rank 12 in this table (between Mem0+Graph at 68.4% and SLM Zero-LLM at 60.4%) despite the score being updated to 87.5%. A system at 87.5% belongs at rank 7, consistent with README.md and BENCHMARKS.md. The score was updated in place but the row was not moved. This leaderboard is inconsistent with every other leaderboard in the PR.

Legacy recall hook was using raw FTS, bypassing vector search and
dampening. Now calls hybridRecall like all other recall paths.
@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: desire paths retrieval + prospective indexing (commit fc0095c0); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental diff (18e1bc9fc0095c) replaces the synchronous handleRecall wrapper with a direct async call to hybridRecall in /api/hooks/recall. The change is small and directionally correct (eliminating the indirection layer), but introduces a potential API contract break if hybridRecall's return shape differs from what handleRecall returned — the response is passed directly to c.json(result) with no normalization. Three issues from prior rounds remain unresolved: (1) the README headline still claims '87.5% LoCoMo accuracy' with no sample-size caveat (this was explicitly added then removed in round 0b1892f); (2) docs/research/market/memory-benchmark-comparison.md still ranks Signet at row 12 between 68.4% and 60.4% systems despite the score being updated to 87.5%; (3) isDecisionContent is evaluated against full memory content and promotes ALL attributes in that memory to kind='constraint', not just the attributes describing the decision clause.

Confidence: 6.4/10

  • Style consistency & maintainability: 7
  • Repository conventions adherence: 8
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 5
  • Injection risk detection confidence: 5
  • Attack-surface risk assessment confidence: 5
  • Future hardening guidance confidence: 6
  • Scope alignment confidence: 7
  • Existing functionality awareness: 8
  • Existing tooling/pattern leverage: 7
  • Functional completeness confidence: 6
  • Pattern correctness confidence: 6
  • Documentation coverage confidence: 7
{
  "style_maintainability": 7,
  "repo_convention_adherence": 8,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 5,
  "injection_risk_detection": 5,
  "attack_surface_risk_assessment": 5,
  "future_hardening_guidance": 6,
  "scope_alignment": 7,
  "duplication_awareness": 8,
  "tooling_pattern_leverage": 7,
  "functional_completeness": 6,
  "pattern_correctness": 6,
  "documentation_coverage": 7
}

const agentId = c.req.header("x-signet-agent-id") ?? "default";
const result = await hybridRecall(
{ query: body.query, limit: body.limit, scope: body.project, agentId },
cfg,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return-shape compatibility is unverified. handleRecall previously returned a value consumed directly by c.json(result). hybridRecall is now substituted with no normalization step. If hybridRecall returns { results, total } (or any shape other than the { memories, count } shape that callers of this endpoint expect), this silently breaks all hook-based recall consumers. Before merging, confirm hybridRecall's return type matches the prior handleRecall response shape, or add an explicit projection.

@@ -5448,7 +5675,12 @@ app.post("/api/hooks/recall", async (c) => {
return c.json({ memories: [], count: 0, bypassed: true });
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior: agentId now defaults to "default" when the x-signet-agent-id header is absent. The prior handleRecall codepath did not apply this scoping. If existing callers (connectors, tests) omit the header, they will now silently get a scope-filtered result set instead of the previous unscoped result — a behavioral regression for any caller that doesn't set the header. Document the requirement or add a deprecation-safe fallback.


const result = handleRecall(body);
const agentId = c.req.header("x-signet-agent-id") ?? "default";
const result = await hybridRecall(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

body.limit is passed directly without a default or bounds check. If the request omits limit, the value is undefined. Verify that hybridRecall handles undefined limit safely (e.g., defaults to a reasonable cap) rather than fetching an unbounded result set.

[Website](https://signetai.sh) · [Docs](https://signetai.sh/docs) · [Vision](VISION.md) · [Discussions](https://github.com/Signet-AI/signetai/discussions) · [Discord](https://discord.gg/pHa5scah9C) · [Contributing](docs/CONTRIBUTING.md) · [AI Policy](AI_POLICY.md)

</div>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved from prior round (0b1892f). The headline '87.5% LoCoMo accuracy' has no sample-size caveat. This qualifier was explicitly added in SHA cb7eec8, then removed in SHA 0b1892f, and the prior review (SHA 18e1bc9) flagged it as unresolved. The 50-question baseline runs produce 36%/34%. The caveat must be re-added near the headline claim — something like '(8-question sample; see Benchmarks)' — or the claim is materially misleading.

comparable. Independent reproductions of Mem0's numbers put them closer
to ~58%, significantly below the self-reported 66.9%.

---
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved from SHA cb7eec8. The Signet row (87.5%) sits at rank 12 between Mem0+Graph (68.4%, rank 11) and SLM Zero-LLM (60.4%, rank 13). A system scoring 87.5% should appear at rank 7, consistent with README.md and BENCHMARKS.md. The score was updated but the row was never moved.

@@ -0,0 +1,416 @@
/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved from SHA 18e1bc9. isDecisionContent is evaluated against the full memory content string, and when it returns true, ALL attributes extracted from that memory are promoted to kind='constraint' with importance=0.85. A memory that says 'we switched to SQLite for simplicity; the project also uses TypeScript and runs on Linux' would promote 'uses TypeScript' and 'runs on Linux' to constraints alongside the actual decision. The promotion should be scoped to only the clause(s) that match the decision patterns, not the entire attribute list.

MCP memory_store accepts transcript field. /api/hooks/remember
forwards to full remember endpoint for transcript + pipeline support.
Session-end writes transcript to session_transcripts immediately,
before the 500-char gate and summary job, so short sessions and
extraction failures never lose raw data.
@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: desire paths retrieval + prospective indexing (commit eb069f21); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental diff (fc0095ceb069f2) introduces three changes: (1) /api/hooks/remember now forwards to /api/memory/remember via a self-loop HTTP fetch, (2) session transcripts are written eagerly in handleSessionEnd, (3) MCP memory_store accepts an optional transcript field. The self-loop fetch does not forward auth headers, which will silently break in any deployment with token auth enabled on /api/memory/remember. The empty catch {} in handleSessionEnd removes observability for write failures. Three issues from prior rounds remain unresolved: README headline missing sample-size caveat (introduced then removed in round 0b1892f), Signet mis-ranked at position 12 in memory-benchmark-comparison.md, and isDecisionContent promoting all attributes in a memory to kind='constraint' on any match.

Confidence: 7.0/10

  • Style consistency & maintainability: 7
  • Repository conventions adherence: 7
  • Merge conflict detection confidence: 6
  • Security vulnerability detection confidence: 8
  • Injection risk detection confidence: 7
  • Attack-surface risk assessment confidence: 7
  • Future hardening guidance confidence: 7
  • Scope alignment confidence: 7
  • Existing functionality awareness: 7
  • Existing tooling/pattern leverage: 7
  • Functional completeness confidence: 7
  • Pattern correctness confidence: 8
  • Documentation coverage confidence: 6
{
  "style_maintainability": 7,
  "repo_convention_adherence": 7,
  "merge_conflict_detection": 6,
  "security_vulnerability_detection": 8,
  "injection_risk_detection": 7,
  "attack_surface_risk_assessment": 7,
  "future_hardening_guidance": 7,
  "scope_alignment": 7,
  "duplication_awareness": 7,
  "tooling_pattern_leverage": 7,
  "functional_completeness": 7,
  "pattern_correctness": 8,
  "documentation_coverage": 6
}

return fetch(`http://${INTERNAL_SELF_HOST}:${PORT}/api/memory/remember`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auth bypass in self-loop fetch. The forwarded request to /api/memory/remember only sets Content-Type: application/json — it does not forward the original request's Authorization header or any token cookies. In any deployment with the auth middleware enabled on /api/memory/remember, this call will receive a 401/403. The response is returned verbatim to the hooks caller, making the failure silent from the hooks caller's perspective (it gets whatever error body the auth middleware returns rather than a hook-specific error). Either propagate the caller's auth headers:

const authHeader = c.req.header("Authorization");
const headers: Record<string, string> = { "Content-Type": "application/json" };
if (authHeader) headers["Authorization"] = authHeader;

or, since this is a same-process self-call, invoke the underlying handler function directly instead of going over the network.

try {
const now = new Date().toISOString();
getDbAccessor().withWriteTx((db) => {
db.prepare(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent catch removes observability. catch {} with no body swallows every failure — DB lock contention, migration-lag errors, statement errors — without any trace in the logs. The rest of the codebase uses logger.error(...) in equivalent non-fatal paths. Change to:

} catch (e) {
  logger.warn("session-end", "transcript write failed (non-fatal)", e as Error);
}

This preserves the non-fatal intent while keeping the failure visible.

type: z.string().optional().describe("Memory type (fact, preference, decision, etc.)"),
importance: z.number().optional().describe("Importance score 0-1"),
tags: z.string().optional().describe("Comma-separated tags for categorization"),
transcript: z
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No size constraint on transcript field. The field is z.string().optional() with no max() limit. An agent or misconfigured caller could pass a multi-megabyte string that gets stored verbatim in the DB. The content field (and other string inputs in the MCP tool set) should serve as a reference point. Add .max(500_000) or whatever ceiling aligns with session_transcripts expectations to bound attack surface.

@@ -5418,8 +5644,13 @@ app.post("/api/hooks/remember", async (c) => {
return c.json({ success: true, memories: [], bypassed: true });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior round unresolved — isDecisionContent over-promotion (context from SHA 18e1bc9). Not touched in this diff but still present: isDecisionContent is evaluated against the full memory content string. When any decision phrase matches, every attribute extracted from that memory is promoted to kind='constraint' with importance=0.85. A memory like "we decided on SQLite; also, the font size is 14px" will promote the font-size attribute to a constraint. This is a logic correctness issue, not a style issue.

[Website](https://signetai.sh) · [Docs](https://signetai.sh/docs) · [Vision](VISION.md) · [Discussions](https://github.com/Signet-AI/signetai/discussions) · [Discord](https://discord.gg/pHa5scah9C) · [Contributing](docs/CONTRIBUTING.md) · [AI Policy](AI_POLICY.md)

</div>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior round unresolved (regression from SHA cb7eec8). The headline 87.5% LoCoMo accuracy has no sample-size qualifier. This caveat was explicitly added in round cb7eec8 and then removed in 0b1892f. The 50-question baseline runs produce 36%/34%. The claim is technically accurate but misleading without disclosing it is an 8-question sample. The prior review accepted cb7eec8 specifically because of the qualifier.

| Rank | System | Score | Metric | LLM Required | Open Source | Source |
|------|--------|-------|--------|--------------|-------------|--------|
| 1 | Kumiho | 0.565 F1 / 97.5% adv | F1 (official) | GPT-4o | SDK open | arXiv:2603.17244 |
| 2 | EverMemOS | 93.05% | Judge (self-reported) | Cloud | No | evermind.ai blog |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior round unresolved. Signet's row in the LoCoMo leaderboard still sits at rank 12, sandwiched between Mem0+Graph (68.4%, rank 11) and SLM Zero-LLM (60.4%, rank 13), despite the score now reading 87.5%. The README and BENCHMARKS.md correctly place Signet at rank 7. This table needs the Signet row moved to rank 7 to match.

@NicholaiVogel
Copy link
Contributor Author

addressed the three review items that were actually worth fixing in 8a5d6fff:

  • forwarded the incoming Authorization header in the /api/hooks/remember self-loop to /api/memory/remember
  • restored observability on eager transcript writes in handleSessionEnd with a non-fatal warning instead of a silent catch
  • corrected the Signet ranking in docs/research/market/memory-benchmark-comparison.md

not taking the remaining notes on this PR:

  • the top README benchmark headline is intentional marketing copy. the benchmark section and supporting docs still identify the Signet result as an 8-question sample where that detail matters
  • the isDecisionContent note is asking for clause-level attribution, which is a broader semantic change than we want to make in this branch. the current structured path stays as-is for this PR
  • collapsing embedded newlines in the OpenCode transcript builder was intentional. it prevents unlabeled continuation lines in the plain-text transcript format we hand to session-end
  • the /api/hooks/recall contract-break concern is a false positive. routing that path through hybridRecall is deliberate parity work, not a response-shape regression
  • MCP transcript size ceilings are reasonable hardening, but not merge-blocking for this branch

at this point, no further behavioral changes are planned here unless something shows up that is actually correctness- or auth-breaking.

@NicholaiVogel
Copy link
Contributor Author

Hi @NicholaiVogel - quick follow-up pass on feat: desire paths retrieval + prospective indexing (commit 8a5d6fff); taking another look at the feature work and I will report back shortly.

Copy link
Contributor Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harness output could not be parsed as structured JSON.

{
  "summary": "This incremental diff (eb069f21 → 8a5d6fff) resolves three of the five outstanding issues from prior rounds: (1) auth header is now forwarded in the /api/hooks/remember self-loop, (2) the empty catch in handleSessionEnd is replaced with a logged warning restoring observability, and (3) Signet is correctly moved to rank 7 in memory-benchmark-comparison.md. Two issues from prior rounds remain unresolved: the README headline still claims '87.5% LoCoMo accuracy' with no sample-size caveat (this qualifier was added in round cb7eec8a then silently removed in round 0b1892f6), and isDecisionContent still promotes ALL attributes in a memory to kind='constraint' when any decision phrase appears anywhere in the content. No new issues are introduced by this diff.",
  "verdict": "comment",
  "confidence": {
    "style_maintainability": 8,
    "repo_convention_adherence": 8,
    "merge_conflict_detection": 6,
    "security_vulnerability_detection": 7,
    "injection_risk_detection": 7,
    "attack_surface_risk_assessment": 6,
    "future_hardening_guidance": 7,
    "scope_alignment": 9,
    "duplication_awareness": 8,
    "tooling_pattern_leverage": 7,
    "functional_completeness": 7,
    "pattern_correctness": 8,
    "documentation_coverage": 8
  },
  "comments": [
    {
      "file": "README.md",
      "line": 20,
      "body": "UNRESOLVED from round 0b1892f6: the headline '87.5% LoCoMo accuracy' has no sample-size qualifier. The 50-question baseline runs produce 36%/34%; the 87.5% figure comes from an 8-question sample. The caveat was explicitly added in round cb7eec8a and accepted, then silently removed. It needs to be restored, e.g. '87.5% LoCoMo accuracy (8-question sample)'."
    },
    {
      "file": "packages/daemon/src/inline-entity-linker.ts",
      "line": 0,
      "body": "UNRESOLVED from round 18e1bc9: isDecisionContent is evaluated against the full memory content string and, when it matches, every attribute extracted from that memory is promoted to kind='constraint' with importance=0.85. A single incidental decision phrase anywhere in the content (e.g. in a quoted string or a background clause) silently elevates all attributes, including unrelated ones. The promotion should be scoped to the specific clause or attribute that matched the pattern, not applied uniformly to the entire attribute set."
    }
  ]
}

@NicholaiVogel NicholaiVogel merged commit 943a62a into main Mar 22, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant