Skip to content

feat(search): Layer 4 — provenance trust-tiering at recall rank time#126

Merged
cipher813 merged 1 commit into
mainfrom
feat/layer4-provenance-trust-tiering
May 18, 2026
Merged

feat(search): Layer 4 — provenance trust-tiering at recall rank time#126
cipher813 merged 1 commit into
mainfrom
feat/layer4-provenance-trust-tiering

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Context

Layer 4 of the 5-layer stored-injection defense plan (private/mnemon-injection-defense-layers-260518.md). Follows PR #124 (Layer 2, merged) and PR #125 (Layer 0, merged). Driver: memory #2362.

Problem

Auto-captured transcript memories (source_client ∈ HOOK_SOURCE_CLIENTS) could outrank deliberate user assertions in unprompted recall — the shape of the 2026-05-18 Desktop incident. The existing HOOK_SOURCE_CONFIDENCE_CEILING save-time cap only moves the 0.25-weighted confidence term (≤0.025 composite delta) — too weak alone.

Change

A flat PROVENANCE_DEMOTION_FACTOR (0.85) multiplier on the composite score for hook-sourced results — a hook capture needs ~18% more relevance+recency to tie an equal-relevance user memory. Rank-only: explicit memory_get(id) bypasses composite scoring, so direct lookups are unaffected (the plan's stated boundary). Stacks on top of the save-time confidence cap.

Provenance wasn't previously carried into search results, so this threads source_client through the pipeline:

  • store.SearchResult gains source_client; search_bm25 / search_vector SELECT + populate d.source_client
  • search.rrf_fuse preserves it across fusion; mmr_rerank carries it automatically (spreads all dataclass fields)
  • search.composite_score applies the demotion + carries source_client onto ScoredResult for observability
  • New config.PROVENANCE_DEMOTION_FACTOR, documented

No S3/MCP schema changeserver.py builds its output dict explicitly and does not expose source_client (additive-only contract preserved).

Tests

+6: exact-factor demotion; non-hook sources untouched; source_client on ScoredResult; ranking below equal user memory; survives RRF fusion; store-level save→search provenance threading. Full suite: 782 passed.

Note: one pre-existing unrelated ruff error in upgrade.py:711 left untouched (not in scope; CI runs pytest, not ruff). CHANGELOG/version bump deferred to the next batched chore: bump ritual PR.

Remaining plan layers (separate PRs)

  • Layer 1 — spotlighting/data envelope (Claude Code preamble first, MCP best-effort)
  • Layer 3 — dual-representation storage (conditional; only if Layer 0 proves insufficient)

🤖 Generated with Claude Code

Auto-captured transcript memories (source_client in HOOK_SOURCE_CLIENTS)
could outrank deliberate user assertions in unprompted recall — the
shape of the 2026-05-18 Desktop incident, where a hook-captured
fragment dominated. The existing HOOK_SOURCE_CONFIDENCE_CEILING save
cap only moves the 0.25-weighted confidence term (≤0.025 composite
delta) — too weak on its own.

Layer 4 of the 5-layer defense plan
(private/mnemon-injection-defense-layers-260518.md): a flat
PROVENANCE_DEMOTION_FACTOR (0.85) multiplier on the composite score
for hook-sourced results, so a hook capture needs ~18% more
relevance+recency to tie an equal-relevance user memory. Rank-only:
explicit memory_get(id) bypasses composite scoring, so direct lookups
are unaffected (the plan's stated boundary).

Provenance was not previously carried into search results, so this
threads source_client through:
- store.SearchResult gains source_client; search_bm25 / search_vector
  SELECT and populate d.source_client.
- search.rrf_fuse preserves it across fusion; mmr_rerank already
  spreads all dataclass fields so it carries automatically.
- search.composite_score applies the demotion and carries
  source_client onto ScoredResult (observability).
- New config.PROVENANCE_DEMOTION_FACTOR, documented as stacking on
  the save-time confidence cap.

No S3/MCP schema change — server.py builds its output dict explicitly
and does not expose source_client (additive-only contract preserved).

+6 tests (exact-factor demotion; non-hook sources untouched;
source_client carried into ScoredResult; ranking; survives RRF;
store-level provenance threading). Full suite 782 passed. Pre-existing
unrelated ruff error in upgrade.py:711 left untouched (not mine; CI
runs pytest). CHANGELOG/version bump deferred to next batched
chore: bump ritual PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit bc676f9 into main May 18, 2026
9 checks passed
@cipher813 cipher813 deleted the feat/layer4-provenance-trust-tiering branch May 18, 2026 19:33
cipher813 added a commit that referenced this pull request May 18, 2026
…njection defense (#128)

Batched release bump for the four post-rc17 security PRs (#124 bare
<system> defang, #125 Layer 0 capture-time rejection, #126 Layer 4
provenance trust-tiering, #127 Layer 1 spotlighting envelope), none
of which bumped individually per the deferred-to-batched-ritual
convention.

- pyproject.toml + src/mnemon/__init__.py: 0.6.0rc17 → 0.6.0rc18
- CHANGELOG.md: new [0.6.0rc18] Security section summarizing the
  five-layer plan's shipped layers + the deferred items

README PyPI badge is dynamic (shields.io/pypi/v) — no change. Suite
786 passed. Tag v0.6.0rc18 + GitHub Release + Fly redeploy are the
post-merge deploy steps (per ROADMAP pre-deploy ritual), not part of
this PR.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant