feat: chunk lifecycle management — supersede, archive, search filtering by EtanHey · Pull Request #102 · EtanHey/brainlayer

EtanHey · 2026-03-26T06:42:43Z

Summary

Schema migration: Add superseded_by, aggregated_into, archived_at columns to chunks table (backwards-compatible ALTER TABLE)
Search filtering: Default search excludes lifecycle-managed chunks; include_archived=True shows full history. Applied to all 3 search paths (vector, LIKE, FTS5)
New MCP tools: brain_supersede(old_chunk_id, new_chunk_id) with safety gate for personal data, brain_archive(chunk_id) with optional reason
brain_store enhancement: Optional supersedes param for atomic store-and-supersede
32 new tests covering all lifecycle operations, safety checks, and personal content detection

Architecture decisions

Additive schema change — no breaking changes to existing search or data
Safety param on supersede: safety_check="auto" for technical facts, "confirm" for personal data (journals, notes, health/finance content)
Superseded/archived chunks drop from vector index (consistent with existing archive_chunk behavior)
Reuses existing retry pattern from _brain_update for DB lock resilience

Test plan

32 unit tests pass (pytest tests/test_chunk_lifecycle.py)
Existing test suites unaffected (test_brainstore, test_search_quality, test_deferred_embedding, test_brain_tags — all green)
Lint clean (ruff check passes)
Verify schema migration on production DB (lifecycle columns nullable, no data migration needed)

🤖 Generated with Claude Code

Note

Add chunk lifecycle management with supersede, archive, and search filtering to brain MCP tools

Adds supersede_chunk and archive_chunk methods to VectorStore, storing lifecycle state in new superseded_by, aggregated_into, and archived_at columns; existing DBs are migrated on init.
Adds brain_supersede and brain_archive MCP tools in store_handler.py and wires them in __init__.py; brain_store gains an optional supersedes field to supersede a prior chunk in the same operation.
Updates SearchMixin.search to exclude superseded, aggregated, and archived chunks by default; pass include_archived=True to include them.
brain_supersede requires explicit confirmation when the chunk contains personal content, returning a confirm_required response before proceeding.
Behavioral Change: default search results no longer include superseded or archived chunks.

📊 Macroscope summarized 81fb647. 5 files reviewed, 4 issues evaluated, 1 issue filtered, 1 comment posted

🗂️ Filtered Issues

src/brainlayer/mcp/__init__.py — 0 comments posted, 1 evaluated, 1 filtered

line 85: The instructions string claims "8 tools" at line 85, but the instructions text now lists 9 tools after adding brain_supersede and brain_archive. The count should be updated from "8 tools" to "9 tools" (or higher if brain_get_person and brain_tags should also be included, as they appear in list_tools() but are not mentioned in the instructions). [ Out of scope ]

Summary by CodeRabbit

New Features
- Added ability to supersede (replace) memory chunks with new ones.
- Added ability to archive (soft-delete) memory chunks with optional reason documentation.
- Enhanced search functionality with option to include archived chunks in results.
Tests
- Added comprehensive test suite for chunk lifecycle management operations.

Add lifecycle columns (superseded_by, aggregated_into, archived_at) to chunks table with backwards-compatible migration. Default search now excludes superseded/aggregated/archived chunks; include_archived=True shows history. New MCP tools: brain_supersede (with safety gate for personal data) and brain_archive (soft-delete with timestamp). brain_store gains optional supersedes param for atomic store-and-supersede. 32 new tests covering schema, VectorStore ops, search filtering, MCP handlers, safety checks, and personal content detection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-03-26T06:42:55Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e5931442-c623-446b-bbf7-004a1698b855

📥 Commits

Reviewing files that changed from the base of the PR and between e38313d and 81fb647.

📒 Files selected for processing (5)

src/brainlayer/mcp/__init__.py
src/brainlayer/mcp/store_handler.py
src/brainlayer/search_repo.py
src/brainlayer/vector_store.py
tests/test_chunk_lifecycle.py

📝 Walkthrough

Walkthrough

This change introduces comprehensive chunk lifecycle management to brainlayer, adding database schema columns (superseded_by, aggregated_into, archived_at) and corresponding MCP tools (brain_supersede, brain_archive). Vector store methods track chunk state transitions, search filtering excludes archived/superseded chunks by default, and safety gates protect against unsafe superseding operations.

Changes

Cohort / File(s)	Summary
MCP Tool Registration & Handlers `src/brainlayer/mcp/__init__.py`, `src/brainlayer/mcp/store_handler.py`	Added two new write-only MCP tools (`brain_supersede`, `brain_archive`) with input schemas and handlers; extended `brain_store` with optional `supersedes` parameter. Introduced `_brain_supersede()` and `_brain_archive()` functions with retry-on-DB-lock, safety checks (personal content detection), and confirmation gating.
Vector Store Lifecycle Schema & Methods `src/brainlayer/vector_store.py`	Extended `chunks` table with lifecycle columns (`superseded_by`, `aggregated_into`, `archived_at`). Added `supersede_chunk(old_chunk_id, new_chunk_id)` method; updated `archive_chunk()` to set `archived_at` timestamp; updated `get_chunk()` return shape to include new lifecycle fields.
Search Filtering Logic `src/brainlayer/search_repo.py`	Added `include_archived` parameter to `search()` and `hybrid_search()` methods; when `False` (default), appends SQL/FTS5 predicates excluding `superseded_by IS NOT NULL`, `aggregated_into IS NOT NULL`, and `archived_at IS NOT NULL` from results.
Comprehensive Test Suite `tests/test_chunk_lifecycle.py`	Added 393-line test file validating chunk lifecycle across schema initialization, vector store operations, search behavior, and MCP handlers; includes tests for `_brain_supersede()` safety checks, `_brain_archive()` with optional reason, `_store_new()` with supersedes parameter, and personal content detection.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant MCP as MCP Client
    participant Handler as store_handler
    participant VectorStore as vector_store
    participant DB as Database
    
    User->>MCP: brain_supersede(old_chunk_id, new_chunk_id, safety_check, confirm)
    MCP->>Handler: _brain_supersede(old_id, new_id, safety_check, confirm)
    Handler->>Handler: _is_personal_content(check old chunk)
    alt safety_check=auto & personal
        Handler->>MCP: {confirm_required: true}
        MCP-->>User: Requires confirmation
    else safety_check=confirm & confirm=true OR non-personal
        Handler->>VectorStore: supersede_chunk(old_id, new_id)
        VectorStore->>DB: UPDATE chunks SET superseded_by=new_id
        VectorStore->>DB: DELETE FROM chunk_vectors (old chunk vectors)
        VectorStore->>DB: PRAGMA optimize
        VectorStore-->>Handler: true/false
        Handler->>MCP: {superseded: new_chunk_id}
        MCP-->>User: Success
    else confirm=false
        Handler->>MCP: {confirm_required: true}
        MCP-->>User: Confirmation required
    end

sequenceDiagram
    actor User
    participant MCP as MCP Client
    participant Handler as store_handler
    participant VectorStore as vector_store
    participant DB as Database
    
    User->>MCP: brain_archive(chunk_id, reason?)
    MCP->>Handler: _brain_archive(chunk_id, reason)
    Handler->>VectorStore: archive_chunk(chunk_id)
    alt chunk exists
        VectorStore->>DB: UPDATE chunks SET value_type='ARCHIVED', archived_at=UTC_NOW()
        VectorStore->>DB: PRAGMA optimize
        VectorStore-->>Handler: true
        Handler->>MCP: {archived: chunk_id}
        MCP-->>User: Success
    else chunk not found
        Handler->>MCP: {error: ...}
        MCP-->>User: Error
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: diarization pipeline + diarized transcript re-indexing #34: Introduces MCP tool for archiving and modifies VectorStore.archive_chunk(), providing complementary chunk archival functionality that this PR builds upon.
refactor: split god modules into focused files with mixin pattern #54: Contains chunk lifecycle refactoring and modifications to the same files (vector_store.py, store_handler.py, MCP handlers), representing foundational work for lifecycle management features.
Harden BrainLayer search validation and backfill coverage #79: Modifies search_repo.hybrid_search() caching logic and _hybrid_cache_key(), which this PR extends with include_archived parameter and cache invalidation.

Poem

🐰 A rabbit hops through chunks so spry,
Old ones supersede, archive, and die,
The lifecycle blooms with archived_at,
Safety checks guard what's dear and that,
Search respects the deleted past,
Fresh memories forever last! 🌿✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/chunk-lifecycle-management

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

macroscopeapp · 2026-03-26T06:45:41Z

🟡 Medium

brainlayer/src/brainlayer/mcp/store_handler.py

Line 526 in 81fb647

_queue_store(

When the database is busy/locked, _store queues the memory via _queue_store but omits supersedes from the queued dict (lines 526–543). Once flushed by _flush_pending_stores, the supersede relationship is silently lost because that function calls store_memory without any supersede handling. The user is told the memory was queued, but the intended chunk replacement never happens.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file src/brainlayer/mcp/store_handler.py around line 526: When the database is busy/locked, `_store` queues the memory via `_queue_store` but omits `supersedes` from the queued dict (lines 526–543). Once flushed by `_flush_pending_stores`, the supersede relationship is silently lost because that function calls `store_memory` without any supersede handling. The user is told the memory was queued, but the intended chunk replacement never happens. Evidence trail: src/brainlayer/mcp/store_handler.py lines 422-438 (shows `_store` function signature with `supersedes` parameter), lines 526-543 (shows `_queue_store` call without `supersedes` in the dictionary), lines 368-409 (shows `_flush_pending_stores` calling `store_memory` without `supersedes` parameter). Repository: https://github.com/EtanHey/brainlayer at REVIEWED_COMMIT.

brain_supersede and brain_archive were added in PR #102. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: session-scoped dedup coordination between BrainLayer hooks Add /tmp/brainlayer_session_{id}.json coordination file that SessionStart and UserPromptSubmit hooks read/write to avoid injecting the same chunk twice. SessionStart writes which chunk_ids it injected; UserPromptSubmit checks before re-injecting. Key changes: - hooks/dedup_coordination.py: shared module for atomic file I/O, chunk registration, handoff detection, and injected-ID tracking - hooks/brainlayer-session-start.py: now in repo (was only in ~/.claude/hooks), captures chunk IDs from FTS queries and writes coordination file - hooks/brainlayer-prompt-search.py: reads coordination file, skips already-injected chunks, early-exits on handoff prompts - 20 tests covering file I/O, dedup, handoff detection, graceful degradation Addresses R46 research: eliminates ~2,500 token duplicate on handoff prompts and prevents cross-hook chunk duplication across session lifetime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: fix ruff format on store_handler.py and test_chunk_lifecycle.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update MCP tool count assertion from 9 to 11 brain_supersede and brain_archive were added in PR #102. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address CodeRabbit review feedback on dedup coordination - Add fcntl.flock file locking to prevent read-modify-write races - Fix UnboundLocalError when mkstemp() fails (init tmp_path=None) - Validate JSON payload is dict before calling .get() - Tighten handoff keywords: bare agent names no longer suppress search - Proportional token estimates (only count new chunks, not dupes) - Add test for non-dict JSON and bare agent name non-trigger Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Mar 26, 2026

View reviewed changes

EtanHey merged commit f1a7bdd into main Mar 26, 2026
1 of 6 checks passed

EtanHey deleted the feat/chunk-lifecycle-management branch March 26, 2026 06:43

macroscopeapp Bot reviewed Mar 26, 2026

View reviewed changes

EtanHey added a commit that referenced this pull request Mar 26, 2026

fix: update MCP tool count assertion from 9 to 11

4d7e8c1

brain_supersede and brain_archive were added in PR #102. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This was referenced Mar 26, 2026

feat: enrichment backfill script + compact instructions #97

Closed

docs: update README and CLAUDE.md for current state (March 26) #108

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: chunk lifecycle management — supersede, archive, search filtering#102

feat: chunk lifecycle management — supersede, archive, search filtering#102
EtanHey merged 1 commit intomainfrom
feat/chunk-lifecycle-management

EtanHey commented Mar 26, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

macroscopeapp Bot Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EtanHey commented Mar 26, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture decisions

Test plan

Add chunk lifecycle management with supersede, archive, and search filtering to brain MCP tools

🗂️ Filtered Issues

Summary by CodeRabbit

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

macroscopeapp Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EtanHey commented Mar 26, 2026 •

edited by macroscopeapp Bot

Loading

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading