Skip to content

feat: chunk lifecycle management — supersede, archive, search filtering#102

Merged
EtanHey merged 1 commit intomainfrom
feat/chunk-lifecycle-management
Mar 26, 2026
Merged

feat: chunk lifecycle management — supersede, archive, search filtering#102
EtanHey merged 1 commit intomainfrom
feat/chunk-lifecycle-management

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Mar 26, 2026

Summary

  • Schema migration: Add superseded_by, aggregated_into, archived_at columns to chunks table (backwards-compatible ALTER TABLE)
  • Search filtering: Default search excludes lifecycle-managed chunks; include_archived=True shows full history. Applied to all 3 search paths (vector, LIKE, FTS5)
  • New MCP tools: brain_supersede(old_chunk_id, new_chunk_id) with safety gate for personal data, brain_archive(chunk_id) with optional reason
  • brain_store enhancement: Optional supersedes param for atomic store-and-supersede
  • 32 new tests covering all lifecycle operations, safety checks, and personal content detection

Architecture decisions

  • Additive schema change — no breaking changes to existing search or data
  • Safety param on supersede: safety_check="auto" for technical facts, "confirm" for personal data (journals, notes, health/finance content)
  • Superseded/archived chunks drop from vector index (consistent with existing archive_chunk behavior)
  • Reuses existing retry pattern from _brain_update for DB lock resilience

Test plan

  • 32 unit tests pass (pytest tests/test_chunk_lifecycle.py)
  • Existing test suites unaffected (test_brainstore, test_search_quality, test_deferred_embedding, test_brain_tags — all green)
  • Lint clean (ruff check passes)
  • Verify schema migration on production DB (lifecycle columns nullable, no data migration needed)

🤖 Generated with Claude Code

Note

Add chunk lifecycle management with supersede, archive, and search filtering to brain MCP tools

  • Adds supersede_chunk and archive_chunk methods to VectorStore, storing lifecycle state in new superseded_by, aggregated_into, and archived_at columns; existing DBs are migrated on init.
  • Adds brain_supersede and brain_archive MCP tools in store_handler.py and wires them in __init__.py; brain_store gains an optional supersedes field to supersede a prior chunk in the same operation.
  • Updates SearchMixin.search to exclude superseded, aggregated, and archived chunks by default; pass include_archived=True to include them.
  • brain_supersede requires explicit confirmation when the chunk contains personal content, returning a confirm_required response before proceeding.
  • Behavioral Change: default search results no longer include superseded or archived chunks.
📊 Macroscope summarized 81fb647. 5 files reviewed, 4 issues evaluated, 1 issue filtered, 1 comment posted

🗂️ Filtered Issues

src/brainlayer/mcp/__init__.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 85: The instructions string claims "8 tools" at line 85, but the instructions text now lists 9 tools after adding brain_supersede and brain_archive. The count should be updated from "8 tools" to "9 tools" (or higher if brain_get_person and brain_tags should also be included, as they appear in list_tools() but are not mentioned in the instructions). [ Out of scope ]

Summary by CodeRabbit

  • New Features

    • Added ability to supersede (replace) memory chunks with new ones.
    • Added ability to archive (soft-delete) memory chunks with optional reason documentation.
    • Enhanced search functionality with option to include archived chunks in results.
  • Tests

    • Added comprehensive test suite for chunk lifecycle management operations.

Add lifecycle columns (superseded_by, aggregated_into, archived_at) to chunks
table with backwards-compatible migration. Default search now excludes
superseded/aggregated/archived chunks; include_archived=True shows history.

New MCP tools: brain_supersede (with safety gate for personal data) and
brain_archive (soft-delete with timestamp). brain_store gains optional
supersedes param for atomic store-and-supersede.

32 new tests covering schema, VectorStore ops, search filtering, MCP
handlers, safety checks, and personal content detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 26, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e5931442-c623-446b-bbf7-004a1698b855

📥 Commits

Reviewing files that changed from the base of the PR and between e38313d and 81fb647.

📒 Files selected for processing (5)
  • src/brainlayer/mcp/__init__.py
  • src/brainlayer/mcp/store_handler.py
  • src/brainlayer/search_repo.py
  • src/brainlayer/vector_store.py
  • tests/test_chunk_lifecycle.py

📝 Walkthrough

Walkthrough

This change introduces comprehensive chunk lifecycle management to brainlayer, adding database schema columns (superseded_by, aggregated_into, archived_at) and corresponding MCP tools (brain_supersede, brain_archive). Vector store methods track chunk state transitions, search filtering excludes archived/superseded chunks by default, and safety gates protect against unsafe superseding operations.

Changes

Cohort / File(s) Summary
MCP Tool Registration & Handlers
src/brainlayer/mcp/__init__.py, src/brainlayer/mcp/store_handler.py
Added two new write-only MCP tools (brain_supersede, brain_archive) with input schemas and handlers; extended brain_store with optional supersedes parameter. Introduced _brain_supersede() and _brain_archive() functions with retry-on-DB-lock, safety checks (personal content detection), and confirmation gating.
Vector Store Lifecycle Schema & Methods
src/brainlayer/vector_store.py
Extended chunks table with lifecycle columns (superseded_by, aggregated_into, archived_at). Added supersede_chunk(old_chunk_id, new_chunk_id) method; updated archive_chunk() to set archived_at timestamp; updated get_chunk() return shape to include new lifecycle fields.
Search Filtering Logic
src/brainlayer/search_repo.py
Added include_archived parameter to search() and hybrid_search() methods; when False (default), appends SQL/FTS5 predicates excluding superseded_by IS NOT NULL, aggregated_into IS NOT NULL, and archived_at IS NOT NULL from results.
Comprehensive Test Suite
tests/test_chunk_lifecycle.py
Added 393-line test file validating chunk lifecycle across schema initialization, vector store operations, search behavior, and MCP handlers; includes tests for _brain_supersede() safety checks, _brain_archive() with optional reason, _store_new() with supersedes parameter, and personal content detection.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant MCP as MCP Client
    participant Handler as store_handler
    participant VectorStore as vector_store
    participant DB as Database
    
    User->>MCP: brain_supersede(old_chunk_id, new_chunk_id, safety_check, confirm)
    MCP->>Handler: _brain_supersede(old_id, new_id, safety_check, confirm)
    Handler->>Handler: _is_personal_content(check old chunk)
    alt safety_check=auto & personal
        Handler->>MCP: {confirm_required: true}
        MCP-->>User: Requires confirmation
    else safety_check=confirm & confirm=true OR non-personal
        Handler->>VectorStore: supersede_chunk(old_id, new_id)
        VectorStore->>DB: UPDATE chunks SET superseded_by=new_id
        VectorStore->>DB: DELETE FROM chunk_vectors (old chunk vectors)
        VectorStore->>DB: PRAGMA optimize
        VectorStore-->>Handler: true/false
        Handler->>MCP: {superseded: new_chunk_id}
        MCP-->>User: Success
    else confirm=false
        Handler->>MCP: {confirm_required: true}
        MCP-->>User: Confirmation required
    end
Loading
sequenceDiagram
    actor User
    participant MCP as MCP Client
    participant Handler as store_handler
    participant VectorStore as vector_store
    participant DB as Database
    
    User->>MCP: brain_archive(chunk_id, reason?)
    MCP->>Handler: _brain_archive(chunk_id, reason)
    Handler->>VectorStore: archive_chunk(chunk_id)
    alt chunk exists
        VectorStore->>DB: UPDATE chunks SET value_type='ARCHIVED', archived_at=UTC_NOW()
        VectorStore->>DB: PRAGMA optimize
        VectorStore-->>Handler: true
        Handler->>MCP: {archived: chunk_id}
        MCP-->>User: Success
    else chunk not found
        Handler->>MCP: {error: ...}
        MCP-->>User: Error
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 A rabbit hops through chunks so spry,
Old ones supersede, archive, and die,
The lifecycle blooms with archived_at,
Safety checks guard what's dear and that,
Search respects the deleted past,
Fresh memories forever last! 🌿✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/chunk-lifecycle-management

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@EtanHey EtanHey merged commit f1a7bdd into main Mar 26, 2026
1 of 6 checks passed
@EtanHey EtanHey deleted the feat/chunk-lifecycle-management branch March 26, 2026 06:43
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

When the database is busy/locked, _store queues the memory via _queue_store but omits supersedes from the queued dict (lines 526–543). Once flushed by _flush_pending_stores, the supersede relationship is silently lost because that function calls store_memory without any supersede handling. The user is told the memory was queued, but the intended chunk replacement never happens.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/mcp/store_handler.py around line 526:

When the database is busy/locked, `_store` queues the memory via `_queue_store` but omits `supersedes` from the queued dict (lines 526–543). Once flushed by `_flush_pending_stores`, the supersede relationship is silently lost because that function calls `store_memory` without any supersede handling. The user is told the memory was queued, but the intended chunk replacement never happens.

Evidence trail:
src/brainlayer/mcp/store_handler.py lines 422-438 (shows `_store` function signature with `supersedes` parameter), lines 526-543 (shows `_queue_store` call without `supersedes` in the dictionary), lines 368-409 (shows `_flush_pending_stores` calling `store_memory` without `supersedes` parameter). Repository: https://github.com/EtanHey/brainlayer at REVIEWED_COMMIT.

EtanHey added a commit that referenced this pull request Mar 26, 2026
brain_supersede and brain_archive were added in PR #102.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EtanHey added a commit that referenced this pull request Mar 26, 2026
* feat: session-scoped dedup coordination between BrainLayer hooks

Add /tmp/brainlayer_session_{id}.json coordination file that SessionStart
and UserPromptSubmit hooks read/write to avoid injecting the same chunk
twice. SessionStart writes which chunk_ids it injected; UserPromptSubmit
checks before re-injecting.

Key changes:
- hooks/dedup_coordination.py: shared module for atomic file I/O, chunk
  registration, handoff detection, and injected-ID tracking
- hooks/brainlayer-session-start.py: now in repo (was only in ~/.claude/hooks),
  captures chunk IDs from FTS queries and writes coordination file
- hooks/brainlayer-prompt-search.py: reads coordination file, skips
  already-injected chunks, early-exits on handoff prompts
- 20 tests covering file I/O, dedup, handoff detection, graceful degradation

Addresses R46 research: eliminates ~2,500 token duplicate on handoff prompts
and prevents cross-hook chunk duplication across session lifetime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: fix ruff format on store_handler.py and test_chunk_lifecycle.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update MCP tool count assertion from 9 to 11

brain_supersede and brain_archive were added in PR #102.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review feedback on dedup coordination

- Add fcntl.flock file locking to prevent read-modify-write races
- Fix UnboundLocalError when mkstemp() fails (init tmp_path=None)
- Validate JSON payload is dict before calling .get()
- Tighten handoff keywords: bare agent names no longer suppress search
- Proportional token estimates (only count new chunks, not dupes)
- Add test for non-dict JSON and bare agent name non-trigger

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant