feat: chunk lifecycle management — supersede, archive, search filtering#102
feat: chunk lifecycle management — supersede, archive, search filtering#102
Conversation
Add lifecycle columns (superseded_by, aggregated_into, archived_at) to chunks table with backwards-compatible migration. Default search now excludes superseded/aggregated/archived chunks; include_archived=True shows history. New MCP tools: brain_supersede (with safety gate for personal data) and brain_archive (soft-delete with timestamp). brain_store gains optional supersedes param for atomic store-and-supersede. 32 new tests covering schema, VectorStore ops, search filtering, MCP handlers, safety checks, and personal content detection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis change introduces comprehensive chunk lifecycle management to brainlayer, adding database schema columns ( Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant MCP as MCP Client
participant Handler as store_handler
participant VectorStore as vector_store
participant DB as Database
User->>MCP: brain_supersede(old_chunk_id, new_chunk_id, safety_check, confirm)
MCP->>Handler: _brain_supersede(old_id, new_id, safety_check, confirm)
Handler->>Handler: _is_personal_content(check old chunk)
alt safety_check=auto & personal
Handler->>MCP: {confirm_required: true}
MCP-->>User: Requires confirmation
else safety_check=confirm & confirm=true OR non-personal
Handler->>VectorStore: supersede_chunk(old_id, new_id)
VectorStore->>DB: UPDATE chunks SET superseded_by=new_id
VectorStore->>DB: DELETE FROM chunk_vectors (old chunk vectors)
VectorStore->>DB: PRAGMA optimize
VectorStore-->>Handler: true/false
Handler->>MCP: {superseded: new_chunk_id}
MCP-->>User: Success
else confirm=false
Handler->>MCP: {confirm_required: true}
MCP-->>User: Confirmation required
end
sequenceDiagram
actor User
participant MCP as MCP Client
participant Handler as store_handler
participant VectorStore as vector_store
participant DB as Database
User->>MCP: brain_archive(chunk_id, reason?)
MCP->>Handler: _brain_archive(chunk_id, reason)
Handler->>VectorStore: archive_chunk(chunk_id)
alt chunk exists
VectorStore->>DB: UPDATE chunks SET value_type='ARCHIVED', archived_at=UTC_NOW()
VectorStore->>DB: PRAGMA optimize
VectorStore-->>Handler: true
Handler->>MCP: {archived: chunk_id}
MCP-->>User: Success
else chunk not found
Handler->>MCP: {error: ...}
MCP-->>User: Error
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🟡 Medium
When the database is busy/locked, _store queues the memory via _queue_store but omits supersedes from the queued dict (lines 526–543). Once flushed by _flush_pending_stores, the supersede relationship is silently lost because that function calls store_memory without any supersede handling. The user is told the memory was queued, but the intended chunk replacement never happens.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/mcp/store_handler.py around line 526:
When the database is busy/locked, `_store` queues the memory via `_queue_store` but omits `supersedes` from the queued dict (lines 526–543). Once flushed by `_flush_pending_stores`, the supersede relationship is silently lost because that function calls `store_memory` without any supersede handling. The user is told the memory was queued, but the intended chunk replacement never happens.
Evidence trail:
src/brainlayer/mcp/store_handler.py lines 422-438 (shows `_store` function signature with `supersedes` parameter), lines 526-543 (shows `_queue_store` call without `supersedes` in the dictionary), lines 368-409 (shows `_flush_pending_stores` calling `store_memory` without `supersedes` parameter). Repository: https://github.com/EtanHey/brainlayer at REVIEWED_COMMIT.
brain_supersede and brain_archive were added in PR #102. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: session-scoped dedup coordination between BrainLayer hooks
Add /tmp/brainlayer_session_{id}.json coordination file that SessionStart
and UserPromptSubmit hooks read/write to avoid injecting the same chunk
twice. SessionStart writes which chunk_ids it injected; UserPromptSubmit
checks before re-injecting.
Key changes:
- hooks/dedup_coordination.py: shared module for atomic file I/O, chunk
registration, handoff detection, and injected-ID tracking
- hooks/brainlayer-session-start.py: now in repo (was only in ~/.claude/hooks),
captures chunk IDs from FTS queries and writes coordination file
- hooks/brainlayer-prompt-search.py: reads coordination file, skips
already-injected chunks, early-exits on handoff prompts
- 20 tests covering file I/O, dedup, handoff detection, graceful degradation
Addresses R46 research: eliminates ~2,500 token duplicate on handoff prompts
and prevents cross-hook chunk duplication across session lifetime.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: fix ruff format on store_handler.py and test_chunk_lifecycle.py
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: update MCP tool count assertion from 9 to 11
brain_supersede and brain_archive were added in PR #102.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address CodeRabbit review feedback on dedup coordination
- Add fcntl.flock file locking to prevent read-modify-write races
- Fix UnboundLocalError when mkstemp() fails (init tmp_path=None)
- Validate JSON payload is dict before calling .get()
- Tighten handoff keywords: bare agent names no longer suppress search
- Proportional token estimates (only count new chunks, not dupes)
- Add test for non-dict JSON and bare agent name non-trigger
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
superseded_by,aggregated_into,archived_atcolumns to chunks table (backwards-compatible ALTER TABLE)include_archived=Trueshows full history. Applied to all 3 search paths (vector, LIKE, FTS5)brain_supersede(old_chunk_id, new_chunk_id)with safety gate for personal data,brain_archive(chunk_id)with optional reasonsupersedesparam for atomic store-and-supersedeArchitecture decisions
safety_check="auto"for technical facts,"confirm"for personal data (journals, notes, health/finance content)archive_chunkbehavior)_brain_updatefor DB lock resilienceTest plan
pytest tests/test_chunk_lifecycle.py)ruff checkpasses)🤖 Generated with Claude Code
Note
Add chunk lifecycle management with supersede, archive, and search filtering to brain MCP tools
supersede_chunkandarchive_chunkmethods toVectorStore, storing lifecycle state in newsuperseded_by,aggregated_into, andarchived_atcolumns; existing DBs are migrated on init.brain_supersedeandbrain_archiveMCP tools instore_handler.pyand wires them in__init__.py;brain_storegains an optionalsupersedesfield to supersede a prior chunk in the same operation.SearchMixin.searchto exclude superseded, aggregated, and archived chunks by default; passinclude_archived=Trueto include them.brain_supersederequires explicit confirmation when the chunk contains personal content, returning aconfirm_requiredresponse before proceeding.📊 Macroscope summarized 81fb647. 5 files reviewed, 4 issues evaluated, 1 issue filtered, 1 comment posted
🗂️ Filtered Issues
src/brainlayer/mcp/__init__.py — 0 comments posted, 1 evaluated, 1 filtered
brain_supersedeandbrain_archive. The count should be updated from "8 tools" to "9 tools" (or higher ifbrain_get_personandbrain_tagsshould also be included, as they appear inlist_tools()but are not mentioned in the instructions). [ Out of scope ]Summary by CodeRabbit
New Features
Tests