v0.88.0 — Hierarchical Document Reasoning
New Feature: 4th Retrieval Channel
Hierarchical Document Reasoning — index long structured documents (contracts, manuals, legal texts) as hierarchical trees and let the LLM navigate to relevant sections via top-down semantic reasoning. No vector embeddings needed.
Components
- 5 Document Parsers: Markdown (ATX+Setext), PDF (TOC+font-size), DOCX (heading styles), HTML (h1-h6+filtering), Plaintext (German legal heuristics)
- Tree Builder: Flat sections -> hierarchy -> content splitting -> bottom-up LLM summaries
- SQLite Tree Store: Same DB as existing indexer, CASCADE delete, transactional writes
- LLM Node Selector: Top-down traversal with JSON parsing + regex fallback
- 4-Channel Score Fusion: Auto-normalizing weights (vector + BM25 + graph + hierarchical)
Integration
- 4 new MemoryManager methods: index/remove/list/reindex hierarchical documents
- HierarchicalConfig in config.yaml (enabled by default, score_weight=0.25)
- Existing 3-channel search unchanged when no hierarchical docs exist
Quality
- 136 new tests (models, 5 parsers, store, builder, selector, retrieval, integration, 14 edge cases)
- All existing memory tests pass unchanged
- 14 edge cases covered (no headings, heading jumps, corrupt files, concurrent indexing, encoding issues, etc.)
Full Changelog: v0.87.2...v0.88.0