Skip to content

v0.88.0 — Hierarchical Document Reasoning

Choose a tag to compare

@Alex8791-cyber Alex8791-cyber released this 11 Apr 18:28
· 741 commits to main since this release

New Feature: 4th Retrieval Channel

Hierarchical Document Reasoning — index long structured documents (contracts, manuals, legal texts) as hierarchical trees and let the LLM navigate to relevant sections via top-down semantic reasoning. No vector embeddings needed.

Components

  • 5 Document Parsers: Markdown (ATX+Setext), PDF (TOC+font-size), DOCX (heading styles), HTML (h1-h6+filtering), Plaintext (German legal heuristics)
  • Tree Builder: Flat sections -> hierarchy -> content splitting -> bottom-up LLM summaries
  • SQLite Tree Store: Same DB as existing indexer, CASCADE delete, transactional writes
  • LLM Node Selector: Top-down traversal with JSON parsing + regex fallback
  • 4-Channel Score Fusion: Auto-normalizing weights (vector + BM25 + graph + hierarchical)

Integration

  • 4 new MemoryManager methods: index/remove/list/reindex hierarchical documents
  • HierarchicalConfig in config.yaml (enabled by default, score_weight=0.25)
  • Existing 3-channel search unchanged when no hierarchical docs exist

Quality

  • 136 new tests (models, 5 parsers, store, builder, selector, retrieval, integration, 14 edge cases)
  • All existing memory tests pass unchanged
  • 14 edge cases covered (no headings, heading jumps, corrupt files, concurrent indexing, encoding issues, etc.)

Full Changelog: v0.87.2...v0.88.0