Skip to content

feat(rag): port augment module from lancedb to leann-rs#897

Merged
starpit merged 1 commit intoIBM:mainfrom
starpit:feat/leann-rs-rag
Feb 21, 2026
Merged

feat(rag): port augment module from lancedb to leann-rs#897
starpit merged 1 commit intoIBM:mainfrom
starpit:feat/leann-rs-rag

Conversation

@starpit
Copy link
Copy Markdown
Member

@starpit starpit commented Feb 21, 2026

Summary

  • Replace lancedb/arrow vector storage with leann-core (crates.io 0.1.1) for HNSW-based indexing, sentence-based chunking, and passage management
  • Remove storage.rs (lancedb VecDB wrapper) and windowing.rs (line-based chunking), replaced by leann-rs equivalents
  • Add SpnlEmbeddingProvider adapter bridging spnl's async embed() to leann-core's sync EmbeddingProvider trait
  • Rewrite RAPTOR indexer with two-phase batch-rebuild approach (documented in RAPTOR_LEANN_PLAN.md)
  • Net -2655 lines (2243 added, 4898 removed)

Test plan

  • cargo build --features rag compiles without lancedb
  • cargo clippy clean on both spnl and spnl-cli with full CI feature sets
  • All 87 tests pass (cargo test -p spnl --features ... per core.yml)
  • retrieve test exercises full index→search pipeline with local embedding model
  • HNSW index files created on disk (.index, .ids.txt, .leann.meta.json, .leann.passages.jsonl, .leann.passages.idx)
  • No lancedb/arrow code imports remain in augment module
  • Manual test: index a multi-page document and verify retrieval quality

🤖 Generated with Claude Code

Replace lancedb vector storage with leann-rs HNSW-based indexing for
the RAG pipeline. This removes heavyweight lancedb/arrow dependencies
in favor of leann-core which provides HNSW vector indexing,
sentence-based chunking, and passage management.

Key changes:
- Remove storage.rs (lancedb VecDB wrapper) and windowing.rs
  (line-based chunking), replaced by leann-rs equivalents
- Add SpnlEmbeddingProvider adapter bridging spnl's async embed()
  to leann-core's sync EmbeddingProvider trait
- Rewrite layer1.rs to use LeannBuilder with chunk_text() chunking
- Rewrite retrieve.rs to use HNSW search with stored vectors
- Rewrite raptor.rs with two-phase batch-rebuild approach
  (documented in RAPTOR_LEANN_PLAN.md)
- Replace vecdb_uri/vecdb_table options with index_dir,
  chunk_size, chunk_overlap

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Nick Mitchell <nickm@us.ibm.com>
@starpit starpit merged commit f2c05c7 into IBM:main Feb 21, 2026
36 checks passed
@starpit starpit deleted the feat/leann-rs-rag branch February 21, 2026 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant