Release v1.1.0 · allen2c/dvs

✨ Feat: Introduce Automatic Document Chunking

This update rolls out a major enhancement: automatic document chunking! 🧩 Now, documents are intelligently split into smaller, more manageable pieces before embedding, leading to more precise and relevant vector search results.

🚀 What's New?

Automatic Document Splitting: When using dvs.add(), documents are now automatically chunked based on line count or token count. This is handled by the new chunkle and tiktoken dependencies.
New Chunking Controls: The dvs.add() method gets new parameters to customize how documents are split:
- lines_per_chunk
- tokens_per_chunk
Smarter Document & Point Management:
- The Document type now includes fields like source_id, chunk_index, and is_chunk to track the relationship between chunks and their original source. 🔗
- Processing is now more efficient, creating embeddings and database points in batches based on the new chunks.
Database Indexing: The documents table is now indexed by source_id to allow for quickly finding all chunks related to a single parent document.

🔧 Key Changes

dvs.add() Refactor: The core logic is updated to first chunk documents and then process these chunks for embedding and storage.
Document Type: Enhanced with new fields to support chunking and token counting.
Dependencies: Added chunkle and tiktoken to pyproject.toml and requirements files.
Version Bump: Updated project version from 1.0.0 to 1.1.0. 📦

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

✨ Feat: Introduce Automatic Document Chunking

🚀 What's New?

🔧 Key Changes

Uh oh!