Releases: allen2c/dvs
Releases · allen2c/dvs
v1.1.0
✨ Feat: Introduce Automatic Document Chunking
This update rolls out a major enhancement: automatic document chunking! 🧩 Now, documents are intelligently split into smaller, more manageable pieces before embedding, leading to more precise and relevant vector search results.
🚀 What's New?
- Automatic Document Splitting: When using
dvs.add(), documents are now automatically chunked based on line count or token count. This is handled by the newchunkleandtiktokendependencies. - New Chunking Controls: The
dvs.add()method gets new parameters to customize how documents are split:lines_per_chunktokens_per_chunk
- Smarter Document & Point Management:
- The
Documenttype now includes fields likesource_id,chunk_index, andis_chunkto track the relationship between chunks and their original source. 🔗 - Processing is now more efficient, creating embeddings and database points in batches based on the new chunks.
- The
- Database Indexing: The documents table is now indexed by
source_idto allow for quickly finding all chunks related to a single parent document.
🔧 Key Changes
dvs.add()Refactor: The core logic is updated to first chunk documents and then process these chunks for embedding and storage.DocumentType: Enhanced with new fields to support chunking and token counting.- Dependencies: Added
chunkleandtiktokentopyproject.tomland requirements files. - Version Bump: Updated project version from
1.0.0to1.1.0. 📦
v1.0.0
🚀 DVS v1.0.0 Release
This release marks a major milestone for DVS, transitioning it from a FastAPI-based web service to a lightweight, serverless Python library. This change significantly improves performance, simplifies the architecture, and enhances the overall user experience.
🌟 Key Changes
- Goodbye, FastAPI! 👋: The most significant change is the removal of the FastAPI server, making DVS a pure Python library. This reduces overhead and simplifies the deployment process.
- Refined API 🧑💻: The API has been streamlined for better usability. Key improvements include:
- A new
DVSclass for easier interaction. - Enhanced document and point management with methods like
add,remove, andsearch. - Improved handling of embeddings with the
openai-embeddings-model.
- A new
- Enhanced Caching 🗄️: The caching mechanism has been updated for better performance, with a new default path at
./cache/dvs. - Improved Configuration ⚙️: The configuration system now uses a more robust
Settingsclass, providing better control over database paths, table names, and other parameters. - Streamlined Dependencies 📦: The project's dependencies have been updated and simplified, ensuring a more stable and maintainable codebase.
✨ Other Improvements
- New Datasets Script 📥: A new
download_datasets.pyscript makes it easier to download and set up example datasets. - Updated Documentation & README 📖: The documentation and README have been completely revamped to reflect the new library-based approach, with updated examples and usage instructions.
- Simplified Makefile 🛠️: The
Makefilehas been cleaned up, removing outdated commands and streamlining the development process.
💥 Breaking Changes
- The removal of the FastAPI server means that all API endpoints are no longer available. Users should now interact with DVS as a Python library.
- The
DVSclass and its methods have been updated, requiring changes in how the library is used. - The configuration system has been updated, so users will need to adjust their environment variables and settings accordingly.
v0.4.0
Full Changelog: v0.3.0...v0.4.0
v0.3.0
Full Changelog: v0.2.0...v0.3.0
v0.2.0
Full Changelog: v0.1.0...v0.2.0
v0.1.0
Full Changelog: https://github.com/allen2c/dvs/commits/v0.1.0
First version.