Skip to content

Releases: allen2c/dvs

v1.1.0

06 Jul 07:37
8bff5f5

Choose a tag to compare

✨ Feat: Introduce Automatic Document Chunking

This update rolls out a major enhancement: automatic document chunking! 🧩 Now, documents are intelligently split into smaller, more manageable pieces before embedding, leading to more precise and relevant vector search results.


🚀 What's New?

  • Automatic Document Splitting: When using dvs.add(), documents are now automatically chunked based on line count or token count. This is handled by the new chunkle and tiktoken dependencies.
  • New Chunking Controls: The dvs.add() method gets new parameters to customize how documents are split:
    • lines_per_chunk
    • tokens_per_chunk
  • Smarter Document & Point Management:
    • The Document type now includes fields like source_id, chunk_index, and is_chunk to track the relationship between chunks and their original source. 🔗
    • Processing is now more efficient, creating embeddings and database points in batches based on the new chunks.
  • Database Indexing: The documents table is now indexed by source_id to allow for quickly finding all chunks related to a single parent document.

🔧 Key Changes

  • dvs.add() Refactor: The core logic is updated to first chunk documents and then process these chunks for embedding and storage.
  • Document Type: Enhanced with new fields to support chunking and token counting.
  • Dependencies: Added chunkle and tiktoken to pyproject.toml and requirements files.
  • Version Bump: Updated project version from 1.0.0 to 1.1.0. 📦

v1.0.0

06 Jul 02:58
bbb885e

Choose a tag to compare

🚀 DVS v1.0.0 Release

This release marks a major milestone for DVS, transitioning it from a FastAPI-based web service to a lightweight, serverless Python library. This change significantly improves performance, simplifies the architecture, and enhances the overall user experience.

🌟 Key Changes

  • Goodbye, FastAPI! 👋: The most significant change is the removal of the FastAPI server, making DVS a pure Python library. This reduces overhead and simplifies the deployment process.
  • Refined API 🧑‍💻: The API has been streamlined for better usability. Key improvements include:
    • A new DVS class for easier interaction.
    • Enhanced document and point management with methods like add, remove, and search.
    • Improved handling of embeddings with the openai-embeddings-model.
  • Enhanced Caching 🗄️: The caching mechanism has been updated for better performance, with a new default path at ./cache/dvs.
  • Improved Configuration ⚙️: The configuration system now uses a more robust Settings class, providing better control over database paths, table names, and other parameters.
  • Streamlined Dependencies 📦: The project's dependencies have been updated and simplified, ensuring a more stable and maintainable codebase.

✨ Other Improvements

  • New Datasets Script 📥: A new download_datasets.py script makes it easier to download and set up example datasets.
  • Updated Documentation & README 📖: The documentation and README have been completely revamped to reflect the new library-based approach, with updated examples and usage instructions.
  • Simplified Makefile 🛠️: The Makefile has been cleaned up, removing outdated commands and streamlining the development process.

💥 Breaking Changes

  • The removal of the FastAPI server means that all API endpoints are no longer available. Users should now interact with DVS as a Python library.
  • The DVS class and its methods have been updated, requiring changes in how the library is used.
  • The configuration system has been updated, so users will need to adjust their environment variables and settings accordingly.

v0.4.0

06 Dec 07:54

Choose a tag to compare

Full Changelog: v0.3.0...v0.4.0

v0.3.0

04 Dec 04:26

Choose a tag to compare

Full Changelog: v0.2.0...v0.3.0

v0.2.0

23 Nov 08:00

Choose a tag to compare

Full Changelog: v0.1.0...v0.2.0

v0.1.0

23 Nov 06:30

Choose a tag to compare

Full Changelog: https://github.com/allen2c/dvs/commits/v0.1.0

First version.