Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[collection] consider SQLite for keeping analysis results #20

Open
fabiocarrara opened this issue Aug 25, 2023 · 0 comments
Open

[collection] consider SQLite for keeping analysis results #20

fabiocarrara opened this issue Aug 25, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@fabiocarrara
Copy link
Member

fabiocarrara commented Aug 25, 2023

PROs:

  • its easier/faster to check&skip already existing results w.r.t. walking the filesystem
  • no additional deps (built-in in python)
  • still a multi-language self-contained file-based storage
  • we can implement several logic on SQL (object filtering and counting, conditional indexing, etc.)

CONs:

  • SQL-like management, we'll have to cope with migrations
  • probably will use more disk space
  • embeddings as BLOBs. Save and load are pretty fast though, on SSD:

    Bulk insertion of 100k 1024-d vectors took 5.6198 seconds.
    Reading out 100k 1024-d vectors took 2.3810 seconds.

@fabiocarrara fabiocarrara added the enhancement New feature or request label Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant