Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Support full vector indexes (no PLAID compression) #137

Closed
wants to merge 2 commits into from

Conversation

bclavie
Copy link
Owner

@bclavie bclavie commented Feb 14, 2024

A lot of RAGatouille users are using the library to encode short collections (from a few hundreds to low thousands documents), which don't fully benefit from the very optimised PLAID indexes.

PLAID is very powerful but introduces complexity, notably custom CUDA code and faiss, which can be finicky, as evidenced by #119 #85 #55 #72 #14 #120 (and more)

This PR will introduce another naive form of indexing, where the index will be created with all the usual metadata, etc... but the actual "index" will be .pt.gz file containing the gzipped full-sized vectors (with, future compression work in the future). While querying time will be higher with this approach, it'll also allow for a quick-to-implement & low-dependency approach, and should provide satisfactory results with querying time still in the hundreds of milliseconds for collection of 2k documents.

Additionally, this indexing method will become the default when calling index() with fewer than 512000 total embeddings (2000 documents with a 256 token length). Forcing PLAID or FULL_VECTORS(naming?) will be possible by passing it as an explicit argument.

This'll also unlock really easy-CRUD, not for PLAID indexes, but there will be very little in the way of adding/removing documents (subsequent PR).

This indexing approach will evolve but retain retrocompatibility with this first naive implementation.

This is the first step towards #110 as well (no proper HNSW implementation or vector compression here, but the overall indexing idea is the same) and #74 (lower barrier to entry to specifying mps)

WIP

WIP: functional

fix: force full precision for now

tmp: comment out precision arg

lint
@bclavie
Copy link
Owner Author

bclavie commented Mar 15, 2024

Closing as superceded by the much saner #158 rework.

@bclavie bclavie closed this Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment