WIP: Support full vector indexes (no PLAID compression) #137
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A lot of RAGatouille users are using the library to encode short collections (from a few hundreds to low thousands documents), which don't fully benefit from the very optimised PLAID indexes.
PLAID is very powerful but introduces complexity, notably custom CUDA code and
faiss
, which can be finicky, as evidenced by #119 #85 #55 #72 #14 #120 (and more)This PR will introduce another naive form of indexing, where the index will be created with all the usual metadata, etc... but the actual "index" will be
.pt.gz
file containing the gzipped full-sized vectors (with, future compression work in the future). While querying time will be higher with this approach, it'll also allow for a quick-to-implement & low-dependency approach, and should provide satisfactory results with querying time still in the hundreds of milliseconds for collection of 2k documents.Additionally, this indexing method will become the default when calling
index()
with fewer than 512000 total embeddings (2000 documents with a 256 token length). ForcingPLAID
orFULL_VECTORS
(naming?) will be possible by passing it as an explicit argument.This'll also unlock really easy-CRUD, not for PLAID indexes, but there will be very little in the way of adding/removing documents (subsequent PR).
This indexing approach will evolve but retain retrocompatibility with this first naive implementation.
This is the first step towards #110 as well (no proper HNSW implementation or vector compression here, but the overall indexing idea is the same) and #74 (lower barrier to entry to specifying
mps
)