Provide GPU-accelerated vector indexes with RAFT #413

wphicks · 2023-08-04T22:02:27Z

This PR is a replacement for #377, which fell out of date with changes on main. This PR provides access to GPU-accelerated index types provided by the RAPIDS RAFT library. It introduces two new index types: IVF and Tiered IVF, which provide access to both IVF Flat and IVFPQ indexes depending on how they are configured.

The intention of this initial integration is to provide GPU-based performance improvements for building indexes and batched searches. A later PR will introduce another index type which will offer perf improvement (relative to CPU HNSW) for single-query searches as well.

Main objects this PR modified
This PR does not substantively modify existing objects. A clear method is added to the brute force index to allow it to be reset with a single call.

This PR is still a work in progress. Remaining tasks:

Update index-to-index transfer in tiered index to make use of asynchronous job infrastructure (Work on this was started but not completed)
Add vector deletion once enabled by RAFT ([FEA] Support deletion from KNN/ANN indices rapidsai/raft#1565)
Add tests
Add source files instantiating the index types provided by the current headers
Switch back to mainline RAFT after merge of [FEA] Provide device_resources_manager for easy generation of device_resources rapidsai/raft#1716

Tasks to be tackled in later PRs:

Integrating CAGRA indexes
Multivalue support
Vector update support

Questions for this PR:

Should the IVF indexes be named in such a way as to indicate that they are GPU indexes in order to allow for possible CPU IVF support in the future?
Based on initial feedback for 377, I have merged IVF Flat and IVF PQ into a single index type, with the choice of algorithm determined by the configuration parameters. Does this design meet the needs mentioned, or do we prefer the earlier separation?

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

CLAassistant · 2023-08-04T22:02:33Z

All committers have signed the CLA.

Spartee · 2023-08-04T22:09:22Z

@wphicks also, probably need to sign the CLA as well.

DvirDukhan

Thanks for the work so far
Please see comments

DvirDukhan · 2023-08-06T11:15:21Z

CMakeLists.txt

 # Only do these if this is the main project, and not if it is included through add_subdirectory
 set_property(GLOBAL PROPERTY USE_FOLDERS ON)

-set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions -fPIC ${CLANG_SAN_FLAGS} ${LLVM_CXX_FLAGS} ${COV_CXX_FLAGS}")
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions -fPIC -pthread ${CLANG_SAN_FLAGS} ${LLVM_CXX_FLAGS} ${COV_CXX_FLAGS}")


why do we need pthread here?

I actually meant to ask you about that. I was having trouble compiling without it (even before making any of my other changes), but I'll have to recreate the failing build log. Will either post it here or as a separate issue.

please do, thanks!

DvirDukhan · 2023-08-07T21:28:12Z

src/VecSim/vec_sim_common.h

+    double preferredShmemCarveout; // Fraction of GPU's unified memory / L1
+                                   // cache to be used as shared memory
+
+} IVFParams;


can we RAFTIVFParams this?

DvirDukhan · 2023-08-07T21:34:02Z

src/VecSim/vec_sim_common.h

+
+} IVFParams;
+
+typedef struct {


I don't believe we need this struct definition as is
you can send the IVFParams as primaryIndexParams when creating TieredIndexParams
if you need something specific for you tiered index management layer, this is the case where you need to create a specific struct
see

union { TieredHNSWParams tieredHnswParams; } specificParams;

at TieredIndexParams

DvirDukhan · 2023-08-07T21:34:53Z

src/VecSim/algorithms/brute_force/brute_force.h

@@ -35,6 +35,11 @@ class BruteForceIndex : public VecSimIndexAbstract<DistType> {
 public:
    BruteForceIndex(const BFParams *params, const AbstractIndexInitParams &abstractInitParams);

+    void clear() {


when do we need to use this?

The clear() function is used in the Tiered Raft IVF index: The vectors from the flat buffers are all transferred to the backend index, and then the flat buffer is cleared.

DvirDukhan · 2023-08-07T21:35:30Z

src/VecSim/algorithms/ivf/ivf.cuh

rename to raft_ivf please

DvirDukhan · 2023-08-07T21:47:54Z

src/VecSim/algorithms/ivf/ivf.cuh

+        // Copy label data to previously allocated device buffer
+        raft::copy(label_gpu.data_handle(), label, batch_size, res_.get_stream());
+
+        if (std::holds_alternative<raft::neighbors::ivf_flat::index_params>(build_params_)) {


Is there a way to avoid this if?

Not unless we want to split IVF flat and IVF PQ back out into separate indexes again.

One possible alternate spelling would be to use std::visit with a constexpr if to distinguish the types. This has the disadvantage that compilers sometimes have trouble creating an efficient std::visit over small variants, but it has the advantage of providing obviously idiomatic access to the variant.

If performance is what we're worried about, I would avoid std::visit but add a compile-time assumption (via __builtin_unreachable, __assume or similar based on the compiler) that the variant contains the expected type prior to the std::get call. That usually allows the compiler to avoid the check that it must ordinarily make to see if it needs to throw a std::bad_variant_access.

If readability is what we are concerned about, the bodies of those if clauses could be separated into their own functions.

Can you please share a suggestion for the last proposal, so we can iterate? (could be here or on slack). I'm thinking about modularity and code reusability here.

DvirDukhan · 2023-08-07T21:50:19Z

src/VecSim/algorithms/ivf/ivf.cuh

+          index_{std::nullopt} {}
+    auto addVector(const void *vector_data, labelType label,
+                   bool overwrite_allowed = true) override {
+        return addVectorBatch(vector_data, &label, 1, overwrite_allowed);


Mandatory due to our requirements? I guess batch size 1 is not efficient here?

That's correct. As a matter of fact, I would recommend that this index type only be used in a tiered index to allow for efficient insertions.

DvirDukhan · 2023-08-07T21:51:22Z

src/VecSim/algorithms/ivf/ivf.cuh

+                       res_.get_stream());
+
+            // Perform correct search based on index type
+            if (std::holds_alternative<raft::neighbors::ivf_flat::index>(index_)) {


same question about this if here

DvirDukhan · 2023-08-07T21:53:21Z

src/VecSim/algorithms/ivf/ivf_tiered.cuh

+    auto indexSize() {
+        auto frontend_lock = std::scoped_lock(this->flatIndexGuard);
+        auto backend_lock = std::scoped_lock(this->mainIndexGuard);
+        return (getBackendIndex().indexSize() + this->frontendIndex.indexSize());


Assuming transfer is "atomic" and does not cause duplications?

DvirDukhan · 2023-08-07T21:55:59Z

src/VecSim/algorithms/ivf/ivf_tiered.cuh

+
+        auto backend_lock = std::scoped_lock(this->mainIndexGuard);
+        this->flatBuffer->clear();
+        frontend_lock.unlock();


needs to be called after addVectorBatch otherwise we can have both indexes empty

Doesn't the fact that we have a lock on the backend index mean that queries against that index will not execute until we have completed the transfer? Since we do not clear the frontend index before acquiring the backend lock, and we unlock the frontend after the index has been cleared, my understanding was that this should be safe. Are queries allowed to be made against either the frontend or backend index without acquiring the corresponding lock?

Yes you are right, I missed the backend lock,
Additional question regarding this:
when calling

void clear() { idToLabelMapping.clear(); vectorBlocks.clear(); count = idType{}; }

you are not clearing the label set of the frontend index. Can this cause issues when calling multiple times to transferToBackend with an increasing set of IDs?

alonre24

Some small questions/comments after doing a high-level review - before we move on to POC

alonre24 · 2024-02-28T10:30:49Z

src/VecSim/algorithms/brute_force/brute_force_multi.h

+        this->idToLabelMapping.shrink_to_fit();
+        this->vectorBlocks.clear();
+        this->vectorBlocks.shrink_to_fit();
+        this->count = idType{};


this->count = 0 ?

alonre24 · 2024-02-28T10:31:10Z

src/VecSim/algorithms/brute_force/brute_force_single.h

+        this->idToLabelMapping.shrink_to_fit();
+        this->vectorBlocks.clear();
+        this->vectorBlocks.shrink_to_fit();
+        this->count = idType{};


this->count = 0 ?

alonre24 · 2024-02-28T12:20:55Z

src/VecSim/index_factories/raft_ivf_factory.cu

+        //.multi = raftIvfParams->multi,
+        //.logCtx = params->logCtx


why commented out?

alonre24 · 2024-02-28T20:38:54Z

src/VecSim/algorithms/raft_ivf/ivf_tiered.h

+
+    double getDistanceFrom_Unsafe(labelType label, const void *blob) const override {
+        auto flat_dist = this->frontendIndex->getDistanceFrom_Unsafe(label, blob);
+        auto raft_dist = this->backendIndex->getDistanceFrom_Unsafe(label, blob);


This is not implemented in the backend index...

alonre24 · 2024-02-28T20:45:48Z

src/VecSim/algorithms/raft_ivf/ivf_tiered.h

+
+        // delete in place. TODO: Add async job for this
+        this->mainIndexGuard.lock();
+        num_deleted_vectors += this->backendIndex->deleteVector(label);


How do we reclaim memory after deletion? As far as I understand in the backend index, we only mark vectors as deleted

src/VecSim/algorithms/raft_ivf/ivf_tiered.h

Co-authored-by: GuyAv46 <47632673+GuyAv46@users.noreply.github.com>

wphicks added 7 commits July 21, 2023 12:18

Begin refactoring RAFT CMake configuration

b68d63e

Correct RAFT CMake configuration

10093d3

Conditionally link to RAFT

77a4971

Add configuration structs for RAFT indexes

a1aa366

Add IVF index headers

bcef4c1

Provide initial update of tiered RAFT index

2ebcf93

Update style

44d6b15

Merge branch 'main' into fea-raft_integration

418ecaa

DvirDukhan reviewed Aug 7, 2023

View reviewed changes

lowener added 19 commits October 9, 2023 19:16

Merge branch 'main' into fea-raft_integration

1ed0cc9

Rename ivf, add factory

4d66fe8

Fix add and searches

dbe8bea

Add size computation of tiered index

1d62824

Update tiered index

32ce6cc

Add CUDA_ARCHITECTURE for half type

f3cd7b1

Add Tiered index update and test

28765de

Separate cuda code and flat/pq

54b895c

Rework IVF, add stream manager, interface, benchmark

1f99e55

Update Tiered vector ingestion

6bab5e3

Update Tiered index. Add vector deletion code

d072d3a

Add search bitset filter

cfa190b

Add USE_CUDA guardrails for compilation

1726318

Fix style

8ba3767

Remaining USE_CUDA guards

2071218

Fix thread pool benchmark

b735b20

USE_CUDA fix

0790982

Fix compilation

4c90248

Add ivfpq bench

7805aa1

Add test for Cosine and IP

f8c02ef

alonre24 reviewed Feb 29, 2024

View reviewed changes

GuyAv46 reviewed Mar 25, 2024

View reviewed changes

src/VecSim/algorithms/raft_ivf/ivf_tiered.h Show resolved Hide resolved

lowener and others added 2 commits April 2, 2024 12:54

Update src/VecSim/algorithms/raft_ivf/ivf_tiered.h

45c510c

Co-authored-by: GuyAv46 <47632673+GuyAv46@users.noreply.github.com>

Separate index Size

21a7d20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide GPU-accelerated vector indexes with RAFT #413

Provide GPU-accelerated vector indexes with RAFT #413

wphicks commented Aug 4, 2023

CLAassistant commented Aug 4, 2023 •

edited

Spartee commented Aug 4, 2023

DvirDukhan left a comment

DvirDukhan Aug 6, 2023

wphicks Aug 8, 2023

DvirDukhan Aug 24, 2023

DvirDukhan Aug 7, 2023

DvirDukhan Aug 7, 2023

DvirDukhan Aug 7, 2023

lowener Nov 14, 2023

DvirDukhan Aug 7, 2023

DvirDukhan Aug 7, 2023

wphicks Aug 8, 2023

DvirDukhan Aug 24, 2023

DvirDukhan Aug 7, 2023

wphicks Aug 8, 2023

DvirDukhan Aug 7, 2023

wphicks Aug 8, 2023

DvirDukhan Aug 7, 2023

DvirDukhan Aug 7, 2023

wphicks Aug 8, 2023

DvirDukhan Aug 24, 2023

alonre24 left a comment

alonre24 Feb 28, 2024

alonre24 Feb 28, 2024

alonre24 Feb 28, 2024

alonre24 Feb 28, 2024

alonre24 Feb 28, 2024

Provide GPU-accelerated vector indexes with RAFT #413

Are you sure you want to change the base?

Provide GPU-accelerated vector indexes with RAFT #413

Conversation

wphicks commented Aug 4, 2023

CLAassistant commented Aug 4, 2023 • edited

Spartee commented Aug 4, 2023

DvirDukhan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alonre24 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Aug 4, 2023 •

edited