Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide GPU-accelerated vector indexes with RAFT #413

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

wphicks
Copy link

@wphicks wphicks commented Aug 4, 2023

This PR is a replacement for #377, which fell out of date with changes on main. This PR provides access to GPU-accelerated index types provided by the RAPIDS RAFT library. It introduces two new index types: IVF and Tiered IVF, which provide access to both IVF Flat and IVFPQ indexes depending on how they are configured.

The intention of this initial integration is to provide GPU-based performance improvements for building indexes and batched searches. A later PR will introduce another index type which will offer perf improvement (relative to CPU HNSW) for single-query searches as well.

Main objects this PR modified
This PR does not substantively modify existing objects. A clear method is added to the brute force index to allow it to be reset with a single call.

This PR is still a work in progress. Remaining tasks:

Tasks to be tackled in later PRs:

  • Integrating CAGRA indexes
  • Multivalue support
  • Vector update support

Questions for this PR:

  • Should the IVF indexes be named in such a way as to indicate that they are GPU indexes in order to allow for possible CPU IVF support in the future?
  • Based on initial feedback for 377, I have merged IVF Flat and IVF PQ into a single index type, with the choice of algorithm determined by the configuration parameters. Does this design meet the needs mentioned, or do we prefer the earlier separation?

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

@CLAassistant
Copy link

CLAassistant commented Aug 4, 2023

CLA assistant check
All committers have signed the CLA.

@Spartee
Copy link

Spartee commented Aug 4, 2023

@wphicks also, probably need to sign the CLA as well.

Copy link
Collaborator

@DvirDukhan DvirDukhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work so far
Please see comments

CMakeLists.txt Outdated
# Only do these if this is the main project, and not if it is included through add_subdirectory
set_property(GLOBAL PROPERTY USE_FOLDERS ON)

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions -fPIC ${CLANG_SAN_FLAGS} ${LLVM_CXX_FLAGS} ${COV_CXX_FLAGS}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions -fPIC -pthread ${CLANG_SAN_FLAGS} ${LLVM_CXX_FLAGS} ${COV_CXX_FLAGS}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need pthread here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant to ask you about that. I was having trouble compiling without it (even before making any of my other changes), but I'll have to recreate the failing build log. Will either post it here or as a separate issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do, thanks!

double preferredShmemCarveout; // Fraction of GPU's unified memory / L1
// cache to be used as shared memory

} IVFParams;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we RAFTIVFParams this?


} IVFParams;

typedef struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we need this struct definition as is
you can send the IVFParams as primaryIndexParams when creating TieredIndexParams
if you need something specific for you tiered index management layer, this is the case where you need to create a specific struct
see

    union {
        TieredHNSWParams tieredHnswParams;
    } specificParams;

at TieredIndexParams

@@ -35,6 +35,11 @@ class BruteForceIndex : public VecSimIndexAbstract<DistType> {
public:
BruteForceIndex(const BFParams *params, const AbstractIndexInitParams &abstractInitParams);

void clear() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when do we need to use this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clear() function is used in the Tiered Raft IVF index: The vectors from the flat buffers are all transferred to the backend index, and then the flat buffer is cleared.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to raft_ivf please

// Copy label data to previously allocated device buffer
raft::copy(label_gpu.data_handle(), label, batch_size, res_.get_stream());

if (std::holds_alternative<raft::neighbors::ivf_flat::index_params>(build_params_)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to avoid this if?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not unless we want to split IVF flat and IVF PQ back out into separate indexes again.

One possible alternate spelling would be to use std::visit with a constexpr if to distinguish the types. This has the disadvantage that compilers sometimes have trouble creating an efficient std::visit over small variants, but it has the advantage of providing obviously idiomatic access to the variant.

If performance is what we're worried about, I would avoid std::visit but add a compile-time assumption (via __builtin_unreachable, __assume or similar based on the compiler) that the variant contains the expected type prior to the std::get call. That usually allows the compiler to avoid the check that it must ordinarily make to see if it needs to throw a std::bad_variant_access.

If readability is what we are concerned about, the bodies of those if clauses could be separated into their own functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please share a suggestion for the last proposal, so we can iterate? (could be here or on slack). I'm thinking about modularity and code reusability here.

index_{std::nullopt} {}
auto addVector(const void *vector_data, labelType label,
bool overwrite_allowed = true) override {
return addVectorBatch(vector_data, &label, 1, overwrite_allowed);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mandatory due to our requirements? I guess batch size 1 is not efficient here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. As a matter of fact, I would recommend that this index type only be used in a tiered index to allow for efficient insertions.

res_.get_stream());

// Perform correct search based on index type
if (std::holds_alternative<raft::neighbors::ivf_flat::index>(index_)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about this if here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

auto indexSize() {
auto frontend_lock = std::scoped_lock(this->flatIndexGuard);
auto backend_lock = std::scoped_lock(this->mainIndexGuard);
return (getBackendIndex().indexSize() + this->frontendIndex.indexSize());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming transfer is "atomic" and does not cause duplications?


auto backend_lock = std::scoped_lock(this->mainIndexGuard);
this->flatBuffer->clear();
frontend_lock.unlock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be called after addVectorBatch otherwise we can have both indexes empty

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the fact that we have a lock on the backend index mean that queries against that index will not execute until we have completed the transfer? Since we do not clear the frontend index before acquiring the backend lock, and we unlock the frontend after the index has been cleared, my understanding was that this should be safe. Are queries allowed to be made against either the frontend or backend index without acquiring the corresponding lock?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, I missed the backend lock,
Additional question regarding this:
when calling

    void clear() {
       idToLabelMapping.clear();
       vectorBlocks.clear();
       count = idType{};
   }

you are not clearing the label set of the frontend index. Can this cause issues when calling multiple times to transferToBackend with an increasing set of IDs?

Copy link
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small questions/comments after doing a high-level review - before we move on to POC

this->idToLabelMapping.shrink_to_fit();
this->vectorBlocks.clear();
this->vectorBlocks.shrink_to_fit();
this->count = idType{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this->count = 0 ?

this->idToLabelMapping.shrink_to_fit();
this->vectorBlocks.clear();
this->vectorBlocks.shrink_to_fit();
this->count = idType{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this->count = 0 ?

Comment on lines +14 to +15
//.multi = raftIvfParams->multi,
//.logCtx = params->logCtx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why commented out?


double getDistanceFrom_Unsafe(labelType label, const void *blob) const override {
auto flat_dist = this->frontendIndex->getDistanceFrom_Unsafe(label, blob);
auto raft_dist = this->backendIndex->getDistanceFrom_Unsafe(label, blob);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not implemented in the backend index...


// delete in place. TODO: Add async job for this
this->mainIndexGuard.lock();
num_deleted_vectors += this->backendIndex->deleteVector(label);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we reclaim memory after deletion? As far as I understand in the backend index, we only mark vectors as deleted

lowener and others added 2 commits April 2, 2024 12:54
Co-authored-by: GuyAv46 <47632673+GuyAv46@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants