Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploring GPU based kNN vector search #13003

Open
chatman opened this issue Jan 9, 2024 · 2 comments
Open

Exploring GPU based kNN vector search #13003

chatman opened this issue Jan 9, 2024 · 2 comments

Comments

@chatman
Copy link
Contributor

chatman commented Jan 9, 2024

Description

Through this issue, I wish to explore integrating NVIDIA's kNN indexing and search support, https://github.com/rapidsai/raft. Through our initial benchmarks/prototypes, we found it a lot faster than our HNSW based search.
Along the way, we shall add more details of our experiments through this issue. And will open a PR as soon as something takes shape (right now, things are in extremely early proof-of-concept state).

One of the potential results of this work, if explorations prove worthwhile, can be a Lucene module based on a JNI wrapper around the Raft library.

@chatman
Copy link
Contributor Author

chatman commented Mar 19, 2024

As an initial proof of concept integration to evaluate performance, we put together a repository. https://github.com/SearchScale/lucene-cuvs

The benchmarks are against single threaded HNSW, without Panama flags enabled. Will add more benchmarks with multithreaded search on CPU.

Comments, thoughts and questions welcome.

@yupeng9
Copy link

yupeng9 commented May 13, 2024

This is very interesting work. We saw Milvus published article on how GPU accelerates vector search, which looks like a game changer.

For a batch size of 1, the T4 is 6.4x to 6.7x faster than the CPU, and the A10G is 8.3x to 9x faster.
When the batch size increases to 10, the performance improvement is more significant: T4 is 16.8x to 18.7x faster, and A100 is 25.8x to 29.9x faster.
With a batch size of 100, the performance gain continues to grow: T4 is 21.9x to 23.3x faster, and A100 is 48.9x to 49.2x faster.

Is there any update on this plan of adding GPU support in Lucene?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants