-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
In the cold case (KNN vectors don't all fit in RAM), it should help to add prefetching to our HNSW graph crawl?
When we pop the best node out of the work queue, we could ask for all of its neighbor node's vectors to be prefetched before we loop through them evaluating their distance and inserting them into the work queue. We could maybe go further and do so for the top N nodes in the queue. Modern SSDs are typically highly concurrent (maybe 1000s of concurrent requests before they saturate?) so we should use that.
Lucene's prefetch API is very similar to async IO, but maybe not quite as efficient (CPU core will sometimes be more idle with prefetch since Lucene isn't executing immediately as each WILL_NEED
IO request finishes), but it should still be a big win with HNSW versus not prefetching?
Maybe with virtual threads we could eventually build a "real" async IO solution, but that's a bigger thing.
Or maybe we are already prefetching today in our latest KnnVectorsFormat
Codec default and I missed it?