New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls #641
Conversation
A couple of the data structures used in HNSW search are pretty large and expensive to allocate. This commit creates a shared candidates queue and visited set that are reused across calls to HnswGraph#searchLevel. Now the same data structures are used for building the entire graph, which can cut down on allocations during indexing.
This PR contains two changes:
They both seem to have a small positive impact on indexing. Here's an example using Baseline: 788966 msec to write vectors The PR also shows a small search improvement: Baseline
PR (reuse both visited and candidates)
|
* to allocate, so whenever possible they're cleared and reused across calls. | ||
*/ | ||
static class HnswSearchState { | ||
final NeighborQueue candidates; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't include the results
queue because it's typically a lot smaller and I thought it made the logic less clear (results are returned but also passed in as scratch state?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me as-is, but I wonder if this could look better by extracting an HnswGraphSearcher
that would be responsible for keeping the state it needs, as opposed to creating this state object that needs to be passed to a static method.
I'm expecting FixedBitSet.clear
to be quite expensive, maybe we'll need to reconsider whether we should use a bitset or a hash set to keep track of the visited nodes.
Thanks for working on this, and it shows a nice result. I had been discussing with some folks at work and we were considering whether it would be possible to maintain the state even longer on the search side by pooling such states in the codec. Maybe we can try in a follow-up issue. |
I've become a bit cautious with codec-level caching. E.g. we have threadlocals for stored fields which end up storing |
Extracting On the topic of hash sets, I tried switching to About reusing data structures across searches -- I've heard that some other HNSW implementations found this to be beneficial, specifically maintaining a shared pool of "visited" sets for searches to use. It could indeed be complex though, I'd be curious to understand the the magnitude of the improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtibshirani Thanks Julie, very nice enhancement
new HnswGraphSearcher( | ||
similarityFunction, | ||
new NeighborQueue(beamWidth, similarityFunction.reversed == false), | ||
new FixedBitSet(vectorValues.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtibshirani I am wondering what is the reasoning of using FixedBitSet
during graph construction? Are we expecting to visit most of the graph nodes ( I guess that's true on a smaller graphs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a FixedBitSet
here because the insertion and access operations are faster. Also, using SparseFixedBitSet
here doesn't actually really save memory allocations, since during the graph construction we're expected to visit each node at least once.
I tried an experiment where we shared the visited
set but kept SparseFixedBitSet
and it didn't help performance. I also tried switching to FixedBitSet
during search and it hurt performance, probably because we are allocating a very large array once per search.
#641) A couple of the data structures used in HNSW search are pretty large and expensive to allocate. This commit creates a shared candidates queue and visited set that are reused across calls to HnswGraph#searchLevel. Now the same data structures are used for building the entire graph, which can cut down on allocations during indexing. For graph building it also switches the visited set to FixedBitSet for better performance.
I filed https://issues.apache.org/jira/browse/LUCENE-10404 so we don't forget about the hash set idea. |
I think we cannot use intinthashset in core since it's an external
dependency? Did it work for you though?
…On Thu, Feb 3, 2022, 1:36 PM Julie Tibshirani ***@***.***> wrote:
Extracting HnswGraphSearcher is a lot nicer, I pushed that refactor.
On the topic of hash sets, I tried switching to IntIntHashMap on top of
this PR and it gives a nice indexing speed-up. I can open a follow-up PR.
About reusing data structures across searches -- I've heard that some
other HNSW implementations found this to be beneficial, specifically
maintaining a shared pool of "visited" sets for searches to use. It could
indeed be complex though, I'd be curious to understand the the magnitude of
the improvement.
—
Reply to this email directly, view it on GitHub
<#641 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHHUQJGMUPVH4ZEGND6IB3UZLDKNANCNFSM5NN57KCQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
re: sharing across searches I agree it gets tricky, but can be bounded RAM if we use some kind of LRU cache as a pool, and in this case the only "compatibility" check is the size of the bitset. But yeah, definitely more complex, we should only do it if there is a good speedup |
I used |
A couple of the data structures used in HNSW search are pretty large and
expensive to allocate. This commit creates a shared candidates queue and visited
set that are reused across calls to HnswGraph#searchLevel. Now the same data
structures are used for building the entire graph, which can really cut down on
allocations during indexing.