Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse neighborqueue during hnsw index build (attempt 2) #12372

Merged
merged 5 commits into from
Jun 20, 2023

Conversation

jbellis
Copy link
Contributor

@jbellis jbellis commented Jun 14, 2023

This changes HnswGraphBuilder to re-use the same candidates queues for adding nodes by allocating them in the Builder instance.

This saves about 2.5% of build time and takes memory allocations of NQ long[] from 25% of total to 0%. JFR runs are attached.

The difference from the first attempt (which actually made things slower for some graphs) is that it preserves the original code's behavior of using a 1-sized queue for the search in the levels above where the node actually gets added.

main.jfr.gz
nq2.jfr.gz

@jbellis
Copy link
Contributor Author

jbellis commented Jun 14, 2023

Additionally, the original change only re-used the candidates queues within a single addNode call, so this is improved in that respect as well.

@jbellis
Copy link
Contributor Author

jbellis commented Jun 16, 2023

cc @msokolov @benwtrent @zhaih

@benwtrent
Copy link
Member

Hey @jbellis the change looks nice to me. But, I ran https://github.com/mikemccand/luceneutil knnPerTest and saw no change at all in indexing time.

Am I missing something? Could you provide the benchmark you ran to track the 2.5% improvement?

@jbellis
Copy link
Contributor Author

jbellis commented Jun 16, 2023

I'm using the million-row sift dataset via this harness https://github.com/jbellis/hnswdemo/tree/benchmarking

I believe what is happening is that allocation is basically free and there is enough slack across the JVM in these synthetic benchmarks to soak up the extra GC required -- I do not see any changes when run normally, either. So I forced it to a single core with

$ taskset -c 0-1 ./gradlew runTexmex -PsiftName=sift

(it is a hyperthreaded core so i give it logical cores 0-1)

Also if you take a look at the jfr files it is very clear that a significant amount of allocation is gone.

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change seems sane to me. Thanks for the optimizations!

@jbellis
Copy link
Contributor Author

jbellis commented Jun 16, 2023

CI says this test is failing

   >     java.lang.AssertionError: Missing backcompat test files:
   >       9.6.0-cfs
   >       9.7.0-cfs
   >         at __randomizedtesting.SeedInfo.seed([6833F9FB78703710:78EC16BBCCE7213C]:0)
   >         at junit@4.13.1/org.junit.Assert.fail(Assert.java:89)
   >         at org.apache.lucene.backward_index.TestBackwardsCompatibility.testAllVersionsTested(TestBackwardsCompatibility.java:818)

Not sure how that's related to this code or how to fix it

tests pass locally

@zhaih
Copy link
Contributor

zhaih commented Jun 16, 2023

Not sure how that's related to this code or how to fix it

I think this is due to newly cut 9.7 branch so we probably just need to wait a bit more. I see all the auto testing is complaining too.

Copy link
Contributor

@zhaih zhaih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pursuing this! LGTM

@benwtrent
Copy link
Member

@jbellis if you are ready, I can merge and backport this change.

@jbellis
Copy link
Contributor Author

jbellis commented Jun 20, 2023

Ready. Thanks!

@benwtrent benwtrent merged commit fe0278e into apache:main Jun 20, 2023
4 checks passed
benwtrent pushed a commit that referenced this pull request Jun 20, 2023
This changes HnswGraphBuilder to re-use the same candidates queues for adding nodes by allocating them in the Builder instance.

This saves about 2.5% of build time and takes memory allocations of NQ long[] from 25% of total to 0%. JFR runs are attached.

The difference from the first attempt (which actually made things slower for some graphs) is that it preserves the original code's behavior of using a 1-sized queue for the search in the levels above where the node actually gets added.

* Re-use NeighborQueue during build's search

* improve javadoc for OnHeapHnswGraphSearcher

* assert that results parameter is minheap as expected

* update CHANGES
@zhaih zhaih added this to the 9.8.0 milestone Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants