Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10297: Speed up medium cardinality fields with readLongs and SIMD #530

Closed
wants to merge 14 commits into from

Conversation

gf2121
Copy link
Contributor

@gf2121 gf2121 commented Dec 8, 2021

We introduced a bitset optimization for extremly low cardinality fields in https://issues.apache.org/jira/browse/LUCENE-10233, but medium cardinality fields (like 32/128) can rarely trigger this optimization, I'm trying to find out a way to speed up them.

In apache/lucene-solr#1538, we made some effort to use readLELongs to speed up BKD id blocks, but did not get a obvious gain on this approach. I think the reason could probably be that we were trying to optimize the unsorted situation (typically happens for high cardinality fields) and the bottleneck of queries on high cardinality fields is visitDocValues but not readDocIds.

However, medium cardinality fields may be tempted for this optimization because they need to read lots of ids for each term. The basic idea is that we can compute the delta of the sorted ids and encode/decode them like what we do in StoredFieldsInts. I benchmarked the optimization by mocking some random longPoint and querying them with PointInSetQuery. As expected, the medium cardinality fields got spped up and high cardinality fields get even results.

Benchmark Result

doc count field cardinality query point baseline(ms) candidate(ms) diff percentage baseline(QPS) candidate(QPS) diff percentage
100000000 32 1 19 16 -15.79% 52.63 62.5 18.75%
100000000 32 2 34 14 -58.82% 29.41 71.43 142.86%
100000000 32 4 76 22 -71.05% 13.16 45.45 245.45%
100000000 32 8 139 42 -69.78% 7.19 23.81 230.95%
100000000 32 16 279 82 -70.61% 3.58 12.2 240.24%
100000000 128 1 17 11 -35.29% 58.82 90.91 54.55%
100000000 128 8 75 23 -69.33% 13.33 43.48 226.09%
100000000 128 16 126 25 -80.16% 7.94 40 404.00%
100000000 128 32 245 50 -79.59% 4.08 20 390.00%
100000000 128 64 528 97 -81.63% 1.89 10.31 444.33%
100000000 1024 1 3 2 -33.33% 333.33 500 50.00%
100000000 1024 8 13 8 -38.46% 76.92 125 62.50%
100000000 1024 32 31 19 -38.71% 32.26 52.63 63.16%
100000000 1024 128 120 67 -44.17% 8.33 14.93 79.10%
100000000 1024 512 480 133 -72.29% 2.08 7.52 260.90%
100000000 8192 1 3 3 0.00% 333.33 333.33 0.00%
100000000 8192 16 18 15 -16.67% 55.56 66.67 20.00%
100000000 8192 64 19 14 -26.32% 52.63 71.43 35.71%
100000000 8192 512 69 43 -37.68% 14.49 23.26 60.47%
100000000 8192 2048 236 134 -43.22% 4.24 7.46 76.12%
100000000 1048576 1 3 2 -33.33% 333.33 500 50.00%
100000000 1048576 16 18 19 5.56% 55.56 52.63 -5.26%
100000000 1048576 64 17 17 0.00% 58.82 58.82 0.00%
100000000 1048576 512 34 32 -5.88% 29.41 31.25 6.25%
100000000 1048576 2048 89 93 4.49% 11.24 10.75 -4.30%

@gf2121 gf2121 changed the title LUCENE-10297: Speed up medium cardinality fields with readLELongs and SIMD LUCENE-10297: Speed up medium cardinality fields with readLongs and SIMD Dec 9, 2021
import org.apache.lucene.store.FSDirectory;

/** java doc */
public class Run {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the benchmark script, i post it here in case someone would like to play with it. I'll delete this file later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant