Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a specialized bulk scorer for regular conjunctions. #12719

Merged
merged 4 commits into from
Oct 30, 2023

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Oct 24, 2023

PR #12382 added a bulk scorer for top-k hits on conjunctions that yielded a significant speedup (annotation
FP). This change proposes a similar change for exhaustive collection of conjunctive queries, e.g. for counting, faceting, etc.

PR apache#12382 added a bulk scorer for top-k hits on conjunctions that yielded a
significant speedup (annotation
[FP](http://people.apache.org/~mikemccand/lucenebench/AndHighHigh.html)). This
change proposes a similar change for exhaustive collection of conjunctive
queries, e.g. for counting, faceting, etc.
@jpountz
Copy link
Contributor Author

jpountz commented Oct 24, 2023

Wikibigall:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                         Prefix3       88.53      (5.6%)       87.90      (4.5%)   -0.7% ( -10% -    9%) 0.659
                       CountTerm    14431.99      (2.2%)    14391.72      (3.1%)   -0.3% (  -5% -    5%) 0.740
                          IntNRQ      129.57     (19.6%)      129.45     (18.2%)   -0.1% ( -31% -   46%) 0.988
                          Fuzzy1       97.85      (1.3%)       97.83      (0.9%)   -0.0% (  -2% -    2%) 0.961
                          Fuzzy2       79.13      (1.2%)       79.15      (0.8%)    0.0% (  -1% -    2%) 0.921
                        PKLookup      269.31      (1.8%)      269.44      (1.8%)    0.0% (  -3% -    3%) 0.932
           HighTermDayOfYearSort      424.31      (1.6%)      424.69      (1.8%)    0.1% (  -3% -    3%) 0.868
                  CountOrHighMed       85.48     (13.2%)       85.61     (15.9%)    0.1% ( -25% -   33%) 0.975
                      AndHighLow      940.92      (2.0%)      942.38      (2.8%)    0.2% (  -4% -    5%) 0.842
                        Wildcard       67.91      (3.7%)       68.04      (2.8%)    0.2% (  -6% -    6%) 0.855
                 CountOrHighHigh       55.01     (13.4%)       55.14     (16.4%)    0.2% ( -26% -   34%) 0.960
               HighTermMonthSort     4705.02      (2.1%)     4718.66      (1.9%)    0.3% (  -3% -    4%) 0.643
                       OrHighLow      698.53      (1.9%)      700.66      (2.7%)    0.3% (  -4% -    5%) 0.679
                         Respell       60.51      (1.5%)       60.72      (1.4%)    0.3% (  -2% -    3%) 0.458
                      AndHighMed      234.69      (2.1%)      235.53      (2.6%)    0.4% (  -4% -    5%) 0.635
                      HighPhrase        5.54      (7.3%)        5.58      (4.6%)    0.7% ( -10% -   13%) 0.712
                     CountPhrase        3.24     (11.3%)        3.26     (10.1%)    0.7% ( -18% -   24%) 0.830
                     AndHighHigh       51.61      (2.6%)       52.14      (3.4%)    1.0% (  -4% -    7%) 0.279
                       OrHighMed      142.44      (3.4%)      144.55      (4.1%)    1.5% (  -5% -    9%) 0.218
                      OrHighHigh       47.30      (4.1%)       48.17      (4.8%)    1.8% (  -6% -   11%) 0.193
                       MedPhrase       26.15      (5.2%)       26.89      (4.7%)    2.8% (  -6% -   13%) 0.070
                       LowPhrase       66.86      (4.3%)       68.92      (3.7%)    3.1% (  -4% -   11%) 0.016
                         LowTerm      878.77      (5.8%)      919.06      (5.7%)    4.6% (  -6% -   17%) 0.011
                        HighTerm      345.83      (9.5%)      362.55      (8.9%)    4.8% ( -12% -   25%) 0.098
                         MedTerm      532.38      (7.5%)      559.05      (7.0%)    5.0% (  -8% -   21%) 0.029
                 CountAndHighMed      119.60      (3.9%)      126.26      (2.0%)    5.6% (   0% -   11%) 0.000
                CountAndHighHigh       39.81      (4.2%)       43.06      (1.7%)    8.2% (   2% -   14%) 0.000

@jpountz jpountz merged commit 58b9352 into apache:main Oct 30, 2023
4 checks passed
@jpountz jpountz deleted the specialized_bulk_scorer_conjunctions branch October 30, 2023 15:11
jpountz added a commit that referenced this pull request Oct 30, 2023
PR #12382 added a bulk scorer for top-k hits on conjunctions that yielded a
significant speedup (annotation
[FP](http://people.apache.org/~mikemccand/lucenebench/AndHighHigh.html)). This
change proposes a similar change for exhaustive collection of conjunctive
queries, e.g. for counting, faceting, etc.
@jpountz
Copy link
Contributor Author

jpountz commented Nov 3, 2023

This yielded a good speedup on nightly benchmarks. I pushed an annotation.

@jpountz
Copy link
Contributor Author

jpountz commented Nov 3, 2023

Interestingly, it seems to also help with facets: http://people.apache.org/~mikemccand/lucenebench/AndHighHighDayTaxoFacets.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant