Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BS1 optimization to MaxScoreBulkScorer. #12444

Merged
merged 9 commits into from
Jul 19, 2023
Merged

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Jul 17, 2023

Lucene's scorers that can dynamically prune on score provide great speedups when they manage to skip many hits. Unfortunately, there are also cases when they cannot skip hits efficiently, one example case being when there are many clauses in the query. In this case, exhaustively evaluating the set of matches with BooleanScorer (BS1) may perform several times faster.

This commit adds to MaxScoreBulkScorer the BS1 optimization that consists of collecting hits into a bitset to save the overhead of reordering priority queues. This helps make performance degrade much more gracefully when dynamic pruning cannot help much.

Closes #12439

Lucene's scorers that can dynamically prune on score provide great speedups
when they manage to skip many hits. Unfortunately, there are also cases when
they cannot skip hits efficiently, one example case being when there are many
clauses in the query. In this case, exhaustively evaluating the set of matches
with `BooleanScorer` (BS1) may perform several times faster.

This commit adds to `MaxScoreBulkScorer` the BS1 optimization that consists of
collecting hits into a bitset to save the overhead of reordering priority
queues. This helps make performance degrade much more gracefully when dynamic
pruning cannot help much.

Closes apache#12439
@jpountz
Copy link
Contributor Author

jpountz commented Jul 17, 2023

I played with the following tasks file to evaluate the impact of this change:

OrHigh2: several following 
OrHigh3: several following publisher
OrHigh4: several following publisher end
OrHigh6: several following publisher end http known
OrHigh8: several following publisher end http known title him
OrHigh12: several following publisher end http known title him 2 became over than
OrHigh16: several following publisher end http known title him 2 became over than 1 music most part
OrHigh24: several following publisher end http known title him 2 became over than 1 music most part c 2002 2006 april york also 3 0

And here are QPS numbers for various scorers on wikimedium10m. 🔶 denotes the implementation that is used today, 🔷 denotes the implementation that would get used with this change.

Task BooleanScorer WANDScorer within DefaultBulkScorer MaxScoreBulkScorer (main) MaxScoreBulkScorer (patch)
OrHigh2 59.3 58.3 77.6 🔶 90.8 🔷
OrHigh3 25.9 46.0 🔶 51.6 60.7 🔷
OrHigh4 21.1 23.3 🔶 30.1 38.9 🔷
OrHigh6 9.2 12.9 🔶 19.3 27.1 🔷
OrHigh8 6.8 7.8 🔶 12.7 19.1 🔷
OrHigh12 5.1 3.8 🔶 6.1 10.0 🔷
OrHigh16 4.1 2.3 🔶 3.7 6.5 🔷
OrHigh24 3.0 1.2 🔶 1.9 3.8 🔷

@jpountz
Copy link
Contributor Author

jpountz commented Jul 17, 2023

Here is the usual set of queries, still on wikimedium10m. Sparser disjunctive queries like Fuzzy1 and Fuzzy2 can get a slowdown when the majority of clauses have very few matches per window of 2048 doc IDs, so the bitset adds more overhead than it removes.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          Fuzzy2       82.91      (2.1%)       69.87      (1.6%)  -15.7% ( -19% -  -12%) 0.000
           BrowseMonthSSDVFacets       21.10      (9.9%)       20.37      (2.4%)   -3.4% ( -14% -    9%) 0.129
                          Fuzzy1      102.36      (1.7%)       99.64      (1.4%)   -2.7% (  -5% -    0%) 0.000
           BrowseMonthTaxoFacets       29.58      (7.3%)       28.91      (7.6%)   -2.3% ( -15% -   13%) 0.338
            BrowseDateTaxoFacets       43.75      (3.8%)       43.47      (3.9%)   -0.6% (  -8% -    7%) 0.603
                      HighPhrase      191.09      (3.9%)      190.00      (3.5%)   -0.6% (  -7% -    7%) 0.627
                       MedPhrase       23.90      (3.4%)       23.76      (3.0%)   -0.6% (  -6% -    6%) 0.580
         AndHighMedDayTaxoFacets       36.73      (2.8%)       36.57      (2.7%)   -0.4% (  -5% -    5%) 0.617
            MedTermDayTaxoFacets       35.03      (4.7%)       34.88      (7.0%)   -0.4% ( -11% -   11%) 0.818
                         Prefix3      283.01      (3.1%)      281.87      (2.6%)   -0.4% (  -5% -    5%) 0.652
                          IntNRQ      122.02      (2.6%)      121.60      (2.7%)   -0.3% (  -5% -    5%) 0.678
          OrHighMedDayTaxoFacets       14.45      (4.2%)       14.40      (5.0%)   -0.3% (  -9% -    9%) 0.829
                HighSloppyPhrase       23.79      (4.4%)       23.71      (2.8%)   -0.3% (  -7% -    7%) 0.798
               HighTermMonthSort     3276.91      (6.6%)     3267.18      (6.2%)   -0.3% ( -12% -   13%) 0.884
                    OrNotHighMed      318.99      (3.9%)      318.12      (3.7%)   -0.3% (  -7% -    7%) 0.818
        AndHighHighDayTaxoFacets       34.20      (2.6%)       34.14      (2.5%)   -0.2% (  -5% -    5%) 0.834
                    OrNotHighLow      721.97      (3.8%)      721.24      (3.6%)   -0.1% (  -7% -    7%) 0.931
                        PKLookup      242.99      (3.7%)      242.77      (3.9%)   -0.1% (  -7% -    7%) 0.940
                       LowPhrase       41.00      (3.6%)       40.98      (2.7%)   -0.1% (  -6% -    6%) 0.953
     BrowseRandomLabelSSDVFacets       14.98      (8.3%)       14.97      (8.5%)   -0.0% ( -15% -   18%) 0.985
       BrowseDayOfYearTaxoFacets       44.32      (3.5%)       44.31      (3.6%)   -0.0% (  -6% -    7%) 0.979
             MedIntervalsOrdered       20.86      (4.8%)       20.85      (7.2%)   -0.0% ( -11% -   12%) 0.991
            HighIntervalsOrdered       12.55      (4.7%)       12.55      (6.4%)   -0.0% ( -10% -   11%) 0.995
                      AndHighLow     1034.95      (4.2%)     1034.92      (3.7%)   -0.0% (  -7% -    8%) 0.999
                        Wildcard      210.59      (3.0%)      210.61      (2.3%)    0.0% (  -5% -    5%) 0.988
                 LowSloppyPhrase      171.71      (3.7%)      171.83      (2.5%)    0.1% (  -5% -    6%) 0.944
                   OrNotHighHigh      508.88      (5.6%)      509.60      (5.1%)    0.1% (  -9% -   11%) 0.933
                    HighSpanNear       47.49      (5.7%)       47.56      (6.5%)    0.2% ( -11% -   13%) 0.936
                         Respell       93.55      (1.8%)       93.72      (1.7%)    0.2% (  -3% -    3%) 0.744
            BrowseDateSSDVFacets        5.62      (6.9%)        5.63      (7.1%)    0.2% ( -12% -   15%) 0.933
                     LowSpanNear      216.35      (3.8%)      216.84      (3.9%)    0.2% (  -7% -    8%) 0.850
                     MedSpanNear       62.60      (4.2%)       62.75      (5.7%)    0.2% (  -9% -   10%) 0.882
                 MedSloppyPhrase       31.39      (4.2%)       31.47      (2.8%)    0.3% (  -6% -    7%) 0.815
            HighTermTitleBDVSort       22.52      (3.0%)       22.59      (1.8%)    0.3% (  -4% -    5%) 0.713
                        HighTerm      614.93      (7.0%)      616.86      (6.8%)    0.3% ( -12% -   15%) 0.886
                   OrHighNotHigh      398.90      (6.3%)      400.66      (5.9%)    0.4% ( -11% -   13%) 0.819
                      TermDTSort      169.18      (3.7%)      170.02      (3.4%)    0.5% (  -6% -    7%) 0.656
             LowIntervalsOrdered       63.21      (3.5%)       63.56      (4.2%)    0.6% (  -6% -    8%) 0.651
               HighTermTitleSort      171.12      (6.1%)      172.53      (7.4%)    0.8% ( -11% -   15%) 0.699
                    OrHighNotLow      385.23      (7.6%)      388.51      (5.8%)    0.9% ( -11% -   15%) 0.692
                         MedTerm      640.12      (6.8%)      645.71      (6.7%)    0.9% ( -11% -   15%) 0.682
     BrowseRandomLabelTaxoFacets       35.18      (5.4%)       35.50      (8.0%)    0.9% ( -11% -   15%) 0.668
                      AndHighMed      300.64      (3.6%)      304.11      (4.2%)    1.2% (  -6% -    9%) 0.350
                    OrHighNotMed      288.97      (7.6%)      292.33      (6.7%)    1.2% ( -12% -   16%) 0.610
           HighTermDayOfYearSort      422.82      (3.6%)      428.51      (3.0%)    1.3% (  -5% -    8%) 0.199
                     AndHighHigh       72.52      (4.7%)       73.94      (5.4%)    1.9% (  -7% -   12%) 0.223
       BrowseDayOfYearSSDVFacets       19.89      (8.6%)       20.31     (12.2%)    2.1% ( -17% -   25%) 0.533
                         LowTerm      986.32      (6.6%)     1008.27      (8.4%)    2.2% ( -12% -   18%) 0.353
                       OrHighMed      204.41      (3.3%)      213.59      (5.4%)    4.5% (  -3% -   13%) 0.001
                      OrHighHigh       44.87      (6.6%)       49.44     (11.2%)   10.2% (  -7% -   29%) 0.000
                       OrHighLow      301.82      (5.8%)      349.93     (11.7%)   15.9% (  -1% -   35%) 0.000

@jpountz
Copy link
Contributor Author

jpountz commented Jul 17, 2023

Here is a similar table as above but with low-cardinality clauses instead of high-cardinality clauses in order to show how the overhead of the bitset manifests:

OrLow2: rivers sequence
OrLow3: rivers sequence opposite
OrLow4: rivers sequence opposite aug
OrLow6: rivers sequence opposite aug ross bronze
OrLow8: rivers sequence opposite aug ross bronze extension factor
OrLow12: rivers sequence opposite aug ross bronze extension factor migration maintained norwegian visited
OrLow16: rivers sequence opposite aug ross bronze extension factor migration maintained norwegian visited korean argentina developing billion
Task BooleanScorer WANDScorer within DefaultBulkScorer MaxScoreBulkScorer (main) MaxScoreBulkScorer (patch)
OrLow2 283.3 353.0 427.2 🔶 425.7 🔷
OrLow3 210.3 278.6 🔶 270.0 236.4 🔷
OrLow4 171.7 198.3 🔶 190.0 175.7 🔷
OrLow6 124.5 114.7 🔶 112.3 114.3 🔷
OrLow8 97.3 77.5 🔶 77.1 86.4 🔷
OrLow12 68.2 44.7 🔶 50.1 59.8 🔷
OrLow16 52.3 31.1 🔶 36.0 45.4 🔷

With high-frequency clauses, MaxScoreBulkScorer was consistenly better in this PR than in the main branch. With low-frequency clauses, it's now only true for queries with 6 clauses or more. Also WAND performs faster than MAXSCORE here with less than 8 clauses.

I'd like to avoid trying to go too far wrt picking the optimal implementation based on the query, which could get quite messy. Maybe we could introduce simple heuristics in a follow-up, such as only using the bulk scorer if the cost is high enough that we'd expect more than X matches per 2048-bits window on average.

In general, this new MaxScoreBulkScorer feels like the best option to me, as it performs better on the slower queries that have high-frequency clauses, and its performance degrades more gracefully when the number of clauses increases.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really cool that we handle more cases to apply max score in the bulk scorer!

I'd like to avoid trying to go too far wrt picking the optimal implementation based on the query, which could get quite messy. Maybe we could introduce simple heuristics in a follow-up, such as only using the bulk scorer if the cost is high enough that we'd expect more than X matches per 2048-bits window on average.

The numbers you shared are a good compromise but ++ to remain open and add more heuristics in followups.

@jpountz
Copy link
Contributor Author

jpountz commented Jul 19, 2023

I pushed a couple changes that helped improve performance on sparse clauses a bit, and updated the above performance numbers:

  • Inner windows are no longer aligned with multiples of 2048.
  • If there is a single essential clause that has matches in the first half of an inner window, then we collect this first half of the window in an optimized way, and recompute a new inner window based on the next match of essential clauses.

@jpountz jpountz added this to the 9.8.0 milestone Jul 19, 2023
@jpountz jpountz merged commit 17c13a7 into apache:main Jul 19, 2023
4 checks passed
@jpountz jpountz deleted the bs1_maxscore branch July 19, 2023 11:51
jpountz added a commit that referenced this pull request Jul 19, 2023
Lucene's scorers that can dynamically prune on score provide great speedups
when they manage to skip many hits. Unfortunately, there are also cases when
they cannot skip hits efficiently, one example case being when there are many
clauses in the query. In this case, exhaustively evaluating the set of matches
with `BooleanScorer` (BS1) may perform several times faster.

This commit adds to `MaxScoreBulkScorer` the BS1 optimization that consists of
collecting hits into a bitset to save the overhead of reordering priority
queues. This helps make performance degrade much more gracefully when dynamic
pruning cannot help much.

Closes #12439
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch from MAXSCORE to BS1 with high numbers of clauses
2 participants