Add BS1 optimization to MaxScoreBulkScorer. #12444

jpountz · 2023-07-17T07:03:11Z

Lucene's scorers that can dynamically prune on score provide great speedups when they manage to skip many hits. Unfortunately, there are also cases when they cannot skip hits efficiently, one example case being when there are many clauses in the query. In this case, exhaustively evaluating the set of matches with BooleanScorer (BS1) may perform several times faster.

This commit adds to MaxScoreBulkScorer the BS1 optimization that consists of collecting hits into a bitset to save the overhead of reordering priority queues. This helps make performance degrade much more gracefully when dynamic pruning cannot help much.

Closes #12439

Lucene's scorers that can dynamically prune on score provide great speedups when they manage to skip many hits. Unfortunately, there are also cases when they cannot skip hits efficiently, one example case being when there are many clauses in the query. In this case, exhaustively evaluating the set of matches with `BooleanScorer` (BS1) may perform several times faster. This commit adds to `MaxScoreBulkScorer` the BS1 optimization that consists of collecting hits into a bitset to save the overhead of reordering priority queues. This helps make performance degrade much more gracefully when dynamic pruning cannot help much. Closes apache#12439

jpountz · 2023-07-17T07:26:32Z

I played with the following tasks file to evaluate the impact of this change:

OrHigh2: several following 
OrHigh3: several following publisher
OrHigh4: several following publisher end
OrHigh6: several following publisher end http known
OrHigh8: several following publisher end http known title him
OrHigh12: several following publisher end http known title him 2 became over than
OrHigh16: several following publisher end http known title him 2 became over than 1 music most part
OrHigh24: several following publisher end http known title him 2 became over than 1 music most part c 2002 2006 april york also 3 0

And here are QPS numbers for various scorers on wikimedium10m. 🔶 denotes the implementation that is used today, 🔷 denotes the implementation that would get used with this change.

Task	BooleanScorer	WANDScorer within DefaultBulkScorer	MaxScoreBulkScorer (main)	MaxScoreBulkScorer (patch)
OrHigh2	59.3	58.3	77.6 🔶	90.8 🔷
OrHigh3	25.9	46.0 🔶	51.6	60.7 🔷
OrHigh4	21.1	23.3 🔶	30.1	38.9 🔷
OrHigh6	9.2	12.9 🔶	19.3	27.1 🔷
OrHigh8	6.8	7.8 🔶	12.7	19.1 🔷
OrHigh12	5.1	3.8 🔶	6.1	10.0 🔷
OrHigh16	4.1	2.3 🔶	3.7	6.5 🔷
OrHigh24	3.0	1.2 🔶	1.9	3.8 🔷

jpountz · 2023-07-17T10:15:23Z

Here is the usual set of queries, still on wikimedium10m. Sparser disjunctive queries like Fuzzy1 and Fuzzy2 can get a slowdown when the majority of clauses have very few matches per window of 2048 doc IDs, so the bitset adds more overhead than it removes.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          Fuzzy2       82.91      (2.1%)       69.87      (1.6%)  -15.7% ( -19% -  -12%) 0.000
           BrowseMonthSSDVFacets       21.10      (9.9%)       20.37      (2.4%)   -3.4% ( -14% -    9%) 0.129
                          Fuzzy1      102.36      (1.7%)       99.64      (1.4%)   -2.7% (  -5% -    0%) 0.000
           BrowseMonthTaxoFacets       29.58      (7.3%)       28.91      (7.6%)   -2.3% ( -15% -   13%) 0.338
            BrowseDateTaxoFacets       43.75      (3.8%)       43.47      (3.9%)   -0.6% (  -8% -    7%) 0.603
                      HighPhrase      191.09      (3.9%)      190.00      (3.5%)   -0.6% (  -7% -    7%) 0.627
                       MedPhrase       23.90      (3.4%)       23.76      (3.0%)   -0.6% (  -6% -    6%) 0.580
         AndHighMedDayTaxoFacets       36.73      (2.8%)       36.57      (2.7%)   -0.4% (  -5% -    5%) 0.617
            MedTermDayTaxoFacets       35.03      (4.7%)       34.88      (7.0%)   -0.4% ( -11% -   11%) 0.818
                         Prefix3      283.01      (3.1%)      281.87      (2.6%)   -0.4% (  -5% -    5%) 0.652
                          IntNRQ      122.02      (2.6%)      121.60      (2.7%)   -0.3% (  -5% -    5%) 0.678
          OrHighMedDayTaxoFacets       14.45      (4.2%)       14.40      (5.0%)   -0.3% (  -9% -    9%) 0.829
                HighSloppyPhrase       23.79      (4.4%)       23.71      (2.8%)   -0.3% (  -7% -    7%) 0.798
               HighTermMonthSort     3276.91      (6.6%)     3267.18      (6.2%)   -0.3% ( -12% -   13%) 0.884
                    OrNotHighMed      318.99      (3.9%)      318.12      (3.7%)   -0.3% (  -7% -    7%) 0.818
        AndHighHighDayTaxoFacets       34.20      (2.6%)       34.14      (2.5%)   -0.2% (  -5% -    5%) 0.834
                    OrNotHighLow      721.97      (3.8%)      721.24      (3.6%)   -0.1% (  -7% -    7%) 0.931
                        PKLookup      242.99      (3.7%)      242.77      (3.9%)   -0.1% (  -7% -    7%) 0.940
                       LowPhrase       41.00      (3.6%)       40.98      (2.7%)   -0.1% (  -6% -    6%) 0.953
     BrowseRandomLabelSSDVFacets       14.98      (8.3%)       14.97      (8.5%)   -0.0% ( -15% -   18%) 0.985
       BrowseDayOfYearTaxoFacets       44.32      (3.5%)       44.31      (3.6%)   -0.0% (  -6% -    7%) 0.979
             MedIntervalsOrdered       20.86      (4.8%)       20.85      (7.2%)   -0.0% ( -11% -   12%) 0.991
            HighIntervalsOrdered       12.55      (4.7%)       12.55      (6.4%)   -0.0% ( -10% -   11%) 0.995
                      AndHighLow     1034.95      (4.2%)     1034.92      (3.7%)   -0.0% (  -7% -    8%) 0.999
                        Wildcard      210.59      (3.0%)      210.61      (2.3%)    0.0% (  -5% -    5%) 0.988
                 LowSloppyPhrase      171.71      (3.7%)      171.83      (2.5%)    0.1% (  -5% -    6%) 0.944
                   OrNotHighHigh      508.88      (5.6%)      509.60      (5.1%)    0.1% (  -9% -   11%) 0.933
                    HighSpanNear       47.49      (5.7%)       47.56      (6.5%)    0.2% ( -11% -   13%) 0.936
                         Respell       93.55      (1.8%)       93.72      (1.7%)    0.2% (  -3% -    3%) 0.744
            BrowseDateSSDVFacets        5.62      (6.9%)        5.63      (7.1%)    0.2% ( -12% -   15%) 0.933
                     LowSpanNear      216.35      (3.8%)      216.84      (3.9%)    0.2% (  -7% -    8%) 0.850
                     MedSpanNear       62.60      (4.2%)       62.75      (5.7%)    0.2% (  -9% -   10%) 0.882
                 MedSloppyPhrase       31.39      (4.2%)       31.47      (2.8%)    0.3% (  -6% -    7%) 0.815
            HighTermTitleBDVSort       22.52      (3.0%)       22.59      (1.8%)    0.3% (  -4% -    5%) 0.713
                        HighTerm      614.93      (7.0%)      616.86      (6.8%)    0.3% ( -12% -   15%) 0.886
                   OrHighNotHigh      398.90      (6.3%)      400.66      (5.9%)    0.4% ( -11% -   13%) 0.819
                      TermDTSort      169.18      (3.7%)      170.02      (3.4%)    0.5% (  -6% -    7%) 0.656
             LowIntervalsOrdered       63.21      (3.5%)       63.56      (4.2%)    0.6% (  -6% -    8%) 0.651
               HighTermTitleSort      171.12      (6.1%)      172.53      (7.4%)    0.8% ( -11% -   15%) 0.699
                    OrHighNotLow      385.23      (7.6%)      388.51      (5.8%)    0.9% ( -11% -   15%) 0.692
                         MedTerm      640.12      (6.8%)      645.71      (6.7%)    0.9% ( -11% -   15%) 0.682
     BrowseRandomLabelTaxoFacets       35.18      (5.4%)       35.50      (8.0%)    0.9% ( -11% -   15%) 0.668
                      AndHighMed      300.64      (3.6%)      304.11      (4.2%)    1.2% (  -6% -    9%) 0.350
                    OrHighNotMed      288.97      (7.6%)      292.33      (6.7%)    1.2% ( -12% -   16%) 0.610
           HighTermDayOfYearSort      422.82      (3.6%)      428.51      (3.0%)    1.3% (  -5% -    8%) 0.199
                     AndHighHigh       72.52      (4.7%)       73.94      (5.4%)    1.9% (  -7% -   12%) 0.223
       BrowseDayOfYearSSDVFacets       19.89      (8.6%)       20.31     (12.2%)    2.1% ( -17% -   25%) 0.533
                         LowTerm      986.32      (6.6%)     1008.27      (8.4%)    2.2% ( -12% -   18%) 0.353
                       OrHighMed      204.41      (3.3%)      213.59      (5.4%)    4.5% (  -3% -   13%) 0.001
                      OrHighHigh       44.87      (6.6%)       49.44     (11.2%)   10.2% (  -7% -   29%) 0.000
                       OrHighLow      301.82      (5.8%)      349.93     (11.7%)   15.9% (  -1% -   35%) 0.000

jpountz · 2023-07-17T10:41:01Z

Here is a similar table as above but with low-cardinality clauses instead of high-cardinality clauses in order to show how the overhead of the bitset manifests:

OrLow2: rivers sequence
OrLow3: rivers sequence opposite
OrLow4: rivers sequence opposite aug
OrLow6: rivers sequence opposite aug ross bronze
OrLow8: rivers sequence opposite aug ross bronze extension factor
OrLow12: rivers sequence opposite aug ross bronze extension factor migration maintained norwegian visited
OrLow16: rivers sequence opposite aug ross bronze extension factor migration maintained norwegian visited korean argentina developing billion

Task	BooleanScorer	WANDScorer within DefaultBulkScorer	MaxScoreBulkScorer (main)	MaxScoreBulkScorer (patch)
OrLow2	283.3	353.0	427.2 🔶	425.7 🔷
OrLow3	210.3	278.6 🔶	270.0	236.4 🔷
OrLow4	171.7	198.3 🔶	190.0	175.7 🔷
OrLow6	124.5	114.7 🔶	112.3	114.3 🔷
OrLow8	97.3	77.5 🔶	77.1	86.4 🔷
OrLow12	68.2	44.7 🔶	50.1	59.8 🔷
OrLow16	52.3	31.1 🔶	36.0	45.4 🔷

With high-frequency clauses, MaxScoreBulkScorer was consistenly better in this PR than in the main branch. With low-frequency clauses, it's now only true for queries with 6 clauses or more. Also WAND performs faster than MAXSCORE here with less than 8 clauses.

I'd like to avoid trying to go too far wrt picking the optimal implementation based on the query, which could get quite messy. Maybe we could introduce simple heuristics in a follow-up, such as only using the bulk scorer if the cost is high enough that we'd expect more than X matches per 2048-bits window on average.

In general, this new MaxScoreBulkScorer feels like the best option to me, as it performs better on the slower queries that have high-frequency clauses, and its performance degrades more gracefully when the number of clauses increases.

jimczi

That's really cool that we handle more cases to apply max score in the bulk scorer!

I'd like to avoid trying to go too far wrt picking the optimal implementation based on the query, which could get quite messy. Maybe we could introduce simple heuristics in a follow-up, such as only using the bulk scorer if the cost is high enough that we'd expect more than X matches per 2048-bits window on average.

The numbers you shared are a good compromise but ++ to remain open and add more heuristics in followups.

jpountz · 2023-07-19T08:12:49Z

I pushed a couple changes that helped improve performance on sparse clauses a bit, and updated the above performance numbers:

Inner windows are no longer aligned with multiples of 2048.
If there is a single essential clause that has matches in the first half of an inner window, then we collect this first half of the window in an optimized way, and recompute a new inner window based on the next match of essential clauses.

Lucene's scorers that can dynamically prune on score provide great speedups when they manage to skip many hits. Unfortunately, there are also cases when they cannot skip hits efficiently, one example case being when there are many clauses in the query. In this case, exhaustively evaluating the set of matches with `BooleanScorer` (BS1) may perform several times faster. This commit adds to `MaxScoreBulkScorer` the BS1 optimization that consists of collecting hits into a bitset to save the overhead of reordering priority queues. This helps make performance degrade much more gracefully when dynamic pruning cannot help much. Closes #12439

Fix small bug.

e6cde8a

fix test

8526065

jpountz mentioned this pull request Jul 17, 2023

Switch from MAXSCORE to BS1 with high numbers of clauses #12439

Closed

jimczi approved these changes Jul 17, 2023

View reviewed changes

Improve sparse clauses a bit by more often skipping the bitset.

2767ab6

jpountz added 5 commits July 19, 2023 10:13

No longer align windows to multiples of 2048.

e1763d1

Remove unused member.

bd2a4c4

Remove leftover.

199f0ba

Add CHANGES.

2ec459f

Simplify.

17df528

jpountz added this to the 9.8.0 milestone Jul 19, 2023

jpountz merged commit 17c13a7 into apache:main Jul 19, 2023

jpountz deleted the bs1_maxscore branch July 19, 2023 11:51

jpountz mentioned this pull request Jul 24, 2023

Investigate slow fuzzy queries #12456

Closed

jpountz mentioned this pull request Dec 5, 2023

Upgrade to Lucene 9.9 castorini/anserini#2288

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add BS1 optimization to MaxScoreBulkScorer. #12444

Add BS1 optimization to MaxScoreBulkScorer. #12444

Uh oh!

jpountz commented Jul 17, 2023

Uh oh!

jpountz commented Jul 17, 2023 •

edited

Loading

Uh oh!

jpountz commented Jul 17, 2023 •

edited

Loading

Uh oh!

jpountz commented Jul 17, 2023 •

edited

Loading

Uh oh!

jimczi left a comment

Uh oh!

jpountz commented Jul 19, 2023

Uh oh!

Uh oh!

Add BS1 optimization to MaxScoreBulkScorer. #12444

Add BS1 optimization to MaxScoreBulkScorer. #12444

Uh oh!

Conversation

jpountz commented Jul 17, 2023

Uh oh!

jpountz commented Jul 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpountz commented Jul 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpountz commented Jul 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

jpountz commented Jul 19, 2023

Uh oh!

Uh oh!

jpountz commented Jul 17, 2023 •

edited

Loading

jpountz commented Jul 17, 2023 •

edited

Loading

jpountz commented Jul 17, 2023 •

edited

Loading