Change the MAXSCORE scorer to a bulk scorer. #12361

jpountz · 2023-06-09T16:42:17Z

We currently use block-max maxscore for top-level disjunctions, implemented as
a scorer. Since we only use it for top-level disjunctions, we could actually
implement it as a bulk scorer, which helps save some overhead. luceneutil
reports the following numbers on wikimedium10m:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                    HighSpanNear        9.15      (3.7%)        9.03      (3.5%)   -1.4% (  -8% -    6%) 0.224
                         Prefix3      434.41      (2.2%)      429.16      (2.1%)   -1.2% (  -5% -    3%) 0.080
            MedTermDayTaxoFacets       37.33      (6.3%)       36.88      (6.6%)   -1.2% ( -13% -   12%) 0.558
                      AndHighLow     1315.31      (3.1%)     1299.90      (3.9%)   -1.2% (  -7% -    6%) 0.294
                     MedSpanNear       42.42      (2.5%)       41.96      (2.3%)   -1.1% (  -5% -    3%) 0.153
          OrHighMedDayTaxoFacets        5.66      (4.9%)        5.60      (4.9%)   -1.1% ( -10% -    9%) 0.488
                HighSloppyPhrase       16.72      (3.5%)       16.57      (5.4%)   -0.9% (  -9% -    8%) 0.539
                        Wildcard      129.90      (3.9%)      128.92      (3.8%)   -0.8% (  -8% -    7%) 0.537
                      HighPhrase       68.61      (5.5%)       68.10      (4.4%)   -0.7% ( -10% -    9%) 0.637
                       MedPhrase       27.46      (3.9%)       27.26      (3.5%)   -0.7% (  -7% -    6%) 0.538
     BrowseRandomLabelSSDVFacets       14.70      (7.4%)       14.61      (7.7%)   -0.7% ( -14% -   15%) 0.779
                         LowTerm      816.63      (5.6%)      811.34      (5.0%)   -0.6% ( -10% -   10%) 0.699
           BrowseMonthSSDVFacets       20.41      (1.1%)       20.28      (2.0%)   -0.6% (  -3% -    2%) 0.207
                       LowPhrase       43.61      (3.4%)       43.35      (3.0%)   -0.6% (  -6% -    6%) 0.561
                          Fuzzy1      135.81      (1.2%)      135.42      (1.5%)   -0.3% (  -3% -    2%) 0.504
                     LowSpanNear      114.09      (1.7%)      113.78      (1.8%)   -0.3% (  -3% -    3%) 0.626
                          Fuzzy2       71.78      (1.1%)       71.60      (1.0%)   -0.2% (  -2% -    1%) 0.454
        AndHighHighDayTaxoFacets       31.06      (2.2%)       30.98      (2.3%)   -0.2% (  -4% -    4%) 0.730
                          IntNRQ       88.61      (5.8%)       88.46      (5.7%)   -0.2% ( -11% -   12%) 0.926
               HighTermMonthSort     3779.27      (3.8%)     3775.75      (3.3%)   -0.1% (  -6% -    7%) 0.934
         AndHighMedDayTaxoFacets       58.44      (1.8%)       58.42      (1.9%)   -0.0% (  -3% -    3%) 0.948
                         Respell       80.73      (1.6%)       80.82      (1.4%)    0.1% (  -2% -    3%) 0.815
                         MedTerm      731.12      (6.8%)      732.37      (7.1%)    0.2% ( -12% -   15%) 0.938
                        PKLookup      236.79      (4.4%)      237.48      (4.6%)    0.3% (  -8% -    9%) 0.838
                      TermDTSort      181.53      (2.7%)      182.14      (2.1%)    0.3% (  -4% -    5%) 0.661
           HighTermDayOfYearSort      422.38      (3.2%)      423.81      (3.6%)    0.3% (  -6% -    7%) 0.752
                 LowSloppyPhrase       46.81      (2.7%)       46.98      (3.0%)    0.3% (  -5% -    6%) 0.696
                      AndHighMed      342.09      (4.6%)      343.63      (3.8%)    0.4% (  -7% -    9%) 0.737
                     AndHighHigh       46.06      (6.6%)       46.28      (5.8%)    0.5% ( -11% -   13%) 0.809
            HighTermTitleBDVSort       23.23      (3.6%)       23.34      (3.2%)    0.5% (  -6% -    7%) 0.650
                        HighTerm      685.44      (7.3%)      689.42      (7.5%)    0.6% ( -13% -   16%) 0.804
               HighTermTitleSort      156.76      (5.8%)      157.96      (5.6%)    0.8% ( -10% -   12%) 0.671
            HighIntervalsOrdered       25.11      (5.2%)       25.32      (5.2%)    0.8% (  -9% -   11%) 0.607
                    OrNotHighLow     1803.79      (3.7%)     1819.26      (3.5%)    0.9% (  -6% -    8%) 0.452
             LowIntervalsOrdered       62.41      (3.9%)       63.01      (3.7%)    1.0% (  -6% -    8%) 0.423
                    OrNotHighMed      456.34      (3.5%)      460.92      (4.3%)    1.0% (  -6% -    9%) 0.419
                    OrHighNotLow      365.78      (8.4%)      369.70      (9.0%)    1.1% ( -15% -   20%) 0.698
                   OrNotHighHigh      272.99      (6.9%)      276.13      (7.7%)    1.2% ( -12% -   16%) 0.618
                   OrHighNotHigh      438.11      (6.5%)      443.93      (7.3%)    1.3% ( -11% -   16%) 0.543
                    OrHighNotMed      371.40      (7.4%)      376.34      (8.4%)    1.3% ( -13% -   18%) 0.595
                 MedSloppyPhrase        6.47      (4.8%)        6.56      (4.7%)    1.4% (  -7% -   11%) 0.357
            BrowseDateSSDVFacets        5.50      (9.0%)        5.61     (10.0%)    2.0% ( -15% -   23%) 0.509
             MedIntervalsOrdered       29.98      (5.3%)       30.58      (5.5%)    2.0% (  -8% -   13%) 0.242
       BrowseDayOfYearSSDVFacets       19.88      (8.8%)       20.62     (14.4%)    3.7% ( -17% -   29%) 0.322
           BrowseMonthTaxoFacets       26.63     (20.6%)       27.75     (15.2%)    4.2% ( -26% -   50%) 0.462
                       OrHighLow      555.47      (3.1%)      579.50      (2.4%)    4.3% (  -1% -   10%) 0.000
     BrowseRandomLabelTaxoFacets       31.81     (23.1%)       33.32     (19.2%)    4.8% ( -30% -   61%) 0.477
            BrowseDateTaxoFacets       38.87     (24.8%)       40.81     (18.7%)    5.0% ( -30% -   64%) 0.471
       BrowseDayOfYearTaxoFacets       39.19     (24.8%)       41.52     (18.8%)    5.9% ( -30% -   65%) 0.394
                       OrHighMed      138.70      (3.4%)      149.08      (3.4%)    7.5% (   0% -   14%) 0.000
                      OrHighHigh       44.38      (3.3%)       51.20      (3.8%)   15.4% (   8% -   23%) 0.000

OrHighHigh, OrHighMed and orHighLow all get a speedup with this change.

This reverts commit a9ddf69.

This reverts commit 813fccc.

zacharymorn

Thanks @jpountz for the PR! I had the thought before to compare your approach again after mine was merged, but somehow lost track of it. The changes look good to me, although I'm wondering if some of the utility methods like list re-partitioning can potentially be shared among two scorer implementations. But we can take those as future improvements.

jpountz · 2023-06-13T11:58:42Z

Your comment helped me remember that I had planned to remove the scorer (as opposed to bulk scorer) implementation of block-max maxscore. I just pushed a commit that does it, does it make sense to you?

jpountz added 9 commits June 23, 2022 15:23

Add a Block-Max Maxscore bulk scorer.

bd4e2c7

simplify

9fca390

Merge branch 'main' into maxscore

1628356

iter

813fccc

iter

a9ddf69

Revert "iter"

28db63e

This reverts commit a9ddf69.

Revert "iter"

245a7db

This reverts commit 813fccc.

Merge branch 'main' into maxscore

6096484

tests

8406f6f

jpountz requested a review from zacharymorn June 9, 2023 16:42

zacharymorn approved these changes Jun 10, 2023

View reviewed changes

Remove unused file.

488418c

jpountz added 4 commits June 20, 2023 18:17

Merge branch 'main' into maxscore

44f1377

Add CHANGES.

60a4edb

remove unused code

6b3c436

Add comment.

f7e3dd8

jpountz merged commit 8703e44 into apache:main Jun 20, 2023
4 checks passed

jpountz deleted the maxscore branch June 20, 2023 16:55

jpountz added this to the 9.8.0 milestone Jun 20, 2023

jpountz added a commit that referenced this pull request Jun 20, 2023

Change the MAXSCORE scorer to a bulk scorer. (#12361)

0eb7480

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the MAXSCORE scorer to a bulk scorer. #12361

Change the MAXSCORE scorer to a bulk scorer. #12361

jpountz commented Jun 9, 2023

zacharymorn left a comment

jpountz commented Jun 13, 2023

Change the MAXSCORE scorer to a bulk scorer. #12361

Change the MAXSCORE scorer to a bulk scorer. #12361

Conversation

jpountz commented Jun 9, 2023

zacharymorn left a comment

Choose a reason for hiding this comment

jpountz commented Jun 13, 2023