Skip to content

Restore WANDScorer for TOP_SCORES + minShouldMatch > 1#16176

Merged
mccullocht merged 5 commits into
apache:mainfrom
txwei:use-wandscorer-topscores-msm
Jun 4, 2026
Merged

Restore WANDScorer for TOP_SCORES + minShouldMatch > 1#16176
mccullocht merged 5 commits into
apache:mainfrom
txwei:use-wandscorer-topscores-msm

Conversation

@txwei
Copy link
Copy Markdown
Contributor

@txwei txwei commented Jun 2, 2026

Description

Lucene #13408 (released in 10.0) dropped a guard in BooleanScorerSupplier.optionalBulkScorer that previously kept TOP_SCORES + minShouldMatch > 1 queries on WANDScorer. As a side effect, when matches are dense, dispatch now lands on BooleanScorer, which has no top-K impact pruning — causing 10–100× latency regressions on pure-should compound queries with minimumShouldMatch > 1.

@github-actions github-actions Bot added this to the 11.0.0 milestone Jun 2, 2026
Copy link
Copy Markdown
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening @txwei - all looks great. Do you want to remove the Draft flag?

Comment thread lucene/CHANGES.txt Outdated

Bug Fixes
---------------------
* GITHUB#16176: Restore WANDScroer for TOP_SCORES + minShouldMatch > 1. (Tianxiao Wei)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to the 10.5 section?

Also, small typo: s/WANDScroer/WANDScorer/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's please move this to 10.5. I'm happy to handle the backport.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing out the typo! fixed and moved to 10.5

@mccullocht mccullocht modified the milestones: 11.0.0, 10.5.0 Jun 3, 2026
@txwei txwei marked this pull request as ready for review June 3, 2026 18:35
@txwei
Copy link
Copy Markdown
Contributor Author

txwei commented Jun 3, 2026

ConstMSM2 got a huge boost from this change (added in mikemccand/luceneutil#583)

                        TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
 BrowseRandomLabelTaxoFacets       41.72     (37.7%)       30.90     (36.4%)  -25.9% ( -72% -   77%) 0.027
   BrowseDayOfYearTaxoFacets       38.39     (25.0%)       32.08     (25.3%)  -16.4% ( -53% -   45%) 0.039
        BrowseDateTaxoFacets       37.97     (25.4%)       31.73     (25.7%)  -16.4% ( -53% -   46%) 0.042
                     Prefix3     1161.58     (16.9%)     1057.12     (16.9%)   -9.0% ( -36% -   29%) 0.092
       BrowseMonthTaxoFacets       37.59     (28.1%)       35.18     (28.3%)   -6.4% ( -49% -   69%) 0.473
        BrowseDateSSDVFacets        4.87     (15.8%)        4.59     (14.9%)   -5.9% ( -31% -   29%) 0.228
                    PKLookup      545.48      (3.3%)      534.94      (4.1%)   -1.9% (  -8% -    5%) 0.097
   BrowseDayOfYearSSDVFacets       27.29      (6.5%)       26.82      (8.9%)   -1.7% ( -16% -   14%) 0.479
             LowSloppyPhrase      476.06      (6.8%)      470.08      (4.9%)   -1.3% ( -12% -   11%) 0.505
             MedSloppyPhrase      350.36      (5.5%)      346.47      (6.3%)   -1.1% ( -12% -   11%) 0.553
         MedIntervalsOrdered      273.48      (7.8%)      270.54      (6.7%)   -1.1% ( -14% -   14%) 0.641
                      IntNRQ      367.35      (8.0%)      364.00      (8.0%)   -0.9% ( -15% -   16%) 0.720
                   OrHighMed      902.15     (10.5%)      894.41      (8.2%)   -0.9% ( -17% -   19%) 0.773
                OrHighNotMed     1426.98      (9.2%)     1415.23      (8.7%)   -0.8% ( -17% -   18%) 0.771
        HighTermTitleBDVSort       39.72      (5.8%)       39.44      (5.0%)   -0.7% ( -10% -   10%) 0.682
                       range     6620.39      (7.7%)     6577.13      (8.1%)   -0.7% ( -15% -   16%) 0.794
                    HighTerm     1626.82     (18.5%)     1618.18     (19.2%)   -0.5% ( -32% -   45%) 0.929
                     MedTerm     2563.69     (10.2%)     2557.28     (10.3%)   -0.3% ( -18% -   22%) 0.938
                   LowPhrase      256.40      (8.5%)      255.78      (9.0%)   -0.2% ( -16% -   18%) 0.930
     AndHighMedDayTaxoFacets      207.86      (5.8%)      207.46      (5.3%)   -0.2% ( -10% -   11%) 0.913
               OrNotHighHigh      312.77      (9.1%)      312.23      (5.6%)   -0.2% ( -13% -   15%) 0.943
                  TermDTSort      473.69      (7.9%)      473.07      (8.5%)   -0.1% ( -15% -   17%) 0.960
        MedTermDayTaxoFacets       80.46      (3.8%)       80.36      (3.7%)   -0.1% (  -7% -    7%) 0.916
                 LowSpanNear      336.96      (6.7%)      336.55      (6.1%)   -0.1% ( -12% -   13%) 0.952
                    BM25MSM2       35.71      (7.8%)       35.68      (4.4%)   -0.1% ( -11% -   13%) 0.972
                      Fuzzy1      157.30      (4.1%)      157.49      (4.1%)    0.1% (  -7% -    8%) 0.928
                   MedPhrase      259.04      (5.0%)      259.42      (6.8%)    0.1% ( -11% -   12%) 0.938
                HighSpanNear       43.17      (6.9%)       43.24      (5.1%)    0.2% ( -11% -   13%) 0.934
                 MedSpanNear      206.48      (6.8%)      207.29      (4.0%)    0.4% (  -9% -   11%) 0.824
 BrowseRandomLabelSSDVFacets       19.44      (5.2%)       19.55      (4.9%)    0.6% (  -9% -   11%) 0.728
        HighIntervalsOrdered       65.14      (7.9%)       65.53      (7.5%)    0.6% ( -13% -   17%) 0.802
                   OrHighLow     1723.90      (8.1%)     1734.60      (6.8%)    0.6% ( -13% -   16%) 0.794
                     Respell       94.60      (3.6%)       95.25      (2.5%)    0.7% (  -5% -    6%) 0.480
                      Fuzzy2      123.17      (3.9%)      124.03      (4.1%)    0.7% (  -7% -    9%) 0.581
      OrHighMedDayTaxoFacets       50.34      (4.3%)       50.72      (4.0%)    0.8% (  -7% -    9%) 0.564
                  HighPhrase      256.40      (7.3%)      258.71      (3.1%)    0.9% (  -8% -   12%) 0.611
                OrHighNotLow     1337.50     (14.2%)     1355.25     (14.3%)    1.3% ( -23% -   34%) 0.768
              AndMissingHigh     4703.22      (8.9%)     4770.60      (8.2%)    1.4% ( -14% -   20%) 0.596
                    Wildcard      259.76      (5.7%)      263.64      (5.4%)    1.5% (  -9% -   13%) 0.395
                OrNotHighMed      866.83     (11.6%)      879.82      (7.1%)    1.5% ( -15% -   22%) 0.621
    AndHighHighDayTaxoFacets      109.35      (5.5%)      111.09      (3.8%)    1.6% (  -7% -   11%) 0.293
                     LowTerm     3125.36      (9.2%)     3175.31     (12.2%)    1.6% ( -18% -   25%) 0.640
            HighSloppyPhrase      180.61      (6.5%)      183.87      (3.5%)    1.8% (  -7% -   12%) 0.276
       HighTermDayOfYearSort      436.82     (10.4%)      445.08      (7.2%)    1.9% ( -14% -   21%) 0.504
                  OrHighHigh      490.41     (13.4%)      499.84     (12.1%)    1.9% ( -20% -   31%) 0.634
       BrowseMonthSSDVFacets       26.66      (5.5%)       27.21      (9.2%)    2.0% ( -12% -   17%) 0.397
                      IntSet     1406.92     (11.3%)     1436.38     (10.7%)    2.1% ( -17% -   27%) 0.548
                OrNotHighLow     2467.02      (9.7%)     2524.55      (9.9%)    2.3% ( -15% -   24%) 0.452
           HighTermMonthSort     1857.64      (8.2%)     1903.14      (5.9%)    2.4% ( -10% -   17%) 0.277
                 AndHighHigh      677.69     (14.4%)      695.55     (11.9%)    2.6% ( -20% -   33%) 0.528
               OrHighNotHigh      452.07     (10.6%)      465.63      (8.3%)    3.0% ( -14% -   24%) 0.321
                  AndHighMed      627.13     (14.2%)      646.17     (10.1%)    3.0% ( -18% -   31%) 0.436
         LowIntervalsOrdered      330.30     (10.6%)      340.78      (5.6%)    3.2% ( -11% -   21%) 0.237
                  AndHighLow     2718.11      (9.9%)     2819.57      (7.9%)    3.7% ( -12% -   23%) 0.188
           HighTermTitleSort      176.63      (9.5%)      183.39      (7.6%)    3.8% ( -12% -   23%) 0.160
                   ConstMSM2       29.52      (5.7%)     2061.67    (423.8%) 6884.8% (6107% - 7755%) 0.000

@mccullocht mccullocht merged commit 135b348 into apache:main Jun 4, 2026
13 checks passed
mccullocht pushed a commit that referenced this pull request Jun 4, 2026
Lucene #13408 (released in 10.0) dropped a guard in BooleanScorerSupplier.optionalBulkScorer that previously kept TOP_SCORES + minShouldMatch > 1 queries on WANDScorer. As a side effect, when matches are dense, dispatch now lands on BooleanScorer, which has no top-K impact pruning — causing 10–100× latency regressions on pure-should compound queries with minimumShouldMatch > 1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants