Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10480: Use BMM scorer for 2 clauses disjunction #972

Conversation

zacharymorn
Copy link
Contributor

Description (or a Jira issue link if you have one)

Use Block-Max-Maxscore algorithm for 2 clauses disjunction. Adapted from PR #101

@zacharymorn zacharymorn requested a review from jpountz June 22, 2022 03:51
@zacharymorn
Copy link
Contributor Author

Hi @jpountz , I have adapted the original BMM PR #101 to the latest codebase and run further experiments on using it for 2 clauses disjunction. The results look both encouraging and strange :D

When I run python3 src/python/localrun.py -source wikimedium10m with only OrHighLow, OrHighHigh and OrHighMed tasks from tasks/wikimedium.10M.nostopwords.tasks tasks/wikimedium.10M.nostopwords.tasks (by removing the other tasks), I got pretty impressive speedup on average:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      173.31     (24.6%)      181.79     (26.8%)    4.9% ( -37% -   74%) 0.547
                       OrHighLow      166.70     (62.8%)      385.94    (101.5%)  131.5% ( -20% -  794%) 0.000
                      OrHighHigh        9.27     (48.9%)       23.44     (85.9%)  152.9% (  12% -  562%) 0.000
                       OrHighMed       18.45     (61.3%)       55.92    (137.3%)  203.0% (   2% - 1037%) 0.000

However, when I run all the tasks, OrHighLow, OrHighHigh and OrHighMed have only moderate speedup on average and sometimes even slightly negatively impacted:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      OrHighHigh       35.23      (7.2%)       23.86      (7.0%)  -32.3% ( -43% -  -19%) 0.000
                       OrHighLow      898.97      (4.4%)      788.65      (4.2%)  -12.3% ( -20% -   -3%) 0.000
            BrowseDateSSDVFacets        2.62     (27.0%)        2.43     (18.8%)   -7.4% ( -41% -   52%) 0.312
                    HighSpanNear       21.86      (6.4%)       21.00      (6.1%)   -4.0% ( -15% -    9%) 0.045
                          Fuzzy2       94.11     (12.4%)       90.59      (9.8%)   -3.7% ( -23% -   21%) 0.290
                 LowSloppyPhrase       65.63      (8.2%)       63.99      (8.6%)   -2.5% ( -17% -   15%) 0.347
                HighSloppyPhrase       17.25      (5.3%)       16.84      (5.3%)   -2.4% ( -12% -    8%) 0.154
                      TermDTSort      160.18      (8.2%)      156.49      (9.9%)   -2.3% ( -18% -   17%) 0.423
           HighTermDayOfYearSort      164.86      (6.8%)      161.77     (10.1%)   -1.9% ( -17% -   16%) 0.490
          OrHighMedDayTaxoFacets       11.05      (7.1%)       10.86      (7.3%)   -1.7% ( -15% -   13%) 0.465
                      AndHighLow     1482.47      (4.0%)     1459.63     (10.6%)   -1.5% ( -15% -   13%) 0.544
                     MedSpanNear       27.77      (7.2%)       27.49      (6.1%)   -1.0% ( -13% -   13%) 0.628
            HighTermTitleBDVSort      197.53      (7.4%)      195.53      (6.3%)   -1.0% ( -13% -   13%) 0.640
         AndHighMedDayTaxoFacets       43.61      (8.7%)       43.19     (10.1%)   -1.0% ( -18% -   19%) 0.745
            HighIntervalsOrdered       17.38      (8.7%)       17.26      (7.5%)   -0.7% ( -15% -   16%) 0.782
                      HighPhrase      454.15      (5.0%)      451.67      (8.7%)   -0.5% ( -13% -   13%) 0.807
     BrowseRandomLabelSSDVFacets       15.40      (8.1%)       15.32      (7.3%)   -0.5% ( -14% -   16%) 0.837
        AndHighHighDayTaxoFacets       16.94      (7.0%)       16.87      (6.6%)   -0.5% ( -13% -   14%) 0.834
                     LowSpanNear        9.08      (4.8%)        9.05      (4.3%)   -0.3% (  -9% -    9%) 0.838
                        Wildcard       55.15     (11.3%)       55.01     (12.0%)   -0.2% ( -21% -   26%) 0.947
                       MedPhrase      976.56      (2.8%)      977.29      (3.3%)    0.1% (  -5% -    6%) 0.939
            MedTermDayTaxoFacets       77.21      (8.6%)       77.46      (8.7%)    0.3% ( -15% -   19%) 0.908
                    OrNotHighLow     1187.34      (5.1%)     1191.80      (5.3%)    0.4% (  -9% -   11%) 0.819
                   OrHighNotHigh     1556.42      (4.4%)     1566.26      (4.5%)    0.6% (  -7% -    9%) 0.654
             LowIntervalsOrdered      158.96      (6.4%)      160.03      (8.9%)    0.7% ( -13% -   17%) 0.785
                   OrNotHighHigh     1427.22      (3.8%)     1436.97      (5.0%)    0.7% (  -7% -    9%) 0.628
                          Fuzzy1      116.55     (11.4%)      117.41      (9.4%)    0.7% ( -18% -   24%) 0.823
                         LowTerm     3470.46      (5.9%)     3500.25      (5.9%)    0.9% ( -10% -   13%) 0.644
               HighTermMonthSort      169.22     (10.4%)      170.68     (14.9%)    0.9% ( -22% -   29%) 0.832
                          IntNRQ      115.77     (22.6%)      116.95     (21.3%)    1.0% ( -34% -   57%) 0.883
                         MedTerm     3042.06      (4.5%)     3080.17      (5.4%)    1.3% (  -8% -   11%) 0.427
                        HighTerm     2407.19      (5.5%)     2440.56      (4.1%)    1.4% (  -7% -   11%) 0.369
                         Prefix3      396.92     (10.2%)      403.19      (8.6%)    1.6% ( -15% -   22%) 0.595
                    OrNotHighMed     1695.31      (3.6%)     1722.43      (5.5%)    1.6% (  -7% -   11%) 0.274
                 MedSloppyPhrase       13.19      (4.5%)       13.40      (5.0%)    1.6% (  -7% -   11%) 0.283
                    OrHighNotLow     1473.94      (6.7%)     1500.95      (6.6%)    1.8% ( -10% -   16%) 0.383
                      AndHighMed      201.69      (4.5%)      205.65      (9.1%)    2.0% ( -11% -   16%) 0.387
                        PKLookup      247.69     (11.3%)      253.24      (9.6%)    2.2% ( -16% -   26%) 0.499
             MedIntervalsOrdered       30.40      (8.1%)       31.13      (7.7%)    2.4% ( -12% -   19%) 0.338
                    OrHighNotMed     1534.55      (4.5%)     1571.83      (3.9%)    2.4% (  -5% -   11%) 0.068
                         Respell       90.55      (7.9%)       92.75      (8.8%)    2.4% ( -13% -   20%) 0.359
                     AndHighHigh       65.14      (7.1%)       67.16      (8.3%)    3.1% ( -11% -   19%) 0.206
       BrowseDayOfYearSSDVFacets       20.96      (9.7%)       21.65     (11.1%)    3.3% ( -15% -   26%) 0.320
                       LowPhrase       63.71      (6.9%)       65.86      (9.2%)    3.4% ( -11% -   20%) 0.191
           BrowseMonthSSDVFacets       22.49     (13.6%)       23.62     (14.8%)    5.0% ( -20% -   38%) 0.263
           BrowseMonthTaxoFacets       26.25     (43.5%)       34.10     (40.2%)   29.9% ( -37% -  200%) 0.024
            BrowseDateTaxoFacets       22.04     (40.4%)       29.87     (63.1%)   35.5% ( -48% -  233%) 0.034
       BrowseDayOfYearTaxoFacets       22.07     (39.3%)       31.04     (64.1%)   40.6% ( -45% -  236%) 0.016
                       OrHighMed       59.30      (9.3%)       84.18     (20.1%)   41.9% (  11% -   78%) 0.000
     BrowseRandomLabelTaxoFacets       20.38     (52.4%)       30.77     (88.6%)   50.9% ( -59% -  403%) 0.027

This seems to suggest tasks run may interfere with each other as opposed to independent? Do you have any suggestion where I can look into next to confirm the performance impact of this change ?

@jpountz
Copy link
Contributor

jpountz commented Jun 22, 2022

My best guess would be that you are seeing different results mostly because luceneutil picks random queries, and the run that only had disjunctions picked queries that happened to like your change better than the run that included all tasks? These are very impressive speedups indeed!

@zacharymorn
Copy link
Contributor Author

Thanks @jpountz for looking into this! I did further experiments on this and the result seems to suggest it may be caused by bug / caching in the util or lucene itself.

What I did was I first only kept 1 query per pure disjunction task, and remove the rest of the tasks like below.

OrHighHigh: several following # freq=436129 freq=416515
OrHighMed: international chris # freq=418261 freq=85523
OrHighLow: 2005 valois # freq=835460 freq=2277

and got this result:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      189.27     (23.2%)      220.18     (16.1%)   16.3% ( -18% -   72%) 0.010
                      OrHighHigh       10.37     (41.8%)       20.92     (94.9%)  101.8% ( -24% -  410%) 0.000
                       OrHighMed       21.43     (54.3%)       56.18    (138.5%)  162.2% ( -19% -  777%) 0.000
                       OrHighLow      138.14     (26.2%)      368.50     (91.9%)  166.7% (  38% -  385%) 0.000

However, when I added back the rest of the tasks but still kept 1 query for each of the three disjunction tasks, I got vastly different results:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      OrHighHigh       48.31      (7.8%)       38.21      (5.6%)  -20.9% ( -31% -   -8%) 0.000
                       OrHighMed      152.23      (8.5%)      140.16      (8.5%)   -7.9% ( -23% -    9%) 0.003
       BrowseDayOfYearSSDVFacets       21.67     (12.4%)       20.84      (7.1%)   -3.8% ( -20% -   17%) 0.231
                       MedPhrase      103.08      (5.3%)      102.43      (9.1%)   -0.6% ( -14% -   14%) 0.790
                      TermDTSort      162.97     (10.6%)      162.10      (6.4%)   -0.5% ( -15% -   18%) 0.847
                HighSloppyPhrase       62.12      (7.3%)       61.99      (5.7%)   -0.2% ( -12% -   13%) 0.921
                    OrHighNotMed     1216.05      (4.1%)     1216.03      (3.2%)   -0.0% (  -7% -    7%) 0.999
                        HighTerm     2088.45      (4.2%)     2091.84      (3.5%)    0.2% (  -7% -    8%) 0.895
           BrowseMonthSSDVFacets       23.34     (10.0%)       23.49     (11.0%)    0.6% ( -18% -   24%) 0.845
                         MedTerm     3189.76      (3.5%)     3215.19      (3.9%)    0.8% (  -6% -    8%) 0.497
                     AndHighHigh       59.64      (6.9%)       60.14      (6.3%)    0.8% ( -11% -   15%) 0.688
                     MedSpanNear       46.86      (5.9%)       47.26      (7.8%)    0.9% ( -12% -   15%) 0.692
            HighTermTitleBDVSort      125.65      (7.5%)      126.83     (10.1%)    0.9% ( -15% -   20%) 0.738
                       LowPhrase       21.25      (4.2%)       21.62      (4.4%)    1.8% (  -6% -   10%) 0.194
     BrowseRandomLabelSSDVFacets       15.38      (6.6%)       15.66     (10.1%)    1.8% ( -13% -   19%) 0.509
                    HighSpanNear       20.48      (5.6%)       20.86      (5.1%)    1.9% (  -8% -   13%) 0.270
           HighTermDayOfYearSort      187.51      (7.6%)      191.07      (9.4%)    1.9% ( -14% -   20%) 0.482
                    OrHighNotLow     1505.18     (10.5%)     1535.43      (4.4%)    2.0% ( -11% -   18%) 0.431
                 LowSloppyPhrase      233.02      (6.9%)      237.95      (8.4%)    2.1% ( -12% -   18%) 0.383
             MedIntervalsOrdered       18.37      (5.0%)       18.77      (5.3%)    2.2% (  -7% -   13%) 0.177
                    OrNotHighMed     1310.81      (4.0%)     1342.33      (5.1%)    2.4% (  -6% -   11%) 0.096
                 MedSloppyPhrase       40.20      (6.2%)       41.18      (5.7%)    2.4% (  -8% -   15%) 0.190
        AndHighHighDayTaxoFacets       13.90      (5.6%)       14.25      (3.7%)    2.6% (  -6% -   12%) 0.090
                      AndHighMed      566.84      (5.8%)      582.07      (5.7%)    2.7% (  -8% -   15%) 0.138
                   OrNotHighHigh     1976.63      (5.0%)     2030.44      (3.5%)    2.7% (  -5% -   11%) 0.044
                          Fuzzy2       50.72     (10.3%)       52.17      (7.9%)    2.9% ( -13% -   23%) 0.325
            MedTermDayTaxoFacets       79.53      (6.7%)       81.86      (7.5%)    2.9% ( -10% -   18%) 0.192
                   OrHighNotHigh     1169.61      (6.6%)     1204.25      (3.8%)    3.0% (  -6% -   14%) 0.080
                      HighPhrase      400.95      (5.5%)      413.11      (2.2%)    3.0% (  -4% -   11%) 0.022
          OrHighMedDayTaxoFacets       13.10      (5.0%)       13.50      (5.6%)    3.1% (  -7% -   14%) 0.066
         AndHighMedDayTaxoFacets       52.75      (6.4%)       54.42      (6.3%)    3.2% (  -8% -   16%) 0.115
                    OrNotHighLow     2842.51      (3.5%)     2935.45      (5.5%)    3.3% (  -5% -   12%) 0.025
                         LowTerm     3032.44      (3.7%)     3140.83      (3.0%)    3.6% (  -3% -   10%) 0.001
               HighTermMonthSort      210.58     (11.4%)      218.63     (11.3%)    3.8% ( -16% -   29%) 0.286
                          Fuzzy1      135.95      (8.1%)      141.35      (9.6%)    4.0% ( -12% -   23%) 0.158
                          IntNRQ      365.48     (17.1%)      380.05     (12.1%)    4.0% ( -21% -   40%) 0.395
                         Prefix3       72.71     (11.4%)       75.69      (9.5%)    4.1% ( -15% -   28%) 0.217
                        Wildcard      333.86      (9.4%)      347.59     (10.5%)    4.1% ( -14% -   26%) 0.192
                     LowSpanNear       40.56      (4.1%)       42.49      (6.4%)    4.8% (  -5% -   15%) 0.005
                         Respell       56.68      (7.6%)       59.68      (5.5%)    5.3% (  -7% -   19%) 0.012
                      AndHighLow     1062.10      (6.2%)     1118.34      (5.9%)    5.3% (  -6% -   18%) 0.005
            HighIntervalsOrdered        9.10      (4.5%)        9.64      (6.5%)    5.9% (  -4% -   17%) 0.001
             LowIntervalsOrdered       15.02      (5.1%)       15.96      (5.2%)    6.2% (  -3% -   17%) 0.000
            BrowseDateSSDVFacets        2.29      (6.5%)        2.44     (19.1%)    6.4% ( -18% -   34%) 0.158
                       OrHighLow      931.34      (4.9%)      995.65      (3.7%)    6.9% (  -1% -   16%) 0.000
                        PKLookup      244.93     (10.0%)      262.86     (11.4%)    7.3% ( -12% -   31%) 0.031
           BrowseMonthTaxoFacets       23.30     (42.2%)       28.97     (50.4%)   24.3% ( -47% -  202%) 0.098
            BrowseDateTaxoFacets       23.39     (44.9%)       29.54     (52.5%)   26.3% ( -49% -  224%) 0.089
       BrowseDayOfYearTaxoFacets       23.38     (44.2%)       30.09     (54.9%)   28.7% ( -48% -  228%) 0.069
     BrowseRandomLabelTaxoFacets       22.47     (55.6%)       29.49     (68.4%)   31.2% ( -59% -  349%) 0.113

I've attached the tasks file for reference here as well

wikimedium.10M.nostopwords.tasks.txt

@mikemccand , do you have any suggestion where this discrepancy might be coming from? I'll continue to run experiments as well to see if I can pinpoint the issue.

@jpountz
Copy link
Contributor

jpountz commented Jun 23, 2022

The fact that queries perform slower in general in your first benchmark run makes me wonder if this could be due to insufficient warmup time. The default task repeat count of 20 might be too low for these queries that are very good at skipping irrelevant documents. Maybe try passing taskRepeatCount=100 in the ctor of your Competition object? Does it make queries run closer in terms of performance to your second benchmark run?

@jpountz
Copy link
Contributor

jpountz commented Jun 23, 2022

@zacharymorn FYI I played with a slightly different approach that implements BMM as a bulk scorer instead of a scorer, which I was hoping would help with making bookkeeping more lightweight: https://github.com/jpountz/lucene/tree/maxscore. It could be interesting to compare with your implementation.

One optimization it has that seemed to help that your scorer doesn't have is to check for every non-essential scorer whether the score obtained so far plus the sum of max scores of non essential scorers that haven't been checked yet is still competitive.

I got the following results on one run on wikimedium10m:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                    OrHighNotLow     1493.13      (6.5%)     1445.29      (5.1%)   -3.2% ( -13% -    8%) 0.083
                    OrNotHighMed     1410.19      (3.8%)     1373.37      (3.1%)   -2.6% (  -9% -    4%) 0.017
                   OrNotHighHigh     1057.88      (5.1%)     1031.19      (4.4%)   -2.5% ( -11% -    7%) 0.096
                    OrHighNotMed     1525.10      (5.2%)     1486.80      (4.4%)   -2.5% ( -11% -    7%) 0.098
                   OrHighNotHigh     1250.31      (4.3%)     1221.99      (3.4%)   -2.3% (  -9% -    5%) 0.062
                          IntNRQ      531.54      (2.9%)      522.49      (2.7%)   -1.7% (  -7% -    3%) 0.053
                          Fuzzy1      111.13      (2.1%)      109.80      (2.6%)   -1.2% (  -5% -    3%) 0.107
                      AndHighMed      386.29      (4.1%)      381.84      (3.3%)   -1.2% (  -8% -    6%) 0.329
                     AndHighHigh       78.96      (5.6%)       78.18      (4.7%)   -1.0% ( -10% -    9%) 0.548
            BrowseDateSSDVFacets        4.51     (12.6%)        4.47     (12.4%)   -0.8% ( -22% -   27%) 0.836
                    OrNotHighLow     1316.24      (3.8%)     1305.93      (3.1%)   -0.8% (  -7% -    6%) 0.476
          OrHighMedDayTaxoFacets       20.87      (5.1%)       20.71      (4.2%)   -0.8% (  -9% -    9%) 0.609
           BrowseMonthSSDVFacets       23.54      (6.4%)       23.42      (7.4%)   -0.5% ( -13% -   14%) 0.817
     BrowseRandomLabelTaxoFacets       37.54      (1.7%)       37.37      (1.9%)   -0.5% (  -4% -    3%) 0.432
                     MedSpanNear       68.68      (1.7%)       68.37      (2.2%)   -0.4% (  -4% -    3%) 0.474
        AndHighHighDayTaxoFacets       10.78      (5.9%)       10.73      (4.7%)   -0.4% ( -10% -   10%) 0.794
           BrowseMonthTaxoFacets       28.39     (10.0%)       28.29      (9.1%)   -0.3% ( -17% -   20%) 0.910
           HighTermDayOfYearSort      171.78     (13.7%)      171.22     (13.2%)   -0.3% ( -23% -   30%) 0.939
                        PKLookup      245.27      (2.2%)      244.52      (1.9%)   -0.3% (  -4% -    3%) 0.635
                HighSloppyPhrase       39.08      (2.9%)       38.96      (4.3%)   -0.3% (  -7% -    7%) 0.795
               HighTermMonthSort      167.47     (15.1%)      167.06     (14.7%)   -0.2% ( -26% -   34%) 0.959
                      HighPhrase      250.14      (2.8%)      249.53      (2.3%)   -0.2% (  -5% -    5%) 0.767
                      TermDTSort      138.22     (14.0%)      137.97     (13.4%)   -0.2% ( -24% -   31%) 0.967
                          Fuzzy2       55.22      (1.6%)       55.17      (1.5%)   -0.1% (  -3% -    3%) 0.837
                         MedTerm     1844.25      (6.4%)     1843.10      (4.9%)   -0.1% ( -10% -   11%) 0.972
                 MedSloppyPhrase       15.34      (2.2%)       15.33      (3.9%)   -0.1% (  -5% -    6%) 0.954
                         Prefix3      110.03      (2.6%)      110.07      (1.8%)    0.0% (  -4% -    4%) 0.962
                    HighSpanNear        7.95      (1.7%)        7.97      (1.7%)    0.2% (  -3% -    3%) 0.772
       BrowseDayOfYearTaxoFacets       46.78      (1.9%)       46.86      (2.1%)    0.2% (  -3% -    4%) 0.788
                      AndHighLow     1291.99      (2.6%)     1294.28      (3.4%)    0.2% (  -5% -    6%) 0.854
                     LowSpanNear       47.55      (1.5%)       47.64      (1.4%)    0.2% (  -2% -    3%) 0.697
                        Wildcard      157.83      (1.5%)      158.14      (1.3%)    0.2% (  -2% -    3%) 0.661
                       LowPhrase       83.20      (2.3%)       83.37      (2.1%)    0.2% (  -4% -    4%) 0.773
                         Respell       95.18      (1.4%)       95.47      (1.3%)    0.3% (  -2% -    3%) 0.492
         AndHighMedDayTaxoFacets       51.97      (1.8%)       52.16      (2.1%)    0.4% (  -3% -    4%) 0.553
            BrowseDateTaxoFacets       45.77      (2.0%)       45.98      (1.9%)    0.5% (  -3% -    4%) 0.452
            MedTermDayTaxoFacets       60.66      (5.9%)       61.03      (5.0%)    0.6% (  -9% -   12%) 0.718
                       MedPhrase       57.67      (3.1%)       58.06      (2.6%)    0.7% (  -4% -    6%) 0.452
       BrowseDayOfYearSSDVFacets       20.40      (6.0%)       20.57      (4.2%)    0.8% (  -8% -   11%) 0.609
                 LowSloppyPhrase       37.59      (4.0%)       38.00      (3.6%)    1.1% (  -6% -    9%) 0.376
     BrowseRandomLabelSSDVFacets       15.25      (5.2%)       15.41      (6.9%)    1.1% ( -10% -   13%) 0.571
                        HighTerm     2001.23      (6.4%)     2025.82      (4.9%)    1.2% (  -9% -   13%) 0.493
                         LowTerm     2092.97      (4.3%)     2119.02      (5.5%)    1.2% (  -8% -   11%) 0.423
             MedIntervalsOrdered       56.91      (3.9%)       57.92      (3.0%)    1.8% (  -4% -    9%) 0.107
            HighIntervalsOrdered       16.67      (6.2%)       16.97      (4.6%)    1.8% (  -8% -   13%) 0.297
             LowIntervalsOrdered       20.18      (4.3%)       20.57      (3.3%)    1.9% (  -5% -   10%) 0.113
            HighTermTitleBDVSort      182.32     (14.2%)      186.92     (22.0%)    2.5% ( -29% -   45%) 0.667
                       OrHighLow     1235.23      (1.8%)     1484.12      (4.8%)   20.1% (  13% -   27%) 0.000
                       OrHighMed      156.75      (4.7%)      200.46      (4.7%)   27.9% (  17% -   39%) 0.000
                      OrHighHigh       25.07      (5.2%)       48.30      (9.1%)   92.6% (  74% -  112%) 0.000

@zacharymorn
Copy link
Contributor Author

Thanks @jpountz for the suggestion and also providing the bulk scorer implementation! The result looks pretty impressive as well!

I just tried taskRepeatCount=200 with my implementation, and although it did make the results more stable across runs, the speed up with full tasks was still no where near that from just running the three disjunction tasks. I also pulled your version and ran through the same set of tests above, and the speed up were both good and stable (which rules out issue from the util side I guess). I will study your implementation further and see if mine can be improved accordingly.

@zacharymorn
Copy link
Contributor Author

Alright. As it turns out, the reason I'm getting vastly different performance results as I change tasks file here #972 (comment) is that, I have previously configured SEARCH_NUM_THREADS = 10 in my localconstants.py and my changes somehow work better when more threads are executing disjunction queries at the same time. When the tasks file is unmodified and has more than 10 query tasks, the disjunction tasks get executed effectively serially; but when the tasks file only has 3 disjunction tasks, each task is executed multi-threaded.

Here are the results with only 3 disjunction tasks with 3 queries each, but with different number of search threads:

SEARCH_NUM_THREADS = 10

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      197.09     (18.1%)      202.07     (20.2%)    2.5% ( -30% -   49%) 0.678
                       OrHighLow      217.92     (28.3%)      433.05     (46.3%)   98.7% (  18% -  241%) 0.000
                      OrHighHigh       12.53     (48.0%)       27.27     (64.9%)  117.7% (   3% -  443%) 0.000
                       OrHighMed       14.09     (50.2%)       40.60     (74.8%)  188.1% (  42% -  628%) 0.000

SEARCH_NUM_THREADS = 2

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                       OrHighLow      912.29     (12.1%)      903.58      (8.2%)   -1.0% ( -18% -   22%) 0.771
                        PKLookup      252.26     (15.0%)      260.21     (15.5%)    3.2% ( -23% -   39%) 0.514
                       OrHighMed       70.16     (11.8%)       84.99      (8.9%)   21.1% (   0% -   47%) 0.000
                      OrHighHigh       38.24     (11.9%)       49.09     (10.7%)   28.4% (   5% -   57%) 0.000

Just curious @jpountz , were you using the default SEARCH_NUM_THREADS = 2 during your benchmark?

@zacharymorn
Copy link
Contributor Author

Hi @jpountz, I've taken some ideas from your bulk scorer implementation and was able to simplify my code as well as to boost the performance when under default SEARCH_NUM_THREADS here. The benchmark results look similar now albeit a bit varying :

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
            BrowseDateSSDVFacets        4.38     (35.8%)        4.01     (30.3%)   -8.3% ( -54% -   90%) 0.431
                         Prefix3      811.56      (5.6%)      782.08      (7.8%)   -3.6% ( -16% -   10%) 0.091
          OrHighMedDayTaxoFacets       11.42      (5.6%)       11.11      (8.0%)   -2.7% ( -15% -   11%) 0.223
                          IntNRQ      297.19      (1.5%)      291.62      (5.0%)   -1.9% (  -8% -    4%) 0.107
                        Wildcard      269.43      (5.0%)      264.57      (6.4%)   -1.8% ( -12% -   10%) 0.319
     BrowseRandomLabelSSDVFacets       20.22      (8.8%)       19.86      (8.4%)   -1.8% ( -17% -   16%) 0.518
            HighTermTitleBDVSort      236.73      (8.6%)      232.93      (8.6%)   -1.6% ( -17% -   17%) 0.555
        AndHighHighDayTaxoFacets       12.67      (2.9%)       12.48      (4.4%)   -1.5% (  -8% -    5%) 0.186
           BrowseMonthTaxoFacets       32.18     (36.3%)       31.72     (38.8%)   -1.4% ( -56% -  115%) 0.904
                       LowPhrase     1725.41      (3.3%)     1702.14      (5.3%)   -1.3% (  -9% -    7%) 0.334
                 MedSloppyPhrase      111.58      (3.2%)      110.16      (3.8%)   -1.3% (  -8% -    5%) 0.250
                      HighPhrase      930.18      (2.5%)      919.75      (3.4%)   -1.1% (  -6% -    4%) 0.234
            MedTermDayTaxoFacets       46.10      (3.9%)       45.68      (4.8%)   -0.9% (  -9% -    8%) 0.514
                      TermDTSort      341.03      (7.2%)      338.23      (8.5%)   -0.8% ( -15% -   15%) 0.740
         AndHighMedDayTaxoFacets       39.88      (1.9%)       39.57      (3.1%)   -0.8% (  -5% -    4%) 0.349
           HighTermDayOfYearSort      148.85      (7.6%)      147.86      (8.3%)   -0.7% ( -15% -   16%) 0.792
               HighTermMonthSort      218.46      (8.6%)      217.06      (9.2%)   -0.6% ( -16% -   18%) 0.819
                    OrNotHighLow     2696.50      (5.4%)     2681.95      (5.0%)   -0.5% ( -10% -   10%) 0.743
                 LowSloppyPhrase       22.79      (2.0%)       22.69      (2.9%)   -0.4% (  -5% -    4%) 0.585
                          Fuzzy2      125.08      (2.7%)      124.54      (4.3%)   -0.4% (  -7% -    6%) 0.708
                HighSloppyPhrase       21.02      (2.3%)       20.94      (3.0%)   -0.4% (  -5% -    5%) 0.629
                    OrHighNotMed     1805.04      (4.7%)     1797.98      (5.8%)   -0.4% ( -10% -   10%) 0.816
           BrowseMonthSSDVFacets       29.37     (14.0%)       29.26     (13.4%)   -0.4% ( -24% -   31%) 0.933
                       MedPhrase      205.52      (1.7%)      204.78      (3.0%)   -0.4% (  -4% -    4%) 0.643
                          Fuzzy1      128.47      (2.8%)      128.05      (4.2%)   -0.3% (  -7% -    6%) 0.772
                      AndHighLow     2126.24      (5.3%)     2124.42      (5.6%)   -0.1% ( -10% -   11%) 0.960
                         Respell       83.33      (3.2%)       83.33      (4.2%)    0.0% (  -7% -    7%) 0.998
                   OrHighNotHigh     1415.44      (4.4%)     1419.78      (4.5%)    0.3% (  -8% -    9%) 0.827
                    OrHighNotLow     1655.08      (4.4%)     1663.51      (4.7%)    0.5% (  -8% -   10%) 0.725
                   OrNotHighHigh     1035.89      (3.1%)     1042.85      (4.6%)    0.7% (  -6% -    8%) 0.587
                        PKLookup      283.77      (5.1%)      285.92      (4.5%)    0.8% (  -8% -   10%) 0.616
                         LowTerm     3616.62      (4.1%)     3655.48      (5.3%)    1.1% (  -8% -   10%) 0.476
                    HighSpanNear       15.54      (2.2%)       15.71      (3.5%)    1.1% (  -4% -    7%) 0.241
                         MedTerm     2615.07      (4.0%)     2645.27      (4.0%)    1.2% (  -6% -    9%) 0.364
                    OrNotHighMed     1759.45      (4.2%)     1779.94      (4.6%)    1.2% (  -7% -   10%) 0.406
                     LowSpanNear       66.06      (2.9%)       66.83      (4.3%)    1.2% (  -5% -    8%) 0.316
       BrowseDayOfYearSSDVFacets       26.94     (10.7%)       27.30      (9.7%)    1.3% ( -17% -   24%) 0.684
             MedIntervalsOrdered       86.40      (5.1%)       87.58      (4.8%)    1.4% (  -8% -   11%) 0.387
                      AndHighMed      113.83      (3.4%)      115.48      (5.8%)    1.5% (  -7% -   11%) 0.334
                     AndHighHigh       42.71      (2.9%)       43.35      (5.8%)    1.5% (  -6% -   10%) 0.300
                        HighTerm     2639.27      (4.2%)     2678.82      (4.4%)    1.5% (  -6% -   10%) 0.273
            HighIntervalsOrdered       11.98      (7.1%)       12.17      (6.7%)    1.5% ( -11% -   16%) 0.480
                     MedSpanNear       97.91      (2.7%)       99.79      (4.0%)    1.9% (  -4% -    8%) 0.077
             LowIntervalsOrdered      222.06     (12.2%)      229.67     (10.9%)    3.4% ( -17% -   30%) 0.348
            BrowseDateTaxoFacets       28.41     (35.2%)       31.76     (47.7%)   11.8% ( -52% -  146%) 0.374
       BrowseDayOfYearTaxoFacets       28.45     (34.8%)       31.91     (47.4%)   12.2% ( -51% -  144%) 0.355
     BrowseRandomLabelTaxoFacets       27.25     (44.3%)       31.80     (60.7%)   16.7% ( -61% -  218%) 0.321
                       OrHighLow     1204.21      (3.6%)     1475.66      (6.9%)   22.5% (  11% -   34%) 0.000
                       OrHighMed      207.04      (6.4%)      285.96      (5.7%)   38.1% (  24% -   53%) 0.000
                      OrHighHigh       50.80      (6.2%)       88.52      (8.1%)   74.3% (  56% -   94%) 0.000
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
     BrowseRandomLabelSSDVFacets       20.99      (8.4%)       20.06      (7.6%)   -4.4% ( -18% -   12%) 0.081
                         Prefix3      264.64      (5.6%)      258.91      (7.0%)   -2.2% ( -13% -   11%) 0.278
                    HighSpanNear       55.97      (2.8%)       55.33      (3.4%)   -1.1% (  -7% -    5%) 0.240
       BrowseDayOfYearSSDVFacets       27.04      (8.0%)       26.74      (4.1%)   -1.1% ( -12% -   12%) 0.591
                          IntNRQ      123.74     (25.9%)      122.52     (22.6%)   -1.0% ( -39% -   64%) 0.898
                        Wildcard      144.91      (5.7%)      143.53      (5.5%)   -0.9% ( -11% -   10%) 0.593
                     MedSpanNear       58.12      (2.2%)       57.62      (2.6%)   -0.9% (  -5% -    4%) 0.266
                 LowSloppyPhrase       56.23      (2.7%)       55.75      (2.5%)   -0.8% (  -5% -    4%) 0.309
                     LowSpanNear       83.48      (1.9%)       82.83      (2.2%)   -0.8% (  -4% -    3%) 0.237
         AndHighMedDayTaxoFacets       66.27      (2.1%)       65.85      (1.8%)   -0.6% (  -4% -    3%) 0.307
            MedTermDayTaxoFacets       61.16      (4.3%)       60.78      (4.9%)   -0.6% (  -9% -    8%) 0.676
               HighTermMonthSort      197.74     (11.7%)      196.88      (9.9%)   -0.4% ( -19% -   23%) 0.898
                         MedTerm     3912.22      (3.9%)     3899.21      (4.0%)   -0.3% (  -7% -    7%) 0.789
        AndHighHighDayTaxoFacets       52.69      (1.5%)       52.53      (1.5%)   -0.3% (  -3% -    2%) 0.515
                        PKLookup      285.12      (4.8%)      284.29      (4.2%)   -0.3% (  -8% -    9%) 0.837
                       MedPhrase      155.83      (2.1%)      155.88      (1.7%)    0.0% (  -3% -    3%) 0.962
                    OrNotHighLow     2052.07      (4.9%)     2053.48      (5.0%)    0.1% (  -9% -   10%) 0.965
                         Respell       86.89      (3.3%)       86.99      (3.1%)    0.1% (  -6% -    6%) 0.912
                          Fuzzy1      136.62      (1.9%)      136.98      (2.6%)    0.3% (  -4% -    4%) 0.709
                      HighPhrase      652.65      (2.1%)      654.44      (1.8%)    0.3% (  -3% -    4%) 0.657
                          Fuzzy2      104.64      (2.7%)      105.09      (2.3%)    0.4% (  -4% -    5%) 0.591
                   OrNotHighHigh     1558.41      (2.5%)     1565.23      (2.4%)    0.4% (  -4% -    5%) 0.572
                      AndHighLow     1378.89      (3.6%)     1385.39      (3.4%)    0.5% (  -6% -    7%) 0.667
                HighSloppyPhrase       43.66      (3.8%)       43.88      (3.3%)    0.5% (  -6% -    7%) 0.644
            BrowseDateSSDVFacets        4.31     (36.0%)        4.34     (32.6%)    0.6% ( -49% -  107%) 0.955
             LowIntervalsOrdered       24.44      (3.3%)       24.60      (3.5%)    0.7% (  -5% -    7%) 0.536
                 MedSloppyPhrase      121.45      (3.4%)      122.31      (3.2%)    0.7% (  -5% -    7%) 0.499
                    OrNotHighMed     1518.27      (2.9%)     1530.49      (3.5%)    0.8% (  -5% -    7%) 0.432
                       LowPhrase     1590.38      (3.3%)     1604.25      (3.9%)    0.9% (  -6% -    8%) 0.447
           HighTermDayOfYearSort      227.17      (7.8%)      229.44      (7.7%)    1.0% ( -13% -   17%) 0.684
                    OrHighNotMed     1426.20      (4.6%)     1440.64      (3.9%)    1.0% (  -7% -    9%) 0.453
            HighIntervalsOrdered        6.34      (4.4%)        6.42      (4.8%)    1.2% (  -7% -   10%) 0.413
                         LowTerm     3170.27      (5.2%)     3208.75      (6.0%)    1.2% (  -9% -   13%) 0.495
                    OrHighNotLow     1904.11      (4.6%)     1929.56      (3.8%)    1.3% (  -6% -   10%) 0.316
                      AndHighMed      301.92      (4.1%)      306.40      (3.9%)    1.5% (  -6% -    9%) 0.243
             MedIntervalsOrdered       44.71      (4.1%)       45.38      (4.5%)    1.5% (  -6% -   10%) 0.274
                   OrHighNotHigh     2139.92      (2.9%)     2172.45      (4.3%)    1.5% (  -5% -    8%) 0.191
            HighTermTitleBDVSort      300.97      (8.9%)      306.56      (8.5%)    1.9% ( -14% -   21%) 0.499
                        HighTerm     2745.94      (3.5%)     2799.23      (3.6%)    1.9% (  -4% -    9%) 0.083
                      TermDTSort      338.13      (6.3%)      345.25      (6.8%)    2.1% ( -10% -   16%) 0.309
                     AndHighHigh      105.27      (3.0%)      107.76      (3.6%)    2.4% (  -4% -    9%) 0.024
           BrowseMonthTaxoFacets       25.29     (31.4%)       25.99     (35.5%)    2.7% ( -48% -  101%) 0.795
          OrHighMedDayTaxoFacets        7.05      (7.7%)        7.26      (5.0%)    3.0% (  -9% -   17%) 0.146
       BrowseDayOfYearTaxoFacets       27.07     (29.3%)       28.39     (42.0%)    4.8% ( -51% -  107%) 0.672
            BrowseDateTaxoFacets       27.00     (29.5%)       28.33     (42.5%)    4.9% ( -51% -  109%) 0.671
           BrowseMonthSSDVFacets       29.01     (10.5%)       30.45     (10.7%)    5.0% ( -14% -   29%) 0.139
     BrowseRandomLabelTaxoFacets       25.85     (37.4%)       28.27     (58.1%)    9.4% ( -62% -  167%) 0.544
                       OrHighLow     1322.31      (2.8%)     1506.33      (3.5%)   13.9% (   7% -   20%) 0.000
                      OrHighHigh       37.60      (5.5%)       59.95      (9.9%)   59.4% (  41% -   79%) 0.000
                       OrHighMed      146.48      (4.4%)      320.00     (10.1%)  118.5% (  99% -  139%) 0.000
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
            HighTermTitleBDVSort      232.99      (8.7%)      227.90      (6.6%)   -2.2% ( -16% -   14%) 0.372
       BrowseDayOfYearSSDVFacets       27.67      (9.1%)       27.14     (10.6%)   -1.9% ( -19% -   19%) 0.533
                      AndHighMed      195.79      (4.8%)      192.91      (4.3%)   -1.5% ( -10% -    8%) 0.309
                         Prefix3      752.40      (9.1%)      742.56      (7.8%)   -1.3% ( -16% -   17%) 0.627
                        HighTerm     2655.43      (5.3%)     2622.06      (3.1%)   -1.3% (  -9% -    7%) 0.360
                     AndHighHigh      171.69      (4.6%)      169.66      (4.0%)   -1.2% (  -9% -    7%) 0.390
                   OrNotHighHigh     1275.59      (3.6%)     1261.66      (2.5%)   -1.1% (  -6% -    5%) 0.259
                   OrHighNotHigh     1670.68      (4.4%)     1655.08      (3.0%)   -0.9% (  -7% -    6%) 0.433
                 MedSloppyPhrase       12.51      (2.8%)       12.40      (2.5%)   -0.9% (  -5% -    4%) 0.277
                    OrHighNotMed     1677.16      (4.8%)     1663.03      (3.0%)   -0.8% (  -8% -    7%) 0.508
                         MedTerm     4174.46      (3.9%)     4144.81      (4.6%)   -0.7% (  -8% -    8%) 0.598
                         LowTerm     3205.62      (4.9%)     3183.90      (4.5%)   -0.7% (  -9% -    9%) 0.647
                          IntNRQ       86.34      (8.2%)       85.91      (8.2%)   -0.5% ( -15% -   17%) 0.846
                        PKLookup      290.40      (4.1%)      289.10      (4.4%)   -0.4% (  -8% -    8%) 0.740
                     LowSpanNear       18.90      (2.9%)       18.81      (3.1%)   -0.4% (  -6% -    5%) 0.640
                    OrNotHighMed     1413.70      (2.7%)     1408.55      (3.0%)   -0.4% (  -5% -    5%) 0.683
                      AndHighLow     1965.22      (4.7%)     1958.33      (3.7%)   -0.4% (  -8% -    8%) 0.794
                    OrHighNotLow     1628.65      (4.1%)     1624.17      (3.9%)   -0.3% (  -7% -    8%) 0.828
                 LowSloppyPhrase      244.50      (3.0%)      243.99      (2.2%)   -0.2% (  -5% -    5%) 0.802
                     MedSpanNear       69.69      (3.4%)       69.56      (2.8%)   -0.2% (  -6% -    6%) 0.844
                       LowPhrase      176.86      (3.9%)      176.70      (4.0%)   -0.1% (  -7% -    8%) 0.941
                    HighSpanNear       25.67      (4.0%)       25.65      (3.6%)   -0.1% (  -7% -    7%) 0.964
            HighIntervalsOrdered        8.08     (10.1%)        8.08      (9.8%)    0.0% ( -18% -   22%) 0.994
             LowIntervalsOrdered       90.46      (6.5%)       90.49      (6.4%)    0.0% ( -12% -   13%) 0.988
                        Wildcard      297.27      (5.4%)      297.43      (4.4%)    0.1% (  -9% -   10%) 0.972
           BrowseMonthSSDVFacets       28.60      (8.6%)       28.66     (11.6%)    0.2% ( -18% -   22%) 0.947
             MedIntervalsOrdered       69.85      (6.4%)       70.00      (6.4%)    0.2% ( -11% -   13%) 0.913
        AndHighHighDayTaxoFacets       32.05      (1.8%)       32.13      (2.4%)    0.3% (  -3% -    4%) 0.705
                HighSloppyPhrase       15.93      (3.3%)       16.00      (3.3%)    0.4% (  -6% -    7%) 0.682
                    OrNotHighLow     1784.41      (4.1%)     1792.59      (4.5%)    0.5% (  -7% -    9%) 0.734
                          Fuzzy1      146.35      (2.9%)      147.05      (3.8%)    0.5% (  -6% -    7%) 0.656
                         Respell      114.58      (2.5%)      115.18      (3.1%)    0.5% (  -5% -    6%) 0.559
         AndHighMedDayTaxoFacets      134.21      (1.7%)      135.03      (2.6%)    0.6% (  -3% -    4%) 0.373
                      TermDTSort      176.85      (5.1%)      177.94      (6.9%)    0.6% ( -10% -   13%) 0.747
                      HighPhrase      134.11      (2.2%)      135.02      (2.8%)    0.7% (  -4% -    5%) 0.394
                          Fuzzy2      128.74      (2.5%)      129.63      (3.2%)    0.7% (  -4% -    6%) 0.450
                       MedPhrase       85.90      (2.2%)       86.68      (1.9%)    0.9% (  -3% -    5%) 0.158
            MedTermDayTaxoFacets       69.57      (4.2%)       70.39      (4.0%)    1.2% (  -6% -    9%) 0.368
           HighTermDayOfYearSort      196.16      (2.7%)      198.50      (6.6%)    1.2% (  -7% -   10%) 0.455
     BrowseRandomLabelSSDVFacets       20.42      (9.4%)       20.68      (9.7%)    1.3% ( -16% -   22%) 0.671
            BrowseDateSSDVFacets        4.58     (28.7%)        4.66     (32.0%)    1.8% ( -45% -   87%) 0.855
          OrHighMedDayTaxoFacets       19.97      (7.5%)       20.62      (6.2%)    3.2% (  -9% -   18%) 0.138
               HighTermMonthSort      212.99      (5.8%)      219.92      (8.9%)    3.3% ( -10% -   19%) 0.172
           BrowseMonthTaxoFacets       27.65     (35.6%)       31.27     (47.3%)   13.1% ( -51% -  149%) 0.323
            BrowseDateTaxoFacets       24.81     (27.6%)       28.54     (43.5%)   15.1% ( -43% -  118%) 0.191
       BrowseDayOfYearTaxoFacets       24.85     (26.6%)       28.68     (43.3%)   15.4% ( -43% -  116%) 0.175
     BrowseRandomLabelTaxoFacets       23.80     (38.7%)       28.05     (56.9%)   17.8% ( -56% -  185%) 0.247
                       OrHighLow      886.55      (3.3%)     1105.54      (5.0%)   24.7% (  15% -   34%) 0.000
                       OrHighMed      209.42      (6.1%)      304.52      (4.7%)   45.4% (  32% -   59%) 0.000
                      OrHighHigh       46.51      (6.4%)       68.85      (6.9%)   48.0% (  32% -   65%) 0.000
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           HighTermDayOfYearSort      268.07      (7.6%)      260.40      (3.8%)   -2.9% ( -13% -    9%) 0.132
                        PKLookup      284.74      (5.7%)      278.54      (5.8%)   -2.2% ( -12% -    9%) 0.234
     BrowseRandomLabelSSDVFacets       20.82     (10.2%)       20.45      (6.7%)   -1.8% ( -16% -   16%) 0.512
                          Fuzzy2      234.77      (3.0%)      230.58      (3.3%)   -1.8% (  -7% -    4%) 0.073
            MedTermDayTaxoFacets       42.28      (4.7%)       41.56      (5.6%)   -1.7% ( -11% -    9%) 0.298
               HighTermMonthSort      218.63      (9.6%)      215.12      (7.8%)   -1.6% ( -17% -   17%) 0.563
                      TermDTSort      262.87      (7.8%)      258.91      (7.2%)   -1.5% ( -15% -   14%) 0.527
                         Prefix3      349.11      (7.1%)      344.04      (8.0%)   -1.5% ( -15% -   14%) 0.544
        AndHighHighDayTaxoFacets       20.76      (2.6%)       20.47      (3.6%)   -1.4% (  -7% -    4%) 0.154
          OrHighMedDayTaxoFacets        9.09      (5.7%)        8.96      (6.1%)   -1.4% ( -12% -   11%) 0.460
                         Respell      109.27      (3.4%)      107.82      (2.5%)   -1.3% (  -6% -    4%) 0.158
                          Fuzzy1      177.60      (3.9%)      175.30      (2.9%)   -1.3% (  -7% -    5%) 0.229
                        Wildcard      391.34      (8.0%)      386.37      (9.0%)   -1.3% ( -16% -   17%) 0.636
                       MedPhrase     1174.01      (3.6%)     1160.58      (4.8%)   -1.1% (  -9% -    7%) 0.396
         AndHighMedDayTaxoFacets       34.66      (2.7%)       34.27      (3.4%)   -1.1% (  -7% -    5%) 0.243
                      AndHighMed      334.12      (6.3%)      330.86      (4.8%)   -1.0% ( -11% -   10%) 0.584
                      AndHighLow     1720.82      (5.3%)     1704.25      (4.3%)   -1.0% ( -10% -    9%) 0.527
                     MedSpanNear       18.51      (3.6%)       18.36      (3.8%)   -0.8% (  -7% -    6%) 0.491
                       LowPhrase      159.66      (6.1%)      158.43      (6.3%)   -0.8% ( -12% -   12%) 0.693
                     LowSpanNear       14.66      (3.4%)       14.54      (3.4%)   -0.8% (  -7% -    6%) 0.475
                    OrHighNotLow     1848.94      (4.3%)     1837.32      (4.4%)   -0.6% (  -8% -    8%) 0.646
                    HighSpanNear       36.23      (3.5%)       36.04      (2.7%)   -0.5% (  -6% -    5%) 0.590
                   OrNotHighHigh     1355.23      (4.1%)     1349.28      (4.7%)   -0.4% (  -8% -    8%) 0.753
                      HighPhrase       53.43      (2.7%)       53.22      (3.9%)   -0.4% (  -6% -    6%) 0.706
                HighSloppyPhrase       54.76      (3.5%)       54.68      (4.3%)   -0.2% (  -7% -    7%) 0.899
            HighTermTitleBDVSort      199.46      (9.0%)      199.23     (12.6%)   -0.1% ( -19% -   23%) 0.973
                   OrHighNotHigh     1289.90      (3.5%)     1292.38      (3.8%)    0.2% (  -6% -    7%) 0.869
                 MedSloppyPhrase       23.10      (2.2%)       23.15      (2.7%)    0.2% (  -4% -    5%) 0.780
                    OrHighNotMed     1880.20      (3.9%)     1886.66      (5.3%)    0.3% (  -8% -    9%) 0.814
                    OrNotHighLow     1524.76      (3.7%)     1530.74      (4.2%)    0.4% (  -7% -    8%) 0.755
                          IntNRQ     1154.82      (2.3%)     1159.36      (4.6%)    0.4% (  -6% -    7%) 0.731
       BrowseDayOfYearSSDVFacets       26.46      (9.5%)       26.56      (8.8%)    0.4% ( -16% -   20%) 0.890
                    OrNotHighMed     1280.97      (3.4%)     1286.37      (3.4%)    0.4% (  -6% -    7%) 0.695
                 LowSloppyPhrase       18.44      (3.1%)       18.53      (3.4%)    0.4% (  -5% -    7%) 0.671
                     AndHighHigh       64.90      (3.7%)       65.31      (3.3%)    0.6% (  -6% -    7%) 0.565
                         LowTerm     4926.25      (5.3%)     4962.05      (6.0%)    0.7% ( -10% -   12%) 0.684
            HighIntervalsOrdered       12.83      (4.8%)       12.93      (4.5%)    0.8% (  -8% -   10%) 0.608
             LowIntervalsOrdered      199.64      (5.5%)      201.26      (5.2%)    0.8% (  -9% -   12%) 0.630
             MedIntervalsOrdered       18.57      (5.1%)       18.72      (4.9%)    0.8% (  -8% -   11%) 0.597
                        HighTerm     3069.83      (4.1%)     3109.22      (5.1%)    1.3% (  -7% -   10%) 0.378
                         MedTerm     3000.10      (4.0%)     3039.40      (5.1%)    1.3% (  -7% -   10%) 0.365
           BrowseMonthSSDVFacets       29.00     (12.5%)       30.24     (13.9%)    4.3% ( -19% -   35%) 0.302
            BrowseDateSSDVFacets        4.53     (31.8%)        4.88     (31.7%)    7.7% ( -42% -  104%) 0.445
            BrowseDateTaxoFacets       27.12     (37.5%)       29.80     (45.3%)    9.9% ( -53% -  148%) 0.453
       BrowseDayOfYearTaxoFacets       27.25     (37.2%)       30.01     (44.3%)   10.1% ( -52% -  145%) 0.433
     BrowseRandomLabelTaxoFacets       25.82     (44.7%)       29.41     (60.5%)   13.9% ( -63% -  215%) 0.409
           BrowseMonthTaxoFacets       25.92     (35.0%)       29.53     (45.3%)   13.9% ( -49% -  145%) 0.277
                       OrHighLow      885.95      (4.3%)     1142.96      (3.7%)   29.0% (  20% -   38%) 0.000
                      OrHighHigh       46.72      (6.1%)       65.39      (5.8%)   40.0% (  26% -   55%) 0.000
                       OrHighMed      144.59      (6.7%)      525.10     (11.3%)  263.2% ( 229% -  301%) 0.000

For

One optimization it has that seemed to help that your scorer doesn't have is to check for every non-essential scorer whether the score obtained so far plus the sum of max scores of non essential scorers that haven't been checked yet is still competitive.

I implemented similar logic to move the next least contributing essential scorer into non-essential scorer list when minCompetitiveScore increased, I feel the effect would be similar?

In terms of next steps, I'm wondering if there's a preference between bulk scorer and scorer implementations when performance improvement is similar (maybe one type of scorer can be used in more places) ? I've learnt a lot from both PRs already and either one looks like a good improvement for disjunction queries!

@jpountz
Copy link
Contributor

jpountz commented Jun 27, 2022

I feel the effect would be similar?

Indeed, sorry I had misread your code!

In terms of next steps, I'm wondering if there's a preference between bulk scorer and scorer implementations when performance improvement is similar

No, it shouldn't matter. Bulk scorers sometimes help yield better performance because it's easier for them to amortize computation across docs, but if they don't yield better performance, there's no point in using a bulk scorer instead of a regular scorer.

I agree that it looks like a great speedup, we should get this in! The benchmark only tests performance of top-level disjunctions of term queries that have two clauses. I'd be curious to get performance numbers for queries like the below ones to see if we need to fine-tune a bit more when this new scorer gets used. Note that I don't think we need to get the performance better for all these queries to merge the change, we could start by only using this new scorer for the (common) case of a top-level disjunction of 2 term queries, and later see if this scorer can handle more disjunctions.

OrAndHigMedAndHighMed: (+including +looking) (+date +finished) # disjunction of conjunctions, which don't have as good score upper bounds as term queries
OrHighPhraseHighPhrase: "united states" "new york" # disjunction of phrase queries, which don't have as good score upper bounds as term queries and are slow to advance
AndHighOrMedMed: +be +(mostly interview) # disjunction within conjunction that leads iteration
AndMedOrHighHigh: +interview +(at united) # disjunction within conjunction that doesn't lead iteration

@zacharymorn
Copy link
Contributor Author

I feel the effect would be similar?

Indeed, sorry I had misread your code!

No worry, thanks still for the suggestion!

No, it shouldn't matter. Bulk scorers sometimes help yield better performance because it's easier for them to amortize computation across docs, but if they don't yield better performance, there's no point in using a bulk scorer instead of a regular scorer.

Ok I see, makes sense.

I agree that it looks like a great speedup, we should get this in! The benchmark only tests performance of top-level disjunctions of term queries that have two clauses. I'd be curious to get performance numbers for queries like the below ones to see if we need to fine-tune a bit more when this new scorer gets used. Note that I don't think we need to get the performance better for all these queries to merge the change, we could start by only using this new scorer for the (common) case of a top-level disjunction of 2 term queries, and later see if this scorer can handle more disjunctions.

OrAndHigMedAndHighMed: (+including +looking) (+date +finished) # disjunction of conjunctions, which don't have as good score upper bounds as term queries
OrHighPhraseHighPhrase: "united states" "new york" # disjunction of phrase queries, which don't have as good score upper bounds as term queries and are slow to advance
AndHighOrMedMed: +be +(mostly interview) # disjunction within conjunction that leads iteration
AndMedOrHighHigh: +interview +(at united) # disjunction within conjunction that doesn't lead iteration

Sounds good! I have run these queries through benchmark and the results look somewhat consistent:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
          OrHighPhraseHighPhrase       28.89      (8.7%)       24.19      (4.7%)  -16.3% ( -27% -   -3%) 0.000
                 AndHighOrMedMed      101.24      (6.6%)      101.09      (3.0%)   -0.1% (  -9% -   10%) 0.927
                AndMedOrHighHigh       81.44      (6.3%)       81.62      (3.7%)    0.2% (  -9% -   10%) 0.895
           OrAndHigMedAndHighMed      128.26      (7.0%)      136.94      (3.7%)    6.8% (  -3% -   18%) 0.000
                        PKLookup      221.47     (11.7%)      236.93      (9.1%)    7.0% ( -12% -   31%) 0.035
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
          OrHighPhraseHighPhrase       27.73      (9.1%)       23.73      (4.6%)  -14.4% ( -25% -    0%) 0.000
                 AndHighOrMedMed       97.09     (13.1%)       99.30      (4.3%)    2.3% ( -13% -   22%) 0.462
                AndMedOrHighHigh       75.87     (15.2%)       80.04      (5.7%)    5.5% ( -13% -   31%) 0.128
                        PKLookup      219.70     (15.7%)      238.75     (12.4%)    8.7% ( -16% -   43%) 0.053
           OrAndHigMedAndHighMed      121.83     (13.7%)      134.79      (4.4%)   10.6% (  -6% -   33%) 0.001
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
          OrHighPhraseHighPhrase       27.42     (16.2%)       23.99      (4.0%)  -12.5% ( -28% -    9%) 0.001
                 AndHighOrMedMed       96.61     (15.8%)      100.09      (3.6%)    3.6% ( -13% -   27%) 0.321
                AndMedOrHighHigh       75.72     (16.8%)       79.53      (4.9%)    5.0% ( -14% -   32%) 0.200
           OrAndHigMedAndHighMed      122.33     (16.9%)      136.60      (4.5%)   11.7% (  -8% -   39%) 0.003
                        PKLookup      207.94     (21.6%)      233.10     (16.5%)   12.1% ( -21% -   63%) 0.046

Looks like we may need to restrict the scorer to only term queries, or improve it for phrase queries?

@zacharymorn
Copy link
Contributor Author

For OrHighPhraseHighPhrase, the JFR CPU sampling result looks similar, but with the modified version calling advanceShallow more often, suggesting the BMM implementation might be doing boundary adjustment more often?

Modified:

PERCENT       CPU SAMPLES   STACK
8.63%         1389          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
5.24%         843           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advanceShallow()
3.18%         511           java.nio.DirectByteBuffer#get()
2.79%         449           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
2.72%         438           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#refillPositions()
2.48%         399           jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
2.19%         353           org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
2.11%         339           org.apache.lucene.search.PhraseScorer$1#matches()
2.06%         331           org.apache.lucene.codecs.lucene90.Lucene90ScoreSkipReader#skipTo()
1.83%         294           org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
1.63%         263           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#nextPosition()
1.49%         240           org.apache.lucene.store.ByteBufferGuard#getByte()
1.24%         200           org.apache.lucene.search.ExactPhraseMatcher#advancePosition()
1.24%         200           org.apache.lucene.search.ConjunctionDISI#doNext()
1.21%         194           java.util.zip.Inflater#inflateBytesBytes()
1.18%         190           org.apache.lucene.search.ExactPhraseMatcher#nextMatch()
1.13%         182           org.apache.lucene.store.DataInput#readVLong()
1.12%         181           org.apache.lucene.search.ExactPhraseMatcher$1#advanceShallow()
1.11%         178           org.apache.lucene.search.ImpactsDISI#advanceShallow()
1.07%         172           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#skipPositions()
0.89%         143           java.lang.Class#isArray()
0.81%         131           org.apache.lucene.codecs.lucene90.ForUtil#expand8()
0.75%         121           org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator#doNext()
0.74%         119           org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32()
0.71%         115           org.apache.lucene.search.ConjunctionDISI#docID()
0.71%         115           org.apache.lucene.codecs.lucene90.ForUtil#shiftLongs()
0.70%         113           org.apache.lucene.search.PhraseScorer#docID()
0.70%         112           org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.68%         110           org.apache.lucene.search.ExactPhraseMatcher#maxFreq()
0.68%         109           org.apache.lucene.search.ImpactsDISI#docID()

Baseline:

PERCENT       CPU SAMPLES   STACK
8.66%         1196          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
3.88%         536           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
2.96%         409           java.nio.DirectByteBuffer#get()
2.78%         384           org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
2.50%         345           org.apache.lucene.codecs.lucene90.Lucene90ScoreSkipReader#skipTo()
2.46%         340           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advanceShallow()
1.73%         239           org.apache.lucene.search.PhraseScorer$1#matches()
1.72%         237           org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
1.48%         204           java.util.zip.Inflater#inflateBytesBytes()
1.48%         204           org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.23%         170           jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
1.21%         167           org.apache.lucene.search.ConjunctionDISI#doNext()
1.20%         166           org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32()
1.19%         165           org.apache.lucene.store.ByteBufferGuard#getByte()
1.12%         155           org.apache.lucene.codecs.lucene90.PForUtil#prefixSum32()
1.07%         148           java.lang.Class#isArray()
1.06%         147           org.apache.lucene.codecs.lucene90.PForUtil#expand32()
0.98%         135           org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.96%         133           org.apache.lucene.search.ConjunctionDISI#docID()
0.91%         125           org.apache.lucene.search.ExactPhraseMatcher#reset()
0.83%         114           perf.PKLookupTask#go()
0.83%         114           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#refillPositions()
0.80%         111           org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator#doNext()
0.80%         110           org.apache.lucene.search.ImpactsDISI#advanceShallow()
0.78%         108           org.apache.lucene.codecs.lucene90.ForUtil#shiftLongs()
0.78%         108           java.nio.Buffer#position()
0.78%         108           java.nio.Buffer#nextGetIndex()
0.76%         105           org.apache.lucene.store.ByteBufferGuard#ensureValid()
0.71%         98            org.apache.lucene.search.ConjunctionDISI#advance()
0.70%         97            org.apache.lucene.util.PriorityQueue#downHeap()


/** Scorer implementing Block-Max Maxscore algorithm */
public class BlockMaxMaxscoreScorer extends Scorer {
private final ScoreMode scoreMode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to actually need the scoreMode, let's remove it and make sure Boolean2ScorerSupplier only uses this scorer for TOP_SCORES?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Boolean2ScorerSupplier already checks for scoreMode == ScoreMode.TOP_SCORES before using this scorer.

} else if (top.doc > upTo) {
target = upTo + 1;
} else {
float matchedMaxScoreSum = nonEssentialMaxScoreSum;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other boolean queries sum up scores into a double to reduce the impact of rounding on accuracy, though this only helps when there are 3 clauses or more, otherwise summing up into a double doesn't yield better accuracy. Can you either add a comment that this would need to be changed if we were to support more than two clauses with this scorer, or replace it with a double?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I have replaced it with a double.

matchedMaxScoreSum += w.scorer.score();
}

if (matchedMaxScoreSum < minCompetitiveScore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct to me. The overall score could still be above the minCompetitiveScore after summing up scores of the non-essential clauses? Should it be something like below:

if (matchedMaxScoreSum + nonEssentialMaxScoreSum < minCompetitiveScore) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm matchedMaxScoreSum was actually initialized with nonEssentialMaxScoreSum here , so nonEssentialMaxScoreSum was already included in the calculation? Maybe the naming of the variable could be confusing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying, I was confused indeed by the name, which suggested to me that this score only included clauses that we "matched" so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup sorry about the confusion! I've renamed the variable to docScoreUpperBound.

upTo = -1;
for (DisiWrapper w : allScorers) {
upTo = Math.max(w.scorer.advanceShallow(Math.max(w.doc, target)), upTo);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment that taking the max is probably a good approach for 2 clauses but might need more thinking if we want this scorer to perform well with more than 2 clauses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return new TwoPhaseIterator(approximation) {
@Override
public boolean matches() throws IOException {
return score() >= minCompetitiveScore;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing matches() like that has the good side effect that we only pass good matches to the collector, but the bad side-effect that scores are computed twice for competitive hits. I wonder if we should try to cache the score so that it's not computed another time by the collector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch. I've updated it to cache the calculation.

@Override
public float matchCost() {
// maximum number of scorer that matches() might advance
return allScorers.length - essentialsScorers.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matchCost is expected to be fixed, we shouldn't try to adjust it based on the number of essential scorers, see e.g. how ConjunctionDISI uses it to sort two-phase iterators up-front.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I've updated it to use the length of all scorers to over-estimate the cost, since either essential or non-essential list may change in length.

@zacharymorn
Copy link
Contributor Author

Here are the latest benchmark results after the update:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                         Prefix3      196.69      (6.3%)      189.02      (8.6%)   -3.9% ( -17% -   11%) 0.102
            HighTermTitleBDVSort       33.24     (11.6%)       32.31     (10.3%)   -2.8% ( -22% -   21%) 0.421
           HighTermDayOfYearSort      137.58     (10.1%)      135.58      (9.7%)   -1.4% ( -19% -   20%) 0.644
          OrHighMedDayTaxoFacets       25.11      (6.6%)       24.81      (8.8%)   -1.2% ( -15% -   15%) 0.620
                        Wildcard      348.15      (6.8%)      343.94      (6.3%)   -1.2% ( -13% -   12%) 0.559
                      TermDTSort      188.94     (10.2%)      187.75      (9.7%)   -0.6% ( -18% -   21%) 0.841
               HighTermMonthSort      192.52      (9.7%)      191.37      (9.3%)   -0.6% ( -17% -   20%) 0.842
                         MedTerm     2947.19      (3.0%)     2936.13      (3.7%)   -0.4% (  -6% -    6%) 0.726
                        HighTerm     3104.91      (3.5%)     3100.88      (5.0%)   -0.1% (  -8% -    8%) 0.925
                 LowSloppyPhrase       54.77      (0.8%)       54.75      (2.0%)   -0.0% (  -2% -    2%) 0.935
            MedTermDayTaxoFacets       93.60      (4.1%)       93.60      (4.6%)    0.0% (  -8% -    9%) 0.998
                    OrNotHighMed     1661.26      (2.1%)     1661.95      (3.1%)    0.0% (  -5% -    5%) 0.960
                     AndHighHigh       63.20      (4.6%)       63.25      (5.1%)    0.1% (  -9% -   10%) 0.956
        AndHighHighDayTaxoFacets       58.91      (1.1%)       58.96      (1.4%)    0.1% (  -2% -    2%) 0.831
                      HighPhrase     1040.52      (2.0%)     1041.64      (2.2%)    0.1% (  -3% -    4%) 0.869
                       MedPhrase       61.40      (2.5%)       61.48      (3.3%)    0.1% (  -5% -    6%) 0.887
                 MedSloppyPhrase      156.44      (1.6%)      156.80      (3.9%)    0.2% (  -5% -    5%) 0.808
                       LowPhrase      348.79      (1.1%)      349.68      (2.1%)    0.3% (  -2% -    3%) 0.633
                          Fuzzy1      141.81      (2.3%)      142.24      (1.6%)    0.3% (  -3% -    4%) 0.633
                    OrHighNotMed     1471.87      (2.6%)     1476.69      (3.5%)    0.3% (  -5% -    6%) 0.737
                   OrNotHighHigh     1115.16      (2.6%)     1119.36      (3.6%)    0.4% (  -5% -    6%) 0.704
                   OrHighNotHigh     1434.59      (3.0%)     1440.05      (2.8%)    0.4% (  -5% -    6%) 0.679
         AndHighMedDayTaxoFacets      117.61      (2.8%)      118.17      (3.3%)    0.5% (  -5% -    6%) 0.623
                      AndHighMed      122.71      (4.5%)      123.41      (5.3%)    0.6% (  -8% -   10%) 0.716
                HighSloppyPhrase       10.46      (2.6%)       10.52      (3.7%)    0.6% (  -5% -    7%) 0.552
                         Respell       77.60      (2.9%)       78.10      (2.4%)    0.6% (  -4% -    6%) 0.453
             LowIntervalsOrdered       59.06      (2.0%)       59.52      (2.4%)    0.8% (  -3% -    5%) 0.263
            BrowseDateSSDVFacets        4.59     (32.0%)        4.63     (32.4%)    0.8% ( -48% -   96%) 0.938
                     LowSpanNear      168.24      (2.0%)      169.63      (2.3%)    0.8% (  -3% -    5%) 0.225
                     MedSpanNear       29.72      (2.7%)       29.98      (3.1%)    0.9% (  -4% -    6%) 0.354
                      AndHighLow     1433.34      (5.4%)     1446.44      (5.3%)    0.9% (  -9% -   12%) 0.589
                    OrHighNotLow     1989.50      (3.0%)     2010.48      (4.2%)    1.1% (  -5% -    8%) 0.359
                        PKLookup      284.02      (4.4%)      287.16      (5.4%)    1.1% (  -8% -   11%) 0.476
                    HighSpanNear       11.27      (3.3%)       11.39      (4.3%)    1.1% (  -6% -    9%) 0.346
                          Fuzzy2      161.50      (2.4%)      163.37      (1.2%)    1.2% (  -2% -    4%) 0.051
                    OrNotHighLow     1490.10      (3.6%)     1511.49      (3.5%)    1.4% (  -5% -    8%) 0.198
     BrowseRandomLabelSSDVFacets       20.26     (16.3%)       20.65      (5.0%)    1.9% ( -16% -   27%) 0.616
                         LowTerm     4286.35      (3.9%)     4381.66      (5.6%)    2.2% (  -7% -   12%) 0.145
             MedIntervalsOrdered       23.28      (4.9%)       23.83      (6.4%)    2.4% (  -8% -   14%) 0.185
       BrowseDayOfYearSSDVFacets       26.94      (3.8%)       27.59      (9.0%)    2.4% ( -10% -   15%) 0.268
           BrowseMonthSSDVFacets       29.56      (9.1%)       30.49     (13.4%)    3.2% ( -17% -   28%) 0.384
            HighIntervalsOrdered       11.74      (6.8%)       12.22      (7.5%)    4.1% (  -9% -   19%) 0.071
           BrowseMonthTaxoFacets       29.08     (39.3%)       31.68     (44.3%)    8.9% ( -53% -  152%) 0.500
                      OrHighHigh       49.41      (5.7%)       54.49      (3.8%)   10.3% (   0% -   20%) 0.000
                          IntNRQ      127.81     (22.1%)      141.76     (18.7%)   10.9% ( -24% -   66%) 0.092
            BrowseDateTaxoFacets       29.23     (43.8%)       32.65     (44.2%)   11.7% ( -53% -  177%) 0.401
       BrowseDayOfYearTaxoFacets       29.30     (43.4%)       32.77     (43.9%)   11.8% ( -52% -  175%) 0.391
     BrowseRandomLabelTaxoFacets       29.54     (55.5%)       33.08     (56.1%)   12.0% ( -64% -  277%) 0.498
                       OrHighLow      764.71      (3.7%)     1192.43      (4.0%)   55.9% (  46% -   66%) 0.000
                       OrHighMed      210.87      (4.8%)      332.88      (4.9%)   57.9% (  45% -   70%) 0.000
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ      115.39     (24.7%)      105.19     (26.0%)   -8.8% ( -47% -   55%) 0.270
                      TermDTSort      344.09      (7.6%)      331.37      (2.6%)   -3.7% ( -12% -    7%) 0.039
           HighTermDayOfYearSort      144.41      (8.0%)      139.75      (4.6%)   -3.2% ( -14% -   10%) 0.117
           BrowseMonthSSDVFacets       29.55     (13.8%)       28.65     (13.5%)   -3.0% ( -26% -   28%) 0.481
               HighTermMonthSort      135.90      (7.3%)      133.66      (6.6%)   -1.6% ( -14% -   13%) 0.454
                     AndHighHigh       76.59      (4.6%)       75.54      (3.4%)   -1.4% (  -8% -    6%) 0.284
                 LowSloppyPhrase       28.97      (3.5%)       28.78      (3.0%)   -0.7% (  -6% -    6%) 0.529
                HighSloppyPhrase       39.97      (2.5%)       39.83      (2.8%)   -0.4% (  -5% -    5%) 0.667
            MedTermDayTaxoFacets       46.96      (2.7%)       46.83      (4.1%)   -0.3% (  -6% -    6%) 0.795
                      AndHighMed      407.06      (5.4%)      406.05      (4.4%)   -0.2% (  -9% -   10%) 0.872
                       LowPhrase       24.42      (2.4%)       24.40      (2.4%)   -0.1% (  -4% -    4%) 0.917
                       MedPhrase      407.77      (2.4%)      408.15      (2.9%)    0.1% (  -5% -    5%) 0.912
        AndHighHighDayTaxoFacets       28.76      (2.0%)       28.81      (1.7%)    0.2% (  -3% -    3%) 0.783
            HighIntervalsOrdered       38.00      (7.6%)       38.10      (9.0%)    0.3% ( -15% -   18%) 0.924
            HighTermTitleBDVSort       66.71      (8.8%)       67.03      (8.8%)    0.5% ( -15% -   19%) 0.861
                     MedSpanNear       47.80      (3.1%)       48.05      (3.2%)    0.5% (  -5% -    7%) 0.595
             LowIntervalsOrdered       53.59      (3.3%)       53.89      (3.5%)    0.6% (  -6% -    7%) 0.606
         AndHighMedDayTaxoFacets      107.62      (2.0%)      108.31      (1.3%)    0.6% (  -2% -    4%) 0.221
                    OrNotHighMed     1551.89      (3.0%)     1562.23      (3.0%)    0.7% (  -5% -    6%) 0.485
                 MedSloppyPhrase      177.55      (3.5%)      178.91      (2.5%)    0.8% (  -5% -    6%) 0.427
          OrHighMedDayTaxoFacets        9.19      (7.6%)        9.27      (8.0%)    0.9% ( -13% -   17%) 0.723
                          Fuzzy1      156.05      (3.0%)      157.78      (2.7%)    1.1% (  -4% -    6%) 0.216
             MedIntervalsOrdered       81.81      (4.4%)       82.78      (4.5%)    1.2% (  -7% -   10%) 0.400
                    OrHighNotMed     2064.85      (4.7%)     2091.01      (4.7%)    1.3% (  -7% -   11%) 0.397
                    HighSpanNear       25.08      (3.7%)       25.42      (4.5%)    1.3% (  -6% -    9%) 0.301
       BrowseDayOfYearSSDVFacets       28.09     (12.3%)       28.51     (12.6%)    1.5% ( -20% -   30%) 0.705
                     LowSpanNear      274.40      (2.3%)      278.54      (2.3%)    1.5% (  -3% -    6%) 0.038
                      HighPhrase      539.93      (2.8%)      548.73      (2.3%)    1.6% (  -3% -    6%) 0.045
                          Fuzzy2      119.67      (2.7%)      121.67      (2.6%)    1.7% (  -3% -    7%) 0.044
     BrowseRandomLabelSSDVFacets       19.26     (10.1%)       19.65      (9.4%)    2.0% ( -15% -   23%) 0.515
                         Respell       93.58      (3.8%)       95.47      (2.8%)    2.0% (  -4% -    8%) 0.055
                         MedTerm     3172.56      (4.2%)     3238.60      (4.0%)    2.1% (  -5% -   10%) 0.107
                   OrHighNotHigh     1272.14      (5.4%)     1298.96      (4.4%)    2.1% (  -7% -   12%) 0.174
                      AndHighLow     1336.59      (4.4%)     1365.22      (3.9%)    2.1% (  -5% -   10%) 0.103
                         LowTerm     3506.73      (4.9%)     3583.07      (4.0%)    2.2% (  -6% -   11%) 0.126
                        HighTerm     2467.18      (5.1%)     2527.84      (5.0%)    2.5% (  -7% -   13%) 0.123
                    OrNotHighLow     1842.33      (4.9%)     1888.94      (3.3%)    2.5% (  -5% -   11%) 0.055
                   OrNotHighHigh     1517.53      (3.4%)     1557.59      (3.6%)    2.6% (  -4% -   10%) 0.018
                    OrHighNotLow     1759.51      (4.7%)     1808.00      (3.9%)    2.8% (  -5% -   11%) 0.044
                         Prefix3     1004.83      (6.2%)     1035.73      (5.6%)    3.1% (  -8% -   15%) 0.101
            BrowseDateSSDVFacets        4.42     (34.0%)        4.59     (34.6%)    3.9% ( -48% -  109%) 0.721
                        Wildcard      157.42      (7.6%)      163.60      (6.7%)    3.9% (  -9% -   19%) 0.085
                        PKLookup      277.42      (6.1%)      289.34      (3.2%)    4.3% (  -4% -   14%) 0.005
                       OrHighLow     1056.71      (4.0%)     1238.32      (4.2%)   17.2% (   8% -   26%) 0.000
           BrowseMonthTaxoFacets       25.90     (34.9%)       31.19     (47.6%)   20.4% ( -46% -  158%) 0.122
            BrowseDateTaxoFacets       26.79     (38.0%)       32.31     (53.2%)   20.6% ( -51% -  180%) 0.158
       BrowseDayOfYearTaxoFacets       26.88     (37.8%)       32.46     (53.1%)   20.7% ( -50% -  179%) 0.155
     BrowseRandomLabelTaxoFacets       26.25     (48.7%)       33.41     (69.6%)   27.3% ( -61% -  283%) 0.151
                      OrHighHigh       44.09      (6.6%)       57.48      (5.2%)   30.4% (  17% -   45%) 0.000
                       OrHighMed       74.02      (6.5%)      115.40      (6.5%)   55.9% (  40% -   73%) 0.000

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                         Prefix3     1486.03      (5.7%)     1455.45      (6.7%)   -2.1% ( -13% -   11%) 0.299
                    OrHighNotMed     1529.61      (3.6%)     1501.55      (3.6%)   -1.8% (  -8% -    5%) 0.104
                        HighTerm     4492.83      (3.9%)     4420.29      (4.2%)   -1.6% (  -9% -    6%) 0.209
     BrowseRandomLabelSSDVFacets       20.49      (4.6%)       20.27      (6.3%)   -1.1% ( -11% -   10%) 0.542
                       MedPhrase      167.60      (1.8%)      165.87      (1.8%)   -1.0% (  -4% -    2%) 0.067
                   OrHighNotHigh     1474.90      (3.8%)     1460.19      (3.5%)   -1.0% (  -7% -    6%) 0.387
                          Fuzzy2      111.64      (1.5%)      110.58      (2.2%)   -0.9% (  -4% -    2%) 0.113
                    OrHighNotLow     1609.18      (3.2%)     1594.45      (4.4%)   -0.9% (  -8% -    6%) 0.451
            HighIntervalsOrdered       23.50      (7.4%)       23.29      (6.9%)   -0.9% ( -14% -   14%) 0.699
             MedIntervalsOrdered      154.52      (7.7%)      153.18      (7.3%)   -0.9% ( -14% -   15%) 0.713
            MedTermDayTaxoFacets       30.57      (5.6%)       30.32      (6.2%)   -0.8% ( -12% -   11%) 0.668
                       LowPhrase      198.30      (2.3%)      196.85      (2.3%)   -0.7% (  -5% -    4%) 0.324
                      HighPhrase      775.57      (2.4%)      770.08      (2.2%)   -0.7% (  -5% -    4%) 0.337
                   OrNotHighHigh     1399.97      (3.9%)     1390.71      (3.5%)   -0.7% (  -7% -    7%) 0.574
            HighTermTitleBDVSort      186.02      (7.2%)      184.86      (7.4%)   -0.6% ( -14% -   15%) 0.786
                        Wildcard      179.63      (3.2%)      178.77      (3.7%)   -0.5% (  -7% -    6%) 0.664
                        PKLookup      290.46      (3.1%)      289.10      (3.6%)   -0.5% (  -6% -    6%) 0.659
                    OrNotHighMed     1461.62      (3.8%)     1454.97      (3.5%)   -0.5% (  -7% -    7%) 0.692
                      TermDTSort      263.72      (7.4%)      262.56      (7.6%)   -0.4% ( -14% -   15%) 0.853
             LowIntervalsOrdered      125.12      (4.5%)      124.60      (4.4%)   -0.4% (  -8% -    8%) 0.764
                 LowSloppyPhrase      120.77      (1.9%)      120.30      (1.9%)   -0.4% (  -4% -    3%) 0.517
                      AndHighMed      272.51      (5.0%)      271.52      (5.2%)   -0.4% ( -10% -   10%) 0.822
                 MedSloppyPhrase       34.39      (1.8%)       34.27      (1.7%)   -0.3% (  -3% -    3%) 0.539
                          Fuzzy1      145.80      (1.2%)      145.31      (2.3%)   -0.3% (  -3% -    3%) 0.559
         AndHighMedDayTaxoFacets       36.92      (3.1%)       36.83      (3.2%)   -0.3% (  -6% -    6%) 0.801
        AndHighHighDayTaxoFacets       41.21      (2.7%)       41.11      (2.4%)   -0.2% (  -5% -    4%) 0.772
                HighSloppyPhrase       41.09      (2.7%)       41.00      (3.8%)   -0.2% (  -6% -    6%) 0.830
                         MedTerm     3362.24      (3.7%)     3356.21      (3.7%)   -0.2% (  -7% -    7%) 0.877
                         LowTerm     5102.13      (4.2%)     5097.31      (4.6%)   -0.1% (  -8% -    9%) 0.946
                    OrNotHighLow     1842.51      (4.2%)     1841.20      (4.3%)   -0.1% (  -8% -    8%) 0.957
                    HighSpanNear        9.51      (4.6%)        9.52      (5.1%)    0.1% (  -9% -   10%) 0.974
                     MedSpanNear       97.41      (2.7%)       97.48      (3.2%)    0.1% (  -5% -    6%) 0.937
          OrHighMedDayTaxoFacets        8.33      (4.5%)        8.34      (5.3%)    0.1% (  -9% -   10%) 0.960
           BrowseMonthSSDVFacets       29.36      (8.1%)       29.39      (8.4%)    0.1% ( -15% -   17%) 0.972
                         Respell      104.50      (0.9%)      104.71      (2.3%)    0.2% (  -2% -    3%) 0.720
                      AndHighLow     1632.66      (5.2%)     1636.07      (5.0%)    0.2% (  -9% -   10%) 0.897
                     AndHighHigh       84.25      (4.5%)       84.47      (3.9%)    0.3% (  -7% -    9%) 0.845
       BrowseDayOfYearSSDVFacets       26.66      (9.6%)       26.85      (9.0%)    0.7% ( -16% -   21%) 0.805
                     LowSpanNear      272.01      (3.9%)      274.66      (5.0%)    1.0% (  -7% -   10%) 0.490
               HighTermMonthSort      159.98      (7.4%)      161.76     (11.6%)    1.1% ( -16% -   21%) 0.717
           HighTermDayOfYearSort      195.88      (7.0%)      200.34      (8.2%)    2.3% ( -12% -   18%) 0.344
     BrowseRandomLabelTaxoFacets       32.84     (52.5%)       33.70     (50.8%)    2.6% ( -65% -  222%) 0.872
            BrowseDateTaxoFacets       32.24     (41.3%)       33.31     (41.1%)    3.3% ( -55% -  146%) 0.798
       BrowseDayOfYearTaxoFacets       32.26     (41.0%)       33.41     (41.0%)    3.5% ( -55% -  144%) 0.784
            BrowseDateSSDVFacets        4.17     (35.7%)        4.41     (37.0%)    5.9% ( -49% -  122%) 0.610
           BrowseMonthTaxoFacets       31.23     (39.5%)       33.13     (41.8%)    6.1% ( -53% -  144%) 0.636
                          IntNRQ      131.85     (17.7%)      140.08     (10.5%)    6.2% ( -18% -   41%) 0.174
                       OrHighMed      158.82      (7.1%)      198.40      (6.0%)   24.9% (  11% -   40%) 0.000
                      OrHighHigh       38.33      (7.5%)       63.25      (8.8%)   65.0% (  45% -   87%) 0.000
                       OrHighLow      571.04      (4.1%)     1341.58     (10.0%)  134.9% ( 116% -  155%) 0.000
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
     BrowseRandomLabelTaxoFacets       33.86     (55.5%)       33.01     (49.5%)   -2.5% ( -69% -  230%) 0.879
                      AndHighLow     1590.63      (3.8%)     1557.92      (5.3%)   -2.1% ( -10% -    7%) 0.160
       BrowseDayOfYearSSDVFacets       26.91      (4.4%)       26.46      (6.2%)   -1.7% ( -11% -    9%) 0.320
           BrowseMonthSSDVFacets       29.07      (7.5%)       28.61      (9.0%)   -1.6% ( -16% -   16%) 0.545
                       OrHighLow     1123.52      (4.4%)     1110.16      (3.2%)   -1.2% (  -8% -    6%) 0.330
                      HighPhrase      424.08      (1.7%)      419.21      (2.9%)   -1.1% (  -5% -    3%) 0.126
                       LowPhrase      150.69      (2.1%)      149.27      (2.2%)   -0.9% (  -5% -    3%) 0.156
           BrowseMonthTaxoFacets       31.95     (38.9%)       31.66     (40.3%)   -0.9% ( -57% -  128%) 0.943
                   OrNotHighHigh     1478.78      (3.1%)     1469.38      (3.0%)   -0.6% (  -6% -    5%) 0.514
                    HighSpanNear        9.03      (3.6%)        8.98      (3.1%)   -0.5% (  -6% -    6%) 0.648
                      AndHighMed      236.20      (3.1%)      235.23      (3.8%)   -0.4% (  -7% -    6%) 0.707
                   OrHighNotHigh     1404.82      (3.0%)     1399.84      (2.9%)   -0.4% (  -6% -    5%) 0.704
                       MedPhrase     1408.93      (2.3%)     1404.10      (2.6%)   -0.3% (  -5% -    4%) 0.660
                 MedSloppyPhrase       54.44      (2.6%)       54.27      (1.8%)   -0.3% (  -4% -    4%) 0.668
                         Prefix3       88.47      (8.9%)       88.53      (7.0%)    0.1% ( -14% -   17%) 0.977
                 LowSloppyPhrase      133.40      (3.9%)      133.50      (3.5%)    0.1% (  -7% -    7%) 0.945
                     LowSpanNear       58.78      (2.2%)       58.84      (1.4%)    0.1% (  -3% -    3%) 0.859
                     MedSpanNear       27.68      (2.6%)       27.71      (2.1%)    0.1% (  -4% -    4%) 0.889
            HighIntervalsOrdered       19.60      (7.8%)       19.64      (9.5%)    0.2% ( -15% -   18%) 0.937
            MedTermDayTaxoFacets       67.02      (3.7%)       67.26      (3.3%)    0.4% (  -6% -    7%) 0.750
             MedIntervalsOrdered       36.93      (3.4%)       37.09      (3.7%)    0.4% (  -6% -    7%) 0.709
             LowIntervalsOrdered       18.28      (3.4%)       18.37      (3.3%)    0.5% (  -5% -    7%) 0.652
                        HighTerm     2544.46      (4.5%)     2559.83      (4.1%)    0.6% (  -7% -    9%) 0.655
                        Wildcard       67.00      (4.6%)       67.45      (4.5%)    0.7% (  -8% -   10%) 0.642
                        PKLookup      285.72      (4.4%)      287.90      (3.9%)    0.8% (  -7% -    9%) 0.566
                    OrNotHighMed     1769.53      (2.6%)     1783.12      (2.9%)    0.8% (  -4% -    6%) 0.384
            BrowseDateTaxoFacets       32.54     (43.7%)       32.79     (40.8%)    0.8% ( -58% -  151%) 0.954
        AndHighHighDayTaxoFacets       16.31      (2.7%)       16.44      (2.3%)    0.8% (  -4% -    6%) 0.331
                         Respell       90.52      (3.5%)       91.24      (2.7%)    0.8% (  -5% -    7%) 0.417
         AndHighMedDayTaxoFacets       91.12      (1.6%)       91.87      (1.2%)    0.8% (  -1% -    3%) 0.068
       BrowseDayOfYearTaxoFacets       32.59     (43.4%)       32.87     (40.7%)    0.9% ( -58% -  150%) 0.949
                    OrHighNotMed     1303.02      (3.7%)     1314.61      (3.8%)    0.9% (  -6% -    8%) 0.457
                    OrNotHighLow     1526.42      (2.6%)     1540.29      (3.3%)    0.9% (  -4% -    7%) 0.338
                    OrHighNotLow     1872.15      (3.2%)     1889.45      (4.0%)    0.9% (  -6% -    8%) 0.422
                HighSloppyPhrase       43.82      (2.0%)       44.25      (2.9%)    1.0% (  -3% -    6%) 0.203
                         MedTerm     2719.77      (3.3%)     2747.44      (5.0%)    1.0% (  -7% -    9%) 0.446
                          Fuzzy1      141.59      (2.7%)      143.05      (2.7%)    1.0% (  -4% -    6%) 0.224
                     AndHighHigh      132.53      (2.8%)      133.91      (3.0%)    1.0% (  -4% -    7%) 0.261
                         LowTerm     3170.39      (4.0%)     3215.56      (3.7%)    1.4% (  -6% -    9%) 0.243
                          Fuzzy2      102.17      (2.5%)      103.81      (2.7%)    1.6% (  -3% -    6%) 0.051
                          IntNRQ      229.02      (7.8%)      232.85      (4.9%)    1.7% ( -10% -   15%) 0.415
     BrowseRandomLabelSSDVFacets       20.62      (8.0%)       21.00      (9.4%)    1.8% ( -14% -   20%) 0.505
           HighTermDayOfYearSort      105.81      (7.9%)      108.08      (9.1%)    2.1% ( -13% -   20%) 0.428
          OrHighMedDayTaxoFacets       15.57      (7.4%)       16.05      (8.8%)    3.1% ( -12% -   20%) 0.226
            HighTermTitleBDVSort      282.98      (3.1%)      291.81      (8.1%)    3.1% (  -7% -   14%) 0.107
                      TermDTSort      208.95      (3.3%)      215.60      (8.1%)    3.2% (  -7% -   15%) 0.104
               HighTermMonthSort      278.35      (3.2%)      293.92      (9.6%)    5.6% (  -7% -   19%) 0.014
            BrowseDateSSDVFacets        4.10     (31.4%)        4.40     (34.8%)    7.1% ( -44% -  106%) 0.497
                      OrHighHigh       44.28      (6.0%)       49.24      (5.9%)   11.2% (   0% -   24%) 0.000
                       OrHighMed       96.24      (6.4%)      257.56     (11.3%)  167.6% ( 140% -  197%) 0.000
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
          OrHighPhraseHighPhrase       26.43     (11.3%)       22.85      (6.3%)  -13.6% ( -27% -    4%) 0.000
                 AndHighOrMedMed       93.60     (10.5%)       94.62      (7.9%)    1.1% ( -15% -   21%) 0.711
                        PKLookup      213.78     (10.3%)      219.68     (12.4%)    2.8% ( -18% -   28%) 0.444
                AndMedOrHighHigh       72.96     (12.7%)       76.81      (7.9%)    5.3% ( -13% -   29%) 0.115
           OrAndHigMedAndHighMed      117.08     (11.1%)      128.52      (9.3%)    9.8% (  -9% -   33%) 0.003
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
          OrHighPhraseHighPhrase       27.43     (13.1%)       24.07      (4.3%)  -12.3% ( -26% -    5%) 0.000
                 AndHighOrMedMed       96.00     (12.8%)       99.20      (4.0%)    3.3% ( -11% -   23%) 0.265
                        PKLookup      212.67     (18.0%)      224.37     (14.3%)    5.5% ( -22% -   46%) 0.284
                AndMedOrHighHigh       75.19     (16.1%)       80.60      (4.2%)    7.2% ( -11% -   32%) 0.052
           OrAndHigMedAndHighMed      120.29     (14.5%)      135.06      (4.9%)   12.3% (  -6% -   36%) 0.000

@zacharymorn zacharymorn requested a review from jpountz June 29, 2022 04:41
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great. I left some comments that mostly relate to making this scorer work with more than 2 clauses.

With this change, I suspect that some scorers created in TestWANDScorer would now use your new BlockMaxMaxScoreScorer, which is going to decrease the coverage of WANDScorer. Can we somehow make sure that TestWANDScorer always gets a WANDScorer? E.g. I spotted this query under TestWANDScorer#testBasics which likely uses your now scorer:

    //  test a filtered disjunction
    query =
        new BooleanQuery.Builder()
            .add(
                new BooleanQuery.Builder()
                    .add(
                        new BoostQuery(
                            new ConstantScoreQuery(new TermQuery(new Term("foo", "A"))), 2),
                        Occur.SHOULD)
                    .add(new ConstantScoreQuery(new TermQuery(new Term("foo", "B"))), Occur.SHOULD)
                    .build(),
                Occur.MUST)
            .add(new TermQuery(new Term("foo", "C")), Occur.FILTER)
            .build();

assert target <= upTo;

for (DisiWrapper w : allScorers) {
if (w.doc <= upTo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think w.doc can ever be greater than upTo given how upTo is computed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that w.doc is always less than or equal to upTo since we use Math.max to compute upTo, but I'm also wondering if we should keep this branch still (with additional comment) since the way upTo gets computed may change in the future for more clauses, and this branch might be easy to miss? On the other hand, I'm also fine with removing it for now and just update the comment above to mention handling this branch as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for the second approach (remove plus a comment) so that our test coverage build doesn't show up this branch as a branch that is never tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I have removed that branch and also changed the if condition to be an assertion to help catch future changes.

// the "Optimizing Top-k Document Retrieval Strategies for Block-Max Indexes" paper.
nonEssentialMaxScoreSum = 0;
for (DisiWrapper w : allScorers) {
if (nonEssentialMaxScoreSum + w.maxScoreFloat < minCompetitiveScore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (nonEssentialMaxScoreSum + w.maxScoreFloat < minCompetitiveScore) {
if (maxScoreSumPropagator.scoreSumUpperBound(nonEssentialMaxScoreSum + w.maxScoreFloat) < minCompetitiveScore) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


for (DisiWrapper w = essentialsScorers.topList(); w != null; w = w.next) {
docScoreUpperBound += w.scorer.score();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could probably cache the score that is computed here so that we don't re-compute it later in score(), I'm happy to leave it for a follow-up, it's up to you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just gave it a try, but it seems caching a partial score from essential list here may require more changes / additional data structure inside score(), as this scorer no longer maintains the non-essential list explicitly, but only remembers sum of maxScore from non-essential list, so it will need to differentiate scorers from non-essential list when using the cached essential list score. I think I will prefer to have a follow-up issue on this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a follow-up issue for this https://issues.apache.org/jira/browse/LUCENE-10636.

@@ -106,7 +106,7 @@ static long scaleMaxScore(float maxScore, int scalingFactor) {
* Scale min competitive scores the same way as max scores but this time by rounding down in order
* to make sure that we do not miss any matches.
*/
private static long scaleMinScore(float minScore, int scalingFactor) {
static long scaleMinScore(float minScore, int scalingFactor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't seem to need to make this method pkg-private anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@zacharymorn
Copy link
Contributor Author

With this change, I suspect that some scorers created in TestWANDScorer would now use your new BlockMaxMaxScoreScorer, which is going to decrease the coverage of WANDScorer. Can we somehow make sure that TestWANDScorer always gets a WANDScorer? E.g. I spotted this query under TestWANDScorer#testBasics which likely uses your now scorer:

    //  test a filtered disjunction
    query =
        new BooleanQuery.Builder()
            .add(
                new BooleanQuery.Builder()
                    .add(
                        new BoostQuery(
                            new ConstantScoreQuery(new TermQuery(new Term("foo", "A"))), 2),
                        Occur.SHOULD)
                    .add(new ConstantScoreQuery(new TermQuery(new Term("foo", "B"))), Occur.SHOULD)
                    .build(),
                Occur.MUST)
            .add(new TermQuery(new Term("foo", "C")), Occur.FILTER)
            .build();

Yeah this is a good question. In my newly added tests I have used something like this to confirm it's testing the right scorer, but I'm not totally happy about this approach myself :

if (scorer instanceof AssertingScorer) {
    assertTrue(((AssertingScorer) scorer).getIn() instanceof BlockMaxMaxscoreScorer);
} else {
    assertTrue(scorer instanceof BlockMaxMaxscoreScorer);
}

One alternative approach could be instantiating WANDScorer directly inside the test for lower level tests, and moving the higher level tests into another test class that doesn't care about the specific scorer implementation for disjunction? This may require duplicating some code from BooleanWeight, AssertingWeight etc though but should be do-able.

On the other hand, if we don't plan on initiating WANDScorer directly in the test, varying the query clauses and asserting like above might be the best we could do I feel? This has the potential test coverage decrease issue as you suggested so may not be ideal either.

@zacharymorn zacharymorn requested a review from jpountz June 30, 2022 02:40
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of creating WANDScorer more explicitly in tests. It doesn't look easy though and this change is already great so I wonder if we should keep it for a follow-up.

I reviewed the change and left some very minor comments but it looks great to me overall. Let's get it in.

import java.util.List;

/** Scorer implementing Block-Max Maxscore algorithm */
public class BlockMaxMaxscoreScorer extends Scorer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be pkg-private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

// sum of max scores of scorers in nonEssentialScorers list
private double nonEssentialMaxScoreSum;

private long cost;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's make it final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


for (DisiWrapper w = essentialsScorers.topList(); w != null; w = w.next) {
docScoreUpperBound += w.scorer.score();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

assert target <= upTo;

for (DisiWrapper w : allScorers) {
if (w.doc <= upTo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for the second approach (remove plus a comment) so that our test coverage build doesn't show up this branch as a branch that is never tested.

@@ -118,6 +118,21 @@ private Scorer getInternal(long leadCost) throws IOException {
leadCost);
}

// pure two terms disjunction
if (scoreMode == ScoreMode.TOP_SCORES
&& minShouldMatch == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minShouldMatch = 1 also qualifies

Suggested change
&& minShouldMatch == 0
&& minShouldMatch <= 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -39,6 +39,9 @@ public class DisiWrapper {
// For WANDScorer
long maxScore;

// For BlockMaxMaxscoreScorer
float maxScoreFloat;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call this one maxScore and rename the other one scaledMaxScore or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I like this idea. Updated.

@zacharymorn
Copy link
Contributor Author

I like the idea of creating WANDScorer more explicitly in tests. It doesn't look easy though and this change is already great so I wonder if we should keep it for a follow-up.

Sounds good. I've created this follow-up issue https://issues.apache.org/jira/browse/LUCENE-10635 .

I reviewed the change and left some very minor comments but it looks great to me overall. Let's get it in.

Awesome, thanks for all the review and feedback @jpountz, I really appreciate it! Iterating on the solution and seeing it improved each time is a lot of fun and I enjoy this process a lot!

@zacharymorn zacharymorn requested a review from jpountz July 1, 2022 18:43
@zacharymorn zacharymorn merged commit 503ec55 into apache:main Jul 2, 2022
zacharymorn added a commit to zacharymorn/lucene that referenced this pull request Jul 2, 2022
@zacharymorn
Copy link
Contributor Author

Thanks again @jpountz ! I've created the above PR to backport these changes to branch_9x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants