Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery #418

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

zacharymorn
Copy link
Contributor

@zacharymorn zacharymorn commented Oct 29, 2021

Description

This PR is WIP for implementing basic dynamic pruning support for CombinedFieldsQuery

#11099

Tests

Added a randomized test to compare and verify top 100 results match between top_score (with pruning) and complete scoring.

…eldsQuery

// HighMed: from office # freq=3224339 freq=225338
// top scores time usage	425 milliseconds
// complete time usage		401 milliseconds

// HighHigh: but publisher # freq=1456553 freq=1289029
// top scores time usage	469 milliseconds
// complete time usage		322 milliseconds

// HighLow: with fung # freq=3709421 freq=1344
// top scores time usage	241 milliseconds
// complete time usage		428 milliseconds

// HighLow: date insult # freq=2020626 freq=4424
// top scores time usage	171 milliseconds
// complete time usage		225 milliseconds
Perf tests result
Run 1:
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     CFQHighHigh        3.53      (3.3%)        2.92      (5.3%)  -17.2% ( -25% -   -8%) 0.000
                        PKLookup      108.13      (7.7%)      119.85      (8.2%)   10.8% (  -4% -   28%) 0.000
                      CFQHighLow       14.88      (3.9%)       16.69     (12.5%)   12.2% (  -3% -   29%) 0.000
                      CFQHighMed       21.11      (4.1%)       25.87     (11.8%)   22.6% (   6% -   40%) 0.000

Run 2:
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     CFQHighHigh        6.64      (3.1%)        5.63     (10.2%)  -15.2% ( -27% -   -1%) 0.000
                      CFQHighLow        8.35      (2.8%)        8.05     (15.0%)   -3.6% ( -20% -   14%) 0.297
                      CFQHighMed       24.51      (5.3%)       24.90     (19.9%)    1.6% ( -22% -   28%) 0.733
                        PKLookup      110.06     (10.0%)      128.54      (7.9%)   16.8% (  -1% -   38%) 0.000

Run 3:
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      CFQHighMed       13.01      (2.9%)        9.82      (7.8%)  -24.5% ( -34% -  -14%) 0.000
                        PKLookup      107.85      (8.1%)      111.79     (11.2%)    3.7% ( -14% -   24%) 0.236
                     CFQHighHigh        4.83      (2.6%)        5.06      (8.6%)    4.7% (  -6% -   16%) 0.018
                      CFQHighLow       14.95      (3.0%)       18.31     (19.0%)   22.5% (   0% -   45%) 0.000

Run 4:
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      CFQHighMed       11.11      (2.9%)        6.69      (4.1%)  -39.7% ( -45% -  -33%) 0.000
                      CFQHighLow       27.55      (3.8%)       25.46     (11.0%)   -7.6% ( -21% -    7%) 0.003
                     CFQHighHigh        5.25      (3.2%)        4.96      (6.1%)   -5.7% ( -14% -    3%) 0.000
                        PKLookup      107.61      (6.7%)      121.19      (4.6%)   12.6% (   1% -   25%) 0.000
@zacharymorn
Copy link
Contributor Author

Perf tests result for commit 2ba435e
Run 1:
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     CFQHighHigh        3.53      (3.3%)        2.92      (5.3%)  -17.2% ( -25% -   -8%) 0.000
                        PKLookup      108.13      (7.7%)      119.85      (8.2%)   10.8% (  -4% -   28%) 0.000
                      CFQHighLow       14.88      (3.9%)       16.69     (12.5%)   12.2% (  -3% -   29%) 0.000
                      CFQHighMed       21.11      (4.1%)       25.87     (11.8%)   22.6% (   6% -   40%) 0.000

Run 2:
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     CFQHighHigh        6.64      (3.1%)        5.63     (10.2%)  -15.2% ( -27% -   -1%) 0.000
                      CFQHighLow        8.35      (2.8%)        8.05     (15.0%)   -3.6% ( -20% -   14%) 0.297
                      CFQHighMed       24.51      (5.3%)       24.90     (19.9%)    1.6% ( -22% -   28%) 0.733
                        PKLookup      110.06     (10.0%)      128.54      (7.9%)   16.8% (  -1% -   38%) 0.000

Run 3:
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      CFQHighMed       13.01      (2.9%)        9.82      (7.8%)  -24.5% ( -34% -  -14%) 0.000
                        PKLookup      107.85      (8.1%)      111.79     (11.2%)    3.7% ( -14% -   24%) 0.236
                     CFQHighHigh        4.83      (2.6%)        5.06      (8.6%)    4.7% (  -6% -   16%) 0.018
                      CFQHighLow       14.95      (3.0%)       18.31     (19.0%)   22.5% (   0% -   45%) 0.000

Run 4:
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      CFQHighMed       11.11      (2.9%)        6.69      (4.1%)  -39.7% ( -45% -  -33%) 0.000
                      CFQHighLow       27.55      (3.8%)       25.46     (11.0%)   -7.6% ( -21% -    7%) 0.003
                     CFQHighHigh        5.25      (3.2%)        4.96      (6.1%)   -5.7% ( -14% -    3%) 0.000
                        PKLookup      107.61      (6.7%)      121.19      (4.6%)   12.6% (   1% -   25%) 0.000

Run 1
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        4.64      (6.5%)        3.30      (4.7%)  -29.0% ( -37% -  -19%) 0.000
                     CFQHighHigh       11.09      (6.0%)        9.61      (6.0%)  -13.3% ( -23% -   -1%) 0.000
                        PKLookup      103.38      (4.4%)      108.04      (4.3%)    4.5% (  -4% -   13%) 0.001
                   CFQHighMedLow       10.58      (6.1%)       12.30      (8.7%)   16.2% (   1% -   33%) 0.000
                      CFQHighMed       10.70      (7.4%)       15.51     (11.2%)   44.9% (  24% -   68%) 0.000
                   CFQHighLowLow        8.18      (8.2%)       12.87     (11.6%)   57.3% (  34% -   84%) 0.000
                      CFQHighLow       14.57      (7.5%)       30.81     (15.1%)  111.4% (  82% -  144%) 0.000

Run 2
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        5.33      (5.7%)        4.02      (7.7%)  -24.4% ( -35% -  -11%) 0.000
                   CFQHighLowLow       17.14      (6.2%)       13.06      (5.4%)  -23.8% ( -33% -  -13%) 0.000
                      CFQHighMed       17.37      (5.8%)       14.38      (7.7%)  -17.2% ( -29% -   -3%) 0.000
                        PKLookup      103.57      (5.5%)      108.84      (5.9%)    5.1% (  -6% -   17%) 0.005
                   CFQHighMedLow       11.25      (7.2%)       12.70      (9.0%)   12.9% (  -3% -   31%) 0.000
                     CFQHighHigh        5.00      (6.2%)        7.54     (12.1%)   51.0% (  30% -   73%) 0.000
                      CFQHighLow       21.60      (5.2%)       34.57     (14.1%)   60.0% (  38% -   83%) 0.000

Run 3
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        5.40      (6.9%)        4.06      (5.1%)  -24.8% ( -34% -  -13%) 0.000
                   CFQHighMedLow        7.64      (7.4%)        5.79      (6.3%)  -24.2% ( -35% -  -11%) 0.000
                     CFQHighHigh       11.11      (7.0%)        9.60      (5.9%)  -13.6% ( -24% -    0%) 0.000
                   CFQHighLowLow       21.21      (7.6%)       21.22      (6.6%)    0.0% ( -13% -   15%) 0.993
                        PKLookup      103.15      (5.9%)      107.60      (6.9%)    4.3% (  -8% -   18%) 0.034
                      CFQHighLow       21.85      (8.1%)       34.18     (13.5%)   56.4% (  32% -   84%) 0.000
                      CFQHighMed       12.07      (8.4%)       19.98     (16.7%)   65.5% (  37% -   98%) 0.000

Run 4
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     CFQHighHigh        8.50      (5.8%)        6.85      (5.2%)  -19.5% ( -28% -   -8%) 0.000
                   CFQHighMedLow       10.89      (5.7%)        8.96      (5.4%)  -17.8% ( -27% -   -7%) 0.000
                      CFQHighMed        8.41      (5.8%)        7.74      (5.6%)   -7.9% ( -18% -    3%) 0.000
                 CFQHighHighHigh        3.45      (6.7%)        3.38      (5.3%)   -2.0% ( -13% -   10%) 0.287
                   CFQHighLowLow        7.82      (6.4%)        8.20      (7.5%)    4.8% (  -8% -   20%) 0.030
                        PKLookup      103.50      (5.0%)      110.69      (5.4%)    6.9% (  -3% -   18%) 0.000
                      CFQHighLow       11.46      (6.0%)       13.16      (6.7%)   14.8% (   1% -   29%) 0.000
@zacharymorn
Copy link
Contributor Author

Perf test result from 1a71469

Run 1
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        4.64      (6.5%)        3.30      (4.7%)  -29.0% ( -37% -  -19%) 0.000
                     CFQHighHigh       11.09      (6.0%)        9.61      (6.0%)  -13.3% ( -23% -   -1%) 0.000
                        PKLookup      103.38      (4.4%)      108.04      (4.3%)    4.5% (  -4% -   13%) 0.001
                   CFQHighMedLow       10.58      (6.1%)       12.30      (8.7%)   16.2% (   1% -   33%) 0.000
                      CFQHighMed       10.70      (7.4%)       15.51     (11.2%)   44.9% (  24% -   68%) 0.000
                   CFQHighLowLow        8.18      (8.2%)       12.87     (11.6%)   57.3% (  34% -   84%) 0.000
                      CFQHighLow       14.57      (7.5%)       30.81     (15.1%)  111.4% (  82% -  144%) 0.000

Run 2
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        5.33      (5.7%)        4.02      (7.7%)  -24.4% ( -35% -  -11%) 0.000
                   CFQHighLowLow       17.14      (6.2%)       13.06      (5.4%)  -23.8% ( -33% -  -13%) 0.000
                      CFQHighMed       17.37      (5.8%)       14.38      (7.7%)  -17.2% ( -29% -   -3%) 0.000
                        PKLookup      103.57      (5.5%)      108.84      (5.9%)    5.1% (  -6% -   17%) 0.005
                   CFQHighMedLow       11.25      (7.2%)       12.70      (9.0%)   12.9% (  -3% -   31%) 0.000
                     CFQHighHigh        5.00      (6.2%)        7.54     (12.1%)   51.0% (  30% -   73%) 0.000
                      CFQHighLow       21.60      (5.2%)       34.57     (14.1%)   60.0% (  38% -   83%) 0.000

Run 3
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        5.40      (6.9%)        4.06      (5.1%)  -24.8% ( -34% -  -13%) 0.000
                   CFQHighMedLow        7.64      (7.4%)        5.79      (6.3%)  -24.2% ( -35% -  -11%) 0.000
                     CFQHighHigh       11.11      (7.0%)        9.60      (5.9%)  -13.6% ( -24% -    0%) 0.000
                   CFQHighLowLow       21.21      (7.6%)       21.22      (6.6%)    0.0% ( -13% -   15%) 0.993
                        PKLookup      103.15      (5.9%)      107.60      (6.9%)    4.3% (  -8% -   18%) 0.034
                      CFQHighLow       21.85      (8.1%)       34.18     (13.5%)   56.4% (  32% -   84%) 0.000
                      CFQHighMed       12.07      (8.4%)       19.98     (16.7%)   65.5% (  37% -   98%) 0.000

Run 4
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     CFQHighHigh        8.50      (5.8%)        6.85      (5.2%)  -19.5% ( -28% -   -8%) 0.000
                   CFQHighMedLow       10.89      (5.7%)        8.96      (5.4%)  -17.8% ( -27% -   -7%) 0.000
                      CFQHighMed        8.41      (5.8%)        7.74      (5.6%)   -7.9% ( -18% -    3%) 0.000
                 CFQHighHighHigh        3.45      (6.7%)        3.38      (5.3%)   -2.0% ( -13% -   10%) 0.287
                   CFQHighLowLow        7.82      (6.4%)        8.20      (7.5%)    4.8% (  -8% -   20%) 0.030
                        PKLookup      103.50      (5.0%)      110.69      (5.4%)    6.9% (  -3% -   18%) 0.000
                      CFQHighLow       11.46      (6.0%)       13.16      (6.7%)   14.8% (   1% -   29%) 0.000

@zacharymorn zacharymorn changed the title LUCENE-10061: [WIP] Implements basic dynamic pruning support for CombinedFieldsQuery LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery Nov 6, 2021
Run 1
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      CFQHighMed       16.99      (4.9%)       16.84     (14.9%)   -0.9% ( -19% -   19%) 0.790
                        PKLookup       98.49      (8.7%)      121.99      (9.1%)   23.9% (   5% -   45%) 0.000
                     CFQHighHigh        5.82      (4.7%)        7.55     (10.0%)   29.8% (  14% -   46%) 0.000
                      CFQHighLow       17.59      (5.2%)       25.01     (17.3%)   42.2% (  18% -   68%) 0.000

Run 2
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      CFQHighMed       18.28      (6.1%)       15.63     (15.6%)  -14.5% ( -34% -    7%) 0.000
                        PKLookup      101.26     (10.0%)      105.39     (12.0%)    4.1% ( -16% -   28%) 0.243
                     CFQHighHigh        9.96      (5.6%)       10.47     (13.0%)    5.2% ( -12% -   25%) 0.101
                      CFQHighLow        9.71      (5.8%)       15.71     (34.0%)   61.8% (  20% -  107%) 0.000
@zacharymorn
Copy link
Contributor Author

I re-ran perf test after mikemccand/luceneutil@0550148 has been merged:

Results from python3 src/python/localrun.py -source combinedFieldsBig:

Run 1:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.69      (1.8%)        2.49      (6.2%)  -32.5% ( -39% -  -24%) 0.000
      CFQHighMed        4.95      (2.1%)        4.19      (5.9%)  -15.5% ( -22% -   -7%) 0.000
        PKLookup      125.72      (4.5%)      126.86     (10.3%)    0.9% ( -13% -   16%) 0.719
      CFQHighLow       19.92      (2.2%)       20.80      (9.5%)    4.4% (  -7% -   16%) 0.043

Run 2:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.61      (2.8%)        2.48      (2.9%)  -31.4% ( -36% -  -26%) 0.000
        PKLookup      116.67      (7.1%)      123.97      (5.5%)    6.3% (  -5% -   20%) 0.002
      CFQHighMed        4.97      (3.6%)        5.29      (5.5%)    6.6% (  -2% -   16%) 0.000
      CFQHighLow       11.96      (4.5%)       13.99      (6.5%)   17.0% (   5% -   29%) 0.000

Run 3:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.51      (4.2%)        2.44      (6.5%)  -30.5% ( -39% -  -20%) 0.000
        PKLookup      105.72     (11.9%)      108.81     (11.2%)    2.9% ( -18% -   29%) 0.424
      CFQHighMed       10.85      (4.2%)       11.60     (11.4%)    6.9% (  -8% -   23%) 0.011
      CFQHighLow       15.11      (5.6%)       16.16      (9.8%)    7.0% (  -7% -   23%) 0.006

Results from python3 src/python/localrun.py -source combinedFieldsUnevenlyWeightedBig

Run 1:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
        PKLookup       93.42     (13.7%)       88.23     (11.7%)   -5.6% ( -27% -   23%) 0.168
      CFQHighMed        4.69     (10.7%)        5.00     (18.0%)    6.6% ( -20% -   39%) 0.160
     CFQHighHigh        4.51     (10.6%)        5.17     (17.7%)   14.6% ( -12% -   48%) 0.002
      CFQHighLow       14.13      (8.5%)       23.11     (32.3%)   63.5% (  20% -  114%) 0.000

Run 2:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
      CFQHighMed        4.77      (4.5%)        4.10      (8.3%)  -14.2% ( -25% -   -1%) 0.000
        PKLookup       98.99     (12.3%)      101.47     (12.5%)    2.5% ( -19% -   31%) 0.522
     CFQHighHigh        4.88      (5.3%)        5.98     (11.5%)   22.6% (   5% -   41%) 0.000
      CFQHighLow       11.57      (5.6%)       18.86     (18.8%)   62.9% (  36% -   92%) 0.000

Run 3:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.55      (5.1%)        2.38      (9.0%)  -32.9% ( -44% -  -19%) 0.000
        PKLookup      101.29      (7.0%)       94.22     (15.4%)   -7.0% ( -27% -   16%) 0.065
      CFQHighLow       15.43      (5.8%)       16.60     (11.2%)    7.6% (  -8% -   26%) 0.007
      CFQHighMed        3.12      (5.1%)        3.83     (15.0%)   22.7% (   2% -   45%) 0.000

For one of the most negatively impacted query (-42.0%): CFQHighHigh: at united +combinedFields=titleTokenized^4.0,body^2.0 # freq=2834104 freq=1185528, the JFR CPU profiling result looks like the following

PERCENT       CPU SAMPLES   STACK
15.82%        13099         org.apache.lucene.sandbox.search.CombinedFieldQuery$1$1#doMergeImpactsPerField()
11.46%        9487          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
4.69%         3883          org.apache.lucene.search.DisiPriorityQueue#downHeap()
3.66%         3027          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
1.93%         1598          org.apache.lucene.search.DisjunctionDISIApproximation#advance()
1.92%         1590          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
1.92%         1588          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
1.77%         1467          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
1.68%         1392          org.apache.lucene.search.DisiPriorityQueue#top()
1.60%         1326          org.apache.lucene.search.DisiPriorityQueue#topList()
1.54%         1276          org.apache.lucene.util.PriorityQueue#downHeap()
1.50%         1243          java.lang.Math#round()
1.45%         1201          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.38%         1145          java.util.HashMap#resize()
1.35%         1115          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.33%         1100          org.apache.lucene.sandbox.search.CombinedFieldQuery$1$1#mergeImpactsPerField()
1.21%         1001          java.util.HashMap$HashIterator#nextNode()
1.17%         972           java.util.LinkedList#linkLast()
1.13%         934           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.07%         883           org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
1.06%         878           org.apache.lucene.store.ByteBufferGuard#getByte()
1.02%         841           org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()

suggesting quite some CPU time is spent on merging impacts within the same field. I'm suspecting this may occur when the max score is being computed too frequently, as frequent term's skip list would be "dense" and is also used to determine upTo for max score:

public int advanceShallow(int target) throws IOException {
impactsSource.advanceShallow(target);
Impacts impacts = impactsSource.getImpacts();
return impacts.getDocIdUpTo(0);
}

@zacharymorn
Copy link
Contributor Author

Hi @jpountz @jimczi, when I was doing some more deep dive to see how this PR can be improved further, I noticed a difference of field-weight parameter passed to createMultiNormsLeafSimScorer in CombinedFieldQuery:

CombinedFieldQuery#scorer:

MultiNormsLeafSimScorer scoringSimScorer =
new MultiNormsLeafSimScorer(simWeight, context.reader(), fields, true);

CombinedFieldQuery#explain:

MultiNormsLeafSimScorer docScorer =
new MultiNormsLeafSimScorer(
simWeight, context.reader(), fieldAndWeights.values(), true);

For the CombinedFieldQuery#scorer one, fields may contain duplicated fields:

List<FieldAndWeight> fields = new ArrayList<>();
for (int i = 0; i < fieldTerms.length; i++) {
TermState state = termStates[i].get(context);
if (state != null) {
TermsEnum termsEnum = context.reader().terms(fieldTerms[i].field()).iterator();
termsEnum.seekExact(fieldTerms[i].bytes(), state);
PostingsEnum postingsEnum = termsEnum.postings(null, PostingsEnum.FREQS);
iterators.add(postingsEnum);
fields.add(fieldAndWeights.get(fieldTerms[i].field()));
}
}

, whereas fieldAndWeights.values() should not contain duplicated fields.

I feel CombinedFieldQuery#scorer should be updated to use fieldAndWeights.values() as well? What do you think?

@zacharymorn
Copy link
Contributor Author

Hi @jpountz @jimczi, when I was doing some more deep dive to see how this PR can be improved further, I noticed a difference of field-weight parameter passed to createMultiNormsLeafSimScorer in CombinedFieldQuery:

CombinedFieldQuery#scorer:

MultiNormsLeafSimScorer scoringSimScorer =
new MultiNormsLeafSimScorer(simWeight, context.reader(), fields, true);

CombinedFieldQuery#explain:

MultiNormsLeafSimScorer docScorer =
new MultiNormsLeafSimScorer(
simWeight, context.reader(), fieldAndWeights.values(), true);

For the CombinedFieldQuery#scorer one, fields may contain duplicated fields:

List<FieldAndWeight> fields = new ArrayList<>();
for (int i = 0; i < fieldTerms.length; i++) {
TermState state = termStates[i].get(context);
if (state != null) {
TermsEnum termsEnum = context.reader().terms(fieldTerms[i].field()).iterator();
termsEnum.seekExact(fieldTerms[i].bytes(), state);
PostingsEnum postingsEnum = termsEnum.postings(null, PostingsEnum.FREQS);
iterators.add(postingsEnum);
fields.add(fieldAndWeights.get(fieldTerms[i].field()));
}
}

, whereas fieldAndWeights.values() should not contain duplicated fields.

I feel CombinedFieldQuery#scorer should be updated to use fieldAndWeights.values() as well? What do you think?

For now I have created a spin-off issue https://issues.apache.org/jira/browse/LUCENE-10236 and a separate PR #444 for this. Please let me know if they look good to you.

}

return result;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suspecting this may occur when the max score is being computed too frequently, as frequent term's skip list would be "dense" and is also used to determine upTo for max score

That would make sense to me. Maybe update this class so that numLevels / getDocIdUpTo only use the impacts of the term that has the highest weight?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I assume by "highest weight" here, you meant term that has lower doc frequencies, as opposed to field weight?

I also did a quick test with the following updated numLevels / getDocIdupTo implementations to approximate using lower doc frequencies term's impact

@Override
public int numLevels() {
  // this is changed from Integer.MIN_VALUE
  int result = Integer.MAX_VALUE;
 
  // this is changed from Math.max
  for (Impacts impacts : leadingImpactsPerField.values()) {
    result = Math.min(result, impacts.numLevels());
  }

  return result;
}

@Override
public int getDocIdUpTo(int level) {
  // this is changed from Integer.MAX_VALUE
  int result = Integer.MIN_VALUE;

  for (Impacts impacts : leadingImpactsPerField.values()) {
    if (impacts.numLevels() > level) {
      // this is changed from Math.min
      result = Math.max(result, impacts.getDocIdUpTo(level));
    }
  }

  return result;
}

For the slow query CFQHighHigh: at united +combinedFields=titleTokenized^4.0,body^2.0 # freq=2834104 freq=1185528 above, it did improve the from -42% to -30% ~ -35%, with the following JFR CPU result:

PERCENT       CPU SAMPLES   STACK
19.41%        11866         org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
7.35%         4491          org.apache.lucene.search.DisiPriorityQueue#downHeap()
6.07%         3713          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
3.59%         2193          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
3.50%         2143          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
3.24%         1983          org.apache.lucene.search.DisjunctionDISIApproximation#advance()
3.09%         1889          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
2.87%         1752          org.apache.lucene.search.DisiPriorityQueue#top()
2.75%         1681          java.lang.Math#round()
2.71%         1657          org.apache.lucene.search.DisiPriorityQueue#topList()
2.54%         1555          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
2.36%         1441          org.apache.lucene.store.ByteBufferGuard#ensureValid()
2.11%         1292          org.apache.lucene.util.SmallFloat#longToInt4()
1.87%         1142          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
1.73%         1058          org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.71%         1045          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
1.68%         1030          org.apache.lucene.store.ByteBufferGuard#getByte()
1.59%         970           jdk.internal.misc.Unsafe#getByte()
1.57%         959           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.55%         948           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance()
1.18%         720           org.apache.lucene.search.ImpactsDISI#docID()
0.84%         515           org.apache.lucene.sandbox.search.CombinedFieldQuery$1$1#doMergeImpactsPerField()
0.79%         485           org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.76%         462           org.apache.lucene.search.DisiPriorityQueue#prepend()
0.73%         444           java.lang.Math#toIntExact()
0.60%         367           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#freq()
0.53%         324           org.apache.lucene.search.ImpactsDISI#advanceTarget()
0.51%         314           org.apache.lucene.codecs.MultiLevelSkipListReader#skipTo()
0.50%         306           org.apache.lucene.util.SmallFloat#intToByte4()
0.49%         299           org.apache.lucene.search.ImpactsDISI#nextDoc()

This CPU profiling result looks very similar to that of baseline. I'll do more testings to understand why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jpountz , I just noticed a place in the implementation that was called multiple times, and thus tried to "cache" the result to save the repeated computation 6d5e780. It turned out the improvement was quite obvious:

Results from python3 src/python/localrun.py -source combinedFieldsBig

Run 1

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.17      (3.5%)        2.70      (4.6%)  -14.8% ( -22% -   -6%) 0.000
        PKLookup      117.15      (6.7%)      116.31     (10.3%)   -0.7% ( -16% -   17%) 0.792
      CFQHighMed       12.98      (5.2%)       18.26     (13.9%)   40.7% (  20% -   63%) 0.000
      CFQHighLow       17.30      (3.6%)       36.19     (20.1%)  109.2% (  82% -  137%) 0.000

Run 2

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
        PKLookup      104.98      (6.9%)      115.76      (8.5%)   10.3% (  -4% -   27%) 0.000
      CFQHighMed        5.73      (3.9%)        6.53      (8.4%)   14.0% (   1% -   27%) 0.000
      CFQHighLow       17.87      (3.4%)       20.37      (8.1%)   14.0% (   2% -   26%) 0.000
     CFQHighHigh        5.41      (4.2%)        6.53      (8.9%)   20.7% (   7% -   35%) 0.000

Run 3

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.18      (3.6%)        2.66      (5.3%)  -16.4% ( -24% -   -7%) 0.000
      CFQHighMed        5.75      (4.0%)        5.07      (5.5%)  -11.7% ( -20% -   -2%) 0.000
        PKLookup      136.87      (5.9%)      133.97      (9.4%)   -2.1% ( -16% -   14%) 0.393
      CFQHighLow       23.19      (2.9%)       38.74     (16.0%)   67.0% (  46% -   88%) 0.000

Results from python3 src/python/localrun.py -source combinedFieldsUnevenlyWeightedBig

Run 1

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
        PKLookup      102.30      (8.5%)      101.05     (15.1%)   -1.2% ( -22% -   24%) 0.753
      CFQHighMed        5.73      (6.7%)        6.61     (12.5%)   15.5% (  -3% -   37%) 0.000
     CFQHighHigh        5.37      (6.2%)        7.28     (12.7%)   35.6% (  15% -   58%) 0.000
      CFQHighLow       13.61      (6.8%)       27.02     (34.4%)   98.5% (  53% -  150%) 0.000

Run 2

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
      CFQHighMed        5.63      (4.0%)        4.87      (6.6%)  -13.5% ( -23% -   -2%) 0.000
        PKLookup       95.27      (9.3%)      102.84      (9.0%)    7.9% (  -9% -   28%) 0.006
     CFQHighHigh        5.45      (3.7%)        7.43     (10.6%)   36.3% (  21% -   52%) 0.000
      CFQHighLow       13.91      (4.6%)       29.42     (28.2%)  111.5% (  75% -  151%) 0.000

Run 3

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
        PKLookup       74.67     (39.5%)       97.84     (23.3%)   31.0% ( -22% -  155%) 0.002
     CFQHighHigh        5.61      (4.5%)        7.62     (10.0%)   35.8% (  20% -   52%) 0.000
      CFQHighMed       10.44      (3.5%)       15.65     (23.8%)   50.0% (  21% -   80%) 0.000
      CFQHighLow       14.00      (4.2%)       25.99     (33.0%)   85.6% (  46% -  128%) 0.000

However, this optimization doesn't help much for query CFQHighHigh: at united +combinedFields=titleTokenized^4.0,body^2.0 # freq=2834104 freq=1185528, as the dense skip list for at still cause frequent max score computation (although giving the field titleTokenized more weight does indeed reduce perform hit from -42.0% to -34.3%). I tried to play with some different numLevels and getDocIdUpTo implementations, but so far I find it hard to balance the following two:

  1. lower docIdUpTo -> lower max score -> more frequent max score computation, but also more effective pruning due to lower max score
  2. higher docIdUpTo -> higher max score -> less frequent max score computation, but also less effective in pruning due to higher max score

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for running these experiments, the performance does look much better now indeed! I would feel good about merging something that has these performance characteristics.

I assume by "highest weight" here, you meant term that has lower doc frequencies, as opposed to field weight?

I meant field weight. Since we're using impacts from the field that has the highest weight and approximating impacts from other fields, my assumption was that it would help get better max score approximations.

That said, I expect that it would ofter have the same result as picking the field that has the lowest doc freq since fields that have higher weight tend to have fewer terms per doc, so doc freqs are generally lower?

result = Math.max(result, impacts.getDocIdUpTo(level));

When it was Math.min, it hurt us because we would have to recompute score boundaries often. But with Math.max we are forcing all fields but one to return impacts on the next level. My gut feeling is that returning the getDocIdUpTo of the field that has the highest weight (or lowest doc freq as you suggested) would perform better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jpountz for the confirmation and clarification! I gave these ideas a try in this commit db2446f, with the following two similar approaches:

  1. Use the impacts of a) highest weight field + b) lowest doc freq term within that field (commented out)
  2. Use the impacts of a) highest weight field + b) lowest getDocIdUpTo term within that field

Here are the results from python3 src/python/localrun.py -source combinedFieldsUnevenlyWeightedBig

Run 1:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
      CFQHighMed        5.46      (7.7%)        4.58      (7.1%)  -16.1% ( -28% -   -1%) 0.000
      CFQHighLow       21.83      (7.7%)       18.86      (7.2%)  -13.6% ( -26% -    1%) 0.000
     CFQHighHigh        3.44      (7.4%)        3.06      (7.4%)  -11.1% ( -24% -    4%) 0.000
        PKLookup      109.63     (11.8%)      109.67     (13.5%)    0.0% ( -22% -   28%) 0.993

Run 2:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
      CFQHighLow       17.22     (10.2%)       11.73     (12.0%)  -31.9% ( -49% -  -10%) 0.000
      CFQHighMed       10.20     (10.5%)        8.36      (9.3%)  -18.1% ( -34% -    1%) 0.000
     CFQHighHigh        3.65      (6.6%)        3.11      (5.3%)  -14.8% ( -25% -   -3%) 0.000
        PKLookup       91.80     (26.0%)      100.13     (16.3%)    9.1% ( -26% -   69%) 0.187

Run 3:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
      CFQHighMed        5.62      (8.1%)        4.86      (8.0%)  -13.5% ( -27% -    2%) 0.000
     CFQHighHigh        4.13      (6.9%)        3.59      (5.8%)  -13.0% ( -24% -    0%) 0.000
      CFQHighLow       17.00      (8.2%)       15.06      (9.7%)  -11.4% ( -27% -    7%) 0.000
        PKLookup       99.61      (6.4%)      102.96      (6.7%)    3.4% (  -9% -   17%) 0.104

and their CPU profiling looks very similar to that of baseline, suggesting that pruning was not very effective there.

Interestingly, when I flipped the weights assigned to body and titleTokenized fields in the task (assigned 20.0 to body field, and 2.0 to titleTokenized field), I got the following result:

Run 1:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
        PKLookup       99.64      (9.1%)       94.80     (11.6%)   -4.9% ( -23% -   17%) 0.142
      CFQHighLow       17.31      (7.2%)       19.16     (15.0%)   10.7% ( -10% -   35%) 0.004
      CFQHighMed        3.41      (6.4%)        4.57     (13.6%)   34.2% (  13% -   57%) 0.000
     CFQHighHigh        3.28      (6.3%)        5.32     (16.8%)   62.2% (  36% -   91%) 0.000

Run 2:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        4.10      (6.3%)        2.72      (5.8%)  -33.6% ( -42% -  -23%) 0.000
      CFQHighMed        5.69      (5.6%)        4.48      (8.2%)  -21.4% ( -33% -   -8%) 0.000
        PKLookup      104.53     (21.1%)      104.06     (16.8%)   -0.5% ( -31% -   47%) 0.940
      CFQHighLow       22.52      (7.4%)       35.31     (21.3%)   56.8% (  26% -   92%) 0.000

Run 3:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
        PKLookup       80.39     (31.2%)       93.55     (19.4%)   16.4% ( -26% -   97%) 0.046
     CFQHighHigh        5.71      (6.6%)        6.96     (13.4%)   21.7% (   1% -   44%) 0.000
      CFQHighMed       10.21      (5.6%)       12.62     (22.0%)   23.6% (  -3% -   54%) 0.000
      CFQHighLow       16.89      (4.2%)       27.39     (22.9%)   62.1% (  33% -   93%) 0.000

I think what might have happened here is, as highest weighted field / lowest doc freq term tends to have fewer terms per doc, by definition their matches are also more sparse in the corpus, compared with field with lower weight (i.e. there are more docs with a specific term showing up in its body field, compared with titleTokenized field). Hence when returning getDocIdUpTo of the field that has the highest weight, a larger bound will be returned (probably close to Math.max?), causing max score to be much higher and less effective in pruning.

What do you think of the commit and results? Is there anything I may have missed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-ran the perf tests combinedFieldsUnevenlyWeightedBig after bug fix 3d0a215, and the results look similar:

With commit test leadImpact from highest weighted field db2446f

Run 1:

	    TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
     CFQHighHigh        2.81     (10.9%)        2.40     (10.3%)  -14.6% ( -32% -    7%) 0.000
      CFQHighMed        3.27     (11.8%)        2.91     (11.0%)  -11.2% ( -30% -   13%) 0.002
      CFQHighLow       16.09     (14.6%)       14.58     (11.9%)   -9.4% ( -31% -   20%) 0.025
	PKLookup       92.08     (11.6%)       96.23     (13.3%)    4.5% ( -18% -   33%) 0.255

Run 2:

	    TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
      CFQHighLow       15.76     (12.0%)       10.19     (12.9%)  -35.3% ( -53% -  -11%) 0.000
      CFQHighMed       12.01     (13.3%)        8.79     (13.9%)  -26.9% ( -47% -    0%) 0.000
     CFQHighHigh        5.59     (10.8%)        4.40      (8.4%)  -21.2% ( -36% -   -2%) 0.000
	PKLookup       93.36     (21.9%)       94.29     (22.6%)    1.0% ( -35% -   58%) 0.888

Without commit test leadImpact from highest weighted field db2446f

Run 1:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
      CFQHighLow       16.69      (7.9%)       15.02     (14.1%)  -10.0% ( -29% -   13%) 0.006
        PKLookup       94.38      (9.7%)      105.59     (13.3%)   11.9% ( -10% -   38%) 0.001
     CFQHighHigh        5.51      (7.0%)        7.59     (13.5%)   37.6% (  15% -   62%) 0.000
      CFQHighMed       10.24      (7.2%)       15.23     (20.0%)   48.8% (  20% -   81%) 0.000

Run 2:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
        PKLookup       94.85     (11.6%)       92.29     (11.0%)   -2.7% ( -22% -   22%) 0.448
     CFQHighHigh        5.26     (13.7%)        6.25     (23.4%)   18.9% ( -16% -   64%) 0.002
      CFQHighMed        3.39     (16.1%)        4.55     (29.7%)   34.3% (  -9% -   95%) 0.000
      CFQHighLow       16.04     (14.9%)       29.22     (39.6%)   82.2% (  24% -  160%) 0.000

Run 3:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
        PKLookup       68.96     (40.4%)       67.26     (40.1%)   -2.5% ( -59% -  130%) 0.847
     CFQHighHigh        4.85     (16.7%)        5.62     (33.8%)   15.7% ( -29% -   79%) 0.062
      CFQHighMed       10.67     (20.1%)       12.47     (42.7%)   16.9% ( -38% -   99%) 0.110
      CFQHighLow       14.07     (22.6%)       23.18     (68.7%)   64.8% ( -21% -  201%) 0.000

I guess the scoring function has taken care of / bounded the erroneous norm used in approximation earlier as it was effectively Long.MIN_VALUE.

Results from python3 src/python/localrun.py -source combinedFieldsUnevenlyWeightedBig:

Run 1:
	    TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
     CFQHighHigh        2.81     (10.9%)        2.40     (10.3%)  -14.6% ( -32% -    7%) 0.000
      CFQHighMed        3.27     (11.8%)        2.91     (11.0%)  -11.2% ( -30% -   13%) 0.002
      CFQHighLow       16.09     (14.6%)       14.58     (11.9%)   -9.4% ( -31% -   20%) 0.025
	PKLookup       92.08     (11.6%)       96.23     (13.3%)    4.5% ( -18% -   33%) 0.255

Run 2:
	    TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
      CFQHighLow       15.76     (12.0%)       10.19     (12.9%)  -35.3% ( -53% -  -11%) 0.000
      CFQHighMed       12.01     (13.3%)        8.79     (13.9%)  -26.9% ( -47% -    0%) 0.000
     CFQHighHigh        5.59     (10.8%)        4.40      (8.4%)  -21.2% ( -36% -   -2%) 0.000
	PKLookup       93.36     (21.9%)       94.29     (22.6%)    1.0% ( -35% -   58%) 0.888
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making progress on this and sorry for the slow review cycles. I just had a quick look and it seems that some computations can be move up-front such as deciding what field we use as a lead since field weights do not change during query execution?

I wonder if there's things we can do to make the code simpler, such as sharing the logic to combine multiple terms together within the same field with SynonymQuery.

@@ -402,14 +423,30 @@ public Explanation explain(LeafReaderContext context, int doc) throws IOExceptio
public Scorer scorer(LeafReaderContext context) throws IOException {
List<PostingsEnum> iterators = new ArrayList<>();
List<FieldAndWeight> fields = new ArrayList<>();
Map<String, List<ImpactsEnum>> fieldImpactsEnum = new HashMap<>(fieldAndWeights.size());
Map<String, List<Integer>> fieldTermDocFreq = new HashMap<>(fieldAndWeights.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually nee this list of doc freqs? They would be equal to impactsEnum#cost all the time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was actually not aware that ImpactsEnum#cost returns doc freqs directly. Will remove it later.

maxWeight = weight;
maxWeightField = field;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since field weights do not change over time, could we compute the field that has the higest weight up-front instead of doing it every time getImpacts is called?

@zacharymorn
Copy link
Contributor Author

Thanks for making progress on this and sorry for the slow review cycles.

No worry and thanks for the review and feedback!

I just had a quick look and it seems that some computations can be move up-front such as deciding what field we use as a lead since field weights do not change during query execution?

Yup I can move it out to avoid repeated computation there. Just a quick check though, from the perf results so far (#418 (comment) & #418 (comment)), it seems using the most weighted field to decide on the leading impacts may not yield the best result, as matches in most weighted field may also give a high docId upTo bound. If we were to drop that approach and revert the commit db2446f that introduced it, then it won't be computing most weighted field at all. Are you trying to see if perf tests can be further improved if most weighted field only gets computed once?

I wonder if there's things we can do to make the code simpler, such as sharing the logic to combine multiple terms together within the same field with SynonymQuery.

Yeah I have the same thought there. I wasn't sure earlier as SynonymQuery and CombinedFieldsQuery sit in different modules / packages, so extracting common code out may potentially require a new public specialized API / class (in the core module ?). Let me give that a try and see how it looks!

@jpountz
Copy link
Contributor

jpountz commented Dec 1, 2021

from the perf results so far, it seems using the most weighted field to decide on the leading impacts may not yield the best result
Are you trying to see if perf tests can be further improved if most weighted field only gets computed once?

Yes. I liked this approach because it felt like it should work relatively well since the field with the highest weight should drive scores anyway, and deciding about which clause leads impacts up-front felt like it could simplify the logic a bit. But if it doesn't yield better results, let's fall back to the previous approach. Let's maybe just double check with a profiler that the reason why this approach performs worse is actually because we get worse score boundaries and not because of some avoidable slow code like iterating or lookip up the various hash maps?

wasn't sure earlier as SynonymQuery and CombinedFieldsQuery sit in different modules / packages, so extracting common code out may potentially require a new public specialized API / class (in the core module ?). Let me give that a try and see how it looks!

Maybe we could have a SynonymImpactsSource or something like that as a public @lucene.internal class in core that would provide this logic.

… replace doc freq collection with ImpactsEnum#cost
            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
      CFQHighLow       21.45     (13.8%)       15.65     (18.0%)  -27.1% ( -51% -    5%) 0.000
      CFQHighMed        5.92      (4.2%)        4.52     (12.2%)  -23.7% ( -38% -   -7%) 0.000
     CFQHighHigh        5.52      (5.0%)        4.45     (12.1%)  -19.4% ( -34% -   -2%) 0.000
        PKLookup      110.45     (11.1%)      108.08     (15.7%)   -2.1% ( -26% -   27%) 0.617

= Candidate CPU JFR
PROFILE SUMMARY from 86268 events (total: 86268)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
15.48%        13355         org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
7.37%         6359          org.apache.lucene.search.DisjunctionDISIApproximation#advance()
7.35%         6345          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
6.38%         5508          org.apache.lucene.search.DisiPriorityQueue#downHeap()
4.18%         3605          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
4.10%         3538          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
4.01%         3456          org.apache.lucene.search.DisiPriorityQueue#topList()
3.67%         3169          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
3.36%         2899          org.apache.lucene.search.DisiPriorityQueue#top()
3.08%         2660          java.lang.Math#round()
2.40%         2067          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.38%         2057          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
1.94%         1675          org.apache.lucene.util.SmallFloat#longToInt4()
1.87%         1617          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.86%         1605          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.80%         1557          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.50%         1293          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance()
1.39%         1200          org.apache.lucene.search.ImpactsDISI#docID()
1.33%         1145          org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.24%         1067          jdk.internal.misc.Unsafe#getByte()
1.18%         1019          org.apache.lucene.store.ByteBufferGuard#getByte()
0.92%         792           org.apache.lucene.util.SmallFloat#intToByte4()
0.76%         659           java.lang.Math#toIntExact()
0.68%         584           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#freq()
0.67%         579           org.apache.lucene.search.ImpactsDISI#nextDoc()
0.61%         530           org.apache.lucene.search.ImpactsDISI#advanceTarget()
0.58%         500           org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.57%         492           org.apache.lucene.search.DisiPriorityQueue#prepend()
0.40%         345           org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32()
0.39%         333           org.apache.lucene.codecs.lucene90.ForUtil#expand8()

= Baseline CPU JFR
PROFILE SUMMARY from 76683 events (total: 76683)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
16.26%        12468         org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
8.46%         6490          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
8.24%         6315          org.apache.lucene.search.DisjunctionDISIApproximation#nextDoc()
6.13%         4698          org.apache.lucene.search.DisiPriorityQueue#downHeap()
4.79%         3676          org.apache.lucene.search.DisiPriorityQueue#topList()
4.45%         3412          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
4.32%         3314          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
3.25%         2492          java.lang.Math#round()
3.01%         2309          org.apache.lucene.search.DisiPriorityQueue#top()
2.75%         2107          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
2.62%         2010          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
2.29%         1753          org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
2.12%         1623          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.05%         1569          org.apache.lucene.util.SmallFloat#longToInt4()
1.93%         1481          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.77%         1358          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.45%         1110          org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.25%         958           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
1.19%         914           org.apache.lucene.store.ByteBufferGuard#getByte()
1.17%         898           java.lang.Math#toIntExact()
1.06%         815           org.apache.lucene.util.SmallFloat#intToByte4()
1.02%         782           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#freq()
0.86%         656           jdk.internal.misc.Unsafe#getByte()
0.68%         518           org.apache.lucene.codecs.lucene90.PForUtil#decodeAndPrefixSum()
0.64%         492           org.apache.lucene.search.DisiPriorityQueue#prepend()
0.49%         375           org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32()
0.42%         322           org.apache.lucene.codecs.lucene90.ForUtil#expand8()
0.31%         239           java.util.zip.Inflater#inflateBytesBytes()
0.30%         228           org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#seekExact()
0.29%         224           org.apache.lucene.codecs.lucene90.PForUtil#expand32()
            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
     CFQHighHigh        3.68      (6.5%)        2.66      (7.3%)  -27.8% ( -39% -  -14%) 0.000
      CFQHighLow       14.19      (7.9%)       11.31      (8.0%)  -20.3% ( -33% -   -4%) 0.000
      CFQHighMed       12.92      (9.5%)       10.41      (9.9%)  -19.4% ( -35% -    0%) 0.000
        PKLookup      104.92      (6.7%)      104.49      (9.0%)   -0.4% ( -15% -   16%) 0.869

= Candidate CPU JFR
PROFILE SUMMARY from 87201 events (total: 87201)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
15.13%        13191         org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
7.42%         6466          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
7.29%         6354          org.apache.lucene.search.DisjunctionDISIApproximation#advance()
6.79%         5921          org.apache.lucene.search.DisiPriorityQueue#downHeap()
4.34%         3785          org.apache.lucene.search.DisiPriorityQueue#topList()
3.97%         3465          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
3.85%         3360          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
3.18%         2777          java.lang.Math#round()
3.01%         2624          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
3.01%         2624          org.apache.lucene.search.DisiPriorityQueue#top()
2.33%         2036          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
2.20%         1917          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.07%         1809          org.apache.lucene.util.SmallFloat#longToInt4()
1.81%         1580          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.78%         1555          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance()
1.77%         1545          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.51%         1316          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.39%         1210          org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.29%         1124          org.apache.lucene.search.ImpactsDISI#docID()
1.22%         1063          jdk.internal.misc.Unsafe#getByte()
1.19%         1037          org.apache.lucene.store.ByteBufferGuard#getByte()
1.08%         938           org.apache.lucene.search.DisiPriorityQueue#prepend()
0.97%         849           org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.86%         750           org.apache.lucene.codecs.MultiLevelSkipListReader#skipTo()
0.82%         717           org.apache.lucene.util.SmallFloat#intToByte4()
0.78%         683           java.lang.Math#toIntExact()
0.62%         540           org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.61%         529           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#freq()
0.51%         441           org.apache.lucene.search.ImpactsDISI#nextDoc()
0.50%         435           org.apache.lucene.search.ImpactsDISI#advanceTarget()

= Baseline CPU JFR
PROFILE SUMMARY from 81578 events (total: 81578)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
16.38%        13363         org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
8.67%         7073          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
8.05%         6569          org.apache.lucene.search.DisjunctionDISIApproximation#nextDoc()
6.50%         5305          org.apache.lucene.search.DisiPriorityQueue#downHeap()
5.17%         4220          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
4.40%         3587          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
4.30%         3505          org.apache.lucene.search.DisiPriorityQueue#topList()
3.37%         2751          java.lang.Math#round()
2.77%         2263          org.apache.lucene.search.DisiPriorityQueue#top()
2.64%         2156          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
2.62%         2135          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
2.12%         1729          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.11%         1723          org.apache.lucene.util.SmallFloat#longToInt4()
2.06%         1678          org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
2.01%         1640          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.77%         1442          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.62%         1325          org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.26%         1028          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
1.10%         899           java.lang.Math#toIntExact()
1.10%         898           org.apache.lucene.util.SmallFloat#intToByte4()
1.09%         888           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#freq()
1.06%         864           org.apache.lucene.store.ByteBufferGuard#getByte()
0.80%         652           jdk.internal.misc.Unsafe#getByte()
0.74%         605           org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.68%         552           org.apache.lucene.search.DisiPriorityQueue#prepend()
0.60%         492           org.apache.lucene.codecs.lucene90.PForUtil#decodeAndPrefixSum()
0.55%         451           java.io.RandomAccessFile#readBytes()
0.38%         314           org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32()
0.38%         307           org.apache.lucene.codecs.lucene90.ForUtil#expand8()
0.32%         260           org.apache.lucene.codecs.lucene90.PForUtil#expand32()
@zacharymorn
Copy link
Contributor Author

Maybe we could have a SynonymImpactsSource or something like that as a public @lucene.internal class in core that would provide this logic.

Ah yes using @lucene.internal makes sense! I have made a new class to extract out the impacts merging within same field logic in 0a9bdcc, and use that in CombinedFieldsQuery 808fec2. I've also moved up the repeated max weight field computation in 8a7ea99 .

@zacharymorn
Copy link
Contributor Author

Yes. I liked this approach because it felt like it should work relatively well since the field with the highest weight should drive scores anyway, and deciding about which clause leads impacts up-front felt like it could simplify the logic a bit. But if it doesn't yield better results, let's fall back to the previous approach. Let's maybe just double check with a profiler that the reason why this approach performs worse is actually because we get worse score boundaries and not because of some avoidable slow code like iterating or lookip up the various hash maps?

I see. The latest two commits 808fec2 & 75c5b04 used max weight field to drive scoring, and overall they had around -20% impacts to tasks from combinedFieldsUnevenlyWeightedBig so far. A sample CPU JFR results between candidate and baseline look like the following:

Candidate CPU JFR:

PERCENT       CPU SAMPLES   STACK
15.13%        13191         org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
7.42%         6466          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
7.29%         6354          org.apache.lucene.search.DisjunctionDISIApproximation#advance()
6.79%         5921          org.apache.lucene.search.DisiPriorityQueue#downHeap()
4.34%         3785          org.apache.lucene.search.DisiPriorityQueue#topList()
3.97%         3465          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
3.85%         3360          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
3.18%         2777          java.lang.Math#round()
3.01%         2624          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
3.01%         2624          org.apache.lucene.search.DisiPriorityQueue#top()
2.33%         2036          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
2.20%         1917          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.07%         1809          org.apache.lucene.util.SmallFloat#longToInt4()
1.81%         1580          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.78%         1555          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance()
1.77%         1545          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.51%         1316          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.39%         1210          org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.29%         1124          org.apache.lucene.search.ImpactsDISI#docID()
1.22%         1063          jdk.internal.misc.Unsafe#getByte()
1.19%         1037          org.apache.lucene.store.ByteBufferGuard#getByte()
1.08%         938           org.apache.lucene.search.DisiPriorityQueue#prepend()
0.97%         849           org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.86%         750           org.apache.lucene.codecs.MultiLevelSkipListReader#skipTo()
0.82%         717           org.apache.lucene.util.SmallFloat#intToByte4()
0.78%         683           java.lang.Math#toIntExact()
0.62%         540           org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.61%         529           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#freq()
0.51%         441           org.apache.lucene.search.ImpactsDISI#nextDoc()
0.50%         435           org.apache.lucene.search.ImpactsDISI#advanceTarget()

Baseline CPU JFR

PERCENT       CPU SAMPLES   STACK
16.38%        13363         org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
8.67%         7073          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
8.05%         6569          org.apache.lucene.search.DisjunctionDISIApproximation#nextDoc()
6.50%         5305          org.apache.lucene.search.DisiPriorityQueue#downHeap()
5.17%         4220          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
4.40%         3587          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
4.30%         3505          org.apache.lucene.search.DisiPriorityQueue#topList()
3.37%         2751          java.lang.Math#round()
2.77%         2263          org.apache.lucene.search.DisiPriorityQueue#top()
2.64%         2156          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
2.62%         2135          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
2.12%         1729          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.11%         1723          org.apache.lucene.util.SmallFloat#longToInt4()
2.06%         1678          org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
2.01%         1640          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.77%         1442          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.62%         1325          org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.26%         1028          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
1.10%         899           java.lang.Math#toIntExact()
1.10%         898           org.apache.lucene.util.SmallFloat#intToByte4()
1.09%         888           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#freq()
1.06%         864           org.apache.lucene.store.ByteBufferGuard#getByte()
0.80%         652           jdk.internal.misc.Unsafe#getByte()
0.74%         605           org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.68%         552           org.apache.lucene.search.DisiPriorityQueue#prepend()
0.60%         492           org.apache.lucene.codecs.lucene90.PForUtil#decodeAndPrefixSum()
0.55%         451           java.io.RandomAccessFile#readBytes()
0.38%         314           org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32()
0.38%         307           org.apache.lucene.codecs.lucene90.ForUtil#expand8()
0.32%         260           org.apache.lucene.codecs.lucene90.PForUtil#expand32()

As they look very similar, based on my previous debugging, it may suggest the max score being computed is high and most of the docs are not being skipped, hence the candidate implementation ended up examining the same amount of docs with baseline.

* Merge impacts from multiple impactsEnum (terms matches) within the same field. The high level
* logic is to combine freqs that have the same norm from impacts.
*/
public static List<Impact> mergeImpactsPerField(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MixedMutabilityReturnType: This method returns both mutable and immutable collections or maps from different paths. This may be confusing for users of the method. (details)
(at-me in a reply with help or ignore)

@zacharymorn
Copy link
Contributor Author

I just added two logging statements in ImpactsDISI to print out upTo and max score comparison to examine the docs pruning results:

diff --git a/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java b/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
index 843bb4ad57b..3930ae2f50d 100644
--- a/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
+++ b/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
@@ -110,8 +110,10 @@ public final class ImpactsDISI extends DocIdSetIterator {
       assert upTo >= target;
 
       if (maxScore >= minCompetitiveScore) {
+        System.out.println("Keeping - target: " + target + " upTo: " + upTo + " minCompScore: " + minCompetitiveScore + " maxScore: " + maxScore);
         return target;
       }
+      System.out.println("Skipping - target: " + target + " upTo: " + upTo + " minCompScore: " + minCompetitiveScore + " maxScore: " + maxScore);
 
       if (upTo == NO_MORE_DOCS) {
         return NO_MORE_DOCS;

and the following query was used:

CombinedFieldQuery query =
            new CombinedFieldQuery.Builder()
                    .addField("titleTokenized", (float) 20.0)
                    .addField("body", (float) 1.0)
                    .addTerm(new BytesRef("three"))
                    .addTerm(new BytesRef("geronimo"))
                    .build();

For the implementation that uses mostly weighted field and lowest Impacts#getDocIdUpTo to drive scoring, upTo and maxScore were high, hence there was no pruning.

  1> Keeping - target: 152 upTo: 232464 minCompScore: 0.17629458 maxScore: 2.1560352
  1> Keeping - target: 159 upTo: 232464 minCompScore: 0.18669213 maxScore: 2.1560352
  1> Keeping - target: 160 upTo: 232464 minCompScore: 0.20146967 maxScore: 2.1560352
  1> Keeping - target: 161 upTo: 232464 minCompScore: 0.21165837 maxScore: 2.1560352
  1> Keeping - target: 167 upTo: 232464 minCompScore: 0.21696068 maxScore: 2.1560352
  1> Keeping - target: 170 upTo: 232464 minCompScore: 0.23223151 maxScore: 2.1560352
  1> Keeping - target: 171 upTo: 232464 minCompScore: 0.26476994 maxScore: 2.1560352
  1> Keeping - target: 172 upTo: 232464 minCompScore: 0.27027848 maxScore: 2.1560352
  1> Keeping - target: 174 upTo: 232464 minCompScore: 0.27216592 maxScore: 2.1560352
  1> Keeping - target: 175 upTo: 232464 minCompScore: 0.27311972 maxScore: 2.1560352
  1> Keeping - target: 176 upTo: 232464 minCompScore: 0.28871515 maxScore: 2.1560352
  1> Keeping - target: 179 upTo: 232464 minCompScore: 0.2955102 maxScore: 2.1560352
  1> Keeping - target: 185 upTo: 232464 minCompScore: 0.31796065 maxScore: 2.1560352
  1> Keeping - target: 187 upTo: 232464 minCompScore: 0.32868698 maxScore: 2.1560352
  1> Keeping - target: 189 upTo: 232464 minCompScore: 0.33091953 maxScore: 2.1560352
  1> Keeping - target: 190 upTo: 232464 minCompScore: 0.353465 maxScore: 2.1560352
  1> Keeping - target: 191 upTo: 232464 minCompScore: 0.36112097 maxScore: 2.1560352
  1> Keeping - target: 194 upTo: 232464 minCompScore: 0.3743663 maxScore: 2.1560352
  1> Keeping - target: 195 upTo: 232464 minCompScore: 0.38150516 maxScore: 2.1560352
  1> Keeping - target: 197 upTo: 232464 minCompScore: 0.39127985 maxScore: 2.1560352
  1> Keeping - target: 198 upTo: 232464 minCompScore: 0.39738885 maxScore: 2.1560352
  1> Keeping - target: 208 upTo: 232464 minCompScore: 0.39789525 maxScore: 2.1560352
  1> Keeping - target: 216 upTo: 232464 minCompScore: 0.41046718 maxScore: 2.1560352
  1> Keeping - target: 219 upTo: 232464 minCompScore: 0.4148362 maxScore: 2.1560352
  1> Keeping - target: 224 upTo: 232464 minCompScore: 0.4204301 maxScore: 2.1560352
  1> Keeping - target: 227 upTo: 232464 minCompScore: 0.43892494 maxScore: 2.1560352
  1> Keeping - target: 234 upTo: 232464 minCompScore: 0.4551008 maxScore: 2.1560352
  1> Keeping - target: 235 upTo: 232464 minCompScore: 0.47162262 maxScore: 2.1560352
  1> Keeping - target: 238 upTo: 232464 minCompScore: 0.47382453 maxScore: 2.1560352
  1> Keeping - target: 239 upTo: 232464 minCompScore: 0.48034182 maxScore: 2.1560352
  1> Keeping - target: 243 upTo: 232464 minCompScore: 0.48332027 maxScore: 2.1560352
  1> Keeping - target: 245 upTo: 232464 minCompScore: 0.4903498 maxScore: 2.1560352
  1> Keeping - target: 246 upTo: 232464 minCompScore: 0.5026699 maxScore: 2.1560352
...

For the implementation that uses mostly weighted field and lowest cost / doc freq to drive scoring, upTo turned out to be the integer value of no more docs for that query, and there was no pruning either.

  1> Keeping - target: 151 upTo: 2147483647 minCompScore: 0.15862586 maxScore: 2.1560352
  1> Keeping - target: 152 upTo: 2147483647 minCompScore: 0.17629458 maxScore: 2.1560352
  1> Keeping - target: 159 upTo: 2147483647 minCompScore: 0.18669213 maxScore: 2.1560352
  1> Keeping - target: 160 upTo: 2147483647 minCompScore: 0.20146967 maxScore: 2.1560352
  1> Keeping - target: 161 upTo: 2147483647 minCompScore: 0.21165837 maxScore: 2.1560352
  1> Keeping - target: 167 upTo: 2147483647 minCompScore: 0.21696068 maxScore: 2.1560352
  1> Keeping - target: 170 upTo: 2147483647 minCompScore: 0.23223151 maxScore: 2.1560352
  1> Keeping - target: 171 upTo: 2147483647 minCompScore: 0.26476994 maxScore: 2.1560352
  1> Keeping - target: 172 upTo: 2147483647 minCompScore: 0.27027848 maxScore: 2.1560352
  1> Keeping - target: 174 upTo: 2147483647 minCompScore: 0.27216592 maxScore: 2.1560352
  1> Keeping - target: 175 upTo: 2147483647 minCompScore: 0.27311972 maxScore: 2.1560352
  1> Keeping - target: 176 upTo: 2147483647 minCompScore: 0.28871515 maxScore: 2.1560352
  1> Keeping - target: 179 upTo: 2147483647 minCompScore: 0.2955102 maxScore: 2.1560352
  1> Keeping - target: 185 upTo: 2147483647 minCompScore: 0.31796065 maxScore: 2.1560352
  1> Keeping - target: 187 upTo: 2147483647 minCompScore: 0.32868698 maxScore: 2.1560352
  1> Keeping - target: 189 upTo: 2147483647 minCompScore: 0.33091953 maxScore: 2.1560352
  1> Keeping - target: 190 upTo: 2147483647 minCompScore: 0.353465 maxScore: 2.1560352
  1> Keeping - target: 191 upTo: 2147483647 minCompScore: 0.36112097 maxScore: 2.1560352
  1> Keeping - target: 194 upTo: 2147483647 minCompScore: 0.3743663 maxScore: 2.1560352
  1> Keeping - target: 195 upTo: 2147483647 minCompScore: 0.38150516 maxScore: 2.1560352
  1> Keeping - target: 197 upTo: 2147483647 minCompScore: 0.39127985 maxScore: 2.1560352
  1> Keeping - target: 198 upTo: 2147483647 minCompScore: 0.39738885 maxScore: 2.1560352
  1> Keeping - target: 208 upTo: 2147483647 minCompScore: 0.39789525 maxScore: 2.1560352
  1> Keeping - target: 216 upTo: 2147483647 minCompScore: 0.41046718 maxScore: 2.1560352
  1> Keeping - target: 219 upTo: 2147483647 minCompScore: 0.4148362 maxScore: 2.1560352
  1> Keeping - target: 224 upTo: 2147483647 minCompScore: 0.4204301 maxScore: 2.1560352
  1> Keeping - target: 227 upTo: 2147483647 minCompScore: 0.43892494 maxScore: 2.1560352
  1> Keeping - target: 234 upTo: 2147483647 minCompScore: 0.4551008 maxScore: 2.1560352
  1> Keeping - target: 235 upTo: 2147483647 minCompScore: 0.47162262 maxScore: 2.1560352
  1> Keeping - target: 238 upTo: 2147483647 minCompScore: 0.47382453 maxScore: 2.1560352
  1> Keeping - target: 239 upTo: 2147483647 minCompScore: 0.48034182 maxScore: 2.1560352
  1> Keeping - target: 243 upTo: 2147483647 minCompScore: 0.48332027 maxScore: 2.1560352
  1> Keeping - target: 245 upTo: 2147483647 minCompScore: 0.4903498 maxScore: 2.1560352
  1> Keeping - target: 246 upTo: 2147483647 minCompScore: 0.5026699 maxScore: 2.1560352
  1> Keeping - target: 247 upTo: 2147483647 minCompScore: 0.50923806 maxScore: 2.1560352
  1> Keeping - target: 249 upTo: 2147483647 minCompScore: 0.50958425 maxScore: 2.1560352
  1> Keeping - target: 259 upTo: 2147483647 minCompScore: 0.51977855 maxScore: 2.1560352
  1> Keeping - target: 265 upTo: 2147483647 minCompScore: 0.5250303 maxScore: 2.1560352
  1> Keeping - target: 266 upTo: 2147483647 minCompScore: 0.53061455 maxScore: 2.1560352
  1> Keeping - target: 267 upTo: 2147483647 minCompScore: 0.53076476 maxScore: 2.1560352
  1> Keeping - target: 269 upTo: 2147483647 minCompScore: 0.53151745 maxScore: 2.1560352
  1> Keeping - target: 275 upTo: 2147483647 minCompScore: 0.56218606 maxScore: 2.1560352
  1> Keeping - target: 279 upTo: 2147483647 minCompScore: 0.57041436 maxScore: 2.1560352
  1> Keeping - target: 280 upTo: 2147483647 minCompScore: 0.58023125 maxScore: 2.1560352
  1> Keeping - target: 296 upTo: 2147483647 minCompScore: 0.592781 maxScore: 2.1560352
  1> Keeping - target: 298 upTo: 2147483647 minCompScore: 0.5975618 maxScore: 2.1560352
  1> Keeping - target: 314 upTo: 2147483647 minCompScore: 0.5996211 maxScore: 2.1560352
  1> Keeping - target: 318 upTo: 2147483647 minCompScore: 0.6027052 maxScore: 2.1560352
  1> Keeping - target: 319 upTo: 2147483647 minCompScore: 0.6080974 maxScore: 2.1560352
  1> Keeping - target: 329 upTo: 2147483647 minCompScore: 0.6116286 maxScore: 2.1560352
  1> Keeping - target: 335 upTo: 2147483647 minCompScore: 0.6127903 maxScore: 2.1560352
  1> Keeping - target: 336 upTo: 2147483647 minCompScore: 0.6186264 maxScore: 2.1560352
  1> Keeping - target: 339 upTo: 2147483647 minCompScore: 0.62189597 maxScore: 2.1560352
  1> Keeping - target: 342 upTo: 2147483647 minCompScore: 0.63768846 maxScore: 2.1560352
  1> Keeping - target: 346 upTo: 2147483647 minCompScore: 0.63877517 maxScore: 2.1560352
  1> Keeping - target: 352 upTo: 2147483647 minCompScore: 0.66235477 maxScore: 2.1560352
  1> Keeping - target: 354 upTo: 2147483647 minCompScore: 0.6642242 maxScore: 2.1560352
  1> Keeping - target: 355 upTo: 2147483647 minCompScore: 0.6710866 maxScore: 2.1560352
  1> Keeping - target: 358 upTo: 2147483647 minCompScore: 0.6718084 maxScore: 2.1560352
  1> Keeping - target: 362 upTo: 2147483647 minCompScore: 0.67271274 maxScore: 2.1560352
  1> Keeping - target: 365 upTo: 2147483647 minCompScore: 0.6831766 maxScore: 2.1560352
  1> Keeping - target: 367 upTo: 2147483647 minCompScore: 0.683592 maxScore: 2.1560352
  1> Keeping - target: 371 upTo: 2147483647 minCompScore: 0.6896402 maxScore: 2.1560352
...

For the implementation that uses lowest Impacts#getDocIdUpTo across all fields, we would get lower upTo and max score in general, and hence some docs do get pruned.

  1> Keeping - target: 152 upTo: 195 minCompScore: 0.17629458 maxScore: 1.3767262
  1> Keeping - target: 159 upTo: 195 minCompScore: 0.18669213 maxScore: 1.3767262
...
  1> Keeping - target: 219 upTo: 436 minCompScore: 0.4148362 maxScore: 1.7731527
  1> Keeping - target: 224 upTo: 436 minCompScore: 0.4204301 maxScore: 1.7731527
...
  1> Keeping - target: 764 upTo: 902 minCompScore: 0.9138265 maxScore: 1.65851
  1> Keeping - target: 767 upTo: 902 minCompScore: 0.91438395 maxScore: 1.65851
  1> Keeping - target: 779 upTo: 902 minCompScore: 0.91606015 maxScore: 1.65851
...
  1> Keeping - target: 1278 upTo: 1384 minCompScore: 1.0459963 maxScore: 1.5856887
  1> Keeping - target: 1280 upTo: 1384 minCompScore: 1.0520688 maxScore: 1.5856887
...
  1> Keeping - target: 7144 upTo: 7227 minCompScore: 1.428069 maxScore: 1.7997875
  1> Keeping - target: 7228 upTo: 7453 minCompScore: 1.4284092 maxScore: 1.4454873
  1> Skipping - target: 7454 upTo: 7702 minCompScore: 1.4284092 maxScore: 1.3662372
  1> Keeping - target: 7703 upTo: 7921 minCompScore: 1.4284092 maxScore: 1.6103871
  1> Keeping - target: 7922 upTo: 8131 minCompScore: 1.4284092 maxScore: 1.5153265
...
  1> Keeping - target: 17039 upTo: 17327 minCompScore: 1.5600302 maxScore: 1.6949782
  1> Keeping - target: 17328 upTo: 17873 minCompScore: 1.5600302 maxScore: 1.8440268
  1> Keeping - target: 17874 upTo: 18438 minCompScore: 1.5600302 maxScore: 1.8541759
  1> Skipping - target: 18439 upTo: 18774 minCompScore: 1.5600302 maxScore: 1.5445734
  1> Keeping - target: 18775 upTo: 19012 minCompScore: 1.5600302 maxScore: 2.1527436
  1> Keeping - target: 19001 upTo: 19012 minCompScore: 1.5605642 maxScore: 2.1527436
...
  1> Keeping - target: 19408 upTo: 19541 minCompScore: 1.5636915 maxScore: 1.927903
  1> Skipping - target: 19542 upTo: 19834 minCompScore: 1.5636915 maxScore: 1.5145609
  1> Keeping - target: 19835 upTo: 20077 minCompScore: 1.5636915 maxScore: 1.699799
  1> Keeping - target: 20066 upTo: 20077 minCompScore: 1.5694537 maxScore: 1.699799
...
  1> Keeping - target: 27159 upTo: 27396 minCompScore: 1.6439995 maxScore: 2.1519704
  1> Keeping - target: 27182 upTo: 27396 minCompScore: 1.6449014 maxScore: 2.1519704
  1> Keeping - target: 27397 upTo: 27627 minCompScore: 1.6449014 maxScore: 2.1505308
  1> Keeping - target: 27500 upTo: 27627 minCompScore: 1.6467081 maxScore: 2.1505308
  1> Skipping - target: 27628 upTo: 27859 minCompScore: 1.6467081 maxScore: 1.5844319
  1> Keeping - target: 27860 upTo: 28118 minCompScore: 1.6467081 maxScore: 1.7705853
  1> Keeping - target: 27976 upTo: 28118 minCompScore: 1.6487088 maxScore: 1.7705853
  1> Skipping - target: 28119 upTo: 28370 minCompScore: 1.6487088 maxScore: 1.576628
  1> Keeping - target: 28371 upTo: 28619 minCompScore: 1.6487088 maxScore: 2.0417855
  1> Keeping - target: 28620 upTo: 28954 minCompScore: 1.6487088 maxScore: 2.1356478

@jpountz
Copy link
Contributor

jpountz commented Dec 3, 2021

These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)

Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a SynonymImpactsSource, rather than low-level utility methods?

@zacharymorn
Copy link
Contributor Author

These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)

No problem! It's actually quite enjoyable for me to explore different approaches and compare their performance characteristics. I may also miss certain aspects as well, so this discussion helps to verify my understanding also! I've re-implemented the other approach in cbbd28b.

Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a SynonymImpactsSource, rather than low-level utility methods?

Yeah I did give that a try earlier, but I ended up with the utility methods approach as I saw some potential issue. I implemented it again in 502d44e.

If my understanding was correct, the intention of using higher-level abstraction there was to reduce code duplication by using a loop over Map<Field, SynonymImpactsSource> or even Map<Field, SynonymImpacts> in the numLevels, getDocIdUpTo and (in particular) getImpacts implementations in CombinedFieldsQuery? However, I noticed that for getImpacts, it doesn't seems to be easily decomposable / delegate-able to getImpacts of SynonymImpactsSource or SynonymImpacts. As that method takes in a level and internally looks up a docIdUpTo of the field to get a list of impacts with that boundary, for CombinedFieldsQuery we actually need a docIdUpTo that would be valid across all fields, hence it couldn't be abstracted away in the loop 502d44e#diff-62e151fca32e7560ef90ee8eab1adaffac3073f23701aaf0c17a02e250727ed3R547-R550 . Given this restriction, I feel using utility methods may work better in removing duplication in code?

@zacharymorn
Copy link
Contributor Author

These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)

No problem! It's actually quite enjoyable for me to explore different approaches and compare their performance characteristics. I may also miss certain aspects as well, so this discussion helps to verify my understanding also! I've re-implemented the other approach in cbbd28b.

Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a SynonymImpactsSource, rather than low-level utility methods?

Yeah I did give that a try earlier, but I ended up with the utility methods approach as I saw some potential issue. I implemented it again in 502d44e.

If my understanding was correct, the intention of using higher-level abstraction there was to reduce code duplication by using a loop over Map<Field, SynonymImpactsSource> or even Map<Field, SynonymImpacts> in the numLevels, getDocIdUpTo and (in particular) getImpacts implementations in CombinedFieldsQuery? However, I noticed that for getImpacts, it doesn't seems to be easily decomposable / delegate-able to getImpacts of SynonymImpactsSource or SynonymImpacts. As that method takes in a level and internally looks up a docIdUpTo of the field to get a list of impacts with that boundary, for CombinedFieldsQuery we actually need a docIdUpTo that would be valid across all fields, hence it couldn't be abstracted away in the loop 502d44e#diff-62e151fca32e7560ef90ee8eab1adaffac3073f23701aaf0c17a02e250727ed3R547-R550 . Given this restriction, I feel using utility methods may work better in removing duplication in code?

Hi @jpountz , just want to circle back to this discussion and see if you may have further suggestions to the above analysis?

@zacharymorn
Copy link
Contributor Author

Hi @jpountz , just want to check again to see if I have addressed your concern with the utility approach?

To better illustrate the potential issue I'm considering with the higher level abstraction approach, I've copied over some key code snippets here:

If we were to extract out SynonymImpactsSource (from SynonymQuery), it may look something like this:

public class SynonymImpactsSource implements ImpactsSource {

  private final ImpactsEnum[] impactsEnums;
  private final Impacts[] impacts;
  private final float[] boosts;
  private Impacts lead;

  public SynonymImpactsSource(ImpactsEnum[] impactsEnums, float[] boosts) {
    ...
  }

  @Override
  public Impacts getImpacts() throws IOException {
    // Use the impacts that have the lower next boundary as a lead.
    // It will decide on the number of levels and the block boundaries.
    if (lead == null) {
      Impacts tmpLead = null;
      for (int i = 0; i < impactsEnums.length; ++i) {
        impacts[i] = impactsEnums[i].getImpacts();
        if (tmpLead == null || impacts[i].getDocIdUpTo(0) < tmpLead.getDocIdUpTo(0)) {
          tmpLead = impacts[i];
        }
      }
      lead = tmpLead;
    }
    return new Impacts() {

      @Override
      public int numLevels() {
        // Delegate to the lead
        return lead.numLevels();
      }

      @Override
      public int getDocIdUpTo(int level) {
        // Delegate to the lead
        return lead.getDocIdUpTo(level);
      }

      @Override
      public List<Impact> getImpacts(int level) {
        final int docIdUpTo = getDocIdUpTo(level); 
        return ImpactsMergingUtils.mergeImpactsPerField(
            impactsEnums, impacts, boosts, docIdUpTo, false);
      }
    };
  }

  @Override
  public void advanceShallow(int target) throws IOException {
    for (ImpactsEnum impactsEnum : impactsEnums) {
      if (impactsEnum.docID() < target) {
        impactsEnum.advanceShallow(target);
      }
    }
  }

  public Impacts[] impacts() {
    return impacts;
  }
}

However, the above SynonymImpactsSource#getImpacts(level) may not be directly usable for CombinedFieldsQuery, as illustrated below

Impacts CombinedFieldsQuery$ImpactsSource#getImpacts {
   for (entry : Map<Field, SynonymImpactsSource>) {
        // this is potentially problematic, as this methods only takes in level info, and it internally looks up a field-specific docIdUpTo to get valid impacts (see definition above). However, for CombinedFieldQuery, docIdUpTo may be different among different fields with same level 
        combine SynonymImpactsSource.getImpacts(level) 
   }
  return combined Impacts
}

Hence we may actually need to change some APIs if we were to make SynonymImpactsSource to work in CombinedFieldQuery use case ?

Copy link

github-actions bot commented Jan 9, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants