Reduce frequencies buffer size when they are not needed #12954

easyice · 2023-12-18T09:19:12Z

Like #12832, maybe we can consider not using a fixed 128-size buffer for frequencies.

jpountz · 2023-12-19T10:15:35Z

I wonder if there is a performance impact, since this is moving a condition from something that runs once per block of 128 docs to something that map run on every doc.

easyice · 2023-12-19T14:26:58Z

Ohhh.. you said makes sense, i will check it. Thank you Adrien!

easyice · 2023-12-20T16:07:41Z

I took several hours to confirm the results, the benchmark shows it became faster, this exceeded my expectation, we think the speedup is due to remove the loop that initializes the freqBuffer to 1 in reset() like below:

      if (indexHasFreq == false || needsFreq == false) {
        for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) {
          freqBuffer[i] = 1;
        }
      }

Since if we always allocate the 128-size freqBuffer for this PR, the benchmark shows it still has a speedup. therefore, performance improvement has no relevance to reducing memory allocation. so maybe we can consider an other approach: try to avoid the for-loop in reset() if the instance can be reused. thanks for the suggestions from @gf2121 when i investigating the cause of the performance speedup.

Benchmark output for the PR(using wikimediumall):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                HighSloppyPhrase        0.38      (5.3%)        0.37      (4.6%)   -1.6% ( -10% -    8%) 0.323
                         MedTerm      226.03      (4.8%)      223.08      (5.2%)   -1.3% ( -10% -    9%) 0.409
            HighIntervalsOrdered        2.19      (6.4%)        2.17      (6.5%)   -0.9% ( -12% -   12%) 0.676
                 MedSloppyPhrase       18.52      (2.7%)       18.39      (2.5%)   -0.7% (  -5% -    4%) 0.402
                          Fuzzy2       36.10      (1.8%)       35.86      (1.6%)   -0.7% (  -3% -    2%) 0.219
                          Fuzzy1       43.40      (1.6%)       43.16      (1.7%)   -0.6% (  -3% -    2%) 0.276
                         Respell       21.69      (1.8%)       21.58      (1.8%)   -0.5% (  -4% -    3%) 0.375
                         LowTerm      232.03      (3.0%)      231.08      (2.9%)   -0.4% (  -6% -    5%) 0.659
                 LowSloppyPhrase       18.26      (2.0%)       18.20      (2.1%)   -0.3% (  -4% -    3%) 0.660
                        HighTerm      267.11      (5.2%)      266.50      (5.6%)   -0.2% ( -10% -   11%) 0.893
                    HighSpanNear        1.85      (5.7%)        1.84      (6.7%)   -0.2% ( -11% -   12%) 0.935
                    OrHighNotLow      167.52      (5.7%)      167.26      (5.6%)   -0.2% ( -10% -   11%) 0.931
            HighTermTitleBDVSort        1.90      (3.7%)        1.90      (4.5%)   -0.1% (  -7% -    8%) 0.915
             MedIntervalsOrdered        7.07      (3.4%)        7.06      (3.8%)   -0.1% (  -7% -    7%) 0.910
                     MedSpanNear       24.97      (2.1%)       24.94      (2.7%)   -0.1% (  -4% -    4%) 0.874
                      HighPhrase       10.67      (6.0%)       10.66      (5.7%)   -0.1% ( -11% -   12%) 0.950
                       LowPhrase        4.70      (4.0%)        4.70      (3.8%)   -0.0% (  -7% -    8%) 0.979
                    OrHighNotMed      130.98      (6.2%)      131.01      (6.1%)    0.0% ( -11% -   13%) 0.989
                   OrNotHighHigh      171.61      (5.4%)      171.67      (5.3%)    0.0% ( -10% -   11%) 0.984
             LowIntervalsOrdered       28.65      (4.3%)       28.68      (4.3%)    0.1% (  -8% -    9%) 0.947
                      OrHighHigh       18.94      (2.9%)       19.00      (3.6%)    0.3% (  -5% -    6%) 0.766
                   OrHighNotHigh      125.97      (5.5%)      126.41      (6.0%)    0.3% ( -10% -   12%) 0.848
                    OrNotHighMed      181.48      (4.0%)      182.38      (3.6%)    0.5% (  -6% -    8%) 0.679
                     LowSpanNear        6.89      (2.5%)        6.93      (3.2%)    0.6% (  -5% -    6%) 0.516
                       MedPhrase      110.79      (2.8%)      111.45      (2.9%)    0.6% (  -5% -    6%) 0.515
                       OrHighMed       38.51      (2.4%)       38.79      (2.1%)    0.7% (  -3% -    5%) 0.311
                      AndHighMed       40.73      (2.4%)       41.06      (2.5%)    0.8% (  -4% -    5%) 0.304
                      TermDTSort       74.72      (4.1%)       75.32      (2.6%)    0.8% (  -5% -    7%) 0.460
                     AndHighHigh       10.24      (5.6%)       10.33      (3.9%)    0.8% (  -8% -   11%) 0.600
               HighTermMonthSort     1071.18      (2.9%)     1079.84      (4.5%)    0.8% (  -6% -    8%) 0.499
                      AndHighLow      167.91      (5.2%)      170.10      (5.6%)    1.3% (  -9% -   12%) 0.446
                          IntNRQ       13.84      (4.0%)       14.05      (3.3%)    1.5% (  -5% -    9%) 0.208
                    OrNotHighLow      241.06      (4.5%)      244.91      (4.8%)    1.6% (  -7% -   11%) 0.276
                       OrHighLow      175.82      (4.0%)      178.70      (3.5%)    1.6% (  -5% -    9%) 0.174
           HighTermDayOfYearSort      188.33      (3.3%)      191.56      (3.7%)    1.7% (  -5% -    8%) 0.121
                         Prefix3      391.46      (4.0%)      404.16      (3.5%)    3.2% (  -4% -   11%) 0.006
               HighTermTitleSort       87.08      (3.9%)       91.84      (4.1%)    5.5% (  -2% -   14%) 0.000
                        Wildcard       33.16      (3.3%)       37.56      (3.3%)   13.3% (   6% -   20%) 0.000
                        PKLookup       92.19      (1.8%)      104.90      (2.8%)   13.8% (   9% -   18%) 0.000

jpountz · 2023-12-20T16:18:14Z

so maybe we can consider an other approach: try to avoid the for-loop in reset() if the instance can be reused

+1 this sounds like a good idea!

easyice · 2023-12-21T15:07:30Z

Here is the benchmark for new approach (avoid for-loop in reset()), the PKLookup task still has a speedup, but the speedup for Wildcard task is disappeared, i checked the memory allocate flamegraph for Wildcard task, the BlockDocsEnum.<init> call account for about 8%, so the 128-size long array allocate has performance impact on this task. this is different from PKLookup. i'm sorry i only checked the memory allocate for PKLookup yesterday.

so:

PKLookup is speedup by avoid for-loop in reset()
Wildcard is speedup by reduce frequencies buffer allocation

The Benchmark delta about new approach:

baseline: main
my_modified_version: new approach (avoid for-loop in reset())

Benchmark result

                           TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      OrHighHigh       15.94      (4.3%)       15.56      (5.1%)   -2.4% ( -11% -    7%) 0.112
                       OrHighMed       49.58      (5.0%)       48.77      (4.7%)   -1.6% ( -10% -    8%) 0.293
                   OrNotHighHigh      146.95      (3.5%)      144.64      (4.8%)   -1.6% (  -9% -    6%) 0.237
                 LowSloppyPhrase       15.35      (4.1%)       15.13      (3.6%)   -1.4% (  -8% -    6%) 0.234
                HighSloppyPhrase        5.61      (2.8%)        5.53      (3.6%)   -1.4% (  -7% -    5%) 0.176
                      AndHighMed       21.66      (3.9%)       21.43      (4.1%)   -1.1% (  -8% -    7%) 0.390
                       MedPhrase        9.56      (5.0%)        9.47      (5.0%)   -1.0% ( -10% -    9%) 0.512
                     AndHighHigh       15.95      (3.6%)       15.80      (3.7%)   -1.0% (  -8% -    6%) 0.409
                 MedSloppyPhrase        2.03      (2.7%)        2.02      (3.4%)   -0.9% (  -6% -    5%) 0.344
                       OrHighLow      179.58      (3.1%)      177.95      (3.7%)   -0.9% (  -7% -    6%) 0.399
                         LowTerm      177.24      (6.0%)      176.00      (5.2%)   -0.7% ( -11% -   11%) 0.695
           HighTermDayOfYearSort      128.70      (3.0%)      127.85      (3.0%)   -0.7% (  -6% -    5%) 0.482
                    OrHighNotLow      204.26      (3.4%)      203.10      (3.5%)   -0.6% (  -7% -    6%) 0.601
                         MedTerm      235.64      (3.5%)      234.40      (4.0%)   -0.5% (  -7% -    7%) 0.662
                    OrNotHighMed      123.29      (4.0%)      122.67      (3.9%)   -0.5% (  -8% -    7%) 0.689
                        HighTerm      286.35      (4.1%)      284.95      (4.3%)   -0.5% (  -8% -    8%) 0.713
                      TermDTSort       60.32     (10.4%)       60.06     (10.8%)   -0.4% ( -19% -   23%) 0.895
                    OrHighNotMed      135.57      (4.2%)      135.07      (5.3%)   -0.4% (  -9% -    9%) 0.806
                   OrHighNotHigh      118.48      (3.6%)      118.06      (5.5%)   -0.4% (  -9% -    9%) 0.808
             LowIntervalsOrdered       37.27      (4.7%)       37.15      (4.9%)   -0.3% (  -9% -    9%) 0.834
             MedIntervalsOrdered        2.24      (4.5%)        2.24      (5.0%)   -0.1% (  -9% -    9%) 0.928
                       LowPhrase       20.47      (3.3%)       20.44      (3.3%)   -0.1% (  -6% -    6%) 0.916
                          Fuzzy2       30.51      (2.2%)       30.50      (1.3%)   -0.0% (  -3% -    3%) 0.948
            HighIntervalsOrdered        2.55      (5.1%)        2.55      (5.4%)    0.1% (  -9% -   11%) 0.975
                          IntNRQ       24.26      (3.5%)       24.27      (3.7%)    0.1% (  -6% -    7%) 0.962
                      AndHighLow      270.76      (3.7%)      271.29      (5.0%)    0.2% (  -8% -    9%) 0.889
                      HighPhrase       32.64      (4.5%)       32.71      (4.1%)    0.2% (  -8% -    9%) 0.878
               HighTermTitleSort       72.27      (5.1%)       72.44      (4.1%)    0.2% (  -8% -    9%) 0.871
                     MedSpanNear        6.40      (4.3%)        6.42      (3.7%)    0.4% (  -7% -    8%) 0.782
            HighTermTitleBDVSort        2.31      (3.4%)        2.32      (3.5%)    0.4% (  -6% -    7%) 0.716
                    OrNotHighLow      235.32      (3.0%)      236.80      (4.7%)    0.6% (  -6% -    8%) 0.613
                         Respell       25.22      (2.7%)       25.40      (2.2%)    0.7% (  -4% -    5%) 0.353
                          Fuzzy1       12.56      (2.0%)       12.65      (2.0%)    0.7% (  -3% -    4%) 0.251
                    HighSpanNear        4.94      (4.5%)        4.98      (4.1%)    0.8% (  -7% -    9%) 0.566
                     LowSpanNear        1.10      (6.4%)        1.11      (5.0%)    0.8% (  -9% -   13%) 0.646
                        Wildcard       50.34      (2.5%)       51.06      (2.5%)    1.4% (  -3% -    6%) 0.071
               HighTermMonthSort      924.52      (3.1%)      938.49      (4.1%)    1.5% (  -5% -    8%) 0.188
                         Prefix3       44.92      (2.9%)       48.21      (3.0%)    7.3% (   1% -   13%) 0.000
                        PKLookup       86.57      (2.6%)       98.05      (2.3%)   13.3% (   8% -   18%) 0.000

I also ran a diff between two approaches:

baseline: new approach (avoid for-loop in reset())
my_modified_version: PR

Benchmark result

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
             MedIntervalsOrdered        2.03      (6.6%)        2.00      (5.9%)   -1.4% ( -13% -   11%) 0.473
                     LowSpanNear        3.44      (3.6%)        3.39      (4.1%)   -1.4% (  -8% -    6%) 0.259
            HighIntervalsOrdered        2.40      (6.2%)        2.37      (4.3%)   -1.3% ( -11% -    9%) 0.432
                        PKLookup      103.25      (3.9%)      101.88      (2.5%)   -1.3% (  -7% -    5%) 0.205
                    OrNotHighMed      164.60      (3.3%)      163.19      (3.5%)   -0.9% (  -7% -    6%) 0.421
                     AndHighHigh       19.66      (5.3%)       19.50      (5.4%)   -0.8% ( -10% -   10%) 0.635
                     MedSpanNear       21.13      (3.6%)       20.98      (3.7%)   -0.7% (  -7% -    6%) 0.541
                      HighPhrase        2.61      (4.8%)        2.59      (7.7%)   -0.7% ( -12% -   12%) 0.728
                       OrHighMed       43.75      (2.4%)       43.55      (2.4%)   -0.5% (  -5% -    4%) 0.537
                HighSloppyPhrase        3.66      (3.7%)        3.65      (5.6%)   -0.4% (  -9% -    9%) 0.770
                          Fuzzy1       33.55      (2.5%)       33.40      (2.5%)   -0.4% (  -5% -    4%) 0.583
                       LowPhrase       14.64      (4.7%)       14.58      (6.7%)   -0.4% ( -11% -   11%) 0.823
                          Fuzzy2       21.99      (2.8%)       21.91      (2.8%)   -0.4% (  -5% -    5%) 0.658
                    HighSpanNear        4.27      (4.6%)        4.25      (4.1%)   -0.3% (  -8% -    8%) 0.812
                    OrHighNotLow      139.60      (4.5%)      139.17      (5.3%)   -0.3% (  -9% -    9%) 0.845
                        HighTerm      175.84      (6.3%)      175.37      (5.4%)   -0.3% ( -11% -   12%) 0.886
                         MedTerm      239.54      (4.3%)      239.38      (3.5%)   -0.1% (  -7% -    8%) 0.959
                       MedPhrase       14.05      (4.8%)       14.04      (5.2%)   -0.0% (  -9% -   10%) 0.989
                         LowTerm      311.40      (3.8%)      312.41      (3.3%)    0.3% (  -6% -    7%) 0.771
             LowIntervalsOrdered       14.48      (3.4%)       14.55      (3.0%)    0.4% (  -5% -    7%) 0.674
                    OrNotHighLow      242.84      (2.5%)      244.11      (2.6%)    0.5% (  -4% -    5%) 0.517
                         Respell       37.79      (2.1%)       38.01      (2.3%)    0.6% (  -3% -    5%) 0.410
                   OrHighNotHigh      150.48      (4.2%)      151.44      (4.8%)    0.6% (  -8% -   10%) 0.656
                 LowSloppyPhrase        2.58      (3.8%)        2.60      (6.2%)    0.6% (  -9% -   11%) 0.690
                      OrHighHigh       18.16      (6.5%)       18.29      (4.1%)    0.7% (  -9% -   12%) 0.688
                    OrHighNotMed      180.47      (5.1%)      181.73      (5.2%)    0.7% (  -9% -   11%) 0.665
                       OrHighLow      248.00      (2.4%)      249.97      (2.7%)    0.8% (  -4% -    6%) 0.327
                      AndHighMed       53.28      (5.2%)       53.78      (3.2%)    0.9% (  -7% -    9%) 0.496
                   OrNotHighHigh      153.56      (4.7%)      155.04      (5.2%)    1.0% (  -8% -   11%) 0.540
                      AndHighLow      835.08      (2.9%)      843.49      (2.7%)    1.0% (  -4% -    6%) 0.258
                      TermDTSort       61.75      (6.7%)       62.50      (6.6%)    1.2% ( -11% -   15%) 0.563
                 MedSloppyPhrase        7.40      (4.1%)        7.50      (4.3%)    1.4% (  -6% -   10%) 0.311
                         Prefix3       67.01      (8.3%)       67.98      (8.4%)    1.4% ( -14% -   19%) 0.584
            HighTermTitleBDVSort        2.72      (5.2%)        2.77      (5.3%)    1.7% (  -8% -   12%) 0.303
                          IntNRQ       13.99      (7.0%)       14.32      (6.4%)    2.4% ( -10% -   16%) 0.258
           HighTermDayOfYearSort      119.72      (6.7%)      123.20      (7.2%)    2.9% ( -10% -   17%) 0.185
                        Wildcard        8.39      (4.7%)        8.67      (6.4%)    3.3% (  -7% -   15%) 0.061
               HighTermMonthSort     1104.51      (3.5%)     1153.19      (2.5%)    4.4% (  -1% -   10%) 0.000
               HighTermTitleSort       64.12      (5.8%)       69.63      (5.4%)    8.6% (  -2% -   21%) 0.000

code for new approach is here

the memory allocate flamegraph for wildcard, prefix3 tasks in baseline:
mem_flamegraph.zip

github-actions · 2024-01-08T12:22:06Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

easyice · 2024-01-14T08:33:45Z

Close this in favor of #12997

init

a3f23f5

easyice mentioned this pull request Jan 7, 2024

Avoid reset BlockDocsEnum#freqBuffer when indexHasFreq is false #12997

Merged

github-actions bot added the Stale label Jan 8, 2024

easyice closed this Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce frequencies buffer size when they are not needed #12954

Reduce frequencies buffer size when they are not needed #12954

easyice commented Dec 18, 2023

jpountz commented Dec 19, 2023

easyice commented Dec 19, 2023 •

edited

easyice commented Dec 20, 2023 •

edited

jpountz commented Dec 20, 2023

easyice commented Dec 21, 2023

github-actions bot commented Jan 8, 2024

easyice commented Jan 14, 2024

Reduce frequencies buffer size when they are not needed #12954

Reduce frequencies buffer size when they are not needed #12954

Conversation

easyice commented Dec 18, 2023

jpountz commented Dec 19, 2023

easyice commented Dec 19, 2023 • edited

easyice commented Dec 20, 2023 • edited

jpountz commented Dec 20, 2023

easyice commented Dec 21, 2023

github-actions bot commented Jan 8, 2024

easyice commented Jan 14, 2024

easyice commented Dec 19, 2023 •

edited

easyice commented Dec 20, 2023 •

edited