Skip to content

LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll#605

Merged
gsmiller merged 1 commit into
apache:mainfrom
gsmiller:LUCENE-10379-count-direct-to-values
Jan 13, 2022
Merged

LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll#605
gsmiller merged 1 commit into
apache:mainfrom
gsmiller:LUCENE-10379-count-direct-to-values

Conversation

@gsmiller
Copy link
Copy Markdown
Contributor

This change just pulls out part of the optimization done (and subsequently reverted) in LUCENE-10350. The idea is to push only part of the optimization to see if we can isolate the performance regression observed in LUCENE-10350. See LUCENE-10374 for more.

…omyFacetCounts#countAll

Co-authored-by: guofeng.my <guofeng.my@bytedance.com>
@gsmiller
Copy link
Copy Markdown
Contributor Author

I'm seeing the nice performance improvement locally that @gf2121 and I both observed with LUCENE-10350, so I'm going to go ahead and merge this, which is just a portion of LUCENE-10350 (and hopefully not the portion responsible for the nightly bench regressions).

                            TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff p-value
       BrowseDayOfYearSSDVFacets       12.73     (15.5%)       12.30     (14.7%)   -3.4% ( -29% -   31%) 0.483
                        Wildcard      103.85      (9.1%)      102.09      (9.3%)   -1.7% ( -18% -   18%) 0.559
     BrowseRandomLabelSSDVFacets        9.73      (7.0%)        9.64      (7.2%)   -0.9% ( -14% -   14%) 0.690
                      OrHighHigh       45.50      (3.8%)       45.11      (3.7%)   -0.9% (  -8% -    6%) 0.466
                       OrHighMed      129.29      (4.3%)      128.54      (4.4%)   -0.6% (  -8% -    8%) 0.672
                    HighSpanNear        1.79      (5.4%)        1.79      (6.1%)   -0.4% ( -11% -   11%) 0.814
             MedIntervalsOrdered       26.95      (5.7%)       26.85      (6.0%)   -0.4% ( -11% -   12%) 0.844
                         LowTerm     2140.59      (3.8%)     2134.06      (3.3%)   -0.3% (  -7% -    7%) 0.786
                 LowSloppyPhrase       12.56      (2.9%)       12.53      (2.4%)   -0.2% (  -5% -    5%) 0.779
            HighIntervalsOrdered        7.50      (5.9%)        7.49      (6.3%)   -0.2% ( -11% -   12%) 0.923
                         Prefix3      239.09      (9.5%)      238.69      (8.9%)   -0.2% ( -16% -   20%) 0.954
                HighSloppyPhrase       27.88      (4.8%)       27.85      (4.1%)   -0.1% (  -8% -    9%) 0.952
             LowIntervalsOrdered       84.79      (4.1%)       84.75      (4.5%)   -0.0% (  -8% -    8%) 0.976
                      TermDTSort       99.64     (16.5%)       99.63     (17.0%)   -0.0% ( -28% -   40%) 0.999
                 MedSloppyPhrase       54.26      (4.8%)       54.30      (3.8%)    0.1% (  -8% -    9%) 0.964
                     AndHighHigh       55.59      (3.5%)       55.66      (4.4%)    0.1% (  -7% -    8%) 0.920
                      AndHighMed      116.21      (3.6%)      116.37      (4.5%)    0.1% (  -7% -    8%) 0.918
                     LowSpanNear       17.01      (3.2%)       17.03      (3.7%)    0.1% (  -6% -    7%) 0.893
                          Fuzzy2       48.58      (1.5%)       48.65      (1.5%)    0.2% (  -2% -    3%) 0.735
                       OrHighLow      263.27      (3.7%)      263.87      (2.9%)    0.2% (  -6% -    7%) 0.829
                          Fuzzy1      109.75      (1.4%)      110.05      (1.6%)    0.3% (  -2% -    3%) 0.566
                       LowPhrase      693.11      (2.6%)      695.08      (2.2%)    0.3% (  -4% -    5%) 0.712
                     MedSpanNear       78.47      (3.4%)       78.78      (4.0%)    0.4% (  -6% -    8%) 0.737
                         Respell       52.36      (1.0%)       52.57      (1.2%)    0.4% (  -1% -    2%) 0.251
                      HighPhrase       44.86      (3.9%)       45.05      (2.8%)    0.4% (  -6% -    7%) 0.693
            MedTermDayTaxoFacets       42.75      (5.5%)       42.93      (4.6%)    0.4% (  -9% -   11%) 0.786
                       MedPhrase      206.29      (2.6%)      207.19      (2.1%)    0.4% (  -4% -    5%) 0.559
          OrHighMedDayTaxoFacets       11.62      (3.6%)       11.67      (4.2%)    0.5% (  -7% -    8%) 0.709
                        PKLookup      169.21      (3.1%)      170.19      (3.1%)    0.6% (  -5% -    7%) 0.558
                          IntNRQ      854.72      (1.8%)      860.51      (1.8%)    0.7% (  -2% -    4%) 0.230
                      AndHighLow     1108.90      (2.8%)     1117.66      (2.9%)    0.8% (  -4% -    6%) 0.387
                         MedTerm     2707.67      (3.8%)     2730.24      (3.5%)    0.8% (  -6% -    8%) 0.472
                   OrNotHighHigh      850.92      (3.5%)      858.09      (3.7%)    0.8% (  -6% -    8%) 0.461
                    OrHighNotLow     1155.56      (4.2%)     1166.24      (3.7%)    0.9% (  -6% -    9%) 0.458
                        HighTerm     1308.41      (4.0%)     1321.26      (3.5%)    1.0% (  -6% -    8%) 0.412
                   OrHighNotHigh     1396.86      (3.2%)     1411.02      (2.0%)    1.0% (  -4% -    6%) 0.228
        AndHighHighDayTaxoFacets        4.67      (4.6%)        4.71      (4.5%)    1.0% (  -7% -   10%) 0.471
                    OrHighNotMed      822.86      (4.1%)      831.64      (3.9%)    1.1% (  -6% -    9%) 0.403
         AndHighMedDayTaxoFacets       86.23      (1.8%)       87.26      (1.8%)    1.2% (  -2% -    4%) 0.038
            HighTermTitleBDVSort       73.64     (18.2%)       74.80     (17.0%)    1.6% ( -28% -   44%) 0.777
                    OrNotHighMed      895.15      (3.5%)      909.34      (3.4%)    1.6% (  -5% -    8%) 0.145
                    OrNotHighLow     1178.85      (1.7%)     1200.20      (2.6%)    1.8% (  -2% -    6%) 0.010
           HighTermDayOfYearSort       82.17     (17.9%)       83.72     (22.0%)    1.9% ( -32% -   50%) 0.766
               HighTermMonthSort       78.32     (18.6%)       79.84     (17.7%)    1.9% ( -28% -   46%) 0.735
           BrowseMonthSSDVFacets       13.03      (8.5%)       13.85     (22.3%)    6.3% ( -22% -   40%) 0.235
       BrowseDayOfYearTaxoFacets       15.09      (6.1%)       20.45     (32.3%)   35.5% (  -2% -   78%) 0.000
     BrowseRandomLabelTaxoFacets       12.82      (5.5%)       17.41     (28.5%)   35.8% (   1% -   73%) 0.000
            BrowseDateTaxoFacets       15.04      (6.1%)       20.50     (32.5%)   36.3% (  -2% -   79%) 0.000
           BrowseMonthTaxoFacets       15.74      (8.1%)       25.59     (48.0%)   62.6% (   6% -  129%) 0.000

@gsmiller gsmiller merged commit 2f5e3c3 into apache:main Jan 13, 2022
@gsmiller gsmiller deleted the LUCENE-10379-count-direct-to-values branch January 13, 2022 17:17
@gf2121
Copy link
Copy Markdown
Contributor

gf2121 commented Jan 13, 2022

This is great! Thanks @gsmiller !

@gf2121
Copy link
Copy Markdown
Contributor

gf2121 commented Jan 14, 2022

I'm seeing BrowseMonthTaxoFacets increased 30% without any regression in last night benchmark, Thanks @gsmiller for such a great idea !

@gsmiller
Copy link
Copy Markdown
Contributor Author

@gf2121 glad it worked out. Now I just wish we could understand what exactly was causing the regression with the other part of the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants