Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize ordinal encoding for SortedSetDocValues #1010

Merged
merged 3 commits into from
Jul 13, 2022

Conversation

gsmiller
Copy link
Contributor

@gsmiller gsmiller commented Jul 7, 2022

Description (or a Jira issue link if you have one)

This follows up the work done in LUCENE-10067 by adding additional specialization for SORTED_SET doc values.

@gsmiller
Copy link
Contributor Author

gsmiller commented Jul 7, 2022

Benchmarks look good on SSDV faceting (and no regressions elsewhere). I think some new bench tasks have recently been added as well that might be relevant here, so I'll update my luceneutil and run again soon. For now, here are results on wikimediumall:

                            TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff p-value
                         Prefix3       58.57      (8.6%)       57.65      (9.9%)   -1.6% ( -18% -   18%) 0.591
               HighTermMonthSort       47.72     (28.0%)       47.26     (15.8%)   -1.0% ( -34% -   59%) 0.893
     BrowseRandomLabelSSDVFacets        2.60      (6.8%)        2.58      (5.6%)   -0.7% ( -12% -   12%) 0.729
                    HighSpanNear       17.35      (2.6%)       17.23      (4.2%)   -0.7% (  -7% -    6%) 0.544
                   OrHighNotHigh      762.36      (4.2%)      757.77      (4.8%)   -0.6% (  -9% -    8%) 0.673
                        Wildcard       27.05      (5.4%)       26.94      (5.9%)   -0.4% ( -11% -   11%) 0.820
                       LowPhrase       35.88      (2.7%)       35.80      (2.7%)   -0.2% (  -5% -    5%) 0.788
                   OrNotHighHigh      645.25      (3.1%)      644.30      (3.3%)   -0.1% (  -6% -    6%) 0.884
                         LowTerm     1793.47      (3.6%)     1792.45      (3.8%)   -0.1% (  -7% -    7%) 0.961
                    OrNotHighMed      653.99      (3.1%)      653.73      (3.2%)   -0.0% (  -6% -    6%) 0.968
                      AndHighMed       68.77      (5.3%)       68.75      (6.5%)   -0.0% ( -11% -   12%) 0.986
             LowIntervalsOrdered       51.08      (4.6%)       51.08      (4.5%)    0.0% (  -8% -    9%) 1.000
                       MedPhrase       70.46      (3.1%)       70.46      (3.3%)    0.0% (  -6% -    6%) 0.995
                    OrHighNotLow     1055.73      (3.3%)     1055.91      (4.4%)    0.0% (  -7% -    7%) 0.989
            HighIntervalsOrdered        8.03      (4.5%)        8.03      (4.6%)    0.0% (  -8% -    9%) 0.984
                     MedSpanNear       11.88      (2.4%)       11.89      (3.1%)    0.1% (  -5% -    5%) 0.926
            MedTermDayTaxoFacets       18.17      (3.6%)       18.20      (3.8%)    0.2% (  -7% -    7%) 0.891
                    OrHighNotMed      780.58      (3.3%)      781.92      (3.8%)    0.2% (  -6% -    7%) 0.877
          OrHighMedDayTaxoFacets        4.78      (4.4%)        4.79      (5.0%)    0.2% (  -8% -    9%) 0.906
        AndHighHighDayTaxoFacets        6.91      (2.3%)        6.92      (2.9%)    0.2% (  -4% -    5%) 0.828
             MedIntervalsOrdered        4.36      (3.5%)        4.37      (3.7%)    0.2% (  -6% -    7%) 0.851
                      OrHighHigh       14.24      (2.8%)       14.27      (6.4%)    0.3% (  -8% -    9%) 0.872
                          IntNRQ       33.94      (1.1%)       34.05      (1.3%)    0.3% (  -2% -    2%) 0.381
                          Fuzzy2       71.29      (1.7%)       71.55      (1.8%)    0.4% (  -3% -    3%) 0.509
                     LowSpanNear        8.79      (2.7%)        8.83      (3.2%)    0.4% (  -5% -    6%) 0.673
                          Fuzzy1       76.55      (1.7%)       76.90      (1.6%)    0.5% (  -2% -    3%) 0.377
                      AndHighLow     1077.25      (4.1%)     1082.31      (3.8%)    0.5% (  -7% -    8%) 0.706
       BrowseDayOfYearSSDVFacets        3.45      (5.8%)        3.47      (4.9%)    0.6% (  -9% -   11%) 0.722
                 LowSloppyPhrase       16.00      (1.9%)       16.10      (2.9%)    0.6% (  -4% -    5%) 0.437
                       OrHighMed       47.78      (2.0%)       48.08      (3.9%)    0.6% (  -5% -    6%) 0.527
                        HighTerm     1147.74      (4.7%)     1155.05      (4.2%)    0.6% (  -7% -    9%) 0.650
                        PKLookup      147.38      (3.6%)      148.34      (3.1%)    0.7% (  -5% -    7%) 0.537
                     AndHighHigh       19.16      (5.1%)       19.29      (6.8%)    0.7% ( -10% -   13%) 0.730
                         Respell       51.81      (1.5%)       52.15      (1.4%)    0.7% (  -2% -    3%) 0.147
                         MedTerm     1406.90      (4.4%)     1417.58      (4.2%)    0.8% (  -7% -    9%) 0.578
                 MedSloppyPhrase       28.98      (2.0%)       29.20      (2.8%)    0.8% (  -3% -    5%) 0.306
         AndHighMedDayTaxoFacets       23.12      (2.0%)       23.31      (2.3%)    0.8% (  -3% -    5%) 0.232
                      TermDTSort       78.14     (20.6%)       78.83     (20.1%)    0.9% ( -32% -   52%) 0.891
                      HighPhrase      180.25      (2.8%)      182.18      (2.6%)    1.1% (  -4% -    6%) 0.215
                       OrHighLow      317.72      (2.0%)      321.31      (3.4%)    1.1% (  -4% -    6%) 0.199
                HighSloppyPhrase        7.56      (3.9%)        7.65      (5.6%)    1.2% (  -7% -   11%) 0.429
                    OrNotHighLow      703.88      (3.0%)      713.94      (3.2%)    1.4% (  -4% -    7%) 0.142
           BrowseMonthSSDVFacets        3.92      (2.6%)        4.01      (8.1%)    2.2% (  -8% -   13%) 0.246
           HighTermDayOfYearSort       27.70     (14.3%)       28.46     (18.1%)    2.8% ( -25% -   41%) 0.591
            HighTermTitleBDVSort       59.66     (15.4%)       61.56     (15.9%)    3.2% ( -24% -   40%) 0.520
     BrowseRandomLabelTaxoFacets        5.83     (18.6%)        6.18     (13.8%)    6.1% ( -22% -   47%) 0.241
       BrowseDayOfYearTaxoFacets        6.35     (20.9%)        6.75     (17.0%)    6.3% ( -26% -   55%) 0.293
           BrowseMonthTaxoFacets        8.06     (27.7%)        8.58     (21.4%)    6.4% ( -33% -   76%) 0.413
            BrowseDateTaxoFacets        6.33     (20.8%)        6.73     (16.8%)    6.4% ( -25% -   55%) 0.280
            BrowseDateSSDVFacets        0.85     (12.7%)        0.95     (11.0%)   11.5% ( -10% -   40%) 0.002

@gsmiller gsmiller merged commit 7c35311 into apache:main Jul 13, 2022
@gsmiller gsmiller deleted the explore/specialize-ssdv-ordinals branch July 13, 2022 01:55
@jpountz
Copy link
Contributor

jpountz commented Jul 19, 2022

This yielded a big speedup on nightly benchmarks http://people.apache.org/~mikemccand/lucenebench/BrowseDateSSDVFacets.html

@jpountz
Copy link
Contributor

jpountz commented Jul 19, 2022

I pushed an annotation, it should be live on the next nightly run.

@gsmiller
Copy link
Contributor Author

gsmiller commented Jul 19, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants