Skip to content

LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops#606

Closed
gsmiller wants to merge 5 commits into
apache:mainfrom
gsmiller:LUCENE-10380-move-livedoc-check
Closed

LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops#606
gsmiller wants to merge 5 commits into
apache:mainfrom
gsmiller:LUCENE-10380-move-livedoc-check

Conversation

@gsmiller
Copy link
Copy Markdown
Contributor

This change attempts to bring in the other piece of the LUCENE-10350 change without the regressions. See LUCENE-10374 for more details.

values[(int) singleValued.longValue()]++;
}
}
} else {
Copy link
Copy Markdown
Member

@rmuir rmuir Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm also suspicious of making count() and countAll() bigger and bigger with all these specializations.

I would recommend trying to factor out these little "accumulator" loops into separate methods. They could then be shared across count() and countAll(). At least when I looked at this stuff for solr DocValuesFacets, it was needed to get performance across the various specializations there (admittedly this was a while ago, maybe compiler is smarter now):

You can see what I mean if you start here in this file and scroll down:

https://github.com/apache/solr/blob/0f3893b8e08c7aaa81addda926303f7a0c6ee18c/solr/core/src/java/org/apache/solr/request/DocValuesFacets.java#L262

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. It's tricky without being able to reproduce that nightly benchmark regression locally, but I'll give it a shot. This change as I have it appears to have no performance impact at all locally, and since it just adds code complexity, it would be silly to move forward with it except as an academic exercise to try to figure out why the nightly benchmarks are regressing. That's interesting and may be worthwhile, but I'll experiment with your idea more before moving forward. Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, breaking out separate methods sent qps tanking in my local benchmarks. Any thoughts @rmuir? Maybe I missed the mark on what you were suggesting (entirely possible)? Here's the change: d084f85

                            TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff p-value
           BrowseMonthTaxoFacets       27.87     (23.7%)       11.73      (1.3%)  -57.9% ( -67% -  -43%) 0.000
            BrowseDateTaxoFacets       21.90     (20.9%)       11.77      (7.9%)  -46.2% ( -62% -  -21%) 0.000
       BrowseDayOfYearTaxoFacets       21.88     (21.1%)       11.83      (8.1%)  -45.9% ( -62% -  -21%) 0.000
     BrowseRandomLabelTaxoFacets       18.22     (17.8%)        9.96      (6.8%)  -45.3% ( -59% -  -25%) 0.000

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make these simple static methods.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the solr example again, just like those methods there. Instance methods are probably no good in facets, there are many abstractions, probably just drives compiler more crazy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still have the issue of inconsistent loop types between while and for loops? Maybe now that the accumulators are shared, it becomes more of a problem?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, is there really a reason anymore to have count vs countAll? They look the same to me. The only difference is livedocs check which is shown to do nothing? So if we remove livedocs specialization, and remove count-vs-countAll specialization, it should start to be a bit more manageable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, is there really a reason anymore to have count vs countAll? They look the same to me. The only difference is livedocs check which is shown to do nothing? So if we remove livedocs specialization, and remove count-vs-countAll specialization, it should start to be a bit more manageable?

The only option I can think of for this is to put the liveDoc checking behind a DISI abstraction. Then the implementation could be consolidated to just operate on a DISI (which would either be backed by collected hits or by a doc value field with liveDocs validation). The nuance here is that the "standard" count functionality doesn't need to check for deleted docs as its assumes everything in the FacetsCollector is "live," whereas countAll needs to check for deleted docs. So this check needs to happen somewhere, unless liveDocs is null (indicating there are no deleted docs in the index).

I went ahead and tried this out, but am still seeing pretty horrific qps regressions.

                            TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff p-value
           BrowseMonthTaxoFacets       29.20     (20.0%)       13.73     (15.6%)  -53.0% ( -73% -  -21%) 0.000
     BrowseRandomLabelTaxoFacets       18.33     (14.4%)       10.98     (10.4%)  -40.1% ( -56% -  -17%) 0.000
            BrowseDateTaxoFacets       21.36     (16.4%)       12.98     (10.6%)  -39.2% ( -56% -  -14%) 0.000
       BrowseDayOfYearTaxoFacets       21.33     (16.4%)       13.04     (10.5%)  -38.8% ( -56% -  -14%) 0.000

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's really worth pursuing this further at the moment. No matter how I try to break out functionality into small, static methods, qps is regressing. The first revision of this PR that kept everything in one method but pulled the liveDocs null check out appeared to be flat. Besides trying to chase the oddity of the different results in the nightly run, I don't think there's much value in this change. (That said, I'm still really curious what was going on with that nightly benchmark regression... but I'm not sure chasing it this way is going to be very productive).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking at the nightly bench runs over the weekend, at least one task that we were focused on looks like it might just be noisy? https://home.apache.org/~mikemccand/lucenebench/BrowseMonthTaxoFacets.html

So maybe this is just noise after all?

@gsmiller
Copy link
Copy Markdown
Contributor Author

Benchmarking this change locally shows no impact at all. So I don't think it's actually worth pushing this change unless we just want to isolate where the nightly benchmark runs are different (i.e., see if this change regresses in the nightly run). So if I were to merge this, it would just be to see the nightly benchmark results and then likely revert it back out since it just adds complexity with no apparent value. So I won't merge it righ tnow.

                            TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff p-value
           BrowseMonthSSDVFacets       15.90     (23.7%)       15.46     (24.2%)   -2.7% ( -40% -   59%) 0.717
           BrowseMonthTaxoFacets       27.47     (26.8%)       27.05     (26.5%)   -1.5% ( -43% -   70%) 0.858
     BrowseRandomLabelSSDVFacets        9.58      (7.0%)        9.44      (4.8%)   -1.4% ( -12% -   11%) 0.455
                    OrHighNotLow     1265.12      (3.1%)     1251.37      (2.6%)   -1.1% (  -6% -    4%) 0.234
                   OrNotHighHigh      765.18      (3.2%)      757.45      (3.1%)   -1.0% (  -7% -    5%) 0.311
                    OrNotHighMed     1020.03      (3.0%)     1010.06      (3.3%)   -1.0% (  -7% -    5%) 0.327
                          IntNRQ      221.28      (1.3%)      219.28      (1.2%)   -0.9% (  -3% -    1%) 0.019
                   OrHighNotHigh      888.59      (3.8%)      880.64      (3.7%)   -0.9% (  -8% -    6%) 0.452
                    OrNotHighLow      828.68      (2.5%)      821.68      (1.7%)   -0.8% (  -4% -    3%) 0.216
            MedTermDayTaxoFacets       31.65      (4.2%)       31.41      (4.2%)   -0.7% (  -8% -    7%) 0.574
                HighSloppyPhrase        3.04      (4.7%)        3.02      (4.6%)   -0.7% (  -9% -    9%) 0.656
                    OrHighNotMed     1008.99      (3.8%)     1002.52      (3.8%)   -0.6% (  -8% -    7%) 0.597
     BrowseRandomLabelTaxoFacets       17.52     (19.6%)       17.41     (18.9%)   -0.6% ( -32% -   47%) 0.916
                      OrHighHigh       17.58      (3.7%)       17.47      (3.8%)   -0.6% (  -7% -    7%) 0.600
                       OrHighMed      137.10      (3.8%)      136.28      (4.6%)   -0.6% (  -8% -    8%) 0.655
                 LowSloppyPhrase       11.93      (3.8%)       11.88      (3.8%)   -0.4% (  -7% -    7%) 0.750
                         LowTerm     1631.30      (2.6%)     1625.23      (2.3%)   -0.4% (  -5% -    4%) 0.630
                 MedSloppyPhrase       67.24      (2.5%)       66.99      (2.6%)   -0.4% (  -5% -    4%) 0.649
                          Fuzzy1       80.85      (1.6%)       80.68      (1.8%)   -0.2% (  -3% -    3%) 0.699
                         Respell       51.74      (1.5%)       51.65      (1.7%)   -0.2% (  -3% -    3%) 0.729
                       OrHighLow      861.47      (2.9%)      860.91      (3.0%)   -0.1% (  -5% -    6%) 0.944
         AndHighMedDayTaxoFacets      110.66      (1.4%)      110.63      (1.7%)   -0.0% (  -3% -    3%) 0.958
                     MedSpanNear       50.07      (3.7%)       50.08      (3.0%)    0.0% (  -6% -    6%) 0.996
            HighTermTitleBDVSort       67.00     (23.3%)       67.01     (17.4%)    0.0% ( -32% -   53%) 0.998
                    HighSpanNear       10.62      (3.7%)       10.62      (2.9%)    0.0% (  -6% -    6%) 0.985
                          Fuzzy2       71.03      (1.5%)       71.06      (1.7%)    0.0% (  -3% -    3%) 0.953
            BrowseDateTaxoFacets       20.83     (22.2%)       20.84     (22.8%)    0.1% ( -36% -   57%) 0.992
                       LowPhrase      604.18      (2.9%)      604.76      (2.7%)    0.1% (  -5% -    5%) 0.914
       BrowseDayOfYearTaxoFacets       20.82     (22.4%)       20.85     (23.0%)    0.1% ( -36% -   58%) 0.984
             MedIntervalsOrdered       79.55      (5.1%)       79.68      (4.5%)    0.2% (  -8% -   10%) 0.915
                      HighPhrase      163.13      (2.8%)      163.42      (2.6%)    0.2% (  -5% -    5%) 0.837
          OrHighMedDayTaxoFacets        7.74      (5.3%)        7.76      (4.4%)    0.2% (  -8% -   10%) 0.888
       BrowseDayOfYearSSDVFacets       12.12     (14.1%)       12.17     (13.8%)    0.4% ( -24% -   32%) 0.930
             LowIntervalsOrdered      187.35      (8.7%)      188.22      (7.6%)    0.5% ( -14% -   18%) 0.857
                         MedTerm     2446.71      (4.1%)     2458.80      (4.8%)    0.5% (  -8% -    9%) 0.728
                      AndHighLow     1427.63      (2.7%)     1435.06      (2.5%)    0.5% (  -4% -    5%) 0.527
        AndHighHighDayTaxoFacets        9.02      (1.7%)        9.06      (2.4%)    0.5% (  -3% -    4%) 0.415
                       MedPhrase       34.65      (2.8%)       34.84      (2.6%)    0.6% (  -4% -    6%) 0.509
                     LowSpanNear       33.70      (5.5%)       33.91      (4.9%)    0.6% (  -9% -   11%) 0.705
            HighIntervalsOrdered       10.85      (8.8%)       10.92      (7.9%)    0.7% ( -14% -   19%) 0.803
                        HighTerm     1543.65      (4.2%)     1556.45      (4.4%)    0.8% (  -7% -    9%) 0.543
                      AndHighMed       83.36      (4.0%)       84.36      (4.2%)    1.2% (  -6% -    9%) 0.356
                        PKLookup      169.56      (3.5%)      171.59      (3.4%)    1.2% (  -5% -    8%) 0.275
                     AndHighHigh       71.11      (4.2%)       72.08      (4.5%)    1.4% (  -7% -   10%) 0.324
               HighTermMonthSort       96.47     (12.4%)       98.07     (18.7%)    1.7% ( -26% -   37%) 0.741
                        Wildcard      112.10      (4.5%)      114.33      (4.8%)    2.0% (  -7% -   11%) 0.180
                      TermDTSort       98.09     (12.0%)      100.68     (18.9%)    2.6% ( -25% -   38%) 0.598
                         Prefix3      206.94     (12.2%)      213.82     (10.4%)    3.3% ( -17% -   29%) 0.351
           HighTermDayOfYearSort       83.83     (17.8%)       86.78     (24.5%)    3.5% ( -32% -   55%) 0.603

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants