LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops#606
Conversation
| values[(int) singleValued.longValue()]++; | ||
| } | ||
| } | ||
| } else { |
There was a problem hiding this comment.
i'm also suspicious of making count() and countAll() bigger and bigger with all these specializations.
I would recommend trying to factor out these little "accumulator" loops into separate methods. They could then be shared across count() and countAll(). At least when I looked at this stuff for solr DocValuesFacets, it was needed to get performance across the various specializations there (admittedly this was a while ago, maybe compiler is smarter now):
You can see what I mean if you start here in this file and scroll down:
There was a problem hiding this comment.
That's interesting. It's tricky without being able to reproduce that nightly benchmark regression locally, but I'll give it a shot. This change as I have it appears to have no performance impact at all locally, and since it just adds code complexity, it would be silly to move forward with it except as an academic exercise to try to figure out why the nightly benchmarks are regressing. That's interesting and may be worthwhile, but I'll experiment with your idea more before moving forward. Thanks!
There was a problem hiding this comment.
Hmm, breaking out separate methods sent qps tanking in my local benchmarks. Any thoughts @rmuir? Maybe I missed the mark on what you were suggesting (entirely possible)? Here's the change: d084f85
TaskQPS baseline StdDevQPS candidate StdDev Pct diff p-value
BrowseMonthTaxoFacets 27.87 (23.7%) 11.73 (1.3%) -57.9% ( -67% - -43%) 0.000
BrowseDateTaxoFacets 21.90 (20.9%) 11.77 (7.9%) -46.2% ( -62% - -21%) 0.000
BrowseDayOfYearTaxoFacets 21.88 (21.1%) 11.83 (8.1%) -45.9% ( -62% - -21%) 0.000
BrowseRandomLabelTaxoFacets 18.22 (17.8%) 9.96 (6.8%) -45.3% ( -59% - -25%) 0.000
There was a problem hiding this comment.
I would make these simple static methods.
There was a problem hiding this comment.
See the solr example again, just like those methods there. Instance methods are probably no good in facets, there are many abstractions, probably just drives compiler more crazy.
There was a problem hiding this comment.
we still have the issue of inconsistent loop types between while and for loops? Maybe now that the accumulators are shared, it becomes more of a problem?
There was a problem hiding this comment.
also, is there really a reason anymore to have count vs countAll? They look the same to me. The only difference is livedocs check which is shown to do nothing? So if we remove livedocs specialization, and remove count-vs-countAll specialization, it should start to be a bit more manageable?
There was a problem hiding this comment.
also, is there really a reason anymore to have count vs countAll? They look the same to me. The only difference is livedocs check which is shown to do nothing? So if we remove livedocs specialization, and remove count-vs-countAll specialization, it should start to be a bit more manageable?
The only option I can think of for this is to put the liveDoc checking behind a DISI abstraction. Then the implementation could be consolidated to just operate on a DISI (which would either be backed by collected hits or by a doc value field with liveDocs validation). The nuance here is that the "standard" count functionality doesn't need to check for deleted docs as its assumes everything in the FacetsCollector is "live," whereas countAll needs to check for deleted docs. So this check needs to happen somewhere, unless liveDocs is null (indicating there are no deleted docs in the index).
I went ahead and tried this out, but am still seeing pretty horrific qps regressions.
TaskQPS baseline StdDevQPS candidate StdDev Pct diff p-value
BrowseMonthTaxoFacets 29.20 (20.0%) 13.73 (15.6%) -53.0% ( -73% - -21%) 0.000
BrowseRandomLabelTaxoFacets 18.33 (14.4%) 10.98 (10.4%) -40.1% ( -56% - -17%) 0.000
BrowseDateTaxoFacets 21.36 (16.4%) 12.98 (10.6%) -39.2% ( -56% - -14%) 0.000
BrowseDayOfYearTaxoFacets 21.33 (16.4%) 13.04 (10.5%) -38.8% ( -56% - -14%) 0.000
There was a problem hiding this comment.
I'm not sure it's really worth pursuing this further at the moment. No matter how I try to break out functionality into small, static methods, qps is regressing. The first revision of this PR that kept everything in one method but pulled the liveDocs null check out appeared to be flat. Besides trying to chase the oddity of the different results in the nightly run, I don't think there's much value in this change. (That said, I'm still really curious what was going on with that nightly benchmark regression... but I'm not sure chasing it this way is going to be very productive).
There was a problem hiding this comment.
Actually, looking at the nightly bench runs over the weekend, at least one task that we were focused on looks like it might just be noisy? https://home.apache.org/~mikemccand/lucenebench/BrowseMonthTaxoFacets.html
So maybe this is just noise after all?
|
Benchmarking this change locally shows no impact at all. So I don't think it's actually worth pushing this change unless we just want to isolate where the nightly benchmark runs are different (i.e., see if this change regresses in the nightly run). So if I were to merge this, it would just be to see the nightly benchmark results and then likely revert it back out since it just adds complexity with no apparent value. So I won't merge it righ tnow. |
…ving the liveDocs null check outside the loops
…e-loops where possible)
47a7be4 to
52bf236
Compare
This change attempts to bring in the other piece of the LUCENE-10350 change without the regressions. See LUCENE-10374 for more details.