-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10444: Support alternate aggregation functions in association facets #718
LUCENE-10444: Support alternate aggregation functions in association facets #718
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to me to add this capability. I wonder if the extra abstraction hurts us though in these tight loops summing up values in an array? If it does, we might want to provide a specialization for such loops as well?
return null; | ||
} | ||
|
||
if (dimConfig.multiValued) { | ||
if (dimConfig.requireDimCount) { | ||
sumValues = values[dimOrd]; | ||
aggregatedValue = values[dimOrd]; | ||
} else { | ||
// Our sum'd count is not correct, in general: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our "aggregated" count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessarily a "count" though here right? It's an aggregated weight associated with the value.
childCount++; | ||
if (count > bottomValue) { | ||
if (value > bottomValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to ensure that aggregation functions are nondecreasing? I mean min
wouldn't work very well here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. There are a number of things actually preventing us from cleanly adding something like min
. I had it originally but as I started looking at all the changes it would require, I backed off for the time being (especially since I don't have a concrete use-case in mind). One interesting challenge is that these facets implementations all assume the weights are positive values. There are a lot of > 0
checks floating around the various implementations to check whether-or-not a value had any "weight" associated with it. This makes sense when using counts, but it's weird when generally associated weights with the values. So min
started to feel a little weird and I just left it out for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, OK, so then disregard my comment above -- we should not try to tackle min
here!
Oh I missed the benchmarking you did - I guess it was on the backport PR. Looked like no significant change there, good. |
d3c44b7
to
6614900
Compare
Even though I ran benchmarks on the backport version of this change (#719), I figured it would be good to run benchmarks here as well. Below compares this patch against
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few small-ish comments. I love this approach! It generalizes our simplistic aggregation capabilities.
I agree we should not try to tackle the further generalizations needed for min
now, but maybe we can add a // TODO
capturing your comment (and the implicit "always 0" initial value) for the future?
Thanks @gsmiller.
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java
Show resolved
Hide resolved
childCount++; | ||
if (count > bottomValue) { | ||
if (value > bottomValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, OK, so then disregard my comment above -- we should not try to tackle min
here!
|
||
@Override | ||
public boolean advanceExact(int doc) throws IOException { | ||
index++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, instead of index++
shouldn't we do index = doc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow. Great catch! I lifted this from the current implementation but this is an uncaught bug. I just reproduced it with a test. I'm going to open a separate issue and fix this bug first, then merge the fix into this PR. Thanks Mike!
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyFacetAssociations.java
Show resolved
Hide resolved
@mikemccand or @msokolov, did either of you have additional feedback? It didn't really look like it beyond the pre-existing bug (which I've since addressed), but I wanted to check before merging to make sure. Thanks again for the reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gsmiller looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry I checked out. It was looking good to me at the start, and I think you addressed my concerns, plus did clean up, rebasing, etc. I thikn it's ready
Description
This change adds support for "max" aggregations (in addition to "sum") to association faceting. It does so in a way that is (somewhat) extensible for future aggregation functionality.
Solution
Replaced the existing association faceting classes that were hardcoded to "sum" with new classes that allow the user to specify an aggregation function. Note that I will open a separate PR for a backport of this that remains backwards-compatible on 9x.
Tests
Added new testing for new aggregation functionality.
I also tested the performance of this approach using my 9x backport. Results on
luceneutil
look pretty flat to me. I've posted that out put in the backport PR: #719Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.