Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support alternate aggregation functions in association facets [LUCENE-10444] #11480

Closed
asfimport opened this issue Feb 25, 2022 · 4 comments
Closed

Comments

@asfimport
Copy link

We currently only support sum aggregations in the various association facet implementations. I'd be really interested in extending the association facet implementations to support other aggregations, starting with max and min (in addition to sum}). 

I've been sketching up a prototype of this and I think I have a reasonable way to introduce this idea. Will get a PR out for feedback soon.


Migrated from LUCENE-10444 by Greg Miller (@gsmiller), resolved Apr 07 2022
Pull requests: #718, #719

@asfimport
Copy link
Author

asfimport commented Mar 2, 2022

Greg Miller (@gsmiller) (migrated from JIRA)

I've got a couple PRs coming shortly for this. I ended up only adding "max" aggregation (to the existing "sum" functionality). I had intended to also implement average and min, but there are a couple issues with doing so:

  1. Average of course requires tracking the number of data points along with a running average weight. I think we should add this incrementally when/if we tackle Support getting counts from "association" facets [LUCENE-10246] #11282. Support getting counts from "association" facets [LUCENE-10246] #11282 captures the idea of exposing both aggregated weights and counts, so it would provide the foundation needed to support average aggregations.
  2. I ran into some issues with "min" due to lots of assumptions being made in taxonomy/aggregation faceting that weights are all positive. This got me thinking that "min" aggregation might not be particularly useful.

I think "max" on its own is pretty useful. "Sum" aggregations can be heavily influenced by "long tail" effects where lots of matching documents with low weights end up dominating. "Max" has the nice property of removing this "long tail" effect in some situations (i.e., a facet value is only as good as its most highest weight document).

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit f870edf in lucene's branch refs/heads/main from Greg Miller
https://gitbox.apache.org/repos/asf?p=lucene.git;h=f870edf2fe2

LUCENE-10444: Support alternate aggregation functions in association facets (#718)

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 9e10ba0 in lucene's branch refs/heads/branch_9x from Greg Miller
https://gitbox.apache.org/repos/asf?p=lucene.git;h=9e10ba02ec3

LUCENE-10444: Support alternate aggregation functions in association facets (#719)

@asfimport
Copy link
Author

Alan Woodward (@romseygeek) (migrated from JIRA)

Bulk close for 9.2.0 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants