Support getting counts from "association" facets [LUCENE-10246] #11282

asfimport · 2021-11-19T15:33:04Z

We have these nice "association" facet implementations today that aggregate "weights" from the docs that facet over, but they don't keep track of counts. So the user can get "top-n" values for a dim by aggregated weight (great!), but can't know how many docs matched each value. It would be nice to support this so users could show the top-n values but also show counts associated with each.

Migrated from LUCENE-10246 by Greg Miller (@gsmiller), updated Jul 01 2022

asfimport · 2021-11-19T15:34:24Z

Greg Miller (@gsmiller) (migrated from JIRA)

FYI @msokolov, opened this issue based on your comments in PR #264

asfimport · 2021-11-19T15:45:12Z

Michael Sokolov (@msokolov) (migrated from JIRA)

Thanks, Greg! I think we might also want an option to report the aggregated weights, rather than the counts. Consider merging multiple such facet responses across a sharded index; you'd need to preserve the aggregated weights in order to properly rank the facets across the entire index.

asfimport · 2021-11-19T21:59:44Z

Greg Miller (@gsmiller) (migrated from JIRA)

@msokolov yeah, that's a good point. This is something users can do today (get back aggregated weights) and it's definitely something we need to preserve. Sounds like there could be good use-cases though for wanting both counts and weights.

asfimport · 2022-06-30T23:15:53Z

Rushabh Shah (@shahrs87) (migrated from JIRA)

@gsmiller @msokolov
I am pretty new to LUCENE project and want to contribute to small jiras to improve my lucene knowledge. Can you please help me scope the work required for this patch and point me to some relevant classes in the source code. Thank you.

asfimport · 2022-07-01T00:26:15Z

Greg Miller (@gsmiller) (migrated from JIRA)

@shahrs87 I'd start by becoming familiar with the existing "association facet" implementations (TaxonomyFacetIntAssociations and TaxonomyFacetFloatAssociations as well as looking at some demo code like AssociationsFacetsExample). The API contract they implement represent results with FacetResult, which contains a list of LabelAndValue instances. LabelAndValue only models a single label along with a single numeric value. The value "usually" represents a total faceting count for a label in "non-association" facets, but with association faceting, value takes on an aggregated weight "associated" with the label.

The idea with this Jira is to be able to convey both an aggregated weight and the count associated with a label. The best way to do that without creating a weird API for non-association cases is something that will probably take a little thought. Should we just put another "count" field in LabelAndValue and have both value and count be populated with a count for non-association cases? That sounds weird.

So beyond understanding what's currently there, I think the next step is to think about the right way to evolve the API that doesn't create a weird interaction for non-association faceting, especially since those are more commonly used.

Please reach out here as you have questions and I'll do my best to answer in a timely fashion. Thanks for having a look at this!

This is a large change, refactoring most of the taxonomy facets code and changing internal behaviour, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets. 1. Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in #12553. 2. As a consequence, introduce sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of #12576. 3. Compute counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses #11282. 4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes #12585. 5. Introduce the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in #12546. 6. Don't change the API. The only change in behaviour users should notice is the fix for non-positive aggregation values, which were previously discarded. 7. Add tests which were missing for sparse/dense values and non-positive aggregations.

This is a large change, refactoring most of the taxonomy facets code and changing internal behaviour, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets. 1. Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in apache#12553. 2. As a consequence, introduce sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of apache#12576. 3. Compute counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses apache#11282. 4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes apache#12585. 5. Introduce the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in apache#12546. 6. Don't change the API. The only change in behaviour users should notice is the fix for non-positive aggregation values, which were previously discarded. 7. Add tests which were missing for sparse/dense values and non-positive aggregations.

#12966 (#13358) Reduce duplication in taxonomy facets; always do counts (#12966) This is a large change, refactoring most of the taxonomy facets code and changing internal behaviour, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets. 1. Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in #12553. 2. As a consequence, introduce sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of #12576. 3. Compute counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses #11282. 4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes #12585. 5. Introduce the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in #12546. 6. Don't change the API. The only change in behaviour users should notice is the fix for non-positive aggregation values, which were previously discarded. 7. Add tests which were missing for sparse/dense values and non-positive aggregations.

asfimport mentioned this issue Aug 24, 2022

Support alternate aggregation functions in association facets [LUCENE-10444] #11480

Closed

stefanvodita mentioned this issue Dec 22, 2023

Reduce duplication in taxonomy facets; always do counts #12966

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support getting counts from "association" facets [LUCENE-10246] #11282

Support getting counts from "association" facets [LUCENE-10246] #11282

asfimport commented Nov 19, 2021

asfimport commented Nov 19, 2021

asfimport commented Nov 19, 2021

asfimport commented Nov 19, 2021

asfimport commented Jun 30, 2022

asfimport commented Jul 1, 2022

Support getting counts from "association" facets [LUCENE-10246] #11282

Support getting counts from "association" facets [LUCENE-10246] #11282

Comments

asfimport commented Nov 19, 2021

asfimport commented Nov 19, 2021

asfimport commented Nov 19, 2021

asfimport commented Nov 19, 2021

asfimport commented Jun 30, 2022

asfimport commented Jul 1, 2022