New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support getting counts from "association" facets [LUCENE-10246] #11282
Comments
Greg Miller (@gsmiller) (migrated from JIRA) FYI @msokolov, opened this issue based on your comments in PR #264 |
Michael Sokolov (@msokolov) (migrated from JIRA) Thanks, Greg! I think we might also want an option to report the aggregated weights, rather than the counts. Consider merging multiple such facet responses across a sharded index; you'd need to preserve the aggregated weights in order to properly rank the facets across the entire index. |
Greg Miller (@gsmiller) (migrated from JIRA) @msokolov yeah, that's a good point. This is something users can do today (get back aggregated weights) and it's definitely something we need to preserve. Sounds like there could be good use-cases though for wanting both counts and weights. |
Rushabh Shah (@shahrs87) (migrated from JIRA) @gsmiller @msokolov |
Greg Miller (@gsmiller) (migrated from JIRA) @shahrs87 I'd start by becoming familiar with the existing "association facet" implementations ( The idea with this Jira is to be able to convey both an aggregated weight and the count associated with a label. The best way to do that without creating a weird API for non-association cases is something that will probably take a little thought. Should we just put another "count" field in So beyond understanding what's currently there, I think the next step is to think about the right way to evolve the API that doesn't create a weird interaction for non-association faceting, especially since those are more commonly used. Please reach out here as you have questions and I'll do my best to answer in a timely fashion. Thanks for having a look at this! |
This is a large change, refactoring most of the taxonomy facets code and changing internal behaviour, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets. 1. Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in #12553. 2. As a consequence, introduce sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of #12576. 3. Compute counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses #11282. 4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes #12585. 5. Introduce the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in #12546. 6. Don't change the API. The only change in behaviour users should notice is the fix for non-positive aggregation values, which were previously discarded. 7. Add tests which were missing for sparse/dense values and non-positive aggregations.
This is a large change, refactoring most of the taxonomy facets code and changing internal behaviour, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets. 1. Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in apache#12553. 2. As a consequence, introduce sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of apache#12576. 3. Compute counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses apache#11282. 4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes apache#12585. 5. Introduce the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in apache#12546. 6. Don't change the API. The only change in behaviour users should notice is the fix for non-positive aggregation values, which were previously discarded. 7. Add tests which were missing for sparse/dense values and non-positive aggregations.
#12966 (#13358) Reduce duplication in taxonomy facets; always do counts (#12966) This is a large change, refactoring most of the taxonomy facets code and changing internal behaviour, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets. 1. Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in #12553. 2. As a consequence, introduce sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of #12576. 3. Compute counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses #11282. 4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes #12585. 5. Introduce the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in #12546. 6. Don't change the API. The only change in behaviour users should notice is the fix for non-positive aggregation values, which were previously discarded. 7. Add tests which were missing for sparse/dense values and non-positive aggregations.
We have these nice "association" facet implementations today that aggregate "weights" from the docs that facet over, but they don't keep track of counts. So the user can get "top-n" values for a dim by aggregated weight (great!), but can't know how many docs matched each value. It would be nice to support this so users could show the top-n values but also show counts associated with each.
Migrated from LUCENE-10246 by Greg Miller (@gsmiller), updated Jul 01 2022
The text was updated successfully, but these errors were encountered: