Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute multiple aggregations in one iteration of the match-set #12546

Open
stefanvodita opened this issue Sep 9, 2023 · 1 comment
Open

Comments

@stefanvodita
Copy link
Contributor

Description

When a user knows that they want multiple different aggregations, they have to iterate the match-set once for each aggregation, which is inefficient. We can remedy this by enabling users to request more than one aggregation in a faceting call and then by handling all the computations in one go.

@stefanvodita
Copy link
Contributor Author

I was thinking about this issue in connection with #12553. If accumulators were generic, that would give us multi-aggregations by default.
For example, if we're setting up an AVG aggregation, we would have an accumulator which maintains a sum and one which maintains a count, so we're actually doing two aggregations. An accumulator could encapsulate an arbitrary number of aggregations computed in one go.

stefanvodita added a commit that referenced this issue Apr 5, 2024
This is a large change, refactoring most of the taxonomy facets code and changing internal behaviour, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets.

1. Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in #12553.
2. As a consequence, introduce sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of #12576.
3. Compute counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses #11282.
4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes #12585.
5. Introduce the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in #12546.
6. Don't change the API. The only change in behaviour users should notice is the fix for non-positive aggregation values, which were previously discarded.
7. Add tests which were missing for sparse/dense values and non-positive aggregations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant