Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966 #13358

Merged
merged 5 commits into from May 14, 2024

Conversation

stefanvodita
Copy link
Contributor

Annoying things I had to do to backport #12966:

  1. TaxonomyFacetCounts stores the counts twice because it extends IntTaxonomyFacets. The correct way would be to extend TaxonomyFacets, but I couldn't make that change, like I could for FastTaxonomyFacets, which was marked @lucene.experimental.
  2. Increase visibility of useHashTable and rollup in TaxonomyFacets compared to main, from default to protected.
  3. Leave TopOrdAndInt/FloatQueue as they were, since they are public API, and create new queues for our purposes, TopOrdAndInt/FloatNumberQueue, marked deprecated because they will go away in 10.x.

stefanvodita and others added 4 commits May 10, 2024 16:01
This is a large change, refactoring most of the taxonomy facets code and changing internal behaviour, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets.

1. Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in apache#12553.
2. As a consequence, introduce sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of apache#12576.
3. Compute counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses apache#11282.
4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes apache#12585.
5. Introduce the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in apache#12546.
6. Don't change the API. The only change in behaviour users should notice is the fix for non-positive aggregation values, which were previously discarded.
7. Add tests which were missing for sparse/dense values and non-positive aggregations.
@stefanvodita
Copy link
Contributor Author

Despite the "annoying" bits in the description, I don't expect this backport to be controversial, but reviews are welcome! I plan to wait over the weekend and then merge.

@stefanvodita stefanvodita changed the title Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966x Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966 May 10, 2024
stefanvodita referenced this pull request May 13, 2024
…#13284)

Our per-field vector and doc-values readers use `TreeMap`s but don't rely on
the iteration order, so these `TreeMap`s can be replaced with more
CPU/RAM-efficient `HashMap`s.

The per-field postings reader stays on a `TreeMap` since it relies on the
iteration order.
Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stefanvodita!

Make sure to update the main lucene/CHANGES.txt also to move the entry down to 9.11.0 section.

@stefanvodita stefanvodita merged commit 42884fa into apache:branch_9x May 14, 2024
2 checks passed
@stefanvodita stefanvodita added this to the 9.11.0 milestone May 14, 2024
@stefanvodita
Copy link
Contributor Author

Thank you for the review, Mike! I'd already put the CHANGES entries in 9.11 tentatively, now they're correct 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants