Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966 #13358

stefanvodita · 2024-05-10T22:05:09Z

Annoying things I had to do to backport #12966:

TaxonomyFacetCounts stores the counts twice because it extends IntTaxonomyFacets. The correct way would be to extend TaxonomyFacets, but I couldn't make that change, like I could for FastTaxonomyFacets, which was marked @lucene.experimental.
Increase visibility of useHashTable and rollup in TaxonomyFacets compared to main, from default to protected.
Leave TopOrdAndInt/FloatQueue as they were, since they are public API, and create new queues for our purposes, TopOrdAndInt/FloatNumberQueue, marked deprecated because they will go away in 10.x.

This is a large change, refactoring most of the taxonomy facets code and changing internal behaviour, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets. 1. Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in apache#12553. 2. As a consequence, introduce sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of apache#12576. 3. Compute counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses apache#11282. 4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes apache#12585. 5. Introduce the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in apache#12546. 6. Don't change the API. The only change in behaviour users should notice is the fix for non-positive aggregation values, which were previously discarded. 7. Add tests which were missing for sparse/dense values and non-positive aggregations.

stefanvodita · 2024-05-10T22:21:12Z

Despite the "annoying" bits in the description, I don't expect this backport to be controversial, but reviews are welcome! I plan to wait over the weekend and then merge.

…#13284) Our per-field vector and doc-values readers use `TreeMap`s but don't rely on the iteration order, so these `TreeMap`s can be replaced with more CPU/RAM-efficient `HashMap`s. The per-field postings reader stays on a `TreeMap` since it relies on the iteration order.

mikemccand

Thanks @stefanvodita!

Make sure to update the main lucene/CHANGES.txt also to move the entry down to 9.11.0 section.

stefanvodita · 2024-05-14T10:07:20Z

Thank you for the review, Mike! I'd already put the CHANGES entries in 9.11 tentatively, now they're correct 😄

stefanvodita and others added 4 commits May 10, 2024 16:01

tidy

15ec344

fix

5f1c05d

Ensure backwards compatibility

c66e3f0

stefanvodita mentioned this pull request May 10, 2024

Reduce duplication in taxonomy facets; always do counts #12966

Merged

stefanvodita changed the title ~~Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966x~~ Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966 May 10, 2024

mikemccand approved these changes May 13, 2024

View reviewed changes

Merge branch 'branch_9x' into backport-taxo-refactor

a078974

stefanvodita merged commit 42884fa into apache:branch_9x May 14, 2024
2 checks passed

stefanvodita added this to the 9.11.0 milestone May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966 #13358

Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966 #13358

stefanvodita commented May 10, 2024

stefanvodita commented May 10, 2024

mikemccand left a comment

stefanvodita commented May 14, 2024

Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966 #13358

Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966 #13358

Conversation

stefanvodita commented May 10, 2024

stefanvodita commented May 10, 2024

mikemccand left a comment

Choose a reason for hiding this comment

stefanvodita commented May 14, 2024