Skip to content

FacetCutter: unify rollup and range remapping#16122

Draft
epotyom wants to merge 1 commit into
apache:mainfrom
epotyom:facets-ord_remap_api
Draft

FacetCutter: unify rollup and range remapping#16122
epotyom wants to merge 1 commit into
apache:mainfrom
epotyom:facets-ord_remap_api

Conversation

@epotyom
Copy link
Copy Markdown
Contributor

@epotyom epotyom commented May 25, 2026

Two structurally identical problems were solved by two separate mechanisms:

  • Taxonomy rollup: getOrdinalsToRollup() + getChildrenOrds(int) drove a recursive tree-walk in CountFacetRecorder and LongAggregationsFacetRecorder, descending from each dim root through the full taxonomy subtree regardless of which nodes had hits.
  • Range remapping: the pos[] lookup (elementary interval → user range position) was baked into each NonOverlappingLongRangeFacetCutter leaf cutter's nextOrd(), running once per matching document during collection.

What this PR does

Adds two default methods to FacetCutter:

default boolean needsRemapping() throws IOException { return false; }
default OrdinalIterator remapOrd(int mergedOrd) throws IOException { ... }

When needsRemapping() is true, recorders iterate over recorded ordinals and call remapOrd() to obtain final ordinal(s).

TaxonomyFacetsCutter: switches from children/siblings to a parents array walk. remapOrd(ord) walks from ord up to the dim root, emitting every ancestor so counts accumulate at each level.

NonOverlappingLongRangeFacetCutter: leaf cutters now yield raw elementary-interval ordinals. remapOrd() applies the pos[] lookup at reduce time.

Performance

  • Taxonomy: cost is now O(recorded ordinals × hierarchy depth) instead of O(full taxonomy subtree). Sparse result sets benefit significantly.
  • Ranges: pos[] lookup cost drops from O(matching documents) to O(distinct elementary intervals with hits).

Removed from FacetCutter

  • getOrdinalsToRollup()
  • getChildrenOrds(int)

New helpers

  • OrdinalIterator.fromSingleOrd(int) -- one-shot iterator over a single ordinal; used by remapOrd implementations that map 1-to-1.

Benchmarks

TBD

Adds needsRemapping() and remapOrd() default methods to FacetCutter,
allowing cutters that record raw ordinals during collection to remap
them to final ordinals at reduce time.

Changes:
- Add FacetCutter#needsRemapping() (default false) and remapOrd()
- CountFacetRecorder and LongAggregationsFacetRecorder: replace the
  recursive rollup tree-walk with a flat remapOrd loop over recorded
  ordinals, guarded by needsRemapping()
- TaxonomyFacetsCutter: needsRemapping() returns false when disableRollup
  is set or when no single-valued dims exist, skipping the remap loop
  and the parent walk entirely
- NonOverlappingLongRangeFacetCutter: needsRemapping() returns true;
  leaf cutters yield raw elementary-interval ordinals; remapOrd applies
  the pos[] lookup (gap filtering + range position mapping) at reduce time
- DoubleRangeFacetCutter converted to a static factory returning the
  inner LongRangeFacetCutter directly, so needsRemapping() on the
  returned cutter is false for overlapping ranges
- Remove getOrdinalsToRollup() and getChildrenOrds() from FacetCutter
- Add OrdinalIterator.fromSingleOrd(int) factory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant