FacetCutter: unify rollup and range remapping#16122
Draft
epotyom wants to merge 1 commit into
Draft
Conversation
Adds needsRemapping() and remapOrd() default methods to FacetCutter, allowing cutters that record raw ordinals during collection to remap them to final ordinals at reduce time. Changes: - Add FacetCutter#needsRemapping() (default false) and remapOrd() - CountFacetRecorder and LongAggregationsFacetRecorder: replace the recursive rollup tree-walk with a flat remapOrd loop over recorded ordinals, guarded by needsRemapping() - TaxonomyFacetsCutter: needsRemapping() returns false when disableRollup is set or when no single-valued dims exist, skipping the remap loop and the parent walk entirely - NonOverlappingLongRangeFacetCutter: needsRemapping() returns true; leaf cutters yield raw elementary-interval ordinals; remapOrd applies the pos[] lookup (gap filtering + range position mapping) at reduce time - DoubleRangeFacetCutter converted to a static factory returning the inner LongRangeFacetCutter directly, so needsRemapping() on the returned cutter is false for overlapping ranges - Remove getOrdinalsToRollup() and getChildrenOrds() from FacetCutter - Add OrdinalIterator.fromSingleOrd(int) factory
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two structurally identical problems were solved by two separate mechanisms:
getOrdinalsToRollup()+getChildrenOrds(int)drove a recursive tree-walk inCountFacetRecorderandLongAggregationsFacetRecorder, descending from each dim root through the full taxonomy subtree regardless of which nodes had hits.pos[]lookup (elementary interval → user range position) was baked into eachNonOverlappingLongRangeFacetCutterleaf cutter'snextOrd(), running once per matching document during collection.What this PR does
Adds two default methods to
FacetCutter:When
needsRemapping()is true, recorders iterate over recorded ordinals and callremapOrd()to obtain final ordinal(s).TaxonomyFacetsCutter: switches fromchildren/siblingsto aparentsarray walk.remapOrd(ord)walks fromordup to the dim root, emitting every ancestor so counts accumulate at each level.NonOverlappingLongRangeFacetCutter: leaf cutters now yield raw elementary-interval ordinals.remapOrd()applies thepos[]lookup at reduce time.Performance
Removed from
FacetCuttergetOrdinalsToRollup()getChildrenOrds(int)New helpers
OrdinalIterator.fromSingleOrd(int)-- one-shot iterator over a single ordinal; used byremapOrdimplementations that map 1-to-1.Benchmarks
TBD