Skip to content

Use an HllSketchHolder object to enable optimized merge#13737

Merged
cheddar merged 6 commits intoapache:masterfrom
imply-cheddar:hllsketchholder-for-profit
Feb 7, 2023
Merged

Use an HllSketchHolder object to enable optimized merge#13737
cheddar merged 6 commits intoapache:masterfrom
imply-cheddar:hllsketchholder-for-profit

Conversation

@imply-cheddar
Copy link
Contributor

HllSketchAggregatorFactory.combine had been implemented using a pure pair-wise, "make a union -> add 2 things to union -> get sketch" algorithm. This algorithm does 2 things that was CPU

  1. The Union object always builds an HLL_8 sketch regardless of the
    target type. This means that when the target type is not HLL_8, we
    spent CPU cycles converting to HLL_8 and back over and over again
  2. By throwing away the Union object and converting back to the
    HllSketch only to build another Union object, we do lots and lots
    of copy+conversions of the HllSketch

This change introduces an HllSketchHolder object which can hold onto a Union object and delay conversion back into an HllSketch until it is actually needed. This follows the same pattern as the SketchHolder object for theta sketches.

This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

{
public static HllSketchHolder fromObj(Object obj)
{
if (obj instanceof HllSketchHolder) {

Check notice

Code scanning / CodeQL

Chain of 'instanceof' tests

This if block performs a chain of 6 type tests - consider alternatives, e.g. polymorphism or the visitor pattern.
@imply-cheddar imply-cheddar force-pushed the hllsketchholder-for-profit branch from d86347f to c7b35e9 Compare February 3, 2023 07:05
HllSketchAggregatorFactory.combine had been implemented using a
pure pair-wise, "make a union -> add 2 things to union -> get sketch"
algorithm.  This algorithm does 2 things that was CPU

1) The Union object always builds an HLL_8 sketch regardless of the
  target type.  This means that when the target type is not HLL_8, we
  spent CPU cycles converting to HLL_8 and back over and over again
2) By throwing away the Union object and converting back to the
  HllSketch only to build another Union object, we do lots and lots
  of copy+conversions of the HllSketch

This change introduces an HllSketchHolder object which can hold onto
a Union object and delay conversion back into an HllSketch until
it is actually needed.  This follows the same pattern as the
SketchHolder object for theta sketches.
@imply-cheddar imply-cheddar force-pushed the hllsketchholder-for-profit branch from ed1f5d3 to 7bc8903 Compare February 6, 2023 23:25
@cheddar cheddar merged commit f684df4 into apache:master Feb 7, 2023
@imply-cheddar imply-cheddar deleted the hllsketchholder-for-profit branch February 7, 2023 21:57
@clintropolis clintropolis added this to the 26.0 milestone Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants