Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bitwise aggregators, better null handling options for expression agg #11280

Merged
merged 8 commits into from
Jun 25, 2021

Conversation

clintropolis
Copy link
Member

Description

Builds on top of #11104 and #10605 to add bitwise aggregator functions:

Function Notes Default
BIT_AND(expr) Performs a bitwise AND operation on all input values. null if druid.generic.useDefaultValueForNull=false, otherwise 0
BIT_OR(expr) Performs a bitwise OR operation on all input values. null if druid.generic.useDefaultValueForNull=false, otherwise 0
BIT_XOR(expr) Performs a bitwise XOR operation on all input values. null if druid.generic.useDefaultValueForNull=false, otherwise 0

In the process of adding this, I've also modified ExpressionLambdaAggregatorFactory to have an additional JSON property,
initiallyNull, which determines if the aggregator will produce a null value or initialValue/initialCombineValue. For example, an SQL compatible count aggregator would have initiallyNull set to false and have initialValue set to 0, so that it would always return 0 even if no rows were aggregated, while a sum would have it set to true so that it would return null in the same case. For the buffer aggregator, this is tracked by setting a bit in the expression type byte which prefixes all of the serialized expressions, which is then cleared whenever the aggregate function is called. This change simplifies ARRAY_AGG since it was previously using a finalize expression to coerce empty results back to null, but now it can just naturally be initialized to null.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@@ -121,6 +124,7 @@ public ExpressionLambdaAggregatorFactory(

this.initialValueExpressionString = initialValue;
this.initialCombineValueExpressionString = initialCombineValue == null ? initialValue : initialCombineValue;
this.initiallyNull = initiallyNull == null ? NullHandling.sqlCompatible() : initiallyNull;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be named as useNullInitially or isInitiallyNull to make it more clear in code. As above, I would also be fine with some other better name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to isNullUnlessAggregated to more clearly indicate that it is a boolean and hopefully indicate its main role in determining aggregator behavior. initiallyNull seemed confusing alongside initialValue.

@@ -48,7 +49,7 @@
public static ExprEval deserialize(ByteBuffer buffer, int position)
{
// | expression type (byte) | expression bytes |
ExprType type = ExprType.fromByte(buffer.get(position));
ExprType type = ExprType.fromByte((byte) (buffer.get(position) & TYPE_MASK));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to add comment suggesting that only BufferLambdaExpressionAggregator calls this hence we are clearing the sign bit due to implementation in aggregator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked this to do the masking in the buffer aggregator instead of here

Copy link
Member

@rohangarg rohangarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@@ -195,6 +199,12 @@ public String getInitialCombineValueExpressionString()
return initialCombineValueExpressionString;
}

@JsonProperty("initiallyNull")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also be changed to isNullUnlessAggregated


import javax.annotation.Nullable;
import java.nio.ByteBuffer;

public class ExpressionLambdaBufferAggregator implements BufferAggregator
{
private static final short NOT_AGGREGATED_BIT = 1 << 7;
private static final short IS_AGGREGATED_MASK = 0x3F;
private static final byte TYPE_MASK = 0x0F;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to drop either TYPE_MASK or IS_AGGREGATED_MASK and use a common mask whose value is 0x0F ?

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@clintropolis
Copy link
Member Author

thanks for review @jihoonson and @rohangarg 🤘

@clintropolis clintropolis merged commit df9b57a into apache:master Jun 25, 2021
@clintropolis clintropolis deleted the bitwise-aggs branch June 25, 2021 23:51
jihoonson pushed a commit to jihoonson/druid that referenced this pull request Jul 12, 2021
…pache#11280)

* bitwise aggregators, better nulls for expression agg

* correct behavior

* rework deserialize, better names

* fix json, share mask
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants