-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STRING_AGG SQL aggregator function #11241
Conversation
OperandTypes.sequence( | ||
StringUtils.format("'%s'(expr, separator)", NAME), | ||
OperandTypes.ANY, | ||
OperandTypes.LITERAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this also take care of the cases if the literal is wrapped in a scalar function (like concat('a', 'b')
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain if this can (or should) handle an expression for the separator
parameter, I'll do some investigation and potentially update this signature accordingly and add a test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, I'm going to hold off on updating this PR until #11280 goes in, since it has a slight refactor/improvement of the underlying expression aggregator that will require some minor changes in this PR.
) | ||
); | ||
} else { | ||
return Aggregation.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the if-else can be collapsed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it could if i pull out the two expressions into a separate conditional block; I did not make this change yet because it seems like minor shifting of the condition but will modify here (as well as ARRAY_AGG since I did the same thing there) if you feel more strongly about it.
} | ||
maxSizeBytes = ((Number) RexLiteral.value(maxBytes)).intValue(); | ||
} | ||
final DruidExpression arg = arguments.get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be aggregationColumnArg
@@ -65,7 +65,7 @@ | |||
// minimum permitted agg size is 10 bytes so it is at least large enough to hold primitive numerics (long, double) | |||
// | expression type byte | is_null byte | primitive value (8 bytes) | | |||
private static final int MIN_SIZE_BYTES = 10; | |||
private static final HumanReadableBytes DEFAULT_MAX_SIZE_BYTES = new HumanReadableBytes(1L << 10); | |||
public static final HumanReadableBytes DEFAULT_MAX_SIZE_BYTES = new HumanReadableBytes(1L << 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like an extra change - can't find its usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it wasn't meant to be, its used in tests now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @clintropolis 👍
thanks for review @rohangarg and @jihoonson 👍 |
Description
This PR adds
STRING_AGG
, which is sort of likeARRAY_AGG
added in #11157 but it ignores null values, with anARRAY_TO_STRING
function on the output array to turn it into a string value.example:
output:
STRING_AGG(expr, separator, [size])
expr
into a single STRING, ignoring null values. Each value is joined by theseparator
which must be a literal STRING. An optionalsize
in bytes can be supplied to limit aggregation size (default of 1024 bytes). If the aggregated string grows larger than the maximum size in bytes, the query will fail. Use ofORDER BY
within theSTRING_AGG
expression is not currently supported, and the ordering of results within the output string may vary depending on processing order.null
ifdruid.generic.useDefaultValueForNull=false
, otherwise''
STRING_AGG(DISTINCT expr, separator, [size])
expr
into a single STRING, ignoring null values. Each value is joined by theseparator
which must be a literal STRING. An optionalsize
in bytes can be supplied to limit aggregation size (default of 1024 bytes). If the aggregated string grows larger than the maximum size in bytes, the query will fail. Use ofORDER BY
within theSTRING_AGG
expression is not currently supported, and the ordering of results within the output string may vary depending on processing order.null
ifdruid.generic.useDefaultValueForNull=false
, otherwise''
Like
ARRAY_AGG
this is an expression aggregator, so is not well optimized, but will fill in the gaps for now.This PR has: