New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support Coalesce for Column Names #9327
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9327 +/- ##
============================================
+ Coverage 63.40% 66.97% +3.56%
- Complexity 4762 4894 +132
============================================
Files 1832 1403 -429
Lines 98146 73108 -25038
Branches 15020 11722 -3298
============================================
- Hits 62231 48964 -13267
+ Misses 31321 20582 -10739
+ Partials 4594 3562 -1032
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
...c/main/java/org/apache/pinot/core/operator/transform/function/CoalesceTransformFunction.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public String[] transformToStringValuesSV(ProjectionBlock projectionBlock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think implicitly treating everything as STRING will give confusing semantics imo. If we don't want to support this function on few data types, then throwing exception is better.
Reference - https://docs.snowflake.com/en/sql-reference/functions/coalesce.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. For the first version, we can check if all the input are numbers or strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a check for argument type. but the return value has to be a string, otherwise we are not able to represent null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Currently we don't support null
values for transform function yet. You may read TransformBlockValSet.getNullBitmap()
, which count the result as null
when any argument is null
. This won't work for coalesce
.
For now, we may use NullValueUtils.getDefaultNullValue()
for each data type to present the null. Then we should figure out a way to support the real null
in transform function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(edited) represented using default nullvalue probably is not a good idea since all numeric data type uses zero as default null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@walterddr Not really. We use min value as the default for numeric types except for big decimal, which doesn't have a min value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use NullValueUtils.getDefaultNullValue() and only supports numeric and string types. Also it requires the type to be same for all arguments.
adbc410
to
21d21ae
Compare
...c/main/java/org/apache/pinot/core/operator/transform/function/CoalesceTransformFunction.java
Outdated
Show resolved
Hide resolved
Preconditions.checkArgument(func instanceof IdentifierTransformFunction, | ||
"Only column names are supported in COALESCE."); | ||
FieldSpec.DataType dataType = func.getResultMetadata().getDataType().getStoredType(); | ||
Preconditions.checkArgument(dataType.isNumeric() || dataType == FieldSpec.DataType.STRING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this check. We may use getResultMetadata()
to rule out the unsupported types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the second check but we need the type to be identifier to get the null bit map?
...c/main/java/org/apache/pinot/core/operator/transform/function/CoalesceTransformFunction.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/core/operator/transform/function/CoalesceTransformFunction.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments
TransformFunction func = arguments.get(i); | ||
Preconditions.checkArgument(func instanceof IdentifierTransformFunction, | ||
"Only column names are supported in COALESCE."); | ||
FieldSpec.DataType dataType = func.getResultMetadata().getDataType().getStoredType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Let's change the name to storedType
to be more explicit
@Override | ||
public TransformResultMetadata getResultMetadata() { | ||
switch (_dataType) { | ||
case STRING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) We usually follow the sequence of INT
, LONG
, FLOAT
, DOUBLE
, BIG_DECIMAL
, STRING
(same order as the enum for easier tracking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also use intellij autocomplete to generate the branches will automatically be in that order :-) )
} | ||
|
||
@Override | ||
public TransformResultMetadata getResultMetadata() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Let's calculate the result metadata in the init()
and store it in a member variable. It has 2 benefits:
- Fail fast when the type cannot be supported
- Prevent calculating result metadata multiple times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
@Override | ||
public String[] transformToStringValuesSV(ProjectionBlock projectionBlock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Follow the same sequence (INT
, LONG
, FLOAT
, DOUBLE
, BIG_DECIMAL
, STRING
) as the interface for easier tracking
* Get transform float results based on store type. | ||
* @param projectionBlock | ||
*/ | ||
private float[] getFloatTransformResults(ProjectionBlock projectionBlock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Put this method before getDoublelTransformResults()
for easier tracking
Coalesce takes a list of arguments and returns the first not null value. If all arguments are null, return a null.
This implementations transform all arguments into string and return the first non-null string value. Null is represented as string value "null"