Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Coalesce for Column Names #9327

Merged
merged 7 commits into from Sep 14, 2022
Merged

Conversation

61yao
Copy link
Contributor

@61yao 61yao commented Sep 2, 2022

Coalesce takes a list of arguments and returns the first not null value. If all arguments are null, return a null.

This implementations transform all arguments into string and return the first non-null string value. Null is represented as string value "null"

@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2022

Codecov Report

Merging #9327 (43ac34c) into master (ff2a333) will increase coverage by 3.56%.
The diff coverage is 87.50%.

@@             Coverage Diff              @@
##             master    #9327      +/-   ##
============================================
+ Coverage     63.40%   66.97%   +3.56%     
- Complexity     4762     4894     +132     
============================================
  Files          1832     1403     -429     
  Lines         98146    73108   -25038     
  Branches      15020    11722    -3298     
============================================
- Hits          62231    48964   -13267     
+ Misses        31321    20582   -10739     
+ Partials       4594     3562    -1032     
Flag Coverage Δ
unittests1 66.97% <87.50%> (+0.05%) ⬆️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../transform/function/CoalesceTransformFunction.java 87.35% <87.35%> (ø)
...e/pinot/common/function/TransformFunctionType.java 100.00% <100.00%> (ø)
...r/transform/function/TransformFunctionFactory.java 86.89% <100.00%> (+0.09%) ⬆️
.../org/apache/pinot/query/routing/WorkerManager.java 83.87% <0.00%> (-6.87%) ⬇️
...he/pinot/segment/local/segment/store/IndexKey.java 65.00% <0.00%> (-5.00%) ⬇️
.../aggregation/function/ModeAggregationFunction.java 88.10% <0.00%> (ø)
...ller/recommender/data/generator/TimeGenerator.java
...ontroller/api/resources/PinotControllerLogger.java
...inion/event/DefaultMinionEventObserverFactory.java
...ker/routing/instanceselector/InstanceSelector.java
... and 430 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

}

@Override
public String[] transformToStringValuesSV(ProjectionBlock projectionBlock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think implicitly treating everything as STRING will give confusing semantics imo. If we don't want to support this function on few data types, then throwing exception is better.

Reference - https://docs.snowflake.com/en/sql-reference/functions/coalesce.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. For the first version, we can check if all the input are numbers or strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check for argument type. but the return value has to be a string, otherwise we are not able to represent null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Currently we don't support null values for transform function yet. You may read TransformBlockValSet.getNullBitmap(), which count the result as null when any argument is null. This won't work for coalesce.

For now, we may use NullValueUtils.getDefaultNullValue() for each data type to present the null. Then we should figure out a way to support the real null in transform function.

Copy link
Contributor

@walterddr walterddr Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edited) represented using default nullvalue probably is not a good idea since all numeric data type uses zero as default null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@walterddr Not really. We use min value as the default for numeric types except for big decimal, which doesn't have a min value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use NullValueUtils.getDefaultNullValue() and only supports numeric and string types. Also it requires the type to be same for all arguments.

@61yao 61yao force-pushed the coalesce branch 3 times, most recently from adbc410 to 21d21ae Compare September 10, 2022 01:19
Preconditions.checkArgument(func instanceof IdentifierTransformFunction,
"Only column names are supported in COALESCE.");
FieldSpec.DataType dataType = func.getResultMetadata().getDataType().getStoredType();
Preconditions.checkArgument(dataType.isNumeric() || dataType == FieldSpec.DataType.STRING,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this check. We may use getResultMetadata() to rule out the unsupported types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the second check but we need the type to be identifier to get the null bit map?

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

TransformFunction func = arguments.get(i);
Preconditions.checkArgument(func instanceof IdentifierTransformFunction,
"Only column names are supported in COALESCE.");
FieldSpec.DataType dataType = func.getResultMetadata().getDataType().getStoredType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Let's change the name to storedType to be more explicit

@Override
public TransformResultMetadata getResultMetadata() {
switch (_dataType) {
case STRING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) We usually follow the sequence of INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING (same order as the enum for easier tracking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also use intellij autocomplete to generate the branches will automatically be in that order :-) )

}

@Override
public TransformResultMetadata getResultMetadata() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Let's calculate the result metadata in the init() and store it in a member variable. It has 2 benefits:

  1. Fail fast when the type cannot be supported
  2. Prevent calculating result metadata multiple times

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

@Override
public String[] transformToStringValuesSV(ProjectionBlock projectionBlock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Follow the same sequence (INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING) as the interface for easier tracking

* Get transform float results based on store type.
* @param projectionBlock
*/
private float[] getFloatTransformResults(ProjectionBlock projectionBlock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Put this method before getDoublelTransformResults() for easier tracking

@walterddr walterddr merged commit a949f95 into apache:master Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants