Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ARRAYLENGTH UDF for multi-valued columns #5301

Merged
merged 4 commits into from
Apr 27, 2020

Conversation

bozhang2820
Copy link
Contributor

@bozhang2820 bozhang2820 commented Apr 25, 2020

This is to support pushdown of Presto SQL's cardinality() function.

Sample queries:
SELECT COUNT(*) FROM table WHERE arrayLength(mvColumn) > 2
SELECT COUNT(*) FROM table GROUP BY arrayLength(mvColumn)
SELECT MAX(arrayLength(mvColumn)) FROM table

@kishoreg
Copy link
Member

Add description and sample queries please.

Copy link
Member

@kishoreg kishoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the name of the transform function pick either array_length or cardinality.

import org.apache.pinot.core.plan.DocIdSetPlanNode;


public class LengthTransformFunction extends BaseTransformFunction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add java docs with a sample query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



public class LengthTransformFunction extends BaseTransformFunction {
public static final String FUNCTION_NAME = "length";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this applies to multi-valued column, It might be better to name this array_length or cardinality.
https://prestodb.io/docs/current/functions/array.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Changed this to arrayLength so people do not confuse it with the length function for strings.

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bozhang2820 bozhang2820 changed the title Implement LENGTH UDF for multi-valued columns Implement ARRAYLENGTH UDF for multi-valued columns Apr 27, 2020
@xiangfu0 xiangfu0 merged commit 1fe22b5 into apache:master Apr 27, 2020
@bozhang2820 bozhang2820 deleted the length-udf branch April 27, 2020 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants