-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-31098][table] Add ARRAY_SIZE function. #21947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@snuyanzin could you help review it? |
Thank you for your contribution. |
DataTypes.ROW( | ||
DataTypes.BOOLEAN(), DataTypes.DATE())) | ||
.nullable()) | ||
// ARRAY<INT> with null elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also add tests for cases with not null elements and removal of null
.
E.g.
select array_remove(array[1, 2], cast(null as int));
i would expect same array as output, or how do other systems behave?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have added ARRAY_REMOVE(f0, cast(NULL AS INT))
docs/data/sql_functions.yml
Outdated
- sql: ARRAY_SIZE(haystack) | ||
table: haystack.arraySize() | ||
description: Returns the size of an array. If the array itself is null, the function will return null. | ||
- sql: ARRAY_REMOVE(haystack) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't get ...
shouldn't ARRAY_REMOVE
have 2 args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have done
Expression.array_contains | ||
Expression.array_distinct | ||
Expression.array_size | ||
Expression.array_remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend keeping abc order (at least for array_*
expressions) here since it simplifies search/navigation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have done
sequence( | ||
Arrays.asList("haystack", "needle"), | ||
Arrays.asList( | ||
logical(LogicalTypeRoot.ARRAY), ARRAY_ELEMENT_ARG))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARRAY_ELEMENT_ARG
not sure about this since array could consists of non null elements and a query could try to remove null
e.g.
SELECT array_remove(array['abc', 'klm', 'xyz'], cast(null as varchar));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you test remove null from array with nullable types
I'm asking to test remove of null from array with NON nullable types
that's the difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have added test ARRAY_REMOVE(f2, cast(NULL AS VARCHAR))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't go into details.
However there is a simple test to check if it's working or not: build flink
start flink (standalone is enough)
run sql-client
submit a query
select array_remove(array[1, 2, 3], cast(null as int));
Expected result is array[1, 2, 3]
Actual result: it fails with exception
same query e.g. in postgres is working fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i test SELECT array_contains(array[1, 2, 3], cast(null as int)) it also fails with exception.
the reason is ARRAY_ELEMENT_ARG make the second arg must be same with array element type
Caused by: org.apache.flink.table.api.ValidationException: Invalid function call:
array_remove(ARRAY NOT NULL, INT).
so select array_remove(array[1, 2, 3, null], cast(null as int));
return array[1, 2, 3].
do we should also fix the array_contains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, looks like should be fixed as well, however could be done in a separate pr
DataTypes.BOOLEAN(), DataTypes.DATE())) | ||
.nullable()) | ||
// ARRAY<INT> with null elements | ||
.testResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add more complex cases like
removal from array of arrays or array of maps or array of rows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have added array<array> and array<map<int ,string>> unit test.
aee8f4e
to
1342c6b
Compare
hi @snuyanzin please review it again, i fix all your review infos |
1342c6b
to
f7910e6
Compare
hi @snuyanzin have changed, i reorganized the commits and make it two commit |
- sql: ARRAY_SIZE(haystack) | ||
table: haystack.arraySize() | ||
description: Returns the size of an array. If the array itself is null, the function will return null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why ARRAY_SIZE
is required if there is already CARDINALITY
which does same thing.
There is a discussion in jira about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To align the other engines like spark,snowflake, so that users can migrate easily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that's very arguable since different engines support different syntax
BigQuery has ARRAY_LENGTH
Hive has SIZE
Vertica has ARRAY_LENGTH
Postgresql ARRAY_LENGTH
why to support only snowflake and spark?
Support of everything seems to be an overkill...
I guess somewhere it should be stopped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your details response, i will drop the array_size commit.
hi @snuyanzin please review it again |
hi @snuyanzin do you have time to review it again? |
Closing the issue since #21947 is closed |
What is the purpose of the change
This is an implementation of ARRAY_SIZE/ARRAY_REMOVE
Brief change log
ARRAY_SIZE for Table API and SQL
Verifying this change
This change added tests in CollectionFunctionsITCase
Does this pull request potentially affect one of the following parts:
Dependencies (does it add or upgrade a dependency): ( no)
The public API, i.e., is any changed class annotated with @public(Evolving): (yes )
The serializers: (no)
The runtime per-record code paths (performance sensitive): ( no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: ( no)
The S3 file system connector: ( no)
Documentation
Does this pull request introduce a new feature? (yes)
If yes, how is the feature documented? (docs)