New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-31118][table] Add ARRAY_UNION function. #21958
Conversation
hi @snuyanzin do you have time to review it. and the input args type should do somethinh like calcite array_concat. do you have good idea, because it should be reuse like array_except/array_intersect/
|
@@ -617,6 +617,9 @@ collection: | |||
- sql: ARRAY_DISTINCT(haystack) | |||
table: haystack.arrayDistinct() | |||
description: Returns an array with unique elements. If the array itself is null, the function will return null. Keeps ordering of elements. | |||
- sql: ARRAY_UNION(array1, array2) | |||
table: haystack.arrayUnion(array) | |||
description: Returns an array of the elements in the union of array1 and array2, without duplicates. If both of the array are null, the function will return null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the description it is not clear what happens if only one array is null
Thanks for your contribution. To be honest i didn't get the logic.
SELECT array_union(array[1], array[2]);
-- result array[1, 2]
-- this is OK SELECT array_union(array[1], array[2, 3]);
-- result array[1, 2, 3]
--- this is OK SELECT array_union(array[1], array[2, 3, null]);
-- result array[1, 2, 3, 0]
--- this is NOT OK SELECT array_union(array[1], array[map['this is a key', 'this is a value']]);
-- result [1, 68]
-- this is NOT OK SELECT array_union(array[1], array['this is a string']);
-- result [1, 16]
-- this is NOT OK SELECT array_union(array[1], array[array[1, 2, 3]]);
-- result [1, 24]
-- this is NOT OK |
Returns an array of the elements in the union of array1 and array2, without duplicates. | ||
If both of the array are null, the function will return null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@@ -1359,6 +1360,16 @@ public OutType arrayDistinct() { | |||
return toApiSpecificExpression(unresolvedCall(ARRAY_DISTINCT, toExpr())); | |||
} | |||
|
|||
/** | |||
* Returns an array of the elements in the union of array1 and array2, without duplicates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about ordering?
the result is not ok like SELECT array_union(array[1], array[array[1, 2, 3]]); SELECT array_union(array[1], array['this is a string']); i said before this two array type should be exception. we should do it like ARRAY_CONCAT is calcite. but i do not find a good way to express it in flink. so i ask you do you have a good implements and it is important for other array op like |
unfortunately right now i do not have anything where i can just pick it up. You probably could do some research in this direction |
hi @snuyanzin the pr i submit before do not think about implicit cast like spark such as ARRAY ARRAY and i sumit a new pr here #22483 |
What is the purpose of the change
This is an implementation of ARRAY_UNION
Brief change log
ARRAY_UNION for Table API and SQL
Verifying this change
This change added tests in CollectionFunctionsITCase
Does this pull request potentially affect one of the following parts:
Dependencies (does it add or upgrade a dependency): ( no)
The public API, i.e., is any changed class annotated with @public(Evolving): (yes )
The serializers: (no)
The runtime per-record code paths (performance sensitive): ( no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: ( no)
The S3 file system connector: ( no)
Documentation
Does this pull request introduce a new feature? (yes)
If yes, how is the feature documented? (docs)