Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-31118][table] Add ARRAY_UNION function. #21958

Closed
wants to merge 1 commit into from

Conversation

liuyongvs
Copy link
Contributor

  • What is the purpose of the change
    This is an implementation of ARRAY_UNION

  • Brief change log
    ARRAY_UNION for Table API and SQL

  • Verifying this change
    This change added tests in CollectionFunctionsITCase

  • Does this pull request potentially affect one of the following parts:
    Dependencies (does it add or upgrade a dependency): ( no)
    The public API, i.e., is any changed class annotated with @public(Evolving): (yes )
    The serializers: (no)
    The runtime per-record code paths (performance sensitive): ( no)
    Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: ( no)
    The S3 file system connector: ( no)

  • Documentation
    Does this pull request introduce a new feature? (yes)
    If yes, how is the feature documented? (docs)

@liuyongvs
Copy link
Contributor Author

hi @snuyanzin do you have time to review it. and the input args type should do somethinh like calcite array_concat. do you have good idea, because it should be reuse like array_except/array_intersect/

public static final SqlFunction ARRAY_CONCAT = SqlBasicFunction.create(SqlKind.ARRAY_CONCAT, ReturnTypes.LEAST_RESTRICTIVE, OperandTypes.AT_LEAST_ONE_SAME_VARIADIC);

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 17, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@@ -617,6 +617,9 @@ collection:
- sql: ARRAY_DISTINCT(haystack)
table: haystack.arrayDistinct()
description: Returns an array with unique elements. If the array itself is null, the function will return null. Keeps ordering of elements.
- sql: ARRAY_UNION(array1, array2)
table: haystack.arrayUnion(array)
description: Returns an array of the elements in the union of array1 and array2, without duplicates. If both of the array are null, the function will return null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the description it is not clear what happens if only one array is null

@snuyanzin
Copy link
Contributor

snuyanzin commented Feb 17, 2023

Thanks for your contribution.

To be honest i didn't get the logic.
What i did:

  1. Built it from the PR's branch
  2. Started standalone Flink
  3. Via sqlClient submitted several queries
SELECT array_union(array[1], array[2]);
-- result array[1, 2]
-- this is OK
SELECT array_union(array[1], array[2, 3]);
-- result array[1, 2, 3]
--- this is OK
SELECT array_union(array[1], array[2, 3, null]);
-- result array[1, 2, 3, 0]
--- this is NOT OK
SELECT array_union(array[1], array[map['this is a key', 'this is a value']]);
-- result [1, 68]
-- this is NOT OK
SELECT array_union(array[1], array['this is a string']);
-- result [1, 16]
-- this is NOT OK
SELECT array_union(array[1], array[array[1, 2, 3]]);
-- result [1, 24]
-- this is NOT OK

Comment on lines +1492 to +1493
Returns an array of the elements in the union of array1 and array2, without duplicates.
If both of the array are null, the function will return null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -1359,6 +1360,16 @@ public OutType arrayDistinct() {
return toApiSpecificExpression(unresolvedCall(ARRAY_DISTINCT, toExpr()));
}

/**
* Returns an array of the elements in the union of array1 and array2, without duplicates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about ordering?

@liuyongvs liuyongvs closed this Feb 17, 2023
@liuyongvs liuyongvs reopened this Feb 17, 2023
@liuyongvs
Copy link
Contributor Author

Thanks for your contribution.

To be honest i didn't get the logic. What i did:

  1. Built it from the PR's branch
  2. Started standalone Flink
  3. Via sqlClient submitted several queries
SELECT array_union(array[1], array[2]);
-- result array[1, 2]
-- this is OK
SELECT array_union(array[1], array[2, 3]);
-- result array[1, 2, 3]
--- this is OK
SELECT array_union(array[1], array[2, 3, null]);
-- result array[1, 2, 3, 0]
--- this is NOT OK
SELECT array_union(array[1], array[map['this is a key', 'this is a value']]);
-- result [1, 68]
-- this is NOT OK
SELECT array_union(array[1], array['this is a string']);
-- result [1, 16]
-- this is NOT OK
SELECT array_union(array[1], array[array[1, 2, 3]]);
-- result [1, 24]
-- this is NOT OK

the result is not ok like SELECT array_union(array[1], array[array[1, 2, 3]]); SELECT array_union(array[1], array['this is a string']); i said before this two array type should be exception. we should do it like ARRAY_CONCAT is calcite. but i do not find a good way to express it in flink. so i ask you do you have a good implements and it is important for other array op like
array_except/array_intersect..

@liuyongvs liuyongvs requested review from snuyanzin and removed request for snuyanzin February 17, 2023 16:51
@snuyanzin
Copy link
Contributor

snuyanzin commented Feb 21, 2023

so i ask you do you have a good implements

unfortunately right now i do not have anything where i can just pick it up. You probably could do some research in this direction

@liuyongvs
Copy link
Contributor Author

hi @snuyanzin the pr i submit before do not think about implicit cast like spark such as ARRAY ARRAY

and i sumit a new pr here #22483
and i close this pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants