Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-31663][table] Add ARRAY_EXCEPT function #22588

Closed
wants to merge 1 commit into from

Conversation

bvarghese1
Copy link
Contributor

@bvarghese1 bvarghese1 commented May 16, 2023

What is the purpose of the change

This is an implementation of ARRAY_EXCEPT

Brief change log

ARRAY_EXCEPT for Table API and SQL

Flink SQL> SELECT array_except(array[1,2,2], array[2,3,4]);
[1]

Flink SQL> SELECT array_except(array[1,2,2], array[1]);
[2]

Flink SQL> SELECT array_except(array[1,2,2], array[42]);
[1, 2]

Flink SQL> SELECT array_except(array[1,2,2], cast(null as array<int>));
[1, 2]

Flink SQL> SELECT array_except(array[1,2,2], array[null,2]);
[1]

Flink SQL> SELECT array_except(cast(null as array<int>), array[1,2,3]);
<NULL>

Flink SQL> SELECT array_except(array[null,null,1], array[42]);
[NULL, 1]

Flink SQL> SELECT array_except(array[null,null,1], array[null, 42]);
[1]

Flink SQL> SELECT array_except(array[(TRUE, DATE '2022-04-20'), (TRUE, DATE '1990-10-14'), null], array[(TRUE, DATE '1990-10-14')]);
[(TRUE, 2022-04-20), NULL]

Flink SQL> SELECT array_except(array[(TRUE, DATE '2022-04-20'), (TRUE, DATE '1990-10-14'), null], cast(null as array<row<col1 boolean, col2 date>>));
[(TRUE, 2022-04-20), (TRUE, 1990-10-14), NULL]

Flink SQL> SELECT array_except(array[array[1,null,3], array[0], array[1]], array[array[0]]);
[[1, NULL, 3], [1]]

Flink SQL> SELECT array_except(array[map[1, 'a', 2, 'b'], map[3, 'c', 4, 'd']], array[map[3, 'c', 4, 'd']]);
[{1=a, 2=b}]

// Error message with the CommonArrayInputStrategy. Without the CommonArrayInputStrategy the output is [1]
Flink SQL> SELECT array_except(array[1], array['this is a string']);
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: Could not find a common type for arguments: [ARRAY<INT NOT NULL> NOT NULL, ARRAY<CHAR(16) NOT NULL> NOT NULL]

See also https://spark.apache.org/docs/latest/api/sql/index.html#array_except

Verifying this change

  • This change added tests in CollectionFunctionsITCase.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (docs)

@flinkbot
Copy link
Collaborator

flinkbot commented May 16, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@bvarghese1 bvarghese1 force-pushed the array_except branch 2 times, most recently from cbbf0c1 to 5e9e678 Compare May 16, 2023 19:53
}

List<Object> list = new ArrayList();
Set<Object> arrayOneSet = new HashSet<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't depend on java objects equality., but rather use a generated comparator:

        equalityEvaluator =
                context.createEvaluator(
                        $("element1").isEqual($("element2")),
                        DataTypes.BOOLEAN(),
                        DataTypes.FIELD("element1", dataType.notNull().toInternal()),
                        DataTypes.FIELD("element2", dataType.notNull().toInternal()));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use arrayContains comparator

if ((arrayTwo == null && !seen.contains(element))
|| (element == null && !isNullPresent)
|| (element != null
&& !seen.contains(element)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not depend on java equals and hashcode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants