Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-31102][table] Add ARRAY_REMOVE function. #22144

Merged
merged 2 commits into from Mar 26, 2023

Conversation

liuyongvs
Copy link
Contributor

@liuyongvs liuyongvs commented Mar 10, 2023

  • What is the purpose of the change
    This is an implementation of ARRAY_REMOVE

  • Brief change log
    ARRAY_REMOVE for Table API and SQL

    Syntax:
    array_remove(array, needle)
    
    Arguments:
    array: An ARRAY to be handled.
    
    Returns:
    An ARRAY. If array is NULL, the result is NULL. 
    Examples:
    
    SELECT array_remove(array[1, 2, 3, null, 3], 3); 
    -- [1,2,null]
    

    See also
    spark https://spark.apache.org/docs/latest/api/sql/index.html#array_remove

    presto https://prestodb.io/docs/current/functions/array.html

    postgresql https://www.postgresql.org/docs/12/functions-array.html#ARRAY-FUNCTIONS-TABLE

  • Verifying this change
    This change added tests in CollectionFunctionsITCase

  • Does this pull request potentially affect one of the following parts:
    Dependencies (does it add or upgrade a dependency): ( no)
    The public API, i.e., is any changed class annotated with @public(Evolving): (yes )
    The serializers: (no)
    The runtime per-record code paths (performance sensitive): ( no)
    Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: ( no)
    The S3 file system connector: ( no)

  • Documentation
    Does this pull request introduce a new feature? (yes)
    If yes, how is the feature documented? (docs)

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 10, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@liuyongvs
Copy link
Contributor Author

liuyongvs commented Mar 10, 2023

hi @snuyanzin @twalthr after talking in #21947
Let me make a quick summary

  1. do not need support array_size, which can be replaced by CARDINALITY. so drop the commit
  2. the select array_remove(array[1, 2], cast(null as int)); throw exception, which is caused by array_contains. and it is fixed and merge in 10dce7c
  3. the java equals may cause bug in array_distinct, should use ExpressionEvaluator, suggested by @twalthr which is fixed and merge in 6797d6f
  4. the last problem found in [FLINK-31377][table] Fix array_contains ArrayData.ElementGetter shoul… #22143 (comment), will also need to be solved

so it is the time to implment other array functions. so i submit a new PR, could you help to review

@snuyanzin
Copy link
Contributor

thanks for the contribution
can you please fix description?
e.g.

  1. I didn't get how ARRAY_SIZEis related to this PR

  2. Syntax:
    array_remove(array)

    is it possible to invoke array_remove for one arg? What is the expected output?

  3. SELECT array_remove(array(1, 2, 3, null, 3), 3);

    it will not work in Flink since it is not Flink syntax

@liuyongvs
Copy link
Contributor Author

liuyongvs commented Mar 10, 2023

@snuyanzin
1.it is here after talking in #21947, which you reviewed before.
2. is it possible to invoke array_remove for one arg -> no possible. sorry. it is my misspelling and i have fixed it 。
3. it is also misspelling, array[1, 2, 3, null, 3], i have fixed it

@liuyongvs
Copy link
Contributor Author

hi @snuyanzin could you have time to help review it

Comment on lines 77 to 81
if ((element == null && needle != null)
|| (element != null && needle == null)
|| (element != null
&& needle != null
&& !(boolean) equalityHandle.invoke(element, needle))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified e.g. check for the negative case and continue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have done @snuyanzin

Comment on lines 256 to 259
.testResult(
$("f3").arrayRemove(row(true, LocalDate.of(1990, 10, 14))),
"ARRAY_REMOVE(f3, (TRUE, DATE '1990-10-14'))",
new Row[] {Row.of(true, LocalDate.of(2022, 4, 20)), null},
DataTypes.ARRAY(
DataTypes.ROW(
DataTypes.BOOLEAN(), DataTypes.DATE()))
.nullable())
.testResult(
$("f3").arrayRemove(row(false, LocalDate.of(1990, 10, 14))),
"ARRAY_REMOVE(f3, (FALSE, DATE '1990-10-14'))",
new Row[] {
Row.of(true, LocalDate.of(2022, 4, 20)),
Row.of(true, LocalDate.of(1990, 10, 14)),
null
},
DataTypes.ARRAY(
DataTypes.ROW(
DataTypes.BOOLEAN(), DataTypes.DATE()))
.nullable())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the hood there are a lot of Flink jobs executed.
From one side testing is important but tests should be well chosen.
It seems e.g. one of this could be removed since they are very similar.
I guess up to 4-5 test cases should be enough.
WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with you @snuyanzin
These examples cover a variety of cases, and the rest of the functions don't have to cover as many cases as possible, but some of the early functions still need to cover as many cases as possible, so that users can see how to use them and find bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have done @snuyanzin

@liuyongvs
Copy link
Contributor Author

@flinkbot run azure

@liuyongvs
Copy link
Contributor Author

hi @snuyanzin could you help review it again?

@liuyongvs
Copy link
Contributor Author

@snuyanzin rebase to fix the conflicts and reviews from you.

@liuyongvs liuyongvs requested review from snuyanzin and removed request for snuyanzin March 17, 2023 02:13
@liuyongvs liuyongvs force-pushed the array_remove branch 2 times, most recently from 7f7c7ba to d996a53 Compare March 17, 2023 08:14
@liuyongvs
Copy link
Contributor Author

hi @snuyanzin could you help review it again?

$("f0").arrayRemove(
lit(null, DataTypes.SMALLINT())
.cast(DataTypes.INT())),
"ARRAY_REMOVE(f0, cast(NULL AS INT))",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ARRAY_REMOVE(f0, cast(NULL AS INT))",
"ARRAY_REMOVE(f0, CAST(NULL AS INT))",

All SQL keywords should be in uppercase

$("f2").arrayRemove(
lit(null, DataTypes.STRING())
.cast(DataTypes.STRING())),
"ARRAY_REMOVE(f2, cast(NULL AS VARCHAR))",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ARRAY_REMOVE(f2, cast(NULL AS VARCHAR))",
"ARRAY_REMOVE(f2, CAST(NULL AS VARCHAR))",

All SQL keywords should be in uppercase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed, thanks

.nullable())
.testResult(
$("f3").arrayRemove(null),
"ARRAY_REMOVE(f3, null)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ARRAY_REMOVE(f3, null)",
"ARRAY_REMOVE(f3, NULL)",

// ARRAY<ARRAY<INT>>
.testResult(
$("f5").arrayRemove(new Integer[] {0}),
"ARRAY_REMOVE(f5, array[0])",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ARRAY_REMOVE(f5, array[0])",
"ARRAY_REMOVE(f5, ARRAY[0])",

DataTypes.ROW(
DataTypes.BOOLEAN(), DataTypes.DATE()))
.nullable())
// ARRAY<INT> with null elements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ARRAY<INT> with null elements
// ARRAY<INT> with NULL elements

"ARRAY_REMOVE(f5, array[0])",
new Integer[][] {new Integer[] {1, null, 3}, new Integer[] {1}},
DataTypes.ARRAY(DataTypes.ARRAY(DataTypes.INT()).nullable()))
// ARRAY<Map<INT, STRING>> with null elements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ARRAY<Map<INT, STRING>> with null elements
// ARRAY<MAP<INT, STRING>> with null elements

Comment on lines 243 to 252
.testResult(
$("f2").arrayRemove("Hello"),
"ARRAY_REMOVE(f2, 'Hello')",
new String[] {"World"},
DataTypes.ARRAY(DataTypes.STRING()).notNull())
.testResult(
$("f2").arrayRemove(
lit(null, DataTypes.STRING())
.cast(DataTypes.STRING())),
"ARRAY_REMOVE(f2, cast(NULL AS VARCHAR))",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not see a big difference between these tests and tests for int types... ARRAY_REMOVE invokes same code for both.... Probably just one (int or varchar is enough)

@liuyongvs
Copy link
Contributor Author

hi @snuyanzin fix your reviews

Copy link
Contributor

@snuyanzin snuyanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing comments
LGTM

will merge soon

@liuyongvs
Copy link
Contributor Author

hi @snuyanzin , will it be merged now?

@snuyanzin snuyanzin merged commit f3c653e into apache:master Mar 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants