Skip to content

Conversation

liuyongvs
Copy link
Contributor

@liuyongvs liuyongvs commented Feb 16, 2023

  • What is the purpose of the change
    This is an implementation of ARRAY_SIZE/ARRAY_REMOVE

  • Brief change log
    ARRAY_SIZE for Table API and SQL

  • Verifying this change
    This change added tests in CollectionFunctionsITCase

  • Does this pull request potentially affect one of the following parts:
    Dependencies (does it add or upgrade a dependency): ( no)
    The public API, i.e., is any changed class annotated with @public(Evolving): (yes )
    The serializers: (no)
    The runtime per-record code paths (performance sensitive): ( no)
    Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: ( no)
    The S3 file system connector: ( no)

  • Documentation
    Does this pull request introduce a new feature? (yes)
    If yes, how is the feature documented? (docs)

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 16, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@liuyongvs
Copy link
Contributor Author

@snuyanzin could you help review it?

@snuyanzin
Copy link
Contributor

snuyanzin commented Feb 16, 2023

Thank you for your contribution.
To be honest I'm a bit skeptic about ARRAY_SIZE, since this functionality already exists in Flink like CARDINALITY.
Also please see at my comment in corresponding jira issue.

DataTypes.ROW(
DataTypes.BOOLEAN(), DataTypes.DATE()))
.nullable())
// ARRAY<INT> with null elements
Copy link
Contributor

@snuyanzin snuyanzin Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add tests for cases with not null elements and removal of null.
E.g.

select array_remove(array[1, 2], cast(null as int));

i would expect same array as output, or how do other systems behave?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have added ARRAY_REMOVE(f0, cast(NULL AS INT))

- sql: ARRAY_SIZE(haystack)
table: haystack.arraySize()
description: Returns the size of an array. If the array itself is null, the function will return null.
- sql: ARRAY_REMOVE(haystack)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't get ...
shouldn't ARRAY_REMOVE have 2 args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have done

Expression.array_contains
Expression.array_distinct
Expression.array_size
Expression.array_remove
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend keeping abc order (at least for array_* expressions) here since it simplifies search/navigation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have done

sequence(
Arrays.asList("haystack", "needle"),
Arrays.asList(
logical(LogicalTypeRoot.ARRAY), ARRAY_ELEMENT_ARG)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARRAY_ELEMENT_ARG not sure about this since array could consists of non null elements and a query could try to remove null e.g.

SELECT array_remove(array['abc', 'klm', 'xyz'], cast(null as varchar));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i test remove null, why should do cast(null as varchar)
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you test remove null from array with nullable types
I'm asking to test remove of null from array with NON nullable types
that's the difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have added test ARRAY_REMOVE(f2, cast(NULL AS VARCHAR))

Copy link
Contributor

@snuyanzin snuyanzin Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't go into details.
However there is a simple test to check if it's working or not: build flink
start flink (standalone is enough)
run sql-client
submit a query

select array_remove(array[1, 2, 3], cast(null as int));

Expected result is array[1, 2, 3]
Actual result: it fails with exception

same query e.g. in postgres is working fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i test SELECT array_contains(array[1, 2, 3], cast(null as int)) it also fails with exception.
the reason is ARRAY_ELEMENT_ARG make the second arg must be same with array element type
Caused by: org.apache.flink.table.api.ValidationException: Invalid function call:
array_remove(ARRAY NOT NULL, INT).

so select array_remove(array[1, 2, 3, null], cast(null as int));
return array[1, 2, 3].

do we should also fix the array_contains.

Copy link
Contributor

@snuyanzin snuyanzin Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, looks like should be fixed as well, however could be done in a separate pr

DataTypes.BOOLEAN(), DataTypes.DATE()))
.nullable())
// ARRAY<INT> with null elements
.testResult(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add more complex cases like
removal from array of arrays or array of maps or array of rows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have added array<array> and array<map<int ,string>> unit test.

@liuyongvs
Copy link
Contributor Author

hi @snuyanzin please review it again, i fix all your review infos

@liuyongvs liuyongvs requested a review from snuyanzin February 17, 2023 04:54
@liuyongvs
Copy link
Contributor Author

hi @snuyanzin have changed, i reorganized the commits and make it two commit

Comment on lines +623 to +625
- sql: ARRAY_SIZE(haystack)
table: haystack.arraySize()
description: Returns the size of an array. If the array itself is null, the function will return null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why ARRAY_SIZE is required if there is already CARDINALITY which does same thing.
There is a discussion in jira about that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To align the other engines like spark,snowflake, so that users can migrate easily

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's very arguable since different engines support different syntax
BigQuery has ARRAY_LENGTH
Hive has SIZE
Vertica has ARRAY_LENGTH
Postgresql ARRAY_LENGTH

why to support only snowflake and spark?
Support of everything seems to be an overkill...
I guess somewhere it should be stopped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your details response, i will drop the array_size commit.

@liuyongvs liuyongvs requested a review from snuyanzin February 17, 2023 16:34
@liuyongvs
Copy link
Contributor Author

hi @snuyanzin please review it again

@liuyongvs
Copy link
Contributor Author

hi @snuyanzin do you have time to review it again?

@snuyanzin
Copy link
Contributor

Closing the issue since #21947 is closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants