Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-32257][table] Add built-in ARRAY_MAX function. #22909

Conversation

hanyuzheng7
Copy link
Contributor

@hanyuzheng7 hanyuzheng7 commented Jun 29, 2023

What is the purpose of the change

This is an implementation of ARRAY_MAX

Brief change log

The array_max() function get the maximum element from input array.
The result matches the type of the elements. NULL elements are skipped. If array is empty, or contains only NULL elements, NULL is returned.

Arguments

array: Any ARRAY with elements for which order is supported.

Syntax
array_max(array)

Returns
The result matches the type of the elements. NULL elements are skipped. If array is empty, or contains only NULL elements, NULL is returned.

Examples

SQL

> SELECT array_max(array(1, 20, NULL, 3)); 20

Fink SQL-> select array_max(array[1, 20, null, 3]) 20

See also
spark https://spark.apache.org/docs/latest/api/sql/index.html#array_max

presto https://prestodb.io/docs/current/functions/array.html

Verifying this change

This change added tests in CollectionFunctionsITCase.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@hanyuzheng7 hanyuzheng7 changed the title Flink 32257] table add array max function [Flink 32257] table add array max function Jun 29, 2023
@flinkbot
Copy link
Collaborator

flinkbot commented Jun 29, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

This is an implementation of ARRAY_MAX

The array_max() function concatenates get the maximum element from input array.

The result matches the type of the elements. NULL elements are skipped. If array is empty, or contains only NULL elements, NULL is returned.

Syntax

array_max(array)

Arguments

array: Any ARRAY with elements for which order is supported.

Returns

The result matches the type of the elements. NULL elements are skipped. If array is empty, or contains only NULL elements, NULL is returned.

Examples

SQL

> SELECT array_max(array(1, 20, NULL, 3)); 20

// Fink SQL-> select array_max(array[1, 20, null, 3])
20

See also
spark https://spark.apache.org/docs/latest/api/sql/index.html#array_max

presto https://prestodb.io/docs/current/functions/array.html
@hanyuzheng7 hanyuzheng7 force-pushed the FLINK-32257]-table-Add-ARRAY_MAX-function branch from 650fac3 to 35ec09a Compare June 29, 2023 18:46
@hanyuzheng7
Copy link
Contributor Author

hanyuzheng7 commented Jun 29, 2023

Screenshot 2023-06-29 at 11 03 48 AM ArrayElementOutputTypeStrategyTest Screenshot 2023-06-29 at 1 40 34 PM ArrayComparableElementTypeStrategyTest Screenshot 2023-06-29 at 1 41 40 PM CollectionFunctionsITCase

@hanyuzheng7
Copy link
Contributor Author

@dawidwys This is array_max new PR. Fixed ArrayElementOutputTypeStrategyTest and ready to merge.

@hanyuzheng7 hanyuzheng7 changed the title [Flink 32257] table add array max function [FLINK-32257][table] Add built-in ARRAY_MAX function. Jun 30, 2023
@dawidwys dawidwys closed this in 82776bf Jun 30, 2023
@liuyongvs
Copy link
Contributor

hi @dawidwys @hanyuzheng7 sorry for late review, the pr looks good. but it also exists bug.
i supports these spark collection function recently. i am familiar with it.
because do not have people help review , i supports these in calcite https://github.com/apache/calcite/commits?author=liuyongvs

the return type is not right. it should always return nullable.

because we don't know the value whether is null in compile procedure, only can know in the runtime.
for example:
ddl :

CREATE TABLE data_source (
  a array<int not null> not null
) WITH (
  'connector'='xxx',
);

// the element is array(), that is to say, empty array. the result is null.
// if the return type is int not null, how to save null value.
select array_max(a) from data_source;

spark 
case class ArrayMax(child: Expression)
  extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {

  override def nullable: Boolean = true
  @transient override lazy val dataType: DataType = child.dataType match {
    case ArrayType(dt, _) => dt
    case _ => throw new IllegalStateException(s"$prettyName accepts only arrays.")
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants