Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: array_concat with arrays with different dimensions, add _list* aliases for _array* functions #7008

Merged
merged 5 commits into from
Jul 20, 2023
Merged

fix: array_concat with arrays with different dimensions, add _list* aliases for _array* functions #7008

merged 5 commits into from
Jul 20, 2023

Conversation

izveigor
Copy link
Contributor

@izveigor izveigor commented Jul 18, 2023

Which issue does this PR close?

Closes #6992
Closes #7028
Follow on to #6879

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

@izveigor izveigor marked this pull request as draft July 18, 2023 09:20
@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jul 18, 2023
@izveigor izveigor marked this pull request as ready for review July 18, 2023 12:59
@@ -149,7 +149,7 @@ select column1, column2, column3 from arrays_values_without_nulls;
### Array function tests


## make_array
## make_array (aliases: `make_list`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have the alias make_list, may it be confused with creating ListArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayzhan211 I have identified several reasons:

  1. The idea was taken from popular database: DuckDB. So, in my opinion, users of this database can more easily adapt to our array functions. (Documentation: https://duckdb.org/docs/sql/functions/nested).
  2. array and list prefix or postfix can separate mutable or immutable objects. (See discussion: The concept of practical implementation of the array #6855 (About aliases)).
  3. Some people prefer to use one prefix/postfix over another.

@@ -430,7 +430,22 @@ impl BuiltinScalarFunction {
)
}

/// Returns the output [`DataType` of this function
/// Returns the dimension [`DataType`] of [`DataType::List`]
fn return_dimension(self, input_expr_type: DataType) -> u64 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main codebase, other changes are documentation improvements and new aliases.

for input_expr_type in input_expr_types {
match input_expr_type {
List(field) => {
if !field.data_type().equals_datatype(&Null) {
expr_type = field.data_type().clone();
break;
let dims = self.return_dimension(input_expr_type.clone());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main codebase, other changes are documentation improvements and new aliases.

@izveigor
Copy link
Contributor Author

@alamb @jayzhan211 I wonder if you have free time to review the PR.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done the review

@@ -464,12 +479,16 @@ impl BuiltinScalarFunction {
},
BuiltinScalarFunction::ArrayConcat => {
let mut expr_type = Null;
let mut max_dims = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the init value is 1 and all the lists are 1D, we will get null for expr_type, right? But why doesn't the 1D concat test fails?

init value for expr_type = null, and max_dims = 0, with updating expr_type with input_expr_type instead of field since much more straightforward to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented your idea 👌.

@@ -464,12 +479,16 @@ impl BuiltinScalarFunction {
},
BuiltinScalarFunction::ArrayConcat => {
let mut expr_type = Null;
let mut max_dims = 1;
for input_expr_type in input_expr_types {
match input_expr_type {
List(field) => {
if !field.data_type().equals_datatype(&Null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get a nested list array with a null type? i.e. List(List(Null))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! There is a bug #7028. Thanks, @jayzhan211!

@github-actions github-actions bot added the physical-expr Physical Expressions label Jul 19, 2023
@alamb
Copy link
Contributor

alamb commented Jul 19, 2023

I will plan to review this PR tomorrow -- thank you @izveigor

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alamb alamb changed the title fix: array_concat with arrays with different dimensions fix: array_concat with arrays with different dimensions, add _list* aliases for _array* functions Jul 20, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me @izveigor and @jayzhan211 -- thank you very much. I have some suggestions for comments but I can make a follow on PR if you prefer.

Thanks again

@@ -430,7 +430,22 @@ impl BuiltinScalarFunction {
)
}

/// Returns the output [`DataType` of this function
/// Returns the dimension [`DataType`] of [`DataType::List`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns the dimension [`DataType`] of [`DataType::List`]
/// Returns the dimension [`DataType`] of [`DataType::List`].
///
/// Dimension is defined as the deepest level of nesting.
/// * `Int64` has dimension 1
/// * `List(Int64)` has dimension 2
/// * `List(List(Int64))` has dimension 3
/// * etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed PR to add this documentation: #7045

Ok(compute_array_ndims_with_datatype(arr)?.0)
}

/// Returns the dimension and lower datatype of the array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns the dimension and lower datatype of the array
/// Returns the dimension and the datatype of elements of the array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
3 participants