-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: array_concat
with arrays with different dimensions, add _list*
aliases for _array*
functions
#7008
Conversation
@@ -149,7 +149,7 @@ select column1, column2, column3 from arrays_values_without_nulls; | |||
### Array function tests | |||
|
|||
|
|||
## make_array | |||
## make_array (aliases: `make_list`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to have the alias make_list
, may it be confused with creating ListArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayzhan211 I have identified several reasons:
- The idea was taken from popular database: DuckDB. So, in my opinion, users of this database can more easily adapt to our array functions. (Documentation: https://duckdb.org/docs/sql/functions/nested).
array
andlist
prefix or postfix can separate mutable or immutable objects. (See discussion: The concept of practical implementation of the array #6855 (About aliases)).- Some people prefer to use one prefix/postfix over another.
@@ -430,7 +430,22 @@ impl BuiltinScalarFunction { | |||
) | |||
} | |||
|
|||
/// Returns the output [`DataType` of this function | |||
/// Returns the dimension [`DataType`] of [`DataType::List`] | |||
fn return_dimension(self, input_expr_type: DataType) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main codebase, other changes are documentation improvements and new aliases.
for input_expr_type in input_expr_types { | ||
match input_expr_type { | ||
List(field) => { | ||
if !field.data_type().equals_datatype(&Null) { | ||
expr_type = field.data_type().clone(); | ||
break; | ||
let dims = self.return_dimension(input_expr_type.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main codebase, other changes are documentation improvements and new aliases.
@alamb @jayzhan211 I wonder if you have free time to review the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done the review
@@ -464,12 +479,16 @@ impl BuiltinScalarFunction { | |||
}, | |||
BuiltinScalarFunction::ArrayConcat => { | |||
let mut expr_type = Null; | |||
let mut max_dims = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the init value is 1 and all the lists are 1D, we will get null for expr_type, right? But why doesn't the 1D concat test fails?
init value for expr_type = null, and max_dims = 0, with updating expr_type with input_expr_type instead of field since much more straightforward to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented your idea 👌.
@@ -464,12 +479,16 @@ impl BuiltinScalarFunction { | |||
}, | |||
BuiltinScalarFunction::ArrayConcat => { | |||
let mut expr_type = Null; | |||
let mut max_dims = 1; | |||
for input_expr_type in input_expr_types { | |||
match input_expr_type { | |||
List(field) => { | |||
if !field.data_type().equals_datatype(&Null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to get a nested list array with a null type? i.e. List(List(Null))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! There is a bug #7028. Thanks, @jayzhan211!
I will plan to review this PR tomorrow -- thank you @izveigor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
array_concat
with arrays with different dimensionsarray_concat
with arrays with different dimensions, add _list*
aliases for _array*
functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me @izveigor and @jayzhan211 -- thank you very much. I have some suggestions for comments but I can make a follow on PR if you prefer.
Thanks again
@@ -430,7 +430,22 @@ impl BuiltinScalarFunction { | |||
) | |||
} | |||
|
|||
/// Returns the output [`DataType` of this function | |||
/// Returns the dimension [`DataType`] of [`DataType::List`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Returns the dimension [`DataType`] of [`DataType::List`] | |
/// Returns the dimension [`DataType`] of [`DataType::List`]. | |
/// | |
/// Dimension is defined as the deepest level of nesting. | |
/// * `Int64` has dimension 1 | |
/// * `List(Int64)` has dimension 2 | |
/// * `List(List(Int64))` has dimension 3 | |
/// * etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed PR to add this documentation: #7045
Ok(compute_array_ndims_with_datatype(arr)?.0) | ||
} | ||
|
||
/// Returns the dimension and lower datatype of the array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Returns the dimension and lower datatype of the array | |
/// Returns the dimension and the datatype of elements of the array |
Which issue does this PR close?
Closes #6992
Closes #7028
Follow on to #6879
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
Yes