Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check list size before concat in ScalarValue #10329

Merged
merged 2 commits into from
May 2, 2024

Conversation

timsaucer
Copy link
Contributor

This PR resolves an issue in which we get a panic attempting to concatenate empty arrays. This prevents lead and lag functions from working on these data types.

Which issue does this PR close?

Closes #10328.

Rationale for this change

When we call WindowAggState::new with a data type of a list, we will call ScalarValue::try_from(out_type)?.to_array_of_size(0) which in turn calls list_to_array_of_size. Within this function there is no check on the return size. It could be zero due to either an empty array input or a request for 0 elements. In either of these conditions, you will get a panic from concat.

What changes are included in this PR?

This is a small change to check the return size. When zero it will just take an empty slice of the original array.

Are these changes tested?

I have included an expansion to an existing unit test to cover this. In my bug report I have also shown minimal example to reproduce the problem and the updated results from it's inclusion.

Are there any user-facing changes?

There are no user facing changes.

…d. If not, do not attempt to call concat on empty arrays.
@alamb
Copy link
Contributor

alamb commented May 1, 2024

Thanks @timsaucer -- I started the CI checks on this PR and hope to check it out later today if no one beats me to it

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented May 2, 2024

Thanks @timsaucer and @jayzhan211

@alamb alamb merged commit 2c0afce into apache:main May 2, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented May 2, 2024

BTW I think this would benefit from some end to end testing as well, so I filed #10345

richox pushed a commit to blaze-init/arrow-datafusion that referenced this pull request May 8, 2024
* Add check in list_to_array_of_size to see if any data will be returned. If not, do not attempt to call concat on empty arrays.

* Formatting
richox pushed a commit to blaze-init/arrow-datafusion that referenced this pull request May 8, 2024
* Add check in list_to_array_of_size to see if any data will be returned. If not, do not attempt to call concat on empty arrays.

* Formatting
richox pushed a commit to kwai/blaze that referenced this pull request May 9, 2024
 revert "split huge batch into smaller ones during projection (#448)", This reverts commit 7d33df1.
 fix issue "Nondeterministic expression xxx should be initialized before eval".
 fix issue "IO error: Bad file descriptor" while accessing spill file.
 fix jvm coredump by removing chrono dependency for printing logs.
@richox richox mentioned this pull request May 9, 2024
lihao712 pushed a commit to kwai/blaze that referenced this pull request May 9, 2024
revert "split huge batch into smaller ones during projection (#448)", This reverts commit 7d33df1.
 fix issue "Nondeterministic expression xxx should be initialized before eval".
 fix issue "IO error: Bad file descriptor" while accessing spill file.
 fix jvm coredump by removing chrono dependency for printing logs.

Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
@timsaucer timsaucer deleted the bug/datatype_to_list_size_0 branch May 14, 2024 12:08
joroKr21 pushed a commit to coralogix/arrow-datafusion that referenced this pull request May 24, 2024
* Add check in list_to_array_of_size to see if any data will be returned. If not, do not attempt to call concat on empty arrays.

* Formatting
joroKr21 added a commit to coralogix/arrow-datafusion that referenced this pull request May 24, 2024
* Add check in list_to_array_of_size to see if any data will be returned. If not, do not attempt to call concat on empty arrays.

* Formatting

Co-authored-by: Tim Saucer <timsaucer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to perform lead/lag built in functions on List and Struct data types
3 participants