Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StructArray::from panics when passed empty vector #1657

Closed
bjchambers opened this issue May 5, 2022 · 6 comments
Closed

StructArray::from panics when passed empty vector #1657

bjchambers opened this issue May 5, 2022 · 6 comments
Labels

Comments

@bjchambers
Copy link
Contributor

Describe the bug
StructArray::from(vec![]) causes a panic because it tries to index into the list of arrays to check their lengths here https://github.com/apache/arrow-rs/blob/master/arrow/src/array/array_struct.rs#L212.

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@bjchambers bjchambers added the bug label May 5, 2022
@bjchambers
Copy link
Contributor Author

I guess perhaps this is reasonable, since without their being at least one array it would be impossible to know the length of the struct array to create.

@tustvold
Copy link
Contributor

tustvold commented May 6, 2022

I think having it just create a 0 length array would be sensible and avoid surprise panics. There are niche use-cases where a non-zero row count may be desirable, see #1536, but I think we can deal with that as needed?

@bjchambers
Copy link
Contributor Author

I'm also seeing this with new_null_array(&DataType::Struct(vec![]), N). I wonder if the best path would be the following:

  1. Introduce a method on StructArray fn new(data_type: &DataType, length: usize, fields: Vec<(Field, ArrayRef)>) -> Self which just checks that all the field arrays have the given length. This could also have a variant that takes the null Buffer.
  2. The existing StructArray::from(...) methods use that based off the length of the first field (throwing a reasonable error when called with 0-fields.
  3. The new_null_array can then use that method rather to ensure that it works even on zero length arrays.

Then users who have to handle the non-zero length array of 0 fields would be able to call StructArray::new(...) explicitly.

@tustvold
Copy link
Contributor

tustvold commented May 6, 2022

I think you could get away with a simpler signature, such as

StructArray::new_with_len(fields: Vec<(Field, ArrayRef)>, len: usize) -> Self

But yes that makes sense to me, I hadn't thought of the null use-case

@bjchambers
Copy link
Contributor Author

I think the issue with just creating it of length 0 is that if you then go to put that into a record batch or enclosing struct array of a given length you'll have problems because the lengths don't line up. So, it seems like general idea of having a constructor from an explicit length makes sense, and then the various from methods can use that.

@tustvold
Copy link
Contributor

tustvold commented Jun 1, 2023

I think this is closed by #4064

@tustvold tustvold closed this as completed Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants