Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StructBuilder doesn't handle nulls correctly for empty structs #4842

Closed
emcake opened this issue Sep 20, 2023 · 3 comments · Fixed by #4845
Closed

StructBuilder doesn't handle nulls correctly for empty structs #4842

emcake opened this issue Sep 20, 2023 · 3 comments · Fixed by #4845
Labels
arrow Changes to the arrow crate bug

Comments

@emcake
Copy link
Contributor

emcake commented Sep 20, 2023

Describe the bug
StructBuilder allows the creation of empty structs, but doesn't handle nulls correctly when calling finish() to get the array.
To Reproduce
reproducing test:

    #[test]
    fn empty_struct_nulls() {
        let mut builder = StructBuilder::new(Fields::default(), vec![]);

        builder.append_null();
        builder.append_null();
        builder.append_null();

        let array = builder.finish();

        assert_eq!(array.len(), 3);
        assert_eq!(array.null_count(), 3);
    }

Expected behavior
The test passes.

Additional context
I think this is because of this line in StructArray:

https://github.com/apache/arrow-rs/blob/master/arrow-array/src/array/struct_array.rs#L116

which uses unwrap_or_default() to get the length. A potential fix is the unwrap_or with the null array length but I don't know the internals well enough to know if that's a correct fix.

@emcake emcake added the bug label Sep 20, 2023
@tustvold
Copy link
Contributor

The challenge is the null buffer is technically optional. What is your use-case for a StructArray containing no data?

@emcake
Copy link
Contributor Author

emcake commented Sep 21, 2023

In this case I was replicating a protobuf structure, where proto allows empty message definitions:

message Foo {

}

message Bar {
    Foo foo = 1;
}

and I was doing this in a reflection-like context where I was just examining descriptors and creating structs for them. In the end I caught this in my code and replaced it with a differently-typed column.

The challenge is the null buffer is technically optional. What is your use-case for a StructArray containing no data?

Understood. I think it'd also be sensible to either use this as a fallback with another unwrap_or_default layer, or just bin if you have a) no field columns and b) no null buffer. Alternately if empty structs are unsupported then not allowing you to construct a builder with no fields seems sensible.

@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #4845

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants