Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Can't create a non empty StructArray with no field using StructArray.from_array #15109

Closed
0x26res opened this issue Dec 29, 2022 · 8 comments · Fixed by #33764
Closed

Comments

@0x26res
Copy link
Contributor

0x26res commented Dec 29, 2022

Describe the bug, including details regarding any error messages, version, and platform.

I want to create a StructArray with several rows but no fields/columns:

    array = pa.StructArray.from_arrays(arrays=[], names=[], mask=pa.array([True, True]))
    assert array.type == pa.struct([])
    assert len(array) == 0 # This is wrong

As commented in the code len(array) is wrong and should be 2 (according to the provided mask).

I found a work around but it's not ideal:

    array = pa.StructArray.from_arrays(
        arrays=[pa.nulls(2, pa.null())], names=["DELETE"], mask=pa.array([True, True])
    )
    assert len(array) == 2
    empty_array = array.cast(pa.struct([]))
    assert len(empty_array) == 2
    assert empty_array.type == pa.struct([])

Component(s)

Python

@jorisvandenbossche
Copy link
Member

We currently have a special case for this in StructArray.from_arrays that causes this:

arrow/python/pyarrow/array.pxi

Lines 2785 to 2788 in 359f28b

if (c_arrays.size() == 0 and c_names.size() == 0 and
c_fields.size() == 0):
# The C++ side doesn't allow this
return array([], struct([]))

@jorisvandenbossche jorisvandenbossche changed the title Can't create a non empty StructArray with no field using StructArray.from_array [Python] Can't create a non empty StructArray with no field using StructArray.from_array Jan 18, 2023
@0x26res
Copy link
Contributor Author

0x26res commented Jan 18, 2023

@jorisvandenbossche should I try to handle the mask argument in that branch of the code?

@jorisvandenbossche
Copy link
Member

It might be worth first checking if the C++ code can handle this nowadays, otherwise handling it there sounds good (maybe a pa.array([[]]*len(mask), pa.struct([])) might be sufficient)

@jorisvandenbossche
Copy link
Member

Nope, the C++ method handle this yet:

Result<std::shared_ptr<StructArray>> StructArray::Make(
const std::vector<std::shared_ptr<Array>>& children,
const std::vector<std::shared_ptr<Field>>& fields,
std::shared_ptr<Buffer> null_bitmap, int64_t null_count, int64_t offset) {
if (children.size() != fields.size()) {
return Status::Invalid("Mismatching number of fields and child arrays");
}
int64_t length = 0;
if (children.size() == 0) {
return Status::Invalid("Can't infer struct array length with 0 child arrays");
}

@0x26res
Copy link
Contributor Author

0x26res commented Jan 18, 2023

Should we change it to:

   if (children.size() == 0 && null_bitmap == nullptr) { 
     return Status::Invalid("Can't infer struct array length with 0 child arrays"); 
   } 

And probably change the code below to infer the size from the null_bitmap in that specific case?

@jorisvandenbossche
Copy link
Member

That sounds as an option as well

@0x26res
Copy link
Contributor Author

0x26res commented Jan 18, 2023

Hmm, actually in C++ the null_bitmap is in the shape of a buffer: std::shared_ptr<Buffer> null_bitmap. Is it possible to infer the size of of the mask from a Buffer?

@jorisvandenbossche
Copy link
Member

Ah, no, a Buffer has a size, but this is the size in bytes, which in case of a null bitmap doesn't give you the length of the array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants