Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Support mask in FixedSizeListArray.from_arrays #34316

Closed
kylebarron opened this issue Feb 23, 2023 · 5 comments · Fixed by #39396
Closed

[Python] Support mask in FixedSizeListArray.from_arrays #34316

kylebarron opened this issue Feb 23, 2023 · 5 comments · Fixed by #39396

Comments

@kylebarron
Copy link
Contributor

Describe the enhancement requested

In ListArray.from_arrays it's possible to pass in a mask array defining which list elements are null. FixedSizeListArray.from_arrays does not currently have a mask parameter, so it appears impossible to create a fixed size list array where some elements are null?

pa.array([1, 2, 3, 4], mask=[False, True, False, False])
# <pyarrow.lib.Int64Array object at 0x126eb29e0>
# [
#   1,
#   null,
#   3,
#   4
# ]
pa.FixedSizeListArray.from_arrays([1, 2, 3, 4], 2)
# <pyarrow.lib.FixedSizeListArray object at 0x127255900>
# [
#   [
#     1,
#     2
#   ],
#   [
#     3,
#     4
#   ]
# ]
pa.FixedSizeListArray.from_arrays([1, 2, 3, 4], 2, mask=[False, True])
# TypeError                                 Traceback (most recent call last)
# /Users/kyle/fused/application/job/job/raster/naip/async_zonal_stats.py in line 1
# ----> 449 pa.FixedSizeListArray.from_arrays([1, 2, 3, 4], 2, mask=[False, True])
#
# File ~/fused/application/job/.venv/lib/python3.11/site-packages/pyarrow/array.pxi:2141, in pyarrow.lib.FixedSizeListArray.from_arrays()
#
# TypeError: from_arrays() got an unexpected keyword argument 'mask'

Component(s)

Python

@rok
Copy link
Member

rok commented Feb 23, 2023

This works:

import pyarrow as pa
array = pa.array([1, 2, 3, 4, 5],
                 mask=pa.array([True, False, True, False, True]))
pa.FixedSizeListArray.from_arrays(array, 1)

I assume you're looking to store multidimensional arrays? These discussions might be relevant: #33925 #8510

@kylebarron
Copy link
Contributor Author

That doesn't appear to work:

In [8]: primitive_array = pa.array([1, 2, 3, 4, 5], mask=pa.array([True, False, True, False, True]))

In [9]: list_arr = pa.FixedSizeListArray.from_arrays(primitive_array, 1)

In [10]: list_arr.is_valid()
Out[10]:
<pyarrow.lib.BooleanArray object at 0x1028a4dc0>
[
  true,
  true,
  true,
  true,
  true
]

In your example the validity is assigned to the underlying primitive array, not the fixed size list array itself. According to https://arrow.apache.org/docs/format/Columnar.html#buffer-listing-for-each-layout the FixedSizeListArray should store its own validity array. I want to ensure FFI compatibility between Arrow implementations, which means it's important to be able to set the fixed size list array's mask correctly.

I'm looking to store geoarrow points, not multidimensional arrays, so those discussions don't appear to be relevant to me

@rok
Copy link
Member

rok commented Feb 24, 2023

Indeed from_arrays doesn't pass the mask to FixedSizedListArray. Did you try using from_buffers and passing the validity bitmap? Some examples here.

@jorisvandenbossche
Copy link
Member

@kylebarron thanks for raising the issue! Similarly as we added this keyword to ListArray.from_arrays recently(#13894), we can also add this to FixedSizeListArray.from_arrays

Short term, as @rok notes, from_buffers might be a reasonable alternative (and actually also ensures the validity buffer doesn't need to be inverted and can be used zero-copy):

>>> arr = pa.array([1, 2, 3, 4])
>>> pa.Array.from_buffers(pa.list_(pa.int64(), 2), 2, [None], children=[arr])
<pyarrow.lib.FixedSizeListArray object at 0x7f6ac7eb7ac0>
[
  [
    1,
    2
  ],
  [
    3,
    4
  ]
]

>>> validity = pa.array([True, False])
>>> pa.Array.from_buffers(pa.list_(pa.int64(), 2), 2, [validity.buffers()[1]], children=[arr])
<pyarrow.lib.FixedSizeListArray object at 0x7f6ac408e380>
[
  [
    1,
    2
  ],
  null
]

@LucasG0
Copy link
Contributor

LucasG0 commented Dec 28, 2023

HI, I am working on this issue.

wjones127 pushed a commit that referenced this issue Jan 3, 2024
…eter (#39396)

### What changes are included in this PR?

Add `mask` / `null_bitmap` parameters in corresponding Cython / C++ `FixedSizeListArray` methods, and propagate this bitmap instead of using the current dummy `validity_buf`.

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes, `mask` parameter has been added to `FixedSizeListArray.from_arrays`
* Closes: #34316

Authored-by: LucasG0 <guillermou.lucas@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
@wjones127 wjones127 added this to the 15.0.0 milestone Jan 3, 2024
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
… parameter (apache#39396)

### What changes are included in this PR?

Add `mask` / `null_bitmap` parameters in corresponding Cython / C++ `FixedSizeListArray` methods, and propagate this bitmap instead of using the current dummy `validity_buf`.

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes, `mask` parameter has been added to `FixedSizeListArray.from_arrays`
* Closes: apache#34316

Authored-by: LucasG0 <guillermou.lucas@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
… parameter (apache#39396)

### What changes are included in this PR?

Add `mask` / `null_bitmap` parameters in corresponding Cython / C++ `FixedSizeListArray` methods, and propagate this bitmap instead of using the current dummy `validity_buf`.

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes, `mask` parameter has been added to `FixedSizeListArray.from_arrays`
* Closes: apache#34316

Authored-by: LucasG0 <guillermou.lucas@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
… parameter (apache#39396)

### What changes are included in this PR?

Add `mask` / `null_bitmap` parameters in corresponding Cython / C++ `FixedSizeListArray` methods, and propagate this bitmap instead of using the current dummy `validity_buf`.

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes, `mask` parameter has been added to `FixedSizeListArray.from_arrays`
* Closes: apache#34316

Authored-by: LucasG0 <guillermou.lucas@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants