New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-12677: [Python] Add a mask argument to pyarrow.StructArray.from_arrays #10272
ARROW-12677: [Python] Add a mask argument to pyarrow.StructArray.from_arrays #10272
Conversation
If we add such a That means always some conversion of the data is needed (to invert the mask), and you cannot create a StructArray 100% cheaply from existing arrays with |
b60b98f
to
381821b
Compare
I've added the call to invert. I went ahead and added a I believe I have addressed all PR comments and, pending CI, this is ready for review. |
…ry_pool since we will invert the mask (requires an allocation) if specified. Added tests for list arrays
381821b
to
a376647
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Left a few nits about typos/exception messages.
python/pyarrow/array.pxi
Outdated
if mask is None: | ||
c_mask = shared_ptr[CBuffer]() | ||
elif isinstance(mask, Array): | ||
if mask.type != bool_(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe pa.types.is_boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out how to access is_boolean
from the pxi
file but I changed it to mask.type.id != Type_BOOL
which is similar to the way other type comparisons are done in this file.
Co-authored-by: David Li <li.davidm96@gmail.com>
…mparisons in the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the Python linter is unhappy, but otherwise LGTM.
@@ -2153,7 +2186,8 @@ cdef class StructArray(Array): | |||
return [pyarrow_wrap_array(arr) for arr in arrays] | |||
|
|||
@staticmethod | |||
def from_arrays(arrays, names=None, fields=None): | |||
def from_arrays(arrays, names=None, fields=None, mask=None, | |||
memory_pool=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we are also inconsistent in the naming of this keyword, the ListArray.from_arrays
above uses a pool
keyword (but memory_pool
is used more often, so this change is fine, will open a JIRA to make this consistent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…_arrays This allows the user to supply an optional `mask` when creating a struct array. * The mask requirements are pretty strict (must be a boolean arrow array without nulls) compared with some of the other functions (e.g. `array.mask` accepts a wide variety of inputs). I think this should be ok since this use case is probably rarer and there are other plenty of existing ways to convert other datatypes to an arrow array. * ~~Unfortunately, StructArray::Make interprets the "null buffer" as more of a validity buffer (1 = valid, 0 = null). This is the opposite of everywhere else a `mask` is used. I was torn between inverting the input buffer to mimic the python API and passing through directly to the C interface for simplicity. I chose the simpler option but could be convinced otherwise.~~ Per request, I now invert the mask to align with the python API. Closes apache#10272 from westonpace/feature/ARROW-12677--python-add-a-mask-argument-to-pyarrow-structarra Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
This allows the user to supply an optional
mask
when creating a struct array.array.mask
accepts a wide variety of inputs). I think this should be ok since this use case is probably rarer and there are other plenty of existing ways to convert other datatypes to an arrow array.Unfortunately, StructArray::Make interprets the "null buffer" as more of a validity buffer (1 = valid, 0 = null). This is the opposite of everywhere else aPer request, I now invert the mask to align with the python API.mask
is used. I was torn between inverting the input buffer to mimic the python API and passing through directly to the C interface for simplicity. I chose the simpler option but could be convinced otherwise.