Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-12677: [Python] Add a mask argument to pyarrow.StructArray.from_arrays #10272

Conversation

westonpace
Copy link
Member

@westonpace westonpace commented May 7, 2021

This allows the user to supply an optional mask when creating a struct array.

  • The mask requirements are pretty strict (must be a boolean arrow array without nulls) compared with some of the other functions (e.g. array.mask accepts a wide variety of inputs). I think this should be ok since this use case is probably rarer and there are other plenty of existing ways to convert other datatypes to an arrow array.
  • Unfortunately, StructArray::Make interprets the "null buffer" as more of a validity buffer (1 = valid, 0 = null). This is the opposite of everywhere else a mask is used. I was torn between inverting the input buffer to mimic the python API and passing through directly to the C interface for simplicity. I chose the simpler option but could be convinced otherwise. Per request, I now invert the mask to align with the python API.

@github-actions
Copy link

github-actions bot commented May 7, 2021

@jorisvandenbossche
Copy link
Member

If we add such a mask keyword, I would say that it should match the mask keyword of pa.array(..) (and thus being an inverted mask, and not the validity mask as used internally).

That means always some conversion of the data is needed (to invert the mask), and you cannot create a StructArray 100% cheaply from existing arrays with from_arrays, but for such a case you can still use from_buffers if you want.

@westonpace westonpace force-pushed the feature/ARROW-12677--python-add-a-mask-argument-to-pyarrow-structarra branch from b60b98f to 381821b Compare May 13, 2021 23:48
@westonpace
Copy link
Member Author

I've added the call to invert. I went ahead and added a memory_pool parameter per @&res suggestion on the JIRA. I also verified that we can create null elements in a ListArray and added a test for ListArray.from_arrays since there was none.

I believe I have addressed all PR comments and, pending CI, this is ready for review.

@westonpace westonpace force-pushed the feature/ARROW-12677--python-add-a-mask-argument-to-pyarrow-structarra branch from 381821b to a376647 Compare May 14, 2021 00:14
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Left a few nits about typos/exception messages.

if mask is None:
c_mask = shared_ptr[CBuffer]()
elif isinstance(mask, Array):
if mask.type != bool_():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe pa.types.is_boolean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out how to access is_boolean from the pxi file but I changed it to mask.type.id != Type_BOOL which is similar to the way other type comparisons are done in this file.

python/pyarrow/array.pxi Outdated Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
python/pyarrow/tests/test_array.py Show resolved Hide resolved
westonpace and others added 2 commits May 14, 2021 08:29
Co-authored-by: David Li <li.davidm96@gmail.com>
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the Python linter is unhappy, but otherwise LGTM.

@lidavidm lidavidm closed this in f47703e May 14, 2021
@@ -2153,7 +2186,8 @@ cdef class StructArray(Array):
return [pyarrow_wrap_array(arr) for arr in arrays]

@staticmethod
def from_arrays(arrays, names=None, fields=None):
def from_arrays(arrays, names=None, fields=None, mask=None,
memory_pool=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we are also inconsistent in the naming of this keyword, the ListArray.from_arrays above uses a pool keyword (but memory_pool is used more often, so this change is fine, will open a JIRA to make this consistent)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
…_arrays

This allows the user to supply an optional `mask` when creating a struct array.

 * The mask requirements are pretty strict (must be a boolean arrow array without nulls) compared with some of the other functions (e.g. `array.mask` accepts a wide variety of inputs).  I think this should be ok since this use case is probably rarer and there are other plenty of existing ways to convert other datatypes to an arrow array.
 * ~~Unfortunately, StructArray::Make interprets the "null buffer" as more of a validity buffer (1 = valid, 0 = null).  This is the opposite of everywhere else a `mask` is used.  I was torn between inverting the input buffer to mimic the python API and passing through directly to the C interface for simplicity.  I chose the simpler option but could be convinced otherwise.~~ Per request, I now invert the mask to align with the python API.

Closes apache#10272 from westonpace/feature/ARROW-12677--python-add-a-mask-argument-to-pyarrow-structarra

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@westonpace westonpace deleted the feature/ARROW-12677--python-add-a-mask-argument-to-pyarrow-structarra branch January 6, 2022 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants