Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Clarify interaction between the CDeviceArray, the CArrayView, and the CArray #409

Merged
merged 19 commits into from
Apr 9, 2024

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Mar 22, 2024

When device support was first added, the CArrayView was device-aware but the CArray was not. This worked well until it was clear that __arrow_c_array__ needed to error if it did not represent a CPU array (and the CArray had no way to check). Now, the CArray has a device_type and device_id. A nice side-effect of this is that we get back the view() method (whose removal @jorisvandenbossche had lamented!).

This also implements the device array protocol to help test apache/arrow#40717 . This protocol isn't finalized yet and I could remove that part until it is (although it doesn't seem likely to change).

The non-cpu case is still hard to test without real-world CUDA support...this PR is just trying to get the right information in the right place as early as possible.

import nanoarrow as na

array = na.c_array([1, 2, 3], na.int32())
array.device_type, array.device_id
#> (1, 0)

@paleolimbot paleolimbot marked this pull request as ready for review March 25, 2024 19:35
Copy link
Member

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

python/src/nanoarrow/_lib.pyx Outdated Show resolved Hide resolved
python/src/nanoarrow/_lib.pyx Outdated Show resolved Hide resolved
python/src/nanoarrow/_lib.pyx Outdated Show resolved Hide resolved
python/src/nanoarrow/_lib.pyx Outdated Show resolved Hide resolved
@@ -427,7 +427,7 @@ def c_array_view(obj, schema=None) -> CArrayView:
if isinstance(obj, CArrayView) and schema is None:
return obj

return CArrayView.from_array(c_array(obj, schema))
return c_array(obj, schema).view()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

Comment on lines 38 to 39
assert darray.device_type == 1
assert darray.device_id == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, are there enums that can be used here instead of values 1, 0? Or do we need to wait for DLPack support.

assert darray.device_type == 1
assert darray.device_id == 0
assert darray.array.length == 3
assert "device_type: 1" in repr(darray)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are enums, it would be nice to print the name (e.g. device_type: 1 (CPU))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bit of a rabbit hole but a very good rabbit hole. There's no enum, but there is an ABI-stable set of defines that I turned into one and the result is much better!

Comment on lines +1070 to +1072
cdef _set_device(self, ArrowDeviceType device_type, int64_t device_id):
self._device_type = device_type
self._device_id = device_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called frequently after initialization. Is it worth allowing __cinit__ to set device type/id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to move away from any arguments in __cinit__ in most cases because a user could theoretically call nanoarrow.CArray(...) and get very strange errors. They should really all be ClassName._construct() or something (but maybe in a future PR).

@paleolimbot
Copy link
Member Author

Apologies for the two additional changes, but:

  • After adding the DeviceType, it was pretty clear that the CDevice was never going to get a Device wrapper in not-Cython. Interacting with the device is something that mostly happens in the classes that live in Cython, and the only thing that anybody needs to know about it otherwise is to print it or know if it's the CPU or not. For the other classes there's some payoff to wrapping them in Python (better IDE documentation + completion, iteration time), but the Device doesn't benefit from any of those.
  • The discussion in [Format][C++] Recommended/required value for ArrowDeviceArray.device_id int in case of CPU data arrow#40801 suggested that the device_id for the CPU device should be -1 instead of 0.

@paleolimbot paleolimbot merged commit bab66ac into apache:main Apr 9, 2024
9 checks passed
@paleolimbot paleolimbot deleted the python-device-io branch April 9, 2024 19:19
@paleolimbot paleolimbot added this to the nanoarrow 0.5.0 milestone May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants