Zero-copy Python inputs using `arrow` #224

phil-opp · 2023-03-16T16:00:08Z

Motivation

While we already have support for zero-copy inputs in Rust nodes, we still require copying the data for Python nodes. This can lead to decreased performance when using messages with large data. To avoid these slowdowns, we want to support zero-copy inputs for Python nodes too.

Challenges

The fundamental challenge is that Python normally operates on owned data. So we have to use a special type such as the memoryview object to make the data accessible to Python without copying. What makes things more complex is that we need a custom freeing logic because we need to unmap the shared memory and report a drop event back to the sender node.

Using a non-standard data type makes it more complex to interact with the data. For example, special functions might be needed for reading, cloning, and slicing the data. Also, it is common to convert the data to other types (e.g. numpy arrays), which requires special conversion functions.

Apache Arrow

To keep the Python API simple and easy to use, it's a good idea to use some mature existing data format rather than creating our own custom data format. The Apache Arrow project provides such a data format that supports zero-copy data transfer across language boundaries. It's already quite popular and used in many projects and tools, so it looks like a good candidate for dora.

Arrow vs PyO3

arrow provides two useful features for us
- a stable ABI for sharing data across programming languages without copying
  - Rust -> Python in our case
  - See https://arrow.apache.org/docs/format/CDataInterface.html
- a library with high-level functions to work with the data in Python without copying it (e.g. convert it to a numpy array view)
We could also create our own stable ABI types using PyO3
- This approach is more flexible, e.g. it allows implementing custom methods.
- However, we would need to implement all desired Python functionality and conversions manually.

The Arrow `Array` datatype

Most suitable data type for our use case since dora inputs are always byte arrays right now
- (We might want to support additional arrow types in the future.)
Provides direct access to the data
- Invalidating the data from the outside is not possible. This would be useful for ensuring that old inputs are properly freed.
- We need to rely on the user code to not keep the inputs alive for too long, otherwise the shared memory remains blocked. (We could print some warnings if an input is still not dropped after some timeout.)
Memory management is done using a release callback.
- Called when the consumer does not need the data anymore
- Producer can store additional private data in a private_data pointer field
- See https://arrow.apache.org/docs/format/CDataInterface.html#memory-management for details

Implementation

Accessing the data from Python is possible using pyarrow:

let pa = py.import("pyarrow")?;

let array = pa.getattr("Array")?.call_method1(
    "_import_from_c",
    (array_ptr as Py_uintptr_t, schema_ptr as Py_uintptr_t),
)?;

The question is how we can create the ArrowArray from Rust with custom drop semantics.

The `arrow2` crate

Provides a set of different array types that can be shared via arrow
Unfortunately, most of these types are unsuitable for our use-case, since they assume that the data is stored on the heap
- our data is backed by shared memory instead, which needs special actions on drop
If leaking the data was acceptable, we could use a PrimitiveArray and the arrow2::ffi::mmap::slice function.
- Unfortunately there is no way to figure out whether the data is no longer used.
- Freeing the data too early would result in a segfault or in corrupted data, which is not acceptable.
Idea: Implement the arrow2::array::Array trait for a custom type
- Then we could use the arrow2::ffi::export_array_to_c function to create the Arrow array
  - The arrow2 library automatically fills in a release implementation that calls the drop handler of our custom type
  - In this drop handler, we could unmap the shared memory and report the drop tokens to the dora daemon
- Unfortunately, this approach is not possible because the export_array_to_c function only works with the predefined array types (via downcasting)
Conclusion: There appears to be no way to create an Arrow array with custom drop logic using the arrow2 crate

Manual Creation

The ABI layout of Arrow arrays is specified, so we could do the creation manually
Drawbacks: very unsafe, difficult to get right, a lot of work
Advantage: We can implement exactly the drop behavior that we like

The official `arrow` crate

See https://docs.rs/arrow/35.0.0/arrow/ and https://github.com/apache/arrow-rs
The Buffer::from_custom_allocation looks quite promising for our use case
- This Buffer can then be converted to ArrayData using new_unchecked:
```
ArrayData::new_unchecked(
    UInt8Type,
    len,
    Some(0),
    None,
    0,
    vec![buffer],
    vec![],
)
```
- This ArrayData struct can the be used to construct a FFI_ArrowArray through the new constructor.
- The corresponding schema can be generated using the FFI_ArrowSchema type: FFI_ArrowSchema::try_from(data.data_type())?

The text was updated successfully, but these errors were encountered:

haixuanTao · 2023-03-17T04:48:48Z

Thanks for this detailed analysis!

Curious about when you say:

Unfortunately there is no way to figure out whether the data is no longer used.

This is because we can't listen on the python arrow itself, right? This is why you mentioned that we could eventually listen for a drop of the event object itself. ( with a one-shot Tokio channel for ex.)

In the case we listen for a drop of the event object, wouldn't we be able to know when the data is no longer used?

haixuanTao · 2023-03-17T05:57:30Z

Unfortunately, most of these types are unsuitable for our use-case, since they assume that the data is stored on the heap

I have to check again but from what I remember, the unsafe slice gives you a Array<u8> that you can transmute for free to other types like array<f64> and the likes.

phil-opp · 2023-03-17T09:18:00Z

In the case we listen for a drop of the event object, wouldn't we be able to know when the data is no longer used?

Dropping the event is not enough since the user can take the data out of input events and store them somewhere else (e.g. in some list). So we really have to wait until the data is dropped too. With Arrow, this is signaled through the release callback, which is part of the data format.

The issue with the arrow2 crate is that it does not support to plug your own logic into the release callback.

Unfortunately, most of these types are unsuitable for our use-case, since they assume that the data is stored on the heap

I have to check again but from what I remember, the unsafe slice gives you a Array<u8> that you can transmute for free to other types like array<f64> and the likes.

The issue with the ffi::mmap::slice method is that it does not take any ownership of the data. So it will set the release callback to a no-op, which effectively requires us to keep the data allocated forever (since there is no way to find out whether it's safe to be freed).

haixuanTao · 2023-03-17T09:46:27Z

I see your point. Thanks for the clarification! I'll try to do some test as well next week.

phil-opp · 2023-03-17T09:47:11Z

To give some more details:

The underlying type for all the arrow array implementations is called Buffer in both arrow2 and arrow
- arrow: https://github.com/apache/arrow-rs/blob/0df21883cb7a5f414894a833ae301fcd8d8e464c/arrow-buffer/src/buffer/immutable.rs#L34-L48
- arrow2: https://github.com/jorgecarleitao/arrow2/blob/db87f71f3f673b240de80d5a7592d45c0beae805/src/buffer/immutable.rs#L38-L48
For both crates, this Buffer type has an Arc field of type Bytes, which stores the actual data
- For the arrow crate, Bytes is a custom struct, with a pointer, a length, and a Deallocation field that defines how to clean up the array.
- For arrow2, this Bytes type is an alias for a ForeignVec with the owner set to InternalArrowArray, which contains a reference to the actual FFI-compatible arrow array type.
We want to set a custom deallocation logic:
- For arrow: The nice thing is that the Deallocation parameter can be set to some custom deallocation logic through the owner argument of Buffer::from_custom_allocation
- For arrow2 we don't have such an option. However, the PrivateData struct (which the library frees on drop) has a generic array: Box<dyn Array> field. So maybe we could implement the Array trait for a custom type of ours, which performs the necessary drop steps? Unfortunately, this does not work either because the library assumes that the Array trait can always be downcasted to some known array type. So using e.g. export_array_to_c with a custom Array implementation will result in a panic.

phil-opp · 2023-03-17T10:00:13Z

Could you clarify why you chose the arrow2 crate instead of arrow in the first place? Is there any functionality missing from the arrow crate?

Also, it looks like there are some proposals to merging arrow2 and arrow: jorgecarleitao/arrow2#1429 and apache/arrow-rs#1176.

haixuanTao · 2023-03-17T10:05:19Z

Yeah, I thought that arrow/Buffer required the ownership of the data but it seems that as you mentioned you could do it with [from_custom_allocation](https://docs.rs/arrow/35.0.0/arrow/buffer/struct.Buffer.html#method.from_custom_allocation).

In many ways the arrow2 crate seemed more easy to work with vec and slice . But if this deallocation can only be done with arrow, there is no issue with using it.

haixuanTao · 2023-03-20T02:16:18Z

In any case, it should be simple to change from arrow2 to arrow as they both can read c pointers as input to make a array. I can add some comments to go from one to the other if you need.

Also, isn't the deallocation method of an array that we have built be called at the end of its lifetime, which in our case is when we export it to a python arrow array, and not when it's not used within python?

phil-opp · 2023-03-21T09:34:54Z

In any case, it should be simple to change from arrow2 to arrow as they both can read c pointers as input to make a array. I can add some comments to go from one to the other if you need.

Yeah, I think it shouldn't be too difficult to switch from arrow2 to arrow. The challenge will probably be to set up the Deallocation field correctly. I try to look into it today.

Also, isn't the deallocation method of an array that we have built be called at the end of its lifetime, which in our case is when we export it to a python arrow array, and not when it's not used within python?

This depends on whether the created arrow array only borrows the data or whether it takes ownership. The arrow2::ffi::mmap::slice function only borrows the data so the original array is dropped at the end of the scope as usual (which caused a segfault in the Python code). The arrow::buffer::Buffer::from_custom_allocation function will take ownership of the array instead, so the drop will happen once the last copy of the data is dropped (reference-counted using Arc).

When exporting the array to C through the FFI_ArrowArray::new function, the reference count will be increased by one. The idea is that the Python code will invoke the release callback once it's done with the data, which decreases the reference count and drops the data once the reference count reaches 0. (We might need an additional mem::forget when doing the conversion as it looks like the FFI_ArrowArray has a Drop implementation that calls release itself.)

See #224 for details.

phil-opp · 2023-04-18T09:49:02Z

Implemented in #228.

phil-opp added a commit that referenced this issue Mar 21, 2023

Use arrow crate to send events from Rust to Python without copying

c948639

See #224 for details.

phil-opp mentioned this issue Mar 21, 2023

Share events to Python without copying via arrow crate #228

Merged

phil-opp added a commit that referenced this issue Apr 5, 2023

Use arrow crate to send events from Rust to Python without copying

1fab082

See #224 for details.

phil-opp linked a pull request Apr 18, 2023 that will close this issue

Share events to Python without copying via arrow crate #228

Merged

phil-opp closed this as completed Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-copy Python inputs using `arrow` #224

Zero-copy Python inputs using `arrow` #224

phil-opp commented Mar 16, 2023

haixuanTao commented Mar 17, 2023 •

edited

Loading

haixuanTao commented Mar 17, 2023 •

edited

Loading

phil-opp commented Mar 17, 2023

haixuanTao commented Mar 17, 2023

phil-opp commented Mar 17, 2023 •

edited

Loading

phil-opp commented Mar 17, 2023

haixuanTao commented Mar 17, 2023 •

edited

Loading

haixuanTao commented Mar 20, 2023

phil-opp commented Mar 21, 2023

phil-opp commented Apr 18, 2023

Zero-copy Python inputs using arrow #224

Zero-copy Python inputs using arrow #224

Comments

phil-opp commented Mar 16, 2023

Motivation

Challenges

Apache Arrow

Arrow vs PyO3

The Arrow Array datatype

Implementation

The arrow2 crate

Manual Creation

The official arrow crate

haixuanTao commented Mar 17, 2023 • edited Loading

haixuanTao commented Mar 17, 2023 • edited Loading

phil-opp commented Mar 17, 2023

haixuanTao commented Mar 17, 2023

phil-opp commented Mar 17, 2023 • edited Loading

phil-opp commented Mar 17, 2023

haixuanTao commented Mar 17, 2023 • edited Loading

haixuanTao commented Mar 20, 2023

phil-opp commented Mar 21, 2023

phil-opp commented Apr 18, 2023

Zero-copy Python inputs using `arrow` #224

Zero-copy Python inputs using `arrow` #224

The Arrow `Array` datatype

The `arrow2` crate

The official `arrow` crate

haixuanTao commented Mar 17, 2023 •

edited

Loading

haixuanTao commented Mar 17, 2023 •

edited

Loading

phil-opp commented Mar 17, 2023 •

edited

Loading

haixuanTao commented Mar 17, 2023 •

edited

Loading