[Python] Efficiently serialize functions containing NumPy arrays 

It is my understanding that pyarrow falls back to serializing functions (and other complex Python objects) using cloudpickle, which means that the contents of those functions are also serialized using the fallback method, rather than the efficient method described in <https://ray-project.github.io/2017/10/15/fast-python-serialization-with-ray-and-arrow.html.> It would be good to get the benefit of fast zero-copy (de)serialization for objects like NumPy arrays contained inside functions.
```java

In [1]: import numpy as np, pyarrow as pa

In [2]: pa.__version__
Out[2]: '0.9.0'

In [3]: arr = np.random.rand(10000)

In [4]: %timeit pa.deserialize(pa.serialize(arr).to_buffer())
The slowest run took 38.29 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 68.7 µs per loop

In [5]: def arr_f(): return arr

In [6]: %timeit pa.deserialize(pa.serialize(arr_f).to_buffer())
The slowest run took 5.89 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 539 µs per loop
```
For comparison:
```java

In [7]: %timeit cloudpickle.loads(cloudpickle.dumps(arr))
1000 loops, best of 3: 193 µs per loop

In [8]: %timeit cloudpickle.loads(cloudpickle.dumps(arr_f))
The slowest run took 4.02 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 429 µs per loop
```
cc  @pcmoritz

**Reporter**: [Richard Shin](https://issues.apache.org/jira/browse/ARROW-2449) / @rshin

<sub>**Note**: *This issue was originally created as [ARROW-2449](https://issues.apache.org/jira/browse/ARROW-2449). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Efficiently serialize functions containing NumPy arrays #18585

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] Efficiently serialize functions containing NumPy arrays #18585

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions