Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
ARROW-4956: [C#] Allow ArrowBuffers to wrap external Memory #3971
I'm not sure this is a great idea. The specification states that arrays are supposed to be immutable and relocatable. An Array derived from a buffer with mutable memory breaks the specification unless a copy is made and then you're misconstruing the zero-copy idealism. (EDIT: Maybe not, because that's usually in reference to deserialization. Still, it violates the immutable array spec).
@pgovind and I played with with numpy and pyarrow integration a bit last week. We found that if you say:
import numpy as np import pyarrow as pa data = np.arange(10, dtype='int64') pyarr = pa.array(data) pyarr
It prints as expected:
However, you can then change the
data = 99 pyarr
and it mutates the underlying Arrow array:
So I don't think it is a terrible idea that someone can "new up" a ArrowBuffer with memory that they own themselves. The
That being said, I think
== what Eric said. In addition:
Also, just out of curiosity, if ArrowBuffers always own their memory and are immutable, wouldn't that mean anyone who wants to mutate the buffer necessarily has to make a copy? That seems to go against being a standard format. If I need to make a copy, I might as well use a format native/more suitable to the application I'm using right? Unless I'm missing something here.
The intention of the Arrow columnar specification is for the arrays to be semantically immutable. As soon as you introduce mutability as a design requirement, you might make different decisions. For example, mutating binary arrays in Arrow is expensive because the data structure must be rewritten.
So there is no particular reason to make memory immutable. If an irresponsible developer mutates memory that another part of the application is expecting to be immutable, IMHO that is the developer's problem. We give the developer a great deal of freedom in C++, and Java has similar freedoms. I suggest you do the same in C# and document the expected behavior.
In C++ my thinking has been that applications can use the reference count of buffers to determine whether or not they are safe to mutate in applications where that is desirable. So effectively this makes things copy on write:
@pgovind - do you think we can move forward with making the constructor public here? I have some code which I'd like to build a
Allow ArrowBuffer to wrap external memory @eerhardt @chutchinson Author: Prashanth Govindarajan <firstname.lastname@example.org> Closes apache#3971 from pgovind/TestArrowWrapMemory and squashes the following commits: 50c3288 <Prashanth Govindarajan> Expose the existing constructor publicly 7804a0a <Prashanth Govindarajan> nit 98b56a2 <Prashanth Govindarajan> ARROW-4956 - Allow ArrowBuffers to wrap external Memory in C#