Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-4956: [C#] Allow ArrowBuffers to wrap external Memory #3971

Closed
wants to merge 3 commits into from

Conversation

@pgovind
Copy link
Contributor

pgovind commented Mar 18, 2019

Allow ArrowBuffer to wrap external memory

@eerhardt @chutchinson

@chutchinson

This comment has been minimized.

Copy link
Contributor

chutchinson commented Mar 19, 2019

I'm not sure this is a great idea. The specification states that arrays are supposed to be immutable and relocatable. An Array derived from a buffer with mutable memory breaks the specification unless a copy is made and then you're misconstruing the zero-copy idealism. (EDIT: Maybe not, because that's usually in reference to deserialization. Still, it violates the immutable array spec).

@eerhardt Thoughts?

@eerhardt

This comment has been minimized.

Copy link
Contributor

eerhardt commented Mar 19, 2019

@pgovind and I played with with numpy and pyarrow integration a bit last week. We found that if you say:

import numpy as np
import pyarrow as pa

data = np.arange(10, dtype='int64')
pyarr = pa.array(data)
pyarr

It prints as expected:

<pyarrow.lib.Int64Array object at 0x11df6cf48>
[
  0,
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8,
  9
]

However, you can then change the ndarray:

data[4] = 99
pyarr

and it mutates the underlying Arrow array:

<pyarrow.lib.Int64Array object at 0x11df7f138>
[
  0,
  1,
  2,
  3,
  99,
  5,
  6,
  7,
  8,
  9
]

So I don't think it is a terrible idea that someone can "new up" a ArrowBuffer with memory that they own themselves. The ArrowBuffer will wrap that memory, but the owner of the memory can do with it as they please.

That being said, I think ArrowBuffer's ctor that takes ReadOnlyMemory<byte> should just be made public (and of course stay a ReadOnlyMemory<byte>). Memory<byte> is implicitly convertible to ReadOnlyMemory<byte>, so anyone who has a Memory<byte> can pass it in for "free".

@@ -21,6 +21,10 @@ namespace Apache.Arrow
{
public readonly partial struct ArrowBuffer: IEquatable<ArrowBuffer>
{
public static ArrowBuffer WrapExternalMemoryAsAnArrowBuffer(Memory<byte> externalMemory)

This comment has been minimized.

Copy link
@eerhardt

eerhardt Mar 19, 2019

Contributor

I think we shouldn't be doing this, and instead just expose the existing constructor publicly. The constructor should continue taking ReadOnlyMemory<byte>.

This comment has been minimized.

Copy link
@pgovind

pgovind Mar 19, 2019

Author Contributor

I'm good with that. I put in a new method here to make it super obvious that the memory is not owned by the buffer. The main concern was that making the constructor public would mean clients could construct an ArrowBuffer without going through the Builder. Given a choice, I'd go with forcing people to use the builder methods. They'd get the benefits of 64 byte aligned memory and everything else the builder methods do.

This comment has been minimized.

Copy link
@eerhardt

eerhardt Mar 25, 2019

Contributor

@pgovind - what is the status of this comment? I think we should be exposing the constructor here, and not a static method.

The main concern was that making the constructor public would mean clients could construct an ArrowBuffer without going through the Builder.

With your proposal, they can still do it, just using a different mechanism.

@pgovind

This comment has been minimized.

Copy link
Contributor Author

pgovind commented Mar 19, 2019

== what Eric said. In addition:
The C++ implementation at arrow/cpp/buffer.h defines a "Buffer" and a "MutableBuffer". "Buffer.GetMutableValue" returns a non const pointer that can be used to mutate the buffer => the C++ implementation is not immutable at the moment. This felt like the cleanest way in C# to do the same thing. One can call MemoryMarshal.AsMemory(ReadOnlyMemory) right now to get a "Memory" and change the bytes at the moment, but that feels like a hack compared to this.

Also, just out of curiosity, if ArrowBuffers always own their memory and are immutable, wouldn't that mean anyone who wants to mutate the buffer necessarily has to make a copy? That seems to go against being a standard format. If I need to make a copy, I might as well use a format native/more suitable to the application I'm using right? Unless I'm missing something here.

@wesm

This comment has been minimized.

Copy link
Member

wesm commented Mar 19, 2019

The intention of the Arrow columnar specification is for the arrays to be semantically immutable. As soon as you introduce mutability as a design requirement, you might make different decisions. For example, mutating binary arrays in Arrow is expensive because the data structure must be rewritten.

So there is no particular reason to make memory immutable. If an irresponsible developer mutates memory that another part of the application is expecting to be immutable, IMHO that is the developer's problem. We give the developer a great deal of freedom in C++, and Java has similar freedoms. I suggest you do the same in C# and document the expected behavior.

In C++ my thinking has been that applications can use the reference count of buffers to determine whether or not they are safe to mutate in applications where that is desirable. So effectively this makes things copy on write:

  • Reference count 1: safe to mutate in-place
  • Reference count > 1 : not safe to mutate, must copy
@pitrou pitrou changed the title ARROW-4956 - Allow ArrowBuffers to wrap external Memory in C# ARROW-4956: [C#] Allow ArrowBuffers to wrap external Memory in C# Mar 19, 2019
@pitrou pitrou changed the title ARROW-4956: [C#] Allow ArrowBuffers to wrap external Memory in C# ARROW-4956: [C#] Allow ArrowBuffers to wrap external Memory Mar 19, 2019
@eerhardt

This comment has been minimized.

Copy link
Contributor

eerhardt commented Apr 12, 2019

@pgovind - do you think we can move forward with making the constructor public here? I have some code which I'd like to build a DoubleArray without going through the Builder classes (because the Builder classes will allocate an array, and then re-allocate and copy the data on Build()). Currently it is impossible to create a DoubleArray using an existing .NET Memory object.

@pgovind

This comment has been minimized.

Copy link
Contributor Author

pgovind commented Apr 12, 2019

Yup. I meant to come back to this over the weekend. I'll update the patch shortly with a public constructor

Edit: Updated @eerhardt

@pgovind pgovind force-pushed the pgovind:TestArrowWrapMemory branch from 3f3bcde to 50c3288 Apr 12, 2019
@wesm

This comment has been minimized.

Copy link
Member

wesm commented May 6, 2019

Merging

@wesm wesm changed the title ARROW-4956: [C#] Allow ArrowBuffers to wrap external Memory ARROW-4956: [C#] Allow ArrowBuffers to wrap external Memory May 6, 2019
@wesm wesm closed this in eef5857 May 6, 2019
mattias01 added a commit to mattias01/arrow that referenced this pull request May 21, 2019
Allow ArrowBuffer to wrap external memory

@eerhardt @chutchinson

Author: Prashanth Govindarajan <prgovi@microsoft.com>

Closes apache#3971 from pgovind/TestArrowWrapMemory and squashes the following commits:

50c3288 <Prashanth Govindarajan> Expose the existing constructor publicly
7804a0a <Prashanth Govindarajan> nit
98b56a2 <Prashanth Govindarajan>  ARROW-4956 - Allow ArrowBuffers to wrap external Memory in C#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.