Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Add convenience function to convert pandas.DataFrame to pyarrow.Buffer containing a file or stream representation #16228

Closed
asfimport opened this issue Mar 3, 2017 · 9 comments

Comments

@asfimport
Copy link

Reporter: Wes McKinney / @wesm
Assignee: Phillip Cloud / @cpcloud

Note: This issue was originally created as ARROW-596. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Matthew Rocklin / @mrocklin:
For network applications a nice interface is something that we can pass to socket.send. This might be something like a bytes, bytearray, memoryview, or sequence of those.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
pyarrow.Buffer has a to_pybytes method as a last resort. Converting to bytes is bad because it is a memory copy

Is there a way to do a zero-copy memory handoff to a memoryview? I can dig into how NumPy provides a zero-copy buffer interface but in case you know off hand the right tool to use

@asfimport
Copy link
Author

Matthew Rocklin / @mrocklin:
I've never had to construct one myself. I just grab my_numpy_array.data and pass that around. I'll ask Antoine Pitrou to chime in here. I suspect that he would have a better understanding.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Cython allows you to implement the buffer protocol: see https://cython.readthedocs.io/en/latest/src/userguide/buffer.html . I've never used it but it looks similar to what you would do in C.

Note that pyarrow.Buffer needs to be a fixed-size buffer for that operation to make sense. If not, then getbuffer should lock the buffer size until releasebuffer is called.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
thanks @pitrou – it is a fixed size buffer, so I think that will work out fine. This is being worked on right now in #369

@asfimport
Copy link
Author

Wes McKinney / @wesm:
related to ARROW-376

@asfimport
Copy link
Author

Wes McKinney / @wesm:
This is part of ARROW-881 #612

@asfimport
Copy link
Author

Wes McKinney / @wesm:
I moved this to 0.4 since we need to come up with a standard spec for the index metadata that other serializers (e.g. fastparquet) can conform to also

@asfimport
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants