New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-3504: [Plasma] Add support for Plasma Client to put/get raw bytes without pyarrow serialization. #2752
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2752 +/- ##
==========================================
+ Coverage 87.56% 87.57% +<.01%
==========================================
Files 403 403
Lines 61483 61515 +32
==========================================
+ Hits 53838 53870 +32
Misses 7571 7571
Partials 74 74
Continue to review full report at Codecov.
|
python/pyarrow/_plasma.pyx
Outdated
Parameters | ||
---------- | ||
value : Python bytes | ||
A Python bytes object to store. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should support any object that implements the buffer protocol
python/pyarrow/_plasma.pyx
Outdated
cdef ObjectID target_id = (object_id if object_id | ||
else ObjectID.from_random()) | ||
if not isinstance(value, bytes): | ||
raise ValueError("Input value of put_raw_bytes should be bytes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should coerce to a pyarrow.Buffer with pyarrow.py_buffer and then this object can be passed into write
python/pyarrow/_plasma.pyx
Outdated
if object_buffers[i].data.get() != nullptr: | ||
size = object_buffers[i].data.get().size() | ||
results.append(bytes( | ||
object_buffers[i].data.get().data()[:size])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you definitely want to coerce to bytes? Extra memory copying that may not be needed for all applications
python/pyarrow/_plasma.pyx
Outdated
results.append(None) | ||
return results | ||
else: | ||
return self.get_raw_bytes([object_ids], timeout_ms)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to avoid polymorphic output type unless it's definitely needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the same style as get
function. It is better to have the same behavior for get
and get_raw_bytes
.
Signed-off-by: Yuhong Guo <yuhong.gyh@antfin.com>
79f6685
to
b28748b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Thanks @guoyuhong!
This is a feature enables Java Client to read data that python client puts (cross-language read/write).