Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3504: [Plasma] Add support for Plasma Client to put/get raw bytes without pyarrow serialization. #2752

Closed
wants to merge 5 commits into from

Conversation

guoyuhong
Copy link
Contributor

This is a feature enables Java Client to read data that python client puts (cross-language read/write).

@codecov-io
Copy link

codecov-io commented Oct 13, 2018

Codecov Report

Merging #2752 into master will increase coverage by <.01%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2752      +/-   ##
==========================================
+ Coverage   87.56%   87.57%   +<.01%     
==========================================
  Files         403      403              
  Lines       61483    61515      +32     
==========================================
+ Hits        53838    53870      +32     
  Misses       7571     7571              
  Partials       74       74
Impacted Files Coverage Δ
python/pyarrow/tests/test_plasma.py 95.97% <100%> (+0.08%) ⬆️
python/pyarrow/_plasma.pyx 64.73% <90.47%> (+3.15%) ⬆️
cpp/src/plasma/thirdparty/dlmalloc.c 47.53% <0%> (+0.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 54634dd...b28748b. Read the comment docs.

Parameters
----------
value : Python bytes
A Python bytes object to store.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support any object that implements the buffer protocol

cdef ObjectID target_id = (object_id if object_id
else ObjectID.from_random())
if not isinstance(value, bytes):
raise ValueError("Input value of put_raw_bytes should be bytes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should coerce to a pyarrow.Buffer with pyarrow.py_buffer and then this object can be passed into write

if object_buffers[i].data.get() != nullptr:
size = object_buffers[i].data.get().size()
results.append(bytes(
object_buffers[i].data.get().data()[:size]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you definitely want to coerce to bytes? Extra memory copying that may not be needed for all applications

results.append(None)
return results
else:
return self.get_raw_bytes([object_ids], timeout_ms)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to avoid polymorphic output type unless it's definitely needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the same style as get function. It is better to have the same behavior for get and get_raw_bytes.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thanks @guoyuhong!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants