-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-1251] Upgrade from buffer to memoryview for Python 3 #4820
Conversation
@holdenk Your review please? |
LGTM but from my memory I think I saw a similar PR, was that also yours? (Or am I just imagining things). |
@@ -309,8 +309,8 @@ def _decompress_bytes(data, codec): | |||
|
|||
# Compressed data includes a 4-byte CRC32 checksum which we verify. | |||
# We take care to avoid extra copies of data while slicing large objects | |||
# by use of a buffer. | |||
result = snappy.decompress(buffer(data)[:-4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested this change? When I ran it, it fails with: TypeError: argument 1 must be string or read-only buffer, not memoryview
.
This is because, a slice of a buffer
will return the raw data, but in case of memoryview
a slice will return a memoryview
object for that sub section.
Thanks for catching this. I did not have an effective way to test. Reading through:
memoryview exists in all versions of Python that Beam supports so once we find a memoryview-based solution that works, we should be able to drop buffer altogether. |
61d4aff
to
4e5e1bf
Compare
@aaltay Can you please retry with this update? |
No, the changed version also does not work. This Besides binary_type is just str, even if it worked as expected in this case it would have created a copy of data, which beats the purpose. The real solution here would be to upgrade snappy to accept memoryview as an argument. If we cannot do that, we can remove the optimization and settle for CC'ing a few people who might have an idea of the impact of copying data here: |
Are we using the current python-snappy 0.52? Perhaps @martindurant has some ideas for us. |
Yes we are depending on the python-snappy pypi. Dataflow has 0.5.1 installed, not the latest 0.5.2. But I do not think there is a change related to this. I tested with the latest available version for this PR. |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
A fix has been checked into intake/python-snappy#72 |
buffer was removed in Python 3 in favor of memoryview.
DESCRIPTION HERE
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue.mvn clean verify
to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.