-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-14134] Optimize memory allocations for various core coders #17134
Conversation
Run Java PreCommit |
This is great. Taking a look now. |
Maybe use a local byte[] for longs and read calls for everything else? That seems consistent with what DataInputStream did as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest sticking with read/writeLongViaLocalBuffer
since read/write calls can depend on many layers of I/O before hitting the lowest layer so that allows us to pushdown the number of bytes we want to read/write as close to the layer doing the actual I/O work. Benchmarking using ByteArrayInput/OutputStream
will give very skewed results.
It is difficult for me to say whether 4 reads will be cheaper than creating a byte array. I wish fixed length value types could go on the stack then this would be a no brainer but it does look like a win over allocating the 100's of bytes for each |
Time for the C# runner? 🤣 |
oh also, any thoughts on using the guava Longs, Ints, Shorts.fromBytes methods here? I wasn't sure what the stance on using the shaded guava generally in the core libraries was. |
79b32e6
to
b976114
Compare
This is totally fine to use shaded guava internally. Just don't expose the types on the API surface of things that are public. |
Cool, updated the Long one to use it at least. |
b976114
to
6127e7f
Compare
Run Java_Examples_Dataflow PreCommit |
Run Java PreCommit |
not sure what's going on with the precommit here, the failure seems unrelated in a metrics test. |
Run Java PreCommit |
6127e7f
to
a8aed47
Compare
Run Java PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, can replace readFully with Guava's implementation
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/BitConverters.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/BitConverters.java
Outdated
Show resolved
Hide resolved
01ec734
to
ed3cffe
Compare
Run Java PreCommit |
alright I've given up trying to get this precommit working today. I'll give it another poke tomorrow. |
I filed https://issues.apache.org/jira/browse/BEAM-14148 and started a rollback of the extremely flaky test in #17154 |
Run Java PreCommit |
4 similar comments
Run Java PreCommit |
Run Java PreCommit |
Run Java PreCommit |
Run Java PreCommit |
Run Java PreCommit |
ed3cffe
to
fe7170c
Compare
Run Java PreCommit |
Many coders have significant overhead due to the usage of
DataInputStream
. DataInputStream allocates a significant amount of internal buffers when instantiated, which adds unnecessary overhead for very simple operations like decoding a big-endian long.This changes most coders that use DataInputStream internally to use a more optimized big-endian decoder. I actually benchmarked three different options here, the solution I arrived at was the best mix of performance and allocations.
readLongViaLocalBuffer allocates an 8 byte buffer per call and reads it using a single read() call.
readLongViaTLBuffer does the same, but uses a thread-local buffer rather than allocating a new one each call.
readLongViaReadCalls simply calls read 8 times, storing the results in temporary variables.
R: @lukecwik maybe? Not really sure who's the best to look at this.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.