Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-14134] Optimize memory allocations for various core coders #17134

Merged
merged 1 commit into from
Mar 23, 2022

Conversation

steveniemitz
Copy link
Contributor

Many coders have significant overhead due to the usage of DataInputStream. DataInputStream allocates a significant amount of internal buffers when instantiated, which adds unnecessary overhead for very simple operations like decoding a big-endian long.

This changes most coders that use DataInputStream internally to use a more optimized big-endian decoder. I actually benchmarked three different options here, the solution I arrived at was the best mix of performance and allocations.

Benchmark                Mode  Cnt          Score         Error  Units
readLongViaLocalBuffer  thrpt   10  204364633.343 ± 7412002.528  ops/s
readLongViaTLBuffer     thrpt   10  108663164.381 ±  229471.991  ops/s
readLongViaReadCalls    thrpt   10  160694853.195 ± 5272248.704  ops/s

readLongViaLocalBuffer allocates an 8 byte buffer per call and reads it using a single read() call.
readLongViaTLBuffer does the same, but uses a thread-local buffer rather than allocating a new one each call.
readLongViaReadCalls simply calls read 8 times, storing the results in temporary variables.

R: @lukecwik maybe? Not really sure who's the best to look at this.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@github-actions github-actions bot added the java label Mar 19, 2022
@steveniemitz
Copy link
Contributor Author

Run Java PreCommit

@lukecwik
Copy link
Member

This is great. Taking a look now.

@steveniemitz
Copy link
Contributor Author

I would suggest sticking with read/writeLongViaLocalBuffer since read/write calls can depend on many layers of I/O before

Maybe use a local byte[] for longs and read calls for everything else? That seems consistent with what DataInputStream did as well.

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest sticking with read/writeLongViaLocalBuffer since read/write calls can depend on many layers of I/O before hitting the lowest layer so that allows us to pushdown the number of bytes we want to read/write as close to the layer doing the actual I/O work. Benchmarking using ByteArrayInput/OutputStream will give very skewed results.

@lukecwik
Copy link
Member

lukecwik commented Mar 21, 2022

I would suggest sticking with read/writeLongViaLocalBuffer since read/write calls can depend on many layers of I/O before

Maybe use a local byte[] for longs and read calls for everything else? That seems consistent with what DataInputStream did as well.

It is difficult for me to say whether 4 reads will be cheaper than creating a byte array. I wish fixed length value types could go on the stack then this would be a no brainer but it does look like a win over allocating the 100's of bytes for each Data*Stream object so I'll take your judgement call as to whether you want arrays or multiple reads.

@steveniemitz
Copy link
Contributor Author

I wish fixed length value types could go on the stack then this would be a no brainer

Time for the C# runner? 🤣

@steveniemitz
Copy link
Contributor Author

oh also, any thoughts on using the guava Longs, Ints, Shorts.fromBytes methods here? I wasn't sure what the stance on using the shaded guava generally in the core libraries was.

@lukecwik
Copy link
Member

oh also, any thoughts on using the guava Longs, Ints, Shorts.fromBytes methods here? I wasn't sure what the stance on using the shaded guava generally in the core libraries was.

This is totally fine to use shaded guava internally. Just don't expose the types on the API surface of things that are public.

@steveniemitz
Copy link
Contributor Author

This is totally fine to use shaded guava internally. Just don't expose the types on the API surface of things that are public.

Cool, updated the Long one to use it at least.

@steveniemitz
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@steveniemitz
Copy link
Contributor Author

Run Java PreCommit

@steveniemitz
Copy link
Contributor Author

not sure what's going on with the precommit here, the failure seems unrelated in a metrics test.

@steveniemitz
Copy link
Contributor Author

Run Java PreCommit

@lukecwik
Copy link
Member

Run Java PreCommit

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can replace readFully with Guava's implementation

@steveniemitz
Copy link
Contributor Author

Run Java PreCommit

@steveniemitz
Copy link
Contributor Author

alright I've given up trying to get this precommit working today. I'll give it another poke tomorrow.

@lukecwik
Copy link
Member

alright I've given up trying to get this precommit working today. I'll give it another poke tomorrow.

I filed https://issues.apache.org/jira/browse/BEAM-14148 and started a rollback of the extremely flaky test in #17154

@lukecwik
Copy link
Member

Run Java PreCommit

4 similar comments
@lukecwik
Copy link
Member

Run Java PreCommit

@lukecwik
Copy link
Member

Run Java PreCommit

@lukecwik
Copy link
Member

Run Java PreCommit

@lukecwik
Copy link
Member

Run Java PreCommit

@lukecwik
Copy link
Member

Run Java PreCommit

@lukecwik
Copy link
Member

Run Java PreCommit

@lukecwik lukecwik merged commit 8cda8a2 into apache:master Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants