[BEAM-14134] Optimize memory allocations for various core coders #17134

steveniemitz · 2022-03-19T15:08:46Z

Many coders have significant overhead due to the usage of DataInputStream. DataInputStream allocates a significant amount of internal buffers when instantiated, which adds unnecessary overhead for very simple operations like decoding a big-endian long.

This changes most coders that use DataInputStream internally to use a more optimized big-endian decoder. I actually benchmarked three different options here, the solution I arrived at was the best mix of performance and allocations.

Benchmark                Mode  Cnt          Score         Error  Units
readLongViaLocalBuffer  thrpt   10  204364633.343 ± 7412002.528  ops/s
readLongViaTLBuffer     thrpt   10  108663164.381 ±  229471.991  ops/s
readLongViaReadCalls    thrpt   10  160694853.195 ± 5272248.704  ops/s

readLongViaLocalBuffer allocates an 8 byte buffer per call and reads it using a single read() call.
readLongViaTLBuffer does the same, but uses a thread-local buffer rather than allocating a new one each call.
readLongViaReadCalls simply calls read 8 times, storing the results in temporary variables.

R: @lukecwik maybe? Not really sure who's the best to look at this.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

steveniemitz · 2022-03-19T19:20:14Z

Run Java PreCommit

lukecwik · 2022-03-21T21:31:14Z

This is great. Taking a look now.

steveniemitz · 2022-03-21T21:42:10Z

I would suggest sticking with read/writeLongViaLocalBuffer since read/write calls can depend on many layers of I/O before

Maybe use a local byte[] for longs and read calls for everything else? That seems consistent with what DataInputStream did as well.

lukecwik

I would suggest sticking with read/writeLongViaLocalBuffer since read/write calls can depend on many layers of I/O before hitting the lowest layer so that allows us to pushdown the number of bytes we want to read/write as close to the layer doing the actual I/O work. Benchmarking using ByteArrayInput/OutputStream will give very skewed results.

lukecwik · 2022-03-21T21:45:27Z

I would suggest sticking with read/writeLongViaLocalBuffer since read/write calls can depend on many layers of I/O before

Maybe use a local byte[] for longs and read calls for everything else? That seems consistent with what DataInputStream did as well.

It is difficult for me to say whether 4 reads will be cheaper than creating a byte array. I wish fixed length value types could go on the stack then this would be a no brainer but it does look like a win over allocating the 100's of bytes for each Data*Stream object so I'll take your judgement call as to whether you want arrays or multiple reads.

steveniemitz · 2022-03-21T21:47:28Z

I wish fixed length value types could go on the stack then this would be a no brainer

Time for the C# runner? 🤣

steveniemitz · 2022-03-21T21:52:57Z

oh also, any thoughts on using the guava Longs, Ints, Shorts.fromBytes methods here? I wasn't sure what the stance on using the shaded guava generally in the core libraries was.

lukecwik · 2022-03-21T22:10:51Z

oh also, any thoughts on using the guava Longs, Ints, Shorts.fromBytes methods here? I wasn't sure what the stance on using the shaded guava generally in the core libraries was.

This is totally fine to use shaded guava internally. Just don't expose the types on the API surface of things that are public.

steveniemitz · 2022-03-21T22:51:54Z

This is totally fine to use shaded guava internally. Just don't expose the types on the API surface of things that are public.

Cool, updated the Long one to use it at least.

steveniemitz · 2022-03-22T00:00:01Z

Run Java_Examples_Dataflow PreCommit

steveniemitz · 2022-03-22T03:41:51Z

Run Java PreCommit

steveniemitz · 2022-03-22T13:33:45Z

not sure what's going on with the precommit here, the failure seems unrelated in a metrics test.

steveniemitz · 2022-03-22T13:33:53Z

Run Java PreCommit

lukecwik · 2022-03-22T16:00:37Z

Run Java PreCommit

lukecwik

LGTM, can replace readFully with Guava's implementation

sdks/java/core/src/main/java/org/apache/beam/sdk/coders/BitConverters.java

steveniemitz · 2022-03-22T17:37:07Z

Run Java PreCommit

steveniemitz · 2022-03-22T20:06:20Z

alright I've given up trying to get this precommit working today. I'll give it another poke tomorrow.

lukecwik · 2022-03-22T20:20:10Z

alright I've given up trying to get this precommit working today. I'll give it another poke tomorrow.

I filed https://issues.apache.org/jira/browse/BEAM-14148 and started a rollback of the extremely flaky test in #17154

lukecwik · 2022-03-23T00:42:58Z

Run Java PreCommit

lukecwik · 2022-03-23T03:09:32Z

Run Java PreCommit

lukecwik · 2022-03-23T04:11:12Z

Run Java PreCommit

lukecwik · 2022-03-23T04:26:28Z

Run Java PreCommit

lukecwik · 2022-03-23T04:32:03Z

Run Java PreCommit

lukecwik · 2022-03-23T05:15:12Z

Run Java PreCommit

lukecwik · 2022-03-23T14:12:31Z

Run Java PreCommit

github-actions bot added the java label Mar 19, 2022

lukecwik requested changes Mar 21, 2022

View reviewed changes

steveniemitz force-pushed the coder-tuning branch from 79b32e6 to b976114 Compare March 21, 2022 21:55

steveniemitz requested a review from lukecwik March 21, 2022 22:52

steveniemitz force-pushed the coder-tuning branch from b976114 to 6127e7f Compare March 21, 2022 23:46

steveniemitz force-pushed the coder-tuning branch from 6127e7f to a8aed47 Compare March 22, 2022 14:56

lukecwik approved these changes Mar 22, 2022

View reviewed changes

sdks/java/core/src/main/java/org/apache/beam/sdk/coders/BitConverters.java Outdated Show resolved Hide resolved

lukecwik reviewed Mar 22, 2022

View reviewed changes

sdks/java/core/src/main/java/org/apache/beam/sdk/coders/BitConverters.java Outdated Show resolved Hide resolved

steveniemitz force-pushed the coder-tuning branch from 01ec734 to ed3cffe Compare March 22, 2022 16:25

[BEAM-14134] Optimize memory allocations for various core coders

fe7170c

steveniemitz force-pushed the coder-tuning branch from ed3cffe to fe7170c Compare March 23, 2022 12:53

lukecwik merged commit 8cda8a2 into apache:master Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-14134] Optimize memory allocations for various core coders #17134

[BEAM-14134] Optimize memory allocations for various core coders #17134

steveniemitz commented Mar 19, 2022

steveniemitz commented Mar 19, 2022

lukecwik commented Mar 21, 2022

steveniemitz commented Mar 21, 2022

lukecwik left a comment •

edited

lukecwik commented Mar 21, 2022 •

edited

steveniemitz commented Mar 21, 2022

steveniemitz commented Mar 21, 2022

lukecwik commented Mar 21, 2022

steveniemitz commented Mar 21, 2022

steveniemitz commented Mar 22, 2022

steveniemitz commented Mar 22, 2022

steveniemitz commented Mar 22, 2022

steveniemitz commented Mar 22, 2022

lukecwik commented Mar 22, 2022

lukecwik left a comment

steveniemitz commented Mar 22, 2022

steveniemitz commented Mar 22, 2022

lukecwik commented Mar 22, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

[BEAM-14134] Optimize memory allocations for various core coders #17134

[BEAM-14134] Optimize memory allocations for various core coders #17134

Conversation

steveniemitz commented Mar 19, 2022

GitHub Actions Tests Status (on master branch)

steveniemitz commented Mar 19, 2022

lukecwik commented Mar 21, 2022

steveniemitz commented Mar 21, 2022

lukecwik left a comment • edited

Choose a reason for hiding this comment

lukecwik commented Mar 21, 2022 • edited

steveniemitz commented Mar 21, 2022

steveniemitz commented Mar 21, 2022

lukecwik commented Mar 21, 2022

steveniemitz commented Mar 21, 2022

steveniemitz commented Mar 22, 2022

steveniemitz commented Mar 22, 2022

steveniemitz commented Mar 22, 2022

steveniemitz commented Mar 22, 2022

lukecwik commented Mar 22, 2022

lukecwik left a comment

Choose a reason for hiding this comment

steveniemitz commented Mar 22, 2022

steveniemitz commented Mar 22, 2022

lukecwik commented Mar 22, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

lukecwik commented Mar 23, 2022

lukecwik left a comment •

edited

lukecwik commented Mar 21, 2022 •

edited