Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimizations: Merged all LittleEndianDataInputStream functionality into ByteBufferInputStream #953

Closed
wants to merge 16 commits into from

Conversation

theosib-amazon
Copy link
Contributor

This PR is all performance optimization. In benchmarking with Trino, we find query performance to improve from 5% to 15%, depending on the query, and that includes all the I/O time from S3.

The main modification is to merge all of LittleEndianDataInputStream functionality into ByteBufferInputStream, which yields the following benefits:

  • Elimination of extra layers of abstraction and method call overhead
  • Enable the use of intrinsics for readInt, readLong, etc.
  • Availability of faster access methods like readFully and skipFully, without the need for helper functions
  • Reduces some object creation in the performance critical path

This also includes and enables performance optimizations to:

  • ByteBitPackingValuesReader
  • PlainValuesReader
  • RunLengthBitPackingHybridDecoder

Context:
I've been working on improving Parquet reading performance in Trino, mostly by profiling while running performance benchmarks and TPCDS queries. This PR is a subset of the changes I made that have more than doubled the performance of a lot of TPCDS queries (wall clock time, including the S3 access time). If you are kind enough to accept these changes, I have more I would like to contribute.

…nputStream

Deprecated LittleEndianDataInputStream
Optmized performance of:
- ByteBitPackingValuesReader
- PlainValuesReader
- RunLengthBitPackingHybridDecoder
- Optimized performance of readInt, readLong, and related methods
@theosib-amazon
Copy link
Contributor Author

I forgot to add this to a comment in the code:
The reason PlainValuesReader still includes an unused LittleEndianDataInputStream member is because if I don't, the build will fail, indicating an incompatible API change.

@Override
public void skipFully(long n) throws IOException {
try {
buffer.position(buffer.position() + (int)n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is just trying to avoid the checks that are being done in skip. I don't think that's a good idea. This should delegate to skip instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this specifically because it's performance-critical. I did a bunch of profiling, and skips are among the operations that have to have minimal overhead. Delegating to skip() would introduce a bunch of checks that the JIT isn't going to be smart enough to remove.

@@ -174,4 +248,63 @@ public boolean markSupported() {
public int available() {
return buffer.remaining();
}

@Override
public byte readByte() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes from here on out look like what you're really trying to do because we want to read directly from the stream. Can you remove all the other changes that aren't needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're referring to. All of the methods beyond this point are absolutely necessary. We need to be able to read ints and longs and such from the bytebuffer, and this is the only way to get them.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think most of these changes are good, but there are a few things that should be done before committing this:

  • Revert any unnecessary changes, like new constructors and style changes that are non-functional (e.g. using x++ instead of x += 1)
  • Separate the ByteBufferInputStream additions into a dedicated PR with tests
  • Make real changes to PlainValuesReader rather than keeping both input streams and changing the reference to in2
  • Update for project style

Made ByteBuffer exceptions mode specific
Reverted whitespace change
Added blank line after control flow blocks (except in a few places where it would add a non-functional change to code I didn't edit).
@theosib-amazon
Copy link
Contributor Author

theosib-amazon commented Apr 25, 2022

Thanks for reviewing my PR. I made all the cosmetic changes you asked for.

I'm not sure why you're asking to separate the ByteBufferInputStream additions into its own PR, since the PR was all about improving performance by moving functionality from LittleEndianDataInputStream into ByteBufferInputStream. The changes to PlainValuesReader rely on all of those changes.

The only reason I kept the reference to LittleEndianDataInputStream in PlainValuesReader is because otherwise the build fails with a compatibility break against 1.12.0. I'm going to go ahead with the change in the hopes that that doesn't cause a check failure.

Got rid of reference to LittleEndianByteBufferInputStream
@theosib-amazon
Copy link
Contributor Author

Also, you mentioned tests. Since I'm not making any functional changes, I'm not sure what to test for. The new code should behave exactly as the old version, just a bit faster.

Modified skip() and skipFully() to handle negative and out-of-range arguments. Made EOF exceptions preserve any error message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants