Skip to content

Conversation

@charlesconnell
Copy link
Contributor

@charlesconnell charlesconnell commented Feb 19, 2025

Each time a block is decoded in HFileBlockDefaultDecodingContext, a new DecompressorStream is allocated and used. This is a lot of allocation, and the use of the streaming pattern requires copying every byte to be decompressed more times than necessary. Each byte is copied from a ByteBuff into a byte[], then decompressed, then copied back to a ByteBuff. For decompressors like org.apache.hadoop.hbase.io.compress.zstd.ZstdDecompressor that only operate on direct memory, two additional copies are introduced to move from a byte[] to a direct NIO ByteBuffer, then back to a byte[].

Aside from the copies inherent in the decompression algorithm, and the necessity of copying from an compressed buffer to an uncompressed buffer, all of these other copies can be avoided without sacrificing functionality. Along the way, we'll also avoid allocating objects.

In this PR:

  • Introduce the interface ByteBuffDecompressor which does exactly what it sounds like
  • Provide a ZstdByteBuffDecompressor that uses zstd-jni underneath
    • This works when the input and output arguments are both direct SingleByteBuffs or both heap SingleByteBuffs.
    • I have a plan to improve zstd-jni so we can handle other combinations in HBase in the future.
  • The CodecPool now pools ByteBuffDecompressors the same way that it pools Decompressors.
  • When decoding an HFile block, if the decompressor supports decompression directly on the ByteBuffs, then take the new fast path.

In a subsequent PR I plan to add glue so that any codec offering a org.apache.hadoop.io.compress.DirectDecompressor, which several in hadoop-common already do, can be used as a ByteBuffDecompressor.

I've already been using this code successfully in production at my company.

@charlesconnell charlesconnell changed the title ZStandard decompression can operate directly on ByteBuffs HBASE-29135: ZStandard decompression can operate directly on ByteBuffs Feb 19, 2025
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache9
Copy link
Contributor

Apache9 commented Feb 19, 2025

Is the test failure related?

@charlesconnell
Copy link
Contributor Author

charlesconnell commented Feb 19, 2025

I don't think so. It works on my machine. I think it's just not able to get enough compute resources to complete within its timeout.

@charlesconnell
Copy link
Contributor Author

I spoke too soon. The test works on my machine on a 2.6-based version of this branch that I've been testing with. It doesn't work on this branch.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@charlesconnell
Copy link
Contributor Author

I'm now supporting the CanReinit interface in ZstdByteBuffDecompressor, and the test passes now.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

import org.apache.yetus.audience.InterfaceAudience;
import org.apache.yetus.audience.InterfaceStability;

@InterfaceAudience.Public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for making these new classes public? I wonder whether private or limited private (with config exposure) is more appropriate

Copy link
Contributor Author

@charlesconnell charlesconnell Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that people might want to be able to write their own ByteBuffDecompressionCodecs outside the HBase source tree. But I'm not married to that. I'll change it to Private so it'll be easier to change in the future.

@Override
public void close() {
ctx.close();
dict.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible for this to produce a NPE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

Comment on lines 47 to 54
while (decompressedBytesInBlock < decompressedBlockSize) {
int compressedChunkSize = rawReadInt(input);
compressedBytesConsumed += 4;
int n = rawDecompressor.decompress(output, input, compressedChunkSize);
compressedBytesConsumed += compressedChunkSize;
decompressedBytesInBlock += n;
totalDecompressedBytes += n;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have some sort of check to bail out of the loop if RawDecompressor#decompress returns zero for some reason? Otherwise I think this logic would be stuck

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

* Specification of a block-based decompressor, which can be more efficient than the stream-based
* Decompressor.
*/
@InterfaceAudience.Public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on also making this private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple questions. Also a basic unit test that verifies the happy path and the obvious unsupported paths would be better.

return totalDecompressedBytes;
}

private static int rawReadInt(ByteBuff input) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you don't need to implement this method. Instead, call ByteBuff#getInt(). It uses the Unsafe to read the full 4 bytes at once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ByteBuff#getInt() assumes a system-dependent endian-ness, so its behavior is not totally deterministic. That's why I'm using my own method here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also, since all hardware I use is little-endian, it actually reads this format wrong)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay makes sense. Please add a comment to the method that makes note of this endian-specific implementation. Maybe in the future we'll update our ByteBuff utilities to account for specific endian-ness.

import org.apache.yetus.audience.InterfaceAudience;

/**
* Specification of a block-based decompressor, which can be more efficient than the stream-based
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it "block-based", or "ByteBuff-based"? Nothing in the interface name or methods tells me that it's only decompressing a single serialised HFileBlocks. Does it operate on a single block at a time, or can I provide it an inputLen that represents several blocks in the same input buffer?

Maybe all this is sort of assumed by the existing conventions in this package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment is my bad. It should say "ByteBuff-based." I initially was using "block" in a vague way meaning the opposite of a stream.

ByteBuffDecompressor decompressor =
CodecPool.getByteBuffDecompressor((ByteBuffDecompressionCodec) codec);
if (LOG.isTraceEnabled()) {
LOG.trace("Retrieved decompressor " + decompressor + " from pool.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use Logger format string API instead.

public void returnByteBuffDecompressor(ByteBuffDecompressor decompressor) {
if (decompressor != null) {
if (LOG.isTraceEnabled()) {
LOG.trace("Returning decompressor " + decompressor + " to pool.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

}
}

private boolean canFastDecompress(ByteBuff blockBufferWithoutHeader, ByteBuff onDiskBlock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fast" is relative and will likely continue to change. Instead, can you use a more descriptive name for this alternative implementation. Maybe canDecompressViaByteBuffDecompressor?

@InterfaceAudience.Private
public class ZstdByteBuffDecompressor implements ByteBuffDecompressor, CanReinit {

protected int dictId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these fields protected instead of private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the convention established in ZstdDecompressor, but I don't need to be.

@charlesconnell
Copy link
Contributor Author

I've added some unit tests

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charlesconnell this looks pretty nice!

return totalDecompressedBytes;
}

private static int rawReadInt(ByteBuff input) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay makes sense. Please add a comment to the method that makes note of this endian-specific implementation. Maybe in the future we'll update our ByteBuff utilities to account for specific endian-ness.

input.put(COMPRESSED_PAYLOAD);
input.rewind();
int decompressedSize = decompressor.decompress(output, input, COMPRESSED_PAYLOAD.length);
assertEquals("HBase is awesome", Bytes.toString(output.toBytes(0, decompressedSize)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for branch
+1 💚 mvninstall 3m 47s master passed
+1 💚 compile 4m 30s master passed
+1 💚 checkstyle 1m 7s master passed
+1 💚 spotbugs 2m 30s master passed
+1 💚 spotless 0m 44s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for patch
+1 💚 mvninstall 2m 59s the patch passed
+1 💚 compile 4m 2s the patch passed
+1 💚 javac 4m 2s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 13s /results-checkstyle-hbase-common.txt hbase-common: The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7)
+1 💚 spotbugs 2m 46s the patch passed
+1 💚 hadoopcheck 11m 51s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 43s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 25s The patch does not generate ASF License warnings.
44m 59s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6708/8/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6708
JIRA Issue HBASE-29135
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux f4d426a4ab56 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 6d7b98f
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-common hbase-server hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6708/8/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 19s Maven dependency ordering for branch
+1 💚 mvninstall 2m 58s master passed
+1 💚 compile 1m 30s master passed
+1 💚 javadoc 0m 53s master passed
+1 💚 shadedjars 5m 50s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 5s the patch passed
+1 💚 compile 1m 30s the patch passed
+1 💚 javac 1m 30s the patch passed
+1 💚 javadoc 0m 54s the patch passed
+1 💚 shadedjars 5m 55s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 2m 15s hbase-common in the patch passed.
+1 💚 unit 215m 26s hbase-server in the patch passed.
+1 💚 unit 4m 9s hbase-compression-zstd in the patch passed.
250m 2s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6708/8/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6708
JIRA Issue HBASE-29135
Optional Tests javac javadoc unit compile shadedjars
uname Linux 0a7800f42d28 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 6d7b98f
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6708/8/testReport/
Max. process+thread count 5423 (vs. ulimit of 30000)
modules C: hbase-common hbase-server hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6708/8/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@ndimiduk ndimiduk merged commit 13f174b into apache:master Feb 27, 2025
1 check passed
@ndimiduk ndimiduk deleted the HBASE-29135/zstd-fast-decompression branch February 27, 2025 14:56
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Feb 27, 2025
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Feb 27, 2025
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Feb 27, 2025
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Feb 27, 2025
ndimiduk pushed a commit that referenced this pull request Feb 28, 2025
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Feb 28, 2025
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Feb 28, 2025
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Feb 28, 2025
ndimiduk added a commit that referenced this pull request Mar 3, 2025
#6708)

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Charles Connell <cconnell@hubspot.com>
ndimiduk added a commit that referenced this pull request Mar 3, 2025
#6708)

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Charles Connell <cconnell@hubspot.com>
ndimiduk added a commit that referenced this pull request Mar 3, 2025
#6708)

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Charles Connell <cconnell@hubspot.com>
mokai87 pushed a commit to mokai87/hbase that referenced this pull request Aug 7, 2025
apache#6708)

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Charles Connell <cconnell@hubspot.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants