HBASE-29135: ZStandard decompression can operate directly on ByteBuffs #6708

charlesconnell · 2025-02-19T02:38:41Z

Each time a block is decoded in HFileBlockDefaultDecodingContext, a new DecompressorStream is allocated and used. This is a lot of allocation, and the use of the streaming pattern requires copying every byte to be decompressed more times than necessary. Each byte is copied from a ByteBuff into a byte[], then decompressed, then copied back to a ByteBuff. For decompressors like org.apache.hadoop.hbase.io.compress.zstd.ZstdDecompressor that only operate on direct memory, two additional copies are introduced to move from a byte[] to a direct NIO ByteBuffer, then back to a byte[].

Aside from the copies inherent in the decompression algorithm, and the necessity of copying from an compressed buffer to an uncompressed buffer, all of these other copies can be avoided without sacrificing functionality. Along the way, we'll also avoid allocating objects.

In this PR:

Introduce the interface ByteBuffDecompressor which does exactly what it sounds like
Provide a ZstdByteBuffDecompressor that uses zstd-jni underneath
- This works when the input and output arguments are both direct SingleByteBuffs or both heap SingleByteBuffs.
- I have a plan to improve zstd-jni so we can handle other combinations in HBase in the future.
The CodecPool now pools ByteBuffDecompressors the same way that it pools Decompressors.
When decoding an HFile block, if the decompressor supports decompression directly on the ByteBuffs, then take the new fast path.

In a subsequent PR I plan to add glue so that any codec offering a org.apache.hadoop.io.compress.DirectDecompressor, which several in hadoop-common already do, can be used as a ByteBuffDecompressor.

I've already been using this code successfully in production at my company.

Apache9 · 2025-02-19T09:44:41Z

Is the test failure related?

charlesconnell · 2025-02-19T12:44:12Z

~~I don't think so. It works on my machine. I think it's just not able to get enough compute resources to complete within its timeout.~~

charlesconnell · 2025-02-19T13:07:58Z

I spoke too soon. The test works on my machine on a 2.6-based version of this branch that I've been testing with. It doesn't work on this branch.

charlesconnell · 2025-02-19T18:01:17Z

I'm now supporting the CanReinit interface in ZstdByteBuffDecompressor, and the test passes now.

rmdmattingly · 2025-02-24T15:54:59Z

hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/ByteBuffDecompressionCodec.java

+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.yetus.audience.InterfaceStability;
+
+@InterfaceAudience.Public


What's the motivation for making these new classes public? I wonder whether private or limited private (with config exposure) is more appropriate

I thought that people might want to be able to write their own ByteBuffDecompressionCodecs outside the HBase source tree. But I'm not married to that. I'll change it to Private so it'll be easier to change in the future.

rmdmattingly · 2025-02-24T15:59:18Z

...on-zstd/src/main/java/org/apache/hadoop/hbase/io/compress/zstd/ZstdByteBuffDecompressor.java

+  @Override
+  public void close() {
+    ctx.close();
+    dict.close();


I think it's possible for this to produce a NPE

rmdmattingly · 2025-02-24T16:43:38Z

hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/BlockDecompressorHelper.java

+      while (decompressedBytesInBlock < decompressedBlockSize) {
+        int compressedChunkSize = rawReadInt(input);
+        compressedBytesConsumed += 4;
+        int n = rawDecompressor.decompress(output, input, compressedChunkSize);
+        compressedBytesConsumed += compressedChunkSize;
+        decompressedBytesInBlock += n;
+        totalDecompressedBytes += n;
+      }


should we have some sort of check to bail out of the loop if RawDecompressor#decompress returns zero for some reason? Otherwise I think this logic would be stuck

rmdmattingly · 2025-02-24T19:21:29Z

hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/ByteBuffDecompressor.java

+ * Specification of a block-based decompressor, which can be more efficient than the stream-based
+ * Decompressor.
+ */
+@InterfaceAudience.Public


Any thoughts on also making this private?

ndimiduk

I have a couple questions. Also a basic unit test that verifies the happy path and the obvious unsupported paths would be better.

ndimiduk · 2025-02-25T13:27:11Z

hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/BlockDecompressorHelper.java

+    return totalDecompressedBytes;
+  }
+
+  private static int rawReadInt(ByteBuff input) {


I think that you don't need to implement this method. Instead, call ByteBuff#getInt(). It uses the Unsafe to read the full 4 bytes at once.

ByteBuff#getInt() assumes a system-dependent endian-ness, so its behavior is not totally deterministic. That's why I'm using my own method here.

(also, since all hardware I use is little-endian, it actually reads this format wrong)

Okay makes sense. Please add a comment to the method that makes note of this endian-specific implementation. Maybe in the future we'll update our ByteBuff utilities to account for specific endian-ness.

ndimiduk · 2025-02-25T13:46:29Z

hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/ByteBuffDecompressor.java

+import org.apache.yetus.audience.InterfaceAudience;
+
+/**
+ * Specification of a block-based decompressor, which can be more efficient than the stream-based


nit: is it "block-based", or "ByteBuff-based"? Nothing in the interface name or methods tells me that it's only decompressing a single serialised HFileBlocks. Does it operate on a single block at a time, or can I provide it an inputLen that represents several blocks in the same input buffer?

Maybe all this is sort of assumed by the existing conventions in this package.

That comment is my bad. It should say "ByteBuff-based." I initially was using "block" in a vague way meaning the opposite of a stream.

ndimiduk · 2025-02-25T13:51:23Z

hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java

+        ByteBuffDecompressor decompressor =
+          CodecPool.getByteBuffDecompressor((ByteBuffDecompressionCodec) codec);
+        if (LOG.isTraceEnabled()) {
+          LOG.trace("Retrieved decompressor " + decompressor + " from pool.");


nit: use Logger format string API instead.

ndimiduk · 2025-02-25T13:51:46Z

hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java

+    public void returnByteBuffDecompressor(ByteBuffDecompressor decompressor) {
+      if (decompressor != null) {
+        if (LOG.isTraceEnabled()) {
+          LOG.trace("Returning decompressor " + decompressor + " to pool.");


ndimiduk · 2025-02-25T13:55:53Z

...mmon/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockDefaultDecodingContext.java

+    }
+  }
+
+  private boolean canFastDecompress(ByteBuff blockBufferWithoutHeader, ByteBuff onDiskBlock) {


"fast" is relative and will likely continue to change. Instead, can you use a more descriptive name for this alternative implementation. Maybe canDecompressViaByteBuffDecompressor?

ndimiduk · 2025-02-25T14:11:17Z

...on-zstd/src/main/java/org/apache/hadoop/hbase/io/compress/zstd/ZstdByteBuffDecompressor.java

+@InterfaceAudience.Private
+public class ZstdByteBuffDecompressor implements ByteBuffDecompressor, CanReinit {
+
+  protected int dictId;


Why are these fields protected instead of private?

I'm following the convention established in ZstdDecompressor, but I don't need to be.

charlesconnell · 2025-02-25T19:10:35Z

I've added some unit tests

ndimiduk

Thanks @charlesconnell this looks pretty nice!

ndimiduk · 2025-02-26T14:09:40Z

hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/BlockDecompressorHelper.java

+    return totalDecompressedBytes;
+  }
+
+  private static int rawReadInt(ByteBuff input) {


Okay makes sense. Please add a comment to the method that makes note of this endian-specific implementation. Maybe in the future we'll update our ByteBuff utilities to account for specific endian-ness.

ndimiduk · 2025-02-26T14:15:22Z

...std/src/test/java/org/apache/hadoop/hbase/io/compress/zstd/TestZstdByteBuffDecompressor.java

+      input.put(COMPRESSED_PAYLOAD);
+      input.rewind();
+      int decompressedSize = decompressor.decompress(output, input, COMPRESSED_PAYLOAD.length);
+      assertEquals("HBase is awesome", Bytes.toString(output.toBytes(0, decompressedSize)));


Apache-HBase · 2025-02-26T15:51:09Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 34s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ master Compile Tests _
+0 🆗	mvndep	0m 10s		Maven dependency ordering for branch
+1 💚	mvninstall	3m 47s		master passed
+1 💚	compile	4m 30s		master passed
+1 💚	checkstyle	1m 7s		master passed
+1 💚	spotbugs	2m 30s		master passed
+1 💚	spotless	0m 44s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 11s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 59s		the patch passed
+1 💚	compile	4m 2s		the patch passed
+1 💚	javac	4m 2s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 13s	/results-checkstyle-hbase-common.txt	hbase-common: The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7)
+1 💚	spotbugs	2m 46s		the patch passed
+1 💚	hadoopcheck	11m 51s		Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚	spotless	0m 43s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 25s		The patch does not generate ASF License warnings.
		44m 59s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6708/8/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#6708
JIRA Issue	HBASE-29135
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux f4d426a4ab56 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `6d7b98f`
Default Java	Eclipse Adoptium-17.0.11+9
Max. process+thread count	85 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6708/8/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2025-02-26T19:16:27Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 29s		Docker mode activated.
-0 ⚠️	yetus	0m 3s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ master Compile Tests _
+0 🆗	mvndep	0m 19s		Maven dependency ordering for branch
+1 💚	mvninstall	2m 58s		master passed
+1 💚	compile	1m 30s		master passed
+1 💚	javadoc	0m 53s		master passed
+1 💚	shadedjars	5m 50s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 13s		Maven dependency ordering for patch
+1 💚	mvninstall	3m 5s		the patch passed
+1 💚	compile	1m 30s		the patch passed
+1 💚	javac	1m 30s		the patch passed
+1 💚	javadoc	0m 54s		the patch passed
+1 💚	shadedjars	5m 55s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
+1 💚	unit	2m 15s		hbase-common in the patch passed.
+1 💚	unit	215m 26s		hbase-server in the patch passed.
+1 💚	unit	4m 9s		hbase-compression-zstd in the patch passed.
		250m 2s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6708/8/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#6708
JIRA Issue	HBASE-29135
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux 0a7800f42d28 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `6d7b98f`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6708/8/testReport/
Max. process+thread count	5423 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6708/8/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

apache#6708) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>

#6708) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>

apache#6708) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>

#6708) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Co-authored-by: Charles Connell <cconnell@hubspot.com>

apache#6708) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Co-authored-by: Charles Connell <cconnell@hubspot.com>

ZStandard decompression can operate directly on ByteBuffs

36b29e5

charlesconnell changed the title ~~ZStandard decompression can operate directly on ByteBuffs~~ HBASE-29135: ZStandard decompression can operate directly on ByteBuffs Feb 19, 2025