Skip to content

GH-3530: Bypass Hadoop codec abstraction to optimize compression performance#3570

Open
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:parquet-perf-v2-par6-compression
Open

GH-3530: Bypass Hadoop codec abstraction to optimize compression performance#3570
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:parquet-perf-v2-par6-compression

Conversation

@iemejia
Copy link
Copy Markdown
Member

@iemejia iemejia commented May 17, 2026

Part of #3530 — Apache Parquet Java Performance Improvements

Summary

Bypass the Hadoop CompressionCodec abstraction for all six supported codecs, eliminating per-page codec-pool lookups, stream-wrapper allocation, and unnecessary buffer copies in both CodecFactory and DirectCodecFactory.

Codec Before After
Snappy Hadoop SnappyCodec stream wrappers xerial Snappy.compress/uncompress direct calls
LZ4_RAW Hadoop codec abstraction airlift LZ4Compressor/LZ4Decompressor direct
ZSTD Streaming ZstdOutputStreamNoFinalizer/ZstdInputStreamNoFinalizer Reusable ZstdCompressCtx/ZstdDecompressCtx single-call APIs
GZIP Hadoop GzipCodec with codec-pool overhead JDK GZIPOutputStream/GZIPInputStream direct
LZO GPL com.hadoop.compression.lzo.LzoCodec aircompressor LzoHadoopStreams (Apache 2.0, wire-compatible)
Brotli Abandoned brotli-codec (jbrotli, 2016, x86-only) brotli4j 1.23.0 (10 platforms incl. aarch64, reflection-loaded)

Notable side effects:

  • LZO: Removes GPL dependency; uses Apache 2.0 aircompressor. Wire-compatible framing.
  • Brotli: Enables aarch64 support (linux, macOS, Windows). Removes non-aarch64 Maven profile guards and test skips.

JMH benchmarks: CompressionBenchmark, CpuReadBenchmark, CpuWriteBenchmark, FileReadBenchmark, FileWriteBenchmark, ConcurrentReadWriteBenchmark.

Benchmark results

Environment: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, Linux x86_64.

End-to-end file write (100K rows, SingleShotTime, ms/op lower is better):

Codec V1 dict=true V2 dict=true V2 Speedup
SNAPPY 50.6 -> 40.9 (1.24x) 69.7 -> 38.7 1.80x
ZSTD 52.3 -> 43.6 (1.20x) 70.7 -> 40.6 1.74x
LZ4_RAW 49.6 -> 41.3 (1.20x) 70.2 -> 39.0 1.80x
GZIP 149.9 -> 119.3 (1.26x) 123.4 -> 67.6 1.83x
BROTLI 55.4 -> 46.8 (1.18x) 72.8 -> 41.8 1.74x

End-to-end file read (ms/op lower is better):

Codec V1 Speedup V2 Speedup
SNAPPY 1.50x 1.61x
ZSTD 1.49x 1.60x
LZ4_RAW 1.23x 1.57x
GZIP 1.47x 1.49x
BROTLI 1.83x 1.91x

Raw codec throughput (DirectCodecFactory): Snappy/ZSTD/LZ4/GZIP unchanged (already had native access). Brotli decompression improved 2.3-2.7x (brotli4j >> jbrotli).

V2 shows consistently larger speedups than V1 because V2 encoding produces more, smaller pages, meaning more codec invocations per file where the per-invocation Hadoop overhead accumulates.

…n performance

Some of the Parquet compression codecs rely on Hadoop's CompressionCodec.
After evaluating with performance tests that isolate the CPU utilization it
is clear that the Hadoop abstraction introduces considerable overhead.

This PR improves that for Snappy, LZ4_RAW, ZSTD, GZIP, LZO, and BROTLI.
It also migrates Brotli from jbrotli to brotli4j.

Bypass Hadoop CompressionCodec for Snappy (xerial JNI), LZ4_RAW (airlift),
ZSTD (zstd-jni), GZIP (JDK), LZO (airlift), and BROTLI (brotli4j) in both
CodecFactory and DirectCodecFactory, eliminating per-page codec pool lookups,
stream wrapper allocation, and unnecessary buffer copies.

ZSTD: replace streaming ZstdOutputStreamNoFinalizer/ZstdInputStreamNoFinalizer
with reusable ZstdCompressCtx/ZstdDecompressCtx single-call APIs.

GZIP: bypass Hadoop's GzipCodec and its codec-pool/stream-wrapper overhead
with direct JDK GZIPOutputStream/GZIPInputStream. Compression level is
read from the existing "zlib.compress.level" Hadoop configuration key.

LZO: bypass the GPL-licensed com.hadoop.compression.lzo.LzoCodec entirely
using aircompressor's LzoHadoopStreams (Apache 2.0). The framing format
(big-endian length-prefixed blocks) is wire-compatible with Hadoop's LzoCodec,
so existing LZO Parquet files remain readable. Removes the GPL dependency
for LZO support. Uncomment previously disabled LZO benchmarks and tests.

BROTLI: migrate from abandoned brotli-codec (jbrotli, 2016, x86-only) to
brotli4j 1.23.0 (com.aayushatharva.brotli4j) which supports 10 platforms
including linux/darwin/windows aarch64. brotli4j is a runtime-only optional
dependency accessed via reflection (Encoder.compress and Decoder.decompress)
to avoid a compile-time dependency. Uses Decoder.decompress(byte[], int, int)
instead of DirectDecompress to avoid loading classes that reference Netty.
Remove non-aarch64 Maven profile guards and aarch64 test skips.

ByteBuffer decompressors use native APIs with slice + manual position
advancement pattern (matching DirectCodecFactory.BaseDecompressor):
- Snappy: Snappy.uncompress(slice, slice)
- ZSTD: Zstd.decompress(slice, slice)
- LZ4_RAW: decompressor.decompress(slice, slice)
- GZIP: ByteBufferInputStream.wrap(slice) -> GZIPInputStream
- LZO: ByteBufferInputStream.wrap(slice) -> LzoHadoopInputStream
- BROTLI: byte[] copy through Decoder.decompress (no direct ByteBuffer API)

Add BytesInput.toByteArray() zero-copy override in ByteArrayBytesInput.

Add benchmarks: CompressionBenchmark, CpuReadBenchmark, CpuWriteBenchmark,
FileReadBenchmark, FileWriteBenchmark, InMemoryInputFile, InMemoryOutputFile,
ConcurrentReadWriteBenchmark. Remove encoding/row-group benchmarks.

Add 15 new tests in TestDirectCodecFactory, 3 new tests in TestBytesInput.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant