GH-3530: Optimize DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY, and DELTA_BYTE_ARRAY encoding/decoding#3567
Open
iemejia wants to merge 1 commit into
Open
GH-3530: Optimize DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY, and DELTA_BYTE_ARRAY encoding/decoding#3567iemejia wants to merge 1 commit into
iemejia wants to merge 1 commit into
Conversation
This was referenced May 17, 2026
… and DELTA_BYTE_ARRAY encoding/decoding DELTA_BINARY_PACKED reader: - Cache BytePackerForLong instances (packerCache) to eliminate repeated factory lookups per mini block - Add unpack32Values bulk method that processes 32 values per call instead of 8, reducing loop overhead - Replace ByteBuffer miniBlockByteBuffer with byte[] to avoid ByteBuffer.slice() allocation per mini block and enable the faster byte[]-based packer APIs DELTA_BINARY_PACKED integer writer: - Cache BytePackerForLong instances (packerCache) - Add pack32Values bulk packing method (32 values per call) DELTA_BINARY_PACKED long writer: - Cache BytePackerForLong instances (packerCache) - Add pack32Values bulk packing method (32 values per call) DELTA_BINARY_PACKED base writer: - Remove unused 3-argument constructor DELTA_LENGTH_BYTE_ARRAY writer: - Remove LittleEndianDataOutputStream wrapper; write directly to CapacityByteArrayOutputStream via BytesUtils - Add writeBytes(byte[],int,int) overload for direct byte array writes DELTA_BYTE_ARRAY reader: - Add ByteArraySliceOutputStream to eliminate temporary byte[] copies when materializing prefix+suffix in readBytes() DELTA_BYTE_ARRAY writer: - Use copy().getBytesUnsafe() and direct writeBytes(byte[],int,int) to avoid intermediate Binary allocations - Use Arrays.mismatch for prefix length computation, which is JVM-intrinsified for SIMD acceleration Test utilities: - Remove unused writeInts method from Utils JMH benchmarks: - DeltaBinaryPackedEncodingBenchmark: INT32/INT64 scalar encode with SEQUENTIAL, RANDOM, LOW_CARDINALITY, HIGH_CARDINALITY data patterns - DeltaBinaryPackedDecodingBenchmark: INT32/INT64 scalar decode - DeltaByteArrayEncodingBenchmark: BINARY/FLBA scalar encode with RANDOM/SORTED data and varying string/fixed lengths - DeltaByteArrayDecodingBenchmark: BINARY/FLBA scalar decode - DeltaLengthByteArrayEncodingBenchmark: BINARY scalar encode with UNIFORM_LENGTH/VARIABLE_LENGTH distributions - DeltaLengthByteArrayDecodingBenchmark: BINARY scalar decode - LongDeltaDecodingBenchmark: INT64 decode with 5 bit-width patterns (SEQUENTIAL_DENSE, SEQUENTIAL_STRIDED, RANDOM_SMALL, RANDOM_WIDE, TIMESTAMP_MILLIS) - Shared TestDataFactory for deterministic benchmark data generation
7aae9be to
e794ec2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of #3530 — Apache Parquet Java Performance Improvements
Summary
Optimize scalar hot paths for DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY, and DELTA_BYTE_ARRAY.
DELTA_BINARY_PACKED reader: Cache
BytePackerForLonginstances, addunpack32Valuesbulk method, replaceByteBufferwith reusedbyte[]for mini-block data.DELTA_BINARY_PACKED writers: Cache
BytePackerForLonginstances, addpack32Values.DELTA_LENGTH_BYTE_ARRAY writer: Remove
LittleEndianDataOutputStreamwrapper; write directly toCapacityByteArrayOutputStreamviaBytesUtils.DELTA_BYTE_ARRAY reader:
ByteArraySliceOutputStreamto eliminate temporary copies.DELTA_BYTE_ARRAY writer:
Arrays.mismatchfor SIMD prefix-length computation, directwriteBytes(byte[],int,int)to avoidBinaryallocations.JMH benchmarks:
DeltaBinaryPackedEncodingBenchmark,DeltaBinaryPackedDecodingBenchmark,DeltaByteArrayEncodingBenchmark,DeltaByteArrayDecodingBenchmark,DeltaLengthByteArrayEncodingBenchmark,DeltaLengthByteArrayDecodingBenchmark,LongDeltaDecodingBenchmark.Benchmark results
Environment: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, Linux x86_64.
Key observations:
ByteArraySliceOutputStreameliminates per-value suffixbyte[]allocation.