Skip to content

GH-3516: Optimize DeltaByteArrayWriter and DeltaLengthByteArrayValuesWriter (+33-55% encodeDeltaByteArray)#3517

Open
iemejia wants to merge 1 commit intoapache:masterfrom
iemejia:perf-delta-bytearray-writer
Open

GH-3516: Optimize DeltaByteArrayWriter and DeltaLengthByteArrayValuesWriter (+33-55% encodeDeltaByteArray)#3517
iemejia wants to merge 1 commit intoapache:masterfrom
iemejia:perf-delta-bytearray-writer

Conversation

@iemejia
Copy link
Copy Markdown
Member

@iemejia iemejia commented Apr 21, 2026

Summary

Resolves #3516.

Two related changes in the DELTA_BYTE_ARRAY write path:

1. DeltaLengthByteArrayValuesWriter: drop the unused LittleEndianDataOutputStream wrapper

The class wrapped its CapacityByteArrayOutputStream with a LittleEndianDataOutputStream that was only used by Binary.writeTo() — an extra layer of dispatch on every value that never used any LE-specific functionality (writeInt/writeLong/etc.). Binary.writeTo(arrayOut) works directly with the underlying stream.

Also adds a new overload:

public void writeBytes(byte[] data, int offset, int length) {
  lengthWriter.writeInteger(length);
  arrayOut.write(data, offset, length);
}

so callers that already have the raw bytes can avoid allocating a Binary wrapper.

2. DeltaByteArrayWriter: eliminate per-value Binary.slice() allocation in the suffix path

Tightens the suffixWriter field type from ValuesWriter to DeltaLengthByteArrayValuesWriter (it's always constructed as one) so the new raw-bytes overload is callable. The suffix call becomes:

suffixWriter.writeBytes(vb, i, vb.length - i);

instead of suffixWriter.writeBytes(v.slice(i, vb.length - i)), eliminating the ByteArraySliceBackedBinary allocation per value plus a layer of virtual dispatch.

Benchmark results

From BinaryEncodingBenchmark.encodeDeltaByteArray / encodeDeltaLengthByteArray (added in #3512):

Benchmark Configuration master this PR speedup
encodeDeltaByteArray LOW card, len=10 0.1028 µs 0.0662 µs 1.55x
encodeDeltaByteArray HIGH card, len=10 0.1704 µs 0.1124 µs 1.52x
encodeDeltaByteArray LOW card, len=100 0.2079 µs 0.1678 µs 1.24x
encodeDeltaLengthByteArray LOW card, len=10 0.0481 µs 0.0397 µs 1.21x
encodeDeltaLengthByteArray LOW card, len=100 0.1503 µs 0.1374 µs 1.09x

Long-string cases are flat or trivial — the per-value allocation is amortized away when each value is hundreds of bytes.

How to reproduce

The JMH benchmarks cited above are being added to parquet-benchmarks in #3512. Once that lands, reproduce with:

./mvnw clean package -pl parquet-benchmarks -DskipTests \
    -Dspotless.check.skip=true -Drat.skip=true -Djapicmp.skip=true
java -jar parquet-benchmarks/target/parquet-benchmarks.jar \
    'BinaryEncodingBenchmark.encodeDeltaByteArray|BinaryEncodingBenchmark.encodeDeltaLengthByteArray' \
    -wi 5 -i 10 -f 3

Compare runs against master (baseline) and this branch (optimized).

Validation

  • parquet-column: 573 tests pass
  • Built with -Dspotless.check.skip=true -Drat.skip=true -Djapicmp.skip=true

User-facing changes

None. No public API change. No file format change.

The new DeltaLengthByteArrayValuesWriter.writeBytes(byte[], int, int) overload is added on top of the existing public API.

Closes #3516

Part of a small series of focused performance PRs from work in parquet-perf. Previous: #3494, #3496, #3500, #3504, #3506, #3510, #3514. Companion benchmarks contribution: #3512.

…luesWriter

Two related changes in the DELTA_BYTE_ARRAY write path:

1. DeltaLengthByteArrayValuesWriter: drop the unused LittleEndianDataOutputStream
   wrapper. Binary.writeTo(arrayOut) works directly with the underlying
   CapacityByteArrayOutputStream; the LE wrapper added an extra layer of
   dispatch on every value but never used any LE functionality
   (writeInt/writeLong/etc.). Add a new writeBytes(byte[], int, int) overload
   so callers that already have the raw bytes can avoid allocating a Binary
   wrapper.

2. DeltaByteArrayWriter: tighten suffixWriter field type to
   DeltaLengthByteArrayValuesWriter (it's always constructed as one) so the
   new writeBytes(byte[], int, int) overload is callable. Replace the suffix
   call with the raw-bytes overload, eliminating the per-value Binary.slice()
   allocation.

Benchmark results (BinaryEncodingBenchmark.encodeDeltaByteArray and
encodeDeltaLengthByteArray, added in apache#3512):

  - encodeDeltaByteArray (LOW cardinality, len=10):  +33% to +55%
  - encodeDeltaLengthByteArray (LOW card, len=10):   +18% to +21%
  - long-string cases: flat (per-value alloc amortized away)

No public API change. No file format change.

Validation: parquet-column 573 tests pass. Built with
-Dspotless.check.skip=true -Drat.skip=true -Djapicmp.skip=true.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize DeltaByteArrayWriter and DeltaLengthByteArrayValuesWriter: remove per-value allocation and LittleEndianDataOutputStream wrapper

1 participant