Skip to content

Optimize LittleEndianDataOutputStream.writeInt/writeShort with bulk write (resolves existing TODO) #3518

@iemejia

Description

@iemejia

Background

LittleEndianDataOutputStream.writeInt(int) and writeShort(int) decompose the value byte-by-byte and call out.write(int) for each byte:

public final void writeInt(int v) throws IOException {
  out.write((v >>> 0) & 0xFF);
  out.write((v >>> 8) & 0xFF);
  out.write((v >>> 16) & 0xFF);
  out.write((v >>> 24) & 0xFF);
}

When the underlying stream is CapacityByteArrayOutputStream (the typical case in Parquet writers), each out.write(int) performs a hasRemaining check, a Math.addExact for the new size, possibly a slab-grow check, and a single-byte store. For writeInt, that's 4 separate trips through the bookkeeping.

The class already has the right pattern in writeLong: build a writeBuffer[] and emit a single out.write(writeBuffer, 0, 8). The buffer is even pre-allocated for that purpose. writeInt and writeShort just don't use it.

There's a TODO comment in writeInt (lines 147–149 in current master) acknowledging this:

// TODO: see note in LittleEndianDataInputStream: maybe faster
// to use Integer.reverseBytes() and then writeInt, or a ByteBuffer
// approach

Proposal

Extend the existing writeBuffer[] pattern to writeInt and writeShort:

public final void writeInt(int v) throws IOException {
  writeBuffer[0] = (byte) (v >>> 0);
  writeBuffer[1] = (byte) (v >>> 8);
  writeBuffer[2] = (byte) (v >>> 16);
  writeBuffer[3] = (byte) (v >>> 24);
  out.write(writeBuffer, 0, 4);
}

This collapses 4 write(int) calls into 1 write(byte[], int, int) call, cutting the bookkeeping overhead by ~4x per int. Matches the existing writeLong pattern in the same file.

Resolves the existing TODO in the source.

Expected impact

Standalone JMH benchmark of the class:

  • IntEncodingBenchmark.encodePlain (when routed through LittleEndianDataOutputStream): ~+35% (~20.9M → ~28.2M ops/s)

Note on context

PR #3496 deprecates LittleEndianDataOutputStream because Parquet's own writers no longer use it (they write directly into ByteBuffer-backed slabs, which compiles to a single intrinsic store on little-endian architectures and is strictly faster than any wrapper).

This PR is therefore a purely external-caller benefit: any third-party Parquet-format producer still using the class will get the speedup until they migrate. The change is minimal (~10 lines), obviously correct (matches the existing writeLong pattern), and resolves a long-standing TODO in the source.

Files affected

  • parquet-common/src/main/java/org/apache/parquet/bytes/LittleEndianDataOutputStream.java

No public API change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions