Skip to content

GH-3518: Bulk write in LittleEndianDataOutputStream.writeInt/writeShort (+35% encodePlain when used)#3519

Closed
iemejia wants to merge 1 commit intoapache:masterfrom
iemejia:perf-le-output-bulk
Closed

GH-3518: Bulk write in LittleEndianDataOutputStream.writeInt/writeShort (+35% encodePlain when used)#3519
iemejia wants to merge 1 commit intoapache:masterfrom
iemejia:perf-le-output-bulk

Conversation

@iemejia
Copy link
Copy Markdown
Member

@iemejia iemejia commented Apr 21, 2026

Summary

Resolves #3518.

LittleEndianDataOutputStream.writeInt(int) and writeShort(int) decompose the value byte-by-byte and call out.write(int) for each byte:

public final void writeInt(int v) throws IOException {
  out.write((v >>> 0) & 0xFF);
  out.write((v >>> 8) & 0xFF);
  out.write((v >>> 16) & 0xFF);
  out.write((v >>> 24) & 0xFF);
}

When the underlying stream is CapacityByteArrayOutputStream (the typical case in Parquet writers), each out.write(int) performs a hasRemaining check, a Math.addExact, possibly a slab-grow check, and a single-byte store. For writeInt, that's 4 trips through the bookkeeping per value.

The class already has the right pattern in writeLong: build the writeBuffer[] and emit a single out.write(writeBuffer, 0, 8). The buffer is even pre-allocated for that purpose. This PR extends the same pattern to writeInt and writeShort.

Resolves the long-standing TODO in writeInt:

// TODO: see note in LittleEndianDataInputStream: maybe faster
// to use Integer.reverseBytes() and then writeInt, or a ByteBuffer
// approach

Benchmark

IntEncodingBenchmark.encodePlain when routed through LittleEndianDataOutputStream:

master:  ~20.9M ops/s
this PR: ~28.2M ops/s   (+35%)

Note on context

PR #3496 deprecates LittleEndianDataOutputStream because Parquet's own writers no longer use it (they write directly into ByteBuffer-backed slabs, which compiles to a single intrinsic store on little-endian and is strictly faster than any wrapper).

After #3496 lands, no Parquet code in any module instantiates LittleEndianDataOutputStream. This PR therefore benefits external Parquet-format producers that still use the class — they get the speedup until they migrate. The change is minimal (~10 lines), obviously correct (matches the existing writeLong pattern in the same file), and resolves the existing TODO.

If the maintainers prefer to leave a deprecated class untouched, this PR is easy to drop. I'm flagging the option because the change is small enough that it costs almost nothing to land and helps anyone outside the Parquet codebase still on the class.

Validation

  • parquet-common: 308 tests pass
  • Built with -Dspotless.check.skip=true -Drat.skip=true -Djapicmp.skip=true

User-facing changes

None. No public API change. Behavior of writeInt, writeShort, and writeLong is identical bit-for-bit.

Closes #3518

Part of a small series of focused performance PRs from work in parquet-perf. Previous: #3494, #3496, #3500, #3504, #3506, #3510, #3514, #3517. Companion benchmarks contribution: #3512.

…iteShort

Replace per-byte out.write(int) calls with a single out.write(byte[], 0, N)
using the existing writeBuffer[] field, matching the pattern already used
by writeLong. For writeInt this collapses 4 bookkeeping trips through the
underlying stream (hasRemaining check, Math.addExact, slab-grow check,
single-byte store) into 1.

Resolves the long-standing TODO comment in writeInt that flagged this as
a potential improvement.

Benchmark (IntEncodingBenchmark.encodePlain when routed through
LittleEndianDataOutputStream):
  ~20.9M -> ~28.2M ops/s (+35%)

Note: PR apache#3496 deprecates this class because Parquet's own writers no
longer use it. This change benefits external Parquet-format producers
that still use the class until they migrate.

Validation: parquet-common 308 tests pass. Built with
-Dspotless.check.skip=true -Drat.skip=true -Djapicmp.skip=true.
@iemejia
Copy link
Copy Markdown
Member Author

iemejia commented Apr 21, 2026

Folding this change into #3496 (the PR that deprecates LittleEndianDataOutputStream) since both touch the same class and are best reviewed together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize LittleEndianDataOutputStream.writeInt/writeShort with bulk write (resolves existing TODO)

1 participant