Skip to content

move readLong, readInt, readShort from ByteStringParser to ByteString so they can be optimized#2847

Merged
pjfanning merged 12 commits intoapache:mainfrom
pjfanning:copilot/update-swarutil-support-short
Apr 8, 2026
Merged

move readLong, readInt, readShort from ByteStringParser to ByteString so they can be optimized#2847
pjfanning merged 12 commits intoapache:mainfrom
pjfanning:copilot/update-swarutil-support-short

Conversation

@pjfanning
Copy link
Copy Markdown
Member

@pjfanning pjfanning commented Apr 6, 2026

Summary

This PR implements the following changes:

SWARUtil

  • Added getShort(array, index, ByteOrder) method using VarHandle byte array views for performance, with fallback implementations getShortBEWithoutMethodHandle and getShortLEWithoutMethodHandle
  • Added shortBeArrayView and shortLeArrayView VarHandle fields
  • Expanded class-level Scaladoc to document where the JDK itself uses MethodHandles.byteArrayViewVarHandle (in jdk.internal.util.ByteArray and jdk.internal.util.ByteArrayLittleEndian, backing java.io.DataInputStream and java.util.UUID) and explains the performance benefits: single native load instruction with JIT intrinsification, consolidated bounds check, no alignment requirement, and SWAR arithmetic

ByteString

  • Added readShortBE(offset), readShortLE(offset), readIntBE(offset), readIntLE(offset), readLongBE(offset), readLongLE(offset) public methods to the ByteString abstract class
  • All public methods throw IndexOutOfBoundsException if offset is negative or there are insufficient bytes, via a shared checkReadBounds helper
  • All public methods include @throws[IndexOutOfBoundsException] Javadoc and are marked @since 2.0.0
  • Added corresponding private[pekko] unchecked variants (readShortBEUnchecked, readShortLEUnchecked, readIntBEUnchecked, readIntLEUnchecked, readLongBEUnchecked, readLongLEUnchecked) that skip the bounds check; the public methods delegate to these after checking bounds
  • ByteString1C and ByteString1 override the unchecked variants with direct SWARUtil calls for optimal performance; ByteStrings inherits the byte-by-byte default

ByteStringParser

  • Updated ByteReader.readShortBE/LE, readIntBE/LE, readLongBE/LE to delegate to the new *Unchecked ByteString methods, avoiding a redundant bounds check (the reader already guards with NeedMoreData)

ByteIterator

  • Use the SWARUtil methods when we have a byte array backing the iterator

Tests

  • SWARUtilSpec: added getShort tests covering both byte orders, VarHandle and fallback paths
  • ByteStringSpec: added correctness tests for all three ByteString implementations (ByteString1C, ByteString1, ByteStrings) and bounds-checking tests that verify IndexOutOfBoundsException is thrown for negative offsets and insufficient data

Benchmark

The Benchmark in this PR shows improved perf for array backed ByteStrings (significant) but possible degradation
for ByteStrings that are concatenated from other ByteStrings (which are not backed by arrays).

I don't have the ideal setup to run perf tests but if anyone has time to run them, that would be great. If anyone has any thoughts about why there might be a slowdown for the ConcatString case, get in touch.

With Changes

[info] ByteStringParser_readNum_Benchmark.readIntBE                thrpt    3  22054741.054 ± 39434675.372  ops/s
[info] ByteStringParser_readNum_Benchmark.readIntBE_ConcatString   thrpt    3    470976.546 ±   267449.668  ops/s
[info] ByteStringParser_readNum_Benchmark.readLongBE               thrpt    3  42767459.968 ± 62862466.027  ops/s
[info] ByteStringParser_readNum_Benchmark.readLongBE_ConcatString  thrpt    3    484250.478 ±    99348.024  ops/s

Without Changes

[info] Benchmark                                                    Mode  Cnt         Score         Error  Units
[info] ByteStringParser_readNum_Benchmark.readIntBE                thrpt    3  12185698.383 ±  496144.228  ops/s
[info] ByteStringParser_readNum_Benchmark.readIntBE_ConcatString   thrpt    3    764037.404 ±  401052.665  ops/s
[info] ByteStringParser_readNum_Benchmark.readLongBE               thrpt    3  13025344.886 ± 4398418.139  ops/s
[info] ByteStringParser_readNum_Benchmark.readLongBE_ConcatString  thrpt    3    617643.729 ±  177448.913  ops/s

@pjfanning pjfanning marked this pull request as draft April 6, 2026 09:25
@pjfanning
Copy link
Copy Markdown
Member Author

@copilot ByteStringParser needs its readShort to be an int return value. Is there a good reason to have SWARUtil support this as well as readShort as a short and accessing this via a new package private unchecked read method in ByteString? Don't add this unless there is a reasonable performance reason to support it.

@pjfanning pjfanning marked this pull request as ready for review April 6, 2026 10:57
@pjfanning pjfanning added this to the 2.0.0-M2 milestone Apr 6, 2026
override def getShort(implicit byteOrder: ByteOrder): Short = {
if (len < 2) throw new NoSuchElementException("next on empty iterator")
if (byteOrder == ByteOrder.BIG_ENDIAN) {
val result = SWARUtil.getShort(array, from)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more explicit passing bigEndian = true as well

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the methods so ByteOrder is passed explicitly

Copy link
Copy Markdown
Member

@He-Pin He-Pin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice move!

Copy link
Copy Markdown
Member

@He-Pin He-Pin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this PR looks well-structured. A few observations:

Positive:

  • Clean separation between bounds-checked public API and unchecked internal methods
  • Good test coverage for all three ByteString implementations (ByteString1C, ByteString1, ByteStrings)
  • Proper IndexOutOfBoundsException handling with clear error messages
  • SWARUtil VarHandle approach with graceful fallback is the right pattern

Minor suggestions:

  1. The unchecked methods on the abstract ByteString class (lines ~1455-1480) use byte-by-byte apply() calls which is the correct fallback for ByteStrings, but consider documenting why ByteStrings doesn't override these -- it's because ByteStrings can span multiple chunks making a single VarHandle read unsafe.
  2. ByteStringParser correctly avoids double-bounds-checking by using *Unchecked methods -- good catch.

No blocking issues found.

@pjfanning
Copy link
Copy Markdown
Member Author

I'll look at making a few of the suggested changes

val result = input.readLongBEUnchecked(off)
off += 8
result
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have some benchmark attached.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a benchmark and put results in the description

Copy link
Copy Markdown
Member

@He-Pin He-Pin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep CR: PR #2847 - move readLong/Int/Short from ByteStringParser to ByteString

Architecture Review

Design Pattern: Template Method with Unchecked Overrides
The pattern of checkReadBounds() + readXxxUnchecked() is well-executed. ByteString1C and ByteString1 override the unchecked methods with SWARUtil VarHandle calls, while ByteStrings falls back to the byte-by-byte default. This is the correct approach since ByteStrings can span multiple internal chunks, making a single VarHandle read unsafe.

However, I'd recommend adding a comment on the abstract readXxxUnchecked methods in ByteString explaining why ByteStrings does not override them. Future maintainers might assume this is an oversight.

Binary Compatibility

New public API methods (readShortBE, readShortLE, readIntBE, readIntLE, readLongBE, readLongLE) are added to the abstract ByteString class with @since 2.0.0. Since Pekko 2.0 is a major version, binary compatibility breaks are expected.

The private[pekko] def readXxxUnchecked(offset: Int) methods are new private[pekko] abstract methods on ByteString. All concrete subclasses must implement them - the PR correctly does this. Since ByteString is sealed abstract class, this is safe.

Performance Considerations

The VarHandle-based multi-byte reads are correct. The .asInstanceOf[Short] cast on shortBeArrayView.get(array, index) is necessary because VarHandle.get() returns Object. The JIT should eliminate this at runtime.

The fallback path (getShortBEWithoutMethodHandle) uses the classic byte-shift approach. Since Pekko 2.0 targets Java 17+, the VarHandle path will always be available - the fallback is defensive coding for exotic JVMs.

Test Coverage

Tests cover all 3 ByteString implementations, both byte orders, boundary conditions, and SWARUtil fallback paths.

Missing: No tests for the ByteIterator.getShort/getInt/getLong overrides. The existing ByteIterator tests should exercise these, but explicit tests would be valuable.

Code Quality

  • checkReadBounds uses offset + size > length which can overflow for very large offsets. Safe in practice since ByteString length is bounded by Int.MaxValue.
  • ByteStringParser changes correctly avoid double-bounds-checking by using *Unchecked methods.

Java API Coverage

No Java DSL changes needed - these are methods on ByteString (a Scala class) and are naturally callable from Java. The @since 2.0.0 markers and @throws javadoc are properly included.

Suggestions

  1. Consider adding a comment on the abstract readXxxUnchecked methods explaining why ByteStrings does not override them.
  2. Consider adding explicit tests for ByteIterator.getShort/getInt/getLong overrides.

val result = SWARUtil.getShort(array, from)
from += 2
result
} else if (byteOrder == ByteOrder.LITTLE_ENDIAN) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is not needed,

@He-Pin
Copy link
Copy Markdown
Member

He-Pin commented Apr 6, 2026

@pjfanning btw, I suggest you use copilot cli , which will use less premium requests, and we can use a askUserQuestions to make it ask you and then the whole loop will just consume 1 request.
I'm using /fleet mode, which works great.

@pjfanning pjfanning marked this pull request as ready for review April 8, 2026 10:56
@pjfanning pjfanning merged commit f193ec1 into apache:main Apr 8, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants