move readLong, readInt, readShort from ByteStringParser to ByteString so they can be optimized by pjfanning · Pull Request #2847 · apache/pekko

pjfanning · 2026-04-06T09:11:11Z

Summary

This PR implements the following changes:

SWARUtil

Added getShort(array, index, ByteOrder) method using VarHandle byte array views for performance, with fallback implementations getShortBEWithoutMethodHandle and getShortLEWithoutMethodHandle
Added shortBeArrayView and shortLeArrayView VarHandle fields
Expanded class-level Scaladoc to document where the JDK itself uses MethodHandles.byteArrayViewVarHandle (in jdk.internal.util.ByteArray and jdk.internal.util.ByteArrayLittleEndian, backing java.io.DataInputStream and java.util.UUID) and explains the performance benefits: single native load instruction with JIT intrinsification, consolidated bounds check, no alignment requirement, and SWAR arithmetic

ByteString

Added readShortBE(offset), readShortLE(offset), readIntBE(offset), readIntLE(offset), readLongBE(offset), readLongLE(offset) public methods to the ByteString abstract class
All public methods throw IndexOutOfBoundsException if offset is negative or there are insufficient bytes, via a shared checkReadBounds helper
All public methods include @throws[IndexOutOfBoundsException] Javadoc and are marked @since 2.0.0
Added corresponding private[pekko] unchecked variants (readShortBEUnchecked, readShortLEUnchecked, readIntBEUnchecked, readIntLEUnchecked, readLongBEUnchecked, readLongLEUnchecked) that skip the bounds check; the public methods delegate to these after checking bounds
ByteString1C and ByteString1 override the unchecked variants with direct SWARUtil calls for optimal performance; ByteStrings inherits the byte-by-byte default

ByteStringParser

Updated ByteReader.readShortBE/LE, readIntBE/LE, readLongBE/LE to delegate to the new *Unchecked ByteString methods, avoiding a redundant bounds check (the reader already guards with NeedMoreData)

ByteIterator

Use the SWARUtil methods when we have a byte array backing the iterator

Tests

SWARUtilSpec: added getShort tests covering both byte orders, VarHandle and fallback paths
ByteStringSpec: added correctness tests for all three ByteString implementations (ByteString1C, ByteString1, ByteStrings) and bounds-checking tests that verify IndexOutOfBoundsException is thrown for negative offsets and insufficient data

Benchmark

The Benchmark in this PR shows improved perf for array backed ByteStrings (significant) but possible degradation
for ByteStrings that are concatenated from other ByteStrings (which are not backed by arrays).

I don't have the ideal setup to run perf tests but if anyone has time to run them, that would be great. If anyone has any thoughts about why there might be a slowdown for the ConcatString case, get in touch.

With Changes

[info] ByteStringParser_readNum_Benchmark.readIntBE                thrpt    3  22054741.054 ± 39434675.372  ops/s
[info] ByteStringParser_readNum_Benchmark.readIntBE_ConcatString   thrpt    3    470976.546 ±   267449.668  ops/s
[info] ByteStringParser_readNum_Benchmark.readLongBE               thrpt    3  42767459.968 ± 62862466.027  ops/s
[info] ByteStringParser_readNum_Benchmark.readLongBE_ConcatString  thrpt    3    484250.478 ±    99348.024  ops/s

Without Changes

[info] Benchmark                                                    Mode  Cnt         Score         Error  Units
[info] ByteStringParser_readNum_Benchmark.readIntBE                thrpt    3  12185698.383 ±  496144.228  ops/s
[info] ByteStringParser_readNum_Benchmark.readIntBE_ConcatString   thrpt    3    764037.404 ±  401052.665  ops/s
[info] ByteStringParser_readNum_Benchmark.readLongBE               thrpt    3  13025344.886 ± 4398418.139  ops/s
[info] ByteStringParser_readNum_Benchmark.readLongBE_ConcatString  thrpt    3    617643.729 ±  177448.913  ops/s

Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/0eb96aa0-0ec3-47d6-a26f-732a0367640e Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

…read methods Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/0eb96aa0-0ec3-47d6-a26f-732a0367640e Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/0eb96aa0-0ec3-47d6-a26f-732a0367640e Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

… usage in SWARUtil Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/6c70b3dc-cb53-4f8c-a1e3-692d6e6265c7 Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/6c70b3dc-cb53-4f8c-a1e3-692d6e6265c7 Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

pjfanning · 2026-04-06T09:42:41Z

@copilot ByteStringParser needs its readShort to be an int return value. Is there a good reason to have SWARUtil support this as well as readShort as a short and accessing this via a new package private unchecked read method in ByteString? Don't add this unless there is a reasonable performance reason to support it.

…rHandle reads Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/db26e214-8a2d-41ef-8720-86247c45c74c Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

nvollmar · 2026-04-06T17:31:47Z

actor/src/main/scala-2.13/org/apache/pekko/util/ByteIterator.scala

+    override def getShort(implicit byteOrder: ByteOrder): Short = {
+      if (len < 2) throw new NoSuchElementException("next on empty iterator")
+      if (byteOrder == ByteOrder.BIG_ENDIAN) {
+        val result = SWARUtil.getShort(array, from)


Might be more explicit passing bigEndian = true as well

I refactored the methods so ByteOrder is passed explicitly

He-Pin

Nice move!

He-Pin

Overall this PR looks well-structured. A few observations:

Positive:

Clean separation between bounds-checked public API and unchecked internal methods
Good test coverage for all three ByteString implementations (ByteString1C, ByteString1, ByteStrings)
Proper IndexOutOfBoundsException handling with clear error messages
SWARUtil VarHandle approach with graceful fallback is the right pattern

Minor suggestions:

The unchecked methods on the abstract ByteString class (lines ~1455-1480) use byte-by-byte apply() calls which is the correct fallback for ByteStrings, but consider documenting why ByteStrings doesn't override these -- it's because ByteStrings can span multiple chunks making a single VarHandle read unsafe.
ByteStringParser correctly avoids double-bounds-checking by using *Unchecked methods -- good catch.

No blocking issues found.

pjfanning · 2026-04-06T18:47:30Z

I'll look at making a few of the suggested changes

He-Pin · 2026-04-06T18:54:38Z

stream/src/main/scala/org/apache/pekko/stream/impl/io/ByteStringParser.scala

+      val result = input.readLongBEUnchecked(off)
+      off += 8
+      result
+    }


It would be nice to have some benchmark attached.

added a benchmark and put results in the description

He-Pin

Deep CR: PR #2847 - move readLong/Int/Short from ByteStringParser to ByteString

Architecture Review

Design Pattern: Template Method with Unchecked Overrides
The pattern of checkReadBounds() + readXxxUnchecked() is well-executed. ByteString1C and ByteString1 override the unchecked methods with SWARUtil VarHandle calls, while ByteStrings falls back to the byte-by-byte default. This is the correct approach since ByteStrings can span multiple internal chunks, making a single VarHandle read unsafe.

However, I'd recommend adding a comment on the abstract readXxxUnchecked methods in ByteString explaining why ByteStrings does not override them. Future maintainers might assume this is an oversight.

Binary Compatibility

New public API methods (readShortBE, readShortLE, readIntBE, readIntLE, readLongBE, readLongLE) are added to the abstract ByteString class with @since 2.0.0. Since Pekko 2.0 is a major version, binary compatibility breaks are expected.

The private[pekko] def readXxxUnchecked(offset: Int) methods are new private[pekko] abstract methods on ByteString. All concrete subclasses must implement them - the PR correctly does this. Since ByteString is sealed abstract class, this is safe.

Performance Considerations

The VarHandle-based multi-byte reads are correct. The .asInstanceOf[Short] cast on shortBeArrayView.get(array, index) is necessary because VarHandle.get() returns Object. The JIT should eliminate this at runtime.

The fallback path (getShortBEWithoutMethodHandle) uses the classic byte-shift approach. Since Pekko 2.0 targets Java 17+, the VarHandle path will always be available - the fallback is defensive coding for exotic JVMs.

Test Coverage

Tests cover all 3 ByteString implementations, both byte orders, boundary conditions, and SWARUtil fallback paths.

Missing: No tests for the ByteIterator.getShort/getInt/getLong overrides. The existing ByteIterator tests should exercise these, but explicit tests would be valuable.

Code Quality

checkReadBounds uses offset + size > length which can overflow for very large offsets. Safe in practice since ByteString length is bounded by Int.MaxValue.
ByteStringParser changes correctly avoid double-bounds-checking by using *Unchecked methods.

Java API Coverage

No Java DSL changes needed - these are methods on ByteString (a Scala class) and are naturally callable from Java. The @since 2.0.0 markers and @throws javadoc are properly included.

Suggestions

Consider adding a comment on the abstract readXxxUnchecked methods explaining why ByteStrings does not override them.
Consider adding explicit tests for ByteIterator.getShort/getInt/getLong overrides.

He-Pin · 2026-04-06T19:00:29Z

actor/src/main/scala-2.13/org/apache/pekko/util/ByteIterator.scala

+        val result = SWARUtil.getShort(array, from)
+        from += 2
+        result
+      } else if (byteOrder == ByteOrder.LITTLE_ENDIAN) {


this test is not needed,

He-Pin · 2026-04-06T19:18:07Z

@pjfanning btw, I suggest you use copilot cli , which will use less premium requests, and we can use a askUserQuestions to make it ask you and then the whole loop will just consume 1 request.
I'm using /fleet mode, which works great.

Copilot AI and others added 5 commits April 6, 2026 08:41

Add bounds checking and @throws Javadoc to ByteString read methods

884ed72

Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/0eb96aa0-0ec3-47d6-a26f-732a0367640e Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

Add bounds checking, @throws Javadoc, and bounds tests to ByteString …

8c62c62

…read methods Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/0eb96aa0-0ec3-47d6-a26f-732a0367640e Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

Add explicit Short casts to SWARUtil VarHandle getShort calls

fa07c2b

Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/0eb96aa0-0ec3-47d6-a26f-732a0367640e Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

Add unchecked read methods, remove semicolons, document VarHandle JDK…

8f7de8a

… usage in SWARUtil Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/6c70b3dc-cb53-4f8c-a1e3-692d6e6265c7 Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

Convert {at}code tags to backticks in SWARUtil Scaladoc

a911db4

Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/6c70b3dc-cb53-4f8c-a1e3-692d6e6265c7 Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

pjfanning marked this pull request as draft April 6, 2026 09:25

Override ByteArrayIterator getShort/getInt/getLong to use SWARUtil Va…

195a0ba

…rHandle reads Agent-Logs-Url: https://github.com/pjfanning/incubator-pekko/sessions/db26e214-8a2d-41ef-8720-86247c45c74c Co-authored-by: pjfanning <11783444+pjfanning@users.noreply.github.com>

pjfanning marked this pull request as ready for review April 6, 2026 10:57

pjfanning requested review from He-Pin, Philippus, nvollmar and raboof April 6, 2026 13:23

pjfanning added this to the 2.0.0-M2 milestone Apr 6, 2026

nvollmar reviewed Apr 6, 2026

View reviewed changes

nvollmar approved these changes Apr 6, 2026

View reviewed changes

He-Pin approved these changes Apr 6, 2026

View reviewed changes

He-Pin reviewed Apr 6, 2026

View reviewed changes

He-Pin mentioned this pull request Apr 6, 2026

VarHandle multi-byte reads for ByteString lastIndexOf #2838

Open

pjfanning marked this pull request as draft April 6, 2026 18:47

He-Pin reviewed Apr 6, 2026

View reviewed changes

pjfanning added 2 commits April 6, 2026 21:06

refactor swarutil to make ByteOrder param explicit

e3d18da

scaladoc

a540b65

pjfanning added 3 commits April 6, 2026 21:52

javafmt

55e3564

Create ByteStringParser_readNum_Benchmark.scala

db40fe7

Update ByteStringParser_readNum_Benchmark.scala

9734d88

Update ByteStringParser_readNum_Benchmark.scala

cf82ead

pjfanning marked this pull request as ready for review April 8, 2026 10:56

pjfanning merged commit f193ec1 into apache:main Apr 8, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move readLong, readInt, readShort from ByteStringParser to ByteString so they can be optimized#2847

move readLong, readInt, readShort from ByteStringParser to ByteString so they can be optimized#2847
pjfanning merged 12 commits intoapache:mainfrom
pjfanning:copilot/update-swarutil-support-short

pjfanning commented Apr 6, 2026 •

edited

Loading

Uh oh!

pjfanning commented Apr 6, 2026

Uh oh!

nvollmar Apr 6, 2026

Uh oh!

pjfanning Apr 6, 2026

Uh oh!

He-Pin left a comment

Uh oh!

He-Pin left a comment

Uh oh!

pjfanning commented Apr 6, 2026

Uh oh!

He-Pin Apr 6, 2026

Uh oh!

pjfanning Apr 6, 2026

Uh oh!

He-Pin left a comment

Uh oh!

He-Pin Apr 6, 2026

Uh oh!

He-Pin commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pjfanning commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

SWARUtil

ByteString

ByteStringParser

ByteIterator

Tests

Benchmark

Uh oh!

pjfanning commented Apr 6, 2026

Uh oh!

nvollmar Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pjfanning Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

He-Pin left a comment

Choose a reason for hiding this comment

Uh oh!

He-Pin left a comment

Choose a reason for hiding this comment

Uh oh!

pjfanning commented Apr 6, 2026

Uh oh!

He-Pin Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pjfanning Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

He-Pin left a comment

Choose a reason for hiding this comment

Deep CR: PR #2847 - move readLong/Int/Short from ByteStringParser to ByteString

Architecture Review

Binary Compatibility

Performance Considerations

Test Coverage

Code Quality

Java API Coverage

Suggestions

Uh oh!

He-Pin Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

He-Pin commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pjfanning commented Apr 6, 2026 •

edited

Loading