8359419: AArch64: Relax min vector length to 32-bit for short vectors #26057

XiaohongGong · 2025-07-01T05:59:15Z

Background

On AArch64, the minimum vector length supported is 64-bit for basic types, except for byte and boolean (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between short and wider types (e.g. long/double) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions.

For example, type conversions between ShortVector.SPECIES_128 and LongVector.SPECIES_128 are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size.

To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors.

Impact Analysis

1. Vector types

Vectors only with short element types will be affected, as we just supported 32-bit short vectors in this change.

2. Vector API

No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length.

3. Auto-vectorization

Enables vectorization of cases containing only 2 short lanes, with significant performance improvements. Since we have supported 32-bit vectors for byte type for a long time, extending this to short did not introduce additional risks.

4. Codegen of vector nodes

NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored.

Details:

Lanewise vector operations are unaffected as explained above.
NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE).
Cross-lane operations like reduction may be affected, potentially causing incorrect results for min/max/mul/and reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, adding an explicit vector size check in match_rule_supported_vector() would be beneficial.
Missing codegen support for type conversions with 32-bit input or output vector size should be added.

Main changes:

Support 2 shorts vector types. The supported min vector element count for each basic type is:
- T_BOOLEAN: 2
- T_BYTE/T_CHAR: 4
- T_SHORT: 2 (new supported)
- T_INT/T_FLOAT/T_LONG/T_DOUBLE: 2
Add codegen support for Vector[U]Cast with 32-bit input or output vector size. VectorReinterpret has already considered the 32-bit vector size cases.
Unsupport reductions with less than 8 bytes vector size explicitly.
Add additional IR tests for Vector API type conversions.
Add JMH benchmark for auto-vectorization with two 16-bit lanes.

Test

Tested hotspot/jdk/langtools - all tests passed.

Performance

Following shows the performance improvement of relative VectorAPI JMHs on a NVIDIA Grace (128-bit SVE2) machine:

Benchmark                                             SIZE   Mode  Unit   Before     After    Gain
VectorFPtoIntCastOperations.microDouble128ToShort128  512   thrpt ops/ms  731.529  26278.599  35.92
VectorFPtoIntCastOperations.microDouble128ToShort128  1024  thrpt ops/ms  366.461  10595.767  28.91
VectorFPtoIntCastOperations.microFloat64ToShort64     512   thrpt ops/ms  315.791  14327.682  45.37
VectorFPtoIntCastOperations.microFloat64ToShort64     1024  thrpt ops/ms  158.485   7261.847  45.82
VectorZeroExtend.short2Long                           128   thrpt ops/ms 1447.243 898666.972 620.95

And here is the performance improvement of the added JMH on Grace:

Benchmark                          LEN   Mode  Unit   Before    After   Gain
VectorTwoShorts.addVec2S           64    avgt  ns/op   20.948   12.683  1.65
VectorTwoShorts.addVec2S           128   avgt  ns/op   40.073   22.703  1.76
VectorTwoShorts.addVec2S           512   avgt  ns/op  157.447   83.691  1.88
VectorTwoShorts.addVec2S           1024  avgt  ns/op  313.022  165.085  1.89
VectorTwoShorts.mulVec2S           64    avgt  ns/op   20.981   12.647  1.65
VectorTwoShorts.mulVec2S           128   avgt  ns/op   40.279   22.637  1.77
VectorTwoShorts.mulVec2S           512   avgt  ns/op  158.642   83.371  1.90
VectorTwoShorts.mulVec2S           1024  avgt  ns/op  314.788  165.205  1.90
VectorTwoShorts.reverseBytesVec2S  64    avgt  ns/op   17.739    9.106  1.94
VectorTwoShorts.reverseBytesVec2S  128   avgt  ns/op   32.591   15.632  2.08
VectorTwoShorts.reverseBytesVec2S  512   avgt  ns/op  126.154   55.284  2.28
VectorTwoShorts.reverseBytesVec2S  1024  avgt  ns/op  254.592  107.457  2.36

We can observe the similar uplift on an AArch64 N1 (NEON) machine.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8359419: AArch64: Relax min vector length to 32-bit for short vectors (Enhancement - P4)

Reviewers

Andrew Haley (@theRealAph - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26057/head:pull/26057
$ git checkout pull/26057

Update a local copy of the PR:
$ git checkout pull/26057
$ git pull https://git.openjdk.org/jdk.git pull/26057/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26057

View PR using the GUI difftool:
$ git pr show -t 26057

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26057.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-07-01T06:00:15Z

👋 Welcome back xgong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-07-01T06:00:47Z

@XiaohongGong This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8359419: AArch64: Relax min vector length to 32-bit for short vectors

Reviewed-by: aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 96 new commits pushed to the master branch:

5cf349c: 8361355: Negative cases of Annotated.getAnnotationData implementations are broken
21f2e9a: 8344332: (bf) Migrate DirectByteBuffer away from jdk.internal.ref.Cleaner
854de8c: 8336147: Clarify CDS documentation about static vs dynamic archive
... and 93 more: https://git.openjdk.org/jdk/compare/1ca008fd02496dc33e2707c102560cae1690fba5...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2025-07-01T06:01:16Z

@XiaohongGong The following labels will be automatically applied to this pull request:

core-libs
hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-07-01T06:04:28Z

Webrevs

src/hotspot/cpu/aarch64/aarch64.ad

src/hotspot/cpu/aarch64/aarch64_vector.ad

XiaohongGong · 2025-07-02T02:36:10Z

Hi @theRealAph , I'v updated the patch by fixing the comment issues. Could you please take a look at it again? Thanks a lot!

src/hotspot/cpu/aarch64/aarch64.ad

XiaohongGong · 2025-07-04T01:35:24Z

Hi @theRealAph , the review comments have been addressed. Would you mind taking another look please? Thank you so much!

theRealAph

This looks good. Thanks.

XiaohongGong · 2025-07-04T09:15:14Z

This looks good. Thanks.

Thanks so much for your review!

8359419: AArch64: Relax min vector length to 32-bit for short vectors

5af5bd4

openjdk bot added the rfr Pull request is ready for review label Jul 1, 2025

openjdk bot added hotspot-compiler hotspot-compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Jul 1, 2025

theRealAph reviewed Jul 1, 2025

View reviewed changes

src/hotspot/cpu/aarch64/aarch64.ad Outdated Show resolved Hide resolved

theRealAph reviewed Jul 1, 2025

View reviewed changes

src/hotspot/cpu/aarch64/aarch64.ad Outdated Show resolved Hide resolved

theRealAph reviewed Jul 1, 2025

View reviewed changes

src/hotspot/cpu/aarch64/aarch64_vector.ad Outdated Show resolved Hide resolved

Refine comments based on review suggestion

4e15e58

theRealAph reviewed Jul 2, 2025

View reviewed changes

src/hotspot/cpu/aarch64/aarch64.ad Outdated Show resolved Hide resolved

Refine the comment in ad file

dfda42a

theRealAph approved these changes Jul 4, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jul 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8359419: AArch64: Relax min vector length to 32-bit for short vectors #26057

8359419: AArch64: Relax min vector length to 32-bit for short vectors #26057

XiaohongGong commented Jul 1, 2025 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Jul 1, 2025

Uh oh!

openjdk bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

openjdk bot commented Jul 1, 2025

Uh oh!

mlbridge bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XiaohongGong commented Jul 2, 2025

Uh oh!

Uh oh!

XiaohongGong commented Jul 4, 2025

Uh oh!

theRealAph left a comment

Uh oh!

XiaohongGong commented Jul 4, 2025

Uh oh!

Uh oh!

8359419: AArch64: Relax min vector length to 32-bit for short vectors #26057

Are you sure you want to change the base?

8359419: AArch64: Relax min vector length to 32-bit for short vectors #26057

Conversation

XiaohongGong commented Jul 1, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Impact Analysis

1. Vector types

2. Vector API

3. Auto-vectorization

4. Codegen of vector nodes

Main changes:

Test

Performance

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Jul 1, 2025

Uh oh!

openjdk bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Jul 1, 2025

Uh oh!

mlbridge bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XiaohongGong commented Jul 2, 2025

Uh oh!

Uh oh!

XiaohongGong commented Jul 4, 2025

Uh oh!

theRealAph left a comment

Choose a reason for hiding this comment

Uh oh!

XiaohongGong commented Jul 4, 2025

Uh oh!

Uh oh!

XiaohongGong commented Jul 1, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Jul 1, 2025 •

edited

Loading

mlbridge bot commented Jul 1, 2025 •

edited

Loading