-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8359419: AArch64: Relax min vector length to 32-bit for short vectors #26057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back xgong! A progress list of the required criteria for merging this PR into |
@XiaohongGong This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 96 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@XiaohongGong The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
Hi @theRealAph , I'v updated the patch by fixing the comment issues. Could you please take a look at it again? Thanks a lot! |
Hi @theRealAph , the review comments have been addressed. Would you mind taking another look please? Thank you so much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Thanks.
Thanks so much for your review! |
Background
On AArch64, the minimum vector length supported is 64-bit for basic types, except for
byte
andboolean
(32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions betweenshort
and wider types (e.g.long/double
) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions.For example, type conversions between
ShortVector.SPECIES_128
andLongVector.SPECIES_128
are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size.To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors.
Impact Analysis
1. Vector types
Vectors only with
short
element types will be affected, as we just supported 32-bitshort
vectors in this change.2. Vector API
No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length.
3. Auto-vectorization
Enables vectorization of cases containing only 2
short
lanes, with significant performance improvements. Since we have supported 32-bit vectors forbyte
type for a long time, extending this toshort
did not introduce additional risks.4. Codegen of vector nodes
NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored.
Details:
min/max/mul/and
reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, adding an explicit vector size check inmatch_rule_supported_vector()
would be beneficial.Main changes:
T_BOOLEAN
: 2T_BYTE
/T_CHAR
: 4T_SHORT
: 2 (new supported)T_INT
/T_FLOAT
/T_LONG
/T_DOUBLE
: 2Vector[U]Cast
with 32-bit input or output vector size.VectorReinterpret
has already considered the 32-bit vector size cases.Test
Tested hotspot/jdk/langtools - all tests passed.
Performance
Following shows the performance improvement of relative VectorAPI JMHs on a NVIDIA Grace (128-bit SVE2) machine:
And here is the performance improvement of the added JMH on Grace:
We can observe the similar uplift on an AArch64 N1 (NEON) machine.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26057/head:pull/26057
$ git checkout pull/26057
Update a local copy of the PR:
$ git checkout pull/26057
$ git pull https://git.openjdk.org/jdk.git pull/26057/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26057
View PR using the GUI difftool:
$ git pr show -t 26057
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26057.diff
Using Webrev
Link to Webrev Comment