Arm64 SVE: Support scalable constant vectors and masks by a74nh · Pull Request #127520 · dotnet/runtime

a74nh · 2026-04-28T17:32:41Z

Adds support to GenTreeVecCon and GenTreeMskCon for constants with unknown sizes. Instead of having a blob of data, the constant is represented as being one of either: a repeated value, an sequence with start and step values, or a value in the first lane and the rest zeroed. To handle this the base type is also required.

As this new structure is slightly bigger than a simd16, the simd_t typedef is pushed up to simd32 sized.

For vector constants, a vector is scalable because if it is of TYP_SIMD.

For mask constants, the type is always TYP_MASK. However on Arm64, masks are only used by SVE. Therefore to tell if a mask is scalable then JitUseScalableVectorT is checked.

The IsAllBitsSet() on mask constants is updated to include a base type. A mask that is all set for TYP_LONG will not be all set for TYP_BYTE, and instead will be 100010001000...

Given two scalable constants it may not be possible to add them together to produce a third scalable constant. Instead they will remain as two vectors in the IR.

To show this implementation is workable, scalable support is added for:

Sve.CreateTrueMask*()
Sve.CreateFalseMask*()
Vector.Create()
Vector.CreateScalar()
Vector.CreateScalarUnsafe()
Vector.CreateSequence()

Fixes #125057

dotnet-policy-service · 2026-04-28T17:34:21Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Adds support to GenTreeVecCon and GenTreeMskCon for constants with unknown sizes. Instead of having a blob of data, the constant is represented as being one of either: a repeated value, an sequence with start and step values, or a value in the first lane and the rest zeroed. To handle this the base type is also required. As this new structure is slightly bigger than a simd16, the simd_t typedef is pushed up to simd32 sized. For vector constants, a vector is scalable because if it is of TYP_SIMD. For mask constants, the type is always TYP_MASK. However on Arm64, masks are only used by SVE. Therefore to tell if a mask is scalable then JitUseScalableVectorT is checked. The IsAllBitsSet() on mask constants is updated to include a base type. A mask that is all set for TYP_LONG will not be all set for TYP_BYTE, and instead will be 100010001000... Given two scalable constants it may not be possible to add them together to produce a third scalable constant. Instead they will remain as two vectors in the IR. To show this implementation is workable, scalable support is added for: Sve.CreateTrueMask*() Sve.CreateFalseMask*() Vector.Create() Vector.CreateScalar() Vector.CreateScalarUnsafe() Vector.CreateSequence() Fixes dotnet#125057

a74nh · 2026-04-29T16:20:22Z

Taking this out of draft now.

Because of the very limited support for scalable SVE, this is currently very hard to test. I've been working off the top of @snickolls-arm's WIP branch with all his code in, which allows me to to call handwritten tests. In current HEAD, there are too many errors before getting to my code.

There's still a lot of work to do on top of this. Eg, I need to get generic ops working, plus all the other Vector APIs which create constants. But, I didn't want this PR to grow too big. The important part is this serves as a base for further constant work.

@dotnet/arm64-contrib @jakobbotsch @tannergooding

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds Arm64 SVE “scalable VectorT” support across the JIT, including new encodings for scalable vector/mask constants and updates to value numbering, folding, lowering, LSRA, and codegen to recognize and emit SVE-friendly patterns.

Changes:

Introduce new scalable constant representations (simdscalable_t, simdmaskscalable_t) and plumb them through GenTree constant nodes and hashing.
Extend value numbering and folding to create/consume scalable SIMD constants on Arm64.
Implement Arm64 SVE VectorT intrinsics import and codegen pathways (create/broadcast/sequence), plus mask handling updates.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
src/coreclr/jit/valuenum.h	Adds VN support for scalable SIMD constants on Arm64
src/coreclr/jit/valuenum.cpp	Creates/broadcasts scalable SIMD VN constants and dumps them
src/coreclr/jit/simd.h	Defines new scalable vector/mask constant encodings and helper APIs
src/coreclr/jit/simd.cpp	Implements scalable vector/mask helpers and conversions
src/coreclr/jit/lsraarm64.cpp	Reserves temps for scalable vector constants that can’t be directly encoded
src/coreclr/jit/lowerarmarch.cpp	Updates mask lowering + VectorT intrinsic handling
src/coreclr/jit/hwintrinsiclistarm64sve.h	Enables `VectorT` intrinsics for SVE
src/coreclr/jit/hwintrinsiccodegenarm64.cpp	Emits SVE instructions for `VectorT` intrinsics
src/coreclr/jit/hwintrinsicarm64.cpp	Imports `VectorT` intrinsics and updates true/false mask creation
src/coreclr/jit/hwintrinsic.h	Marks `VectorT_*` as special cases for scalar/broadcast creation
src/coreclr/jit/gentree.h	Extends vector/mask constants to support scalable encodings
src/coreclr/jit/gentree.cpp	Adds scalable constant construction, hashing, folding, and printing
src/coreclr/jit/emitarm64.h	Repositions signed-immediate helpers used by new SVE paths
src/coreclr/jit/compiler.hpp	Extends bitmask helpers for >64-register targets
src/coreclr/jit/compiler.h	Adds new compiler helpers for scalable vector/mask constants
src/coreclr/jit/codegenarm64.cpp	Adds emission for scalable vector/mask constants

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

src/coreclr/jit/gentree.cpp:1

The size comparison for scalable true masks looks inverted. A predicate that is 'all-true' at a smaller element granularity (e.g., .h) should be safe to use for larger element operations (e.g., .d), but not vice-versa. The current >= check will incorrectly treat a large-granularity mask (e.g., .d) as 'true' for smaller element types (e.g., byte), which could enable incorrect optimizations. Consider changing this to genTypeSize(maskBaseType) <= genTypeSize(simdBaseType) (and update the comment accordingly), so only equal-or-finer masks are treated as universally true.
src/coreclr/jit/gentree.cpp:1
Correct spelling in comment: 'optimizatize' -> 'optimize'.

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

src/coreclr/jit/gentree.cpp:1

gtNewSimdVconNode doesn't zero-initialize gtSimdScalableVal before populating fields. Because simdscalable_t contains small fields plus padding/union storage, this can leave padding bytes indeterminate and create nondeterminism when the struct is memcpy'd/printed/hashed elsewhere. Fix (recommended): initialize vecCon->gtSimdScalableVal = {}; (or equivalent) before setting the individual fields.

Change-Id: I627878463cd19d781d6fedaa4e7d3cc9257c4b1b

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

Copilot · 2026-04-30T14:59:30Z

+        static unsigned GetHashCode(const simdscalable_t& val)
+        {
+            unsigned hash = 0;


The hash for simdscalable_t is a straight XOR of multiple fields, which tends to create many collisions (especially when fields are correlated or differ only by swapped/masked bits). Consider using a stronger mixing strategy (e.g., multiplying by an odd constant + rotate/xor steps, or an existing JIT hash combiner) to reduce collisions in VNMap lookups.

This is consistent with all the other hashing in the file. I didn't plan on switching it. Unless someone else thinks I should

Copilot · 2026-04-30T14:59:31Z

+    bool operator==(const simdscalable_t& other) const
+    {
+        if (IsZero() && other.IsZero())
+        {
+            return true;
+        }
+
+        return (gtSimdScalableBaseType == other.gtSimdScalableBaseType) &&
+               (gtSimdScalableKind == other.gtSimdScalableKind) && (gtSimdScalableIndex == other.gtSimdScalableIndex) &&
+               (gtSimdScalableStep == other.gtSimdScalableStep);
+    }


operator== canonicalizes all-zero encodings but does not canonicalize other semantically equivalent encodings (notably 'all-bits-set' repeated values across different base types). This can lead to multiple distinct constants/VNs/IR nodes that are semantically identical, reducing CSE/value-numbering effectiveness and complicating reasoning. Consider also canonicalizing 'all bits set' (and potentially other universally-representable patterns) either in equality/hash or during construction (e.g., normalize to a single base type/encoding when IsAllBitsSet() is true).

Agreed. I was leaving that for a later PR. Didn't want to overcomplicate here. Worst case is fewer optimisations.

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

src/coreclr/jit/gentree.cpp:1

For TYP_SIMD scalable constants, integral broadcasts currently store the full 64-bit IntegralValue() into gtSimdScalableIndex without truncating/canonicalizing to the element width. This breaks simdscalable_t::IsAllBitsSet() (and possibly equality/hash behavior) for smaller element types (e.g., broadcasting -1 for TYP_BYTE will store 0xFFFF...FFFF and fail the 0xFF all-bits-set check). Consider canonicalizing the stored value to the base-type width (e.g., mask to 8/16/32 bits, or memcpy from a correctly-sized scalar like BroadcastConstantToSimdScalable does) at construction time for broadcast/scalar/scalarUnsafe/sequence.
src/coreclr/jit/gentree.cpp:1
Grammar: change 'Attempts to folds' to 'Attempts to fold'.

Copilot · 2026-05-01T14:29:22Z

+// Ensure simd_t is big enough to contain any simd type
 #if defined(TARGET_XARCH)
 typedef simd64_t simd_t;
+#elif defined(TARGET_ARM64)
+typedef simd32_t simd_t;


On ARM64, redefining simd_t as simd32_t increases the storage footprint of any structure that embeds simd_t (notably GenTreeVecCon via gtSimdVal). If simd_t is only required as a fixed-size backing store for fixed-width SIMD values, consider keeping it at 16 bytes and relying on the separate gtSimdScalableVal field/union for scalable constants, or otherwise minimizing where the 32-byte type is used. This can reduce IR node size and memory traffic during compilation.

Suggested change

// Ensure simd_t is big enough to contain any simd type

#if defined(TARGET_XARCH)

typedef simd64_t simd_t;

#elif defined(TARGET_ARM64)

typedef simd32_t simd_t;

// Ensure simd_t is big enough to contain any fixed-width simd type without

// unnecessarily inflating structures that embed it. Larger/scalable ARM64

// constants should use their dedicated storage rather than increasing simd_t.

#if defined(TARGET_XARCH)

typedef simd64_t simd_t;

#elif defined(TARGET_ARM64)

typedef simd16_t simd_t;

No, I think we need it bigger due to all the mempy's of simd_t that exist

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

src/coreclr/jit/gentree.cpp:1

The parameter name index is ambiguous here: it is assigned to gtSimdMaskScalableIndex and seems to represent a boolean-like 'all false (0) vs all true (1)' encoding rather than an index. Renaming it to something like maskValue, isAllTrue, or predicateValue (and possibly using a uint8_t parameter) would make call sites much clearer.

Copilot · 2026-05-01T15:33:19Z

+                hash = static_cast<unsigned>(hash ^ val.scalable.gtSimdMaskScalableBaseType);
+                hash = static_cast<unsigned>(hash ^ val.scalable.gtSimdMaskScalableIndex);


simdmaskscalable_t::operator== treats all-zero masks as equal regardless of base type (IsZero() short-circuit), but this hash includes gtSimdMaskScalableBaseType. That breaks the required invariant for hash maps (equal keys must hash the same), causing failed lookups or duplicate entries. Canonicalize scalable-zero in GetHashCode (e.g., hash only isScalable + a fixed base type + index 0, or special-case val.scalable.IsZero() similarly to the simdscalable_t hash canonicalization).

Suggested change

hash = static_cast<unsigned>(hash ^ val.scalable.gtSimdMaskScalableBaseType);

hash = static_cast<unsigned>(hash ^ val.scalable.gtSimdMaskScalableIndex);

// simdmaskscalable_t::operator== treats all-zero scalable masks as equal

// regardless of base type, so canonicalize that case in the hash as well.

if (!val.scalable.IsZero())

{

hash = static_cast<unsigned>(hash ^ val.scalable.gtSimdMaskScalableBaseType);

hash = static_cast<unsigned>(hash ^ val.scalable.gtSimdMaskScalableIndex);

}

Copilot · 2026-05-01T15:33:20Z

+struct simdmaskvalue_t
+{
+    uint8_t            isScalable;
+    simdmaskscalable_t scalable;
+    simdmask_t         fixed;
+
+    static simdmaskvalue_t FromFixed(const simdmask_t& mask)


simdmaskvalue_t is defined under #if defined(TARGET_ARM64) but is used by non-ARM64 code paths in ValueNumStore changes (e.g., VarTypConv<TYP_MASK> and GetConstantSimdMaskValue are guarded only by FEATURE_MASKED_HW_INTRINSICS). If FEATURE_MASKED_HW_INTRINSICS is enabled on other targets (notably xarch), this will fail to compile because simdmaskvalue_t is incomplete/undefined. Either (1) define simdmaskvalue_t for all targets (with a fixed-only encoding on non-ARM64), or (2) keep simdmaskvalue_t usages fully TARGET_ARM64-guarded and preserve simdmask_t storage on other architectures.

Copilot · 2026-05-01T15:33:21Z

+                            if (emitter::isValidSimm<8>(simdVal.gtSimdScalableIndex) ||
+                                emitter::isValidSimm_MultipleOf<8, 256>(simdVal.gtSimdScalableIndex))


isValidSimm takes an ssize_t but the argument is uint64_t. Converting an out-of-range uint64_t to a signed type is implementation-defined in C++, which can make the check non-portable and harder to reason about (notably for negative values represented in two’s complement). Prefer storing gtSimdScalableIndex/Step as int64_t (if these are conceptually signed immediates) or cast explicitly through int64_t/ssize_t in a way that documents the intended interpretation before calling isValidSimm.

Copilot AI review requested due to automatic review settings April 28, 2026 17:32

github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 28, 2026

dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 28, 2026

build-analysis Bot mentioned this pull request Apr 28, 2026

Android arm32 device not found (armeabi-v7a architecture unavailable) #125440

Open

a74nh force-pushed the truemasknode_github branch 7 times, most recently from 4d39083 to 4754486 Compare April 29, 2026 13:00

a74nh force-pushed the truemasknode_github branch from 4754486 to 7fac1f9 Compare April 29, 2026 14:36

a74nh marked this pull request as ready for review April 29, 2026 16:20

This was referenced Apr 30, 2026

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

a74nh requested review from jakobbotsch and tannergooding April 30, 2026 09:10

a74nh self-assigned this Apr 30, 2026

Merge branch 'main' into truemasknode_github

71dd4f4

a74nh requested review from Copilot and removed request for Copilot April 30, 2026 11:04

Copilot AI reviewed Apr 30, 2026

View reviewed changes

a74nh added 5 commits April 30, 2026 12:42

Fix AllBitsSet functionality

c3afbc8

Remove duplicate loadConstantHelper

fd84125

Remove extra break

43f648b

Ensure index and step have distinct temp registers

d6a99e1

fix formatting

e04c472

Copilot AI review requested due to automatic review settings April 30, 2026 12:53

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread src/coreclr/jit/valuenum.h

Comment thread src/coreclr/jit/valuenum.h Outdated

a74nh added 2 commits April 30, 2026 14:10

Fix GetHashCode for scalables

91d235c

Fix IsTrueMask logic

7c7ebbd

Copilot AI review requested due to automatic review settings April 30, 2026 13:14

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread src/coreclr/jit/simd.h

Comment thread src/coreclr/jit/simd.h

Comment thread src/coreclr/jit/simd.cpp

Comment thread src/coreclr/jit/valuenum.cpp Outdated

Comment thread src/coreclr/jit/valuenum.cpp

Comment thread src/coreclr/jit/lsraarm64.cpp

a74nh added 4 commits April 30, 2026 14:33

Fix IsTrueMask logic

cec1eb0

Reserve correct registers for constant vectors

183bc23

Canonicalize for simdscalable_t zero

160810c

Change-Id: I627878463cd19d781d6fedaa4e7d3cc9257c4b1b

Check all bits when converting to/from vector/mask

168b84d

Copilot AI review requested due to automatic review settings April 30, 2026 14:54

Copilot AI reviewed Apr 30, 2026

View reviewed changes

build-analysis Bot mentioned this pull request Apr 30, 2026

Unable to pull image from mcr.microsoft.com #117164

Open

a74nh added 4 commits May 1, 2026 14:07

fix call to AllBitsSet

6c46277

Fix codegen for SimdScalableScalar floats

7c02f28

use memcpy for getting floats

1c5bf92

Add debug check

3a3e1d3

Copilot AI review requested due to automatic review settings May 1, 2026 14:24

Copilot AI reviewed May 1, 2026

View reviewed changes

a74nh added 2 commits May 1, 2026 16:17

Add VN support for scalable+fixed masks

490bec7

set step to 0 in EvaluateSimdCvtScalableMaskToVector

b8d75a2

a74nh requested a review from Copilot May 1, 2026 15:28

Copilot AI reviewed May 1, 2026

View reviewed changes

a74nh added 3 commits May 1, 2026 16:36

formatting

170665e

Add assert to GetConstantSimdMask

319f7bd

fix FEATURE_MASKED_HW_INTRINSICS defines

064feae

		hash = static_cast<unsigned>(hash ^ val.scalable.gtSimdMaskScalableBaseType);
		hash = static_cast<unsigned>(hash ^ val.scalable.gtSimdMaskScalableIndex);

		if (emitter::isValidSimm<8>(simdVal.gtSimdScalableIndex) \|\|
		emitter::isValidSimm_MultipleOf<8, 256>(simdVal.gtSimdScalableIndex))

Conversation

a74nh commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service Bot commented Apr 28, 2026

Uh oh!

a74nh commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

a74nh May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

a74nh May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

a74nh May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

a74nh commented Apr 28, 2026 •

edited

Loading