JIT ARM64-SVE: Add TrueMask and LoadVector #98218

a74nh · 2024-02-09T11:46:38Z

WIP patch to add TrueMask and LoadVector support

Change-Id: I285f8aba668409ca94e11be2489a6d9b50a4ec6e

Change-Id: I3ad4fd9a8d823cb43a9546ba6356006a0907ac57

dotnet-issue-labeler · 2024-02-09T11:46:44Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

ghost · 2024-02-09T11:46:47Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

WIP patch to add TrueMask and LoadVector support

Author:	a74nh
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `new-api-needs-documentation`, `community-contribution`
Milestone:	-

a74nh · 2024-02-09T11:54:14Z

Wanted to show where I am with getting some API code working.

Intention here is to use truemask() to get a full predicate register, and then pass into a load function.

Test in Sve_mine.cs is not intended for merging and is only until I get template testing working.

        [MethodImpl(MethodImplOptions.NoInlining)]
        public unsafe static Vector<byte> LoadVector_ImplicitMask(byte* address)
        {
            Vector<byte> mask = Sve.TrueMask(SveMaskPattern.All);
            return Sve.LoadVector(mask, address);
        }

Generates to:

G_M32969_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x20]!
            mov     fp, sp
            str     x0, [fp, #0x18]	// [V00 arg0]
						;; size=12 bbWeight=1 PerfScore 2.50
G_M32969_IG02:  ;; offset=0x000C
            ptrue   p7.b
            ldr     x0, [fp, #0x18]	// [V00 arg0]
            ld1b    { z0.b }, p7/z, [x0]
						;; size=12 bbWeight=1 PerfScore 12.00
G_M32969_IG03:  ;; offset=0x0018
            ldp     fp, lr, [sp], #0x20
            ret     lr
						;; size=8 bbWeight=1 PerfScore 2.00

When run the test sometimes passes with

1
2
3
....

And sometimes fails with:

0 1 != 0
0
1 2 != 0
0
2 3 != 0
0
....

I assume there it's either due to register allocation code not yet done or something missing in GC or related.

@kunalspathak @tannergooding

ryujit-bot · 2024-02-09T13:15:38Z

Diff results for #98218

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to -0.00%)

Collection	PDIFF
coreclr_tests.run.linux.arm64.checked.mch	-0.01%
libraries.crossgen2.linux.arm64.checked.mch	-0.01%
libraries.pmi.linux.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	-0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to -0.00%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	-0.01%
benchmarks.run_pgo.osx.arm64.checked.mch	-0.01%
benchmarks.run_tiered.osx.arm64.checked.mch	-0.01%
coreclr_tests.run.osx.arm64.checked.mch	-0.01%
libraries.crossgen2.osx.arm64.checked.mch	-0.01%
libraries.pmi.osx.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	-0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.01% to -0.00%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	-0.01%
benchmarks.run_pgo.windows.arm64.checked.mch	-0.01%
benchmarks.run_tiered.windows.arm64.checked.mch	-0.01%
coreclr_tests.run.windows.arm64.checked.mch	-0.01%
libraries.crossgen2.windows.arm64.checked.mch	-0.01%
libraries.pmi.windows.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	-0.01%

Details here

kunalspathak

looks good overall

kunalspathak · 2024-02-09T15:10:02Z

src/coreclr/jit/hwintrinsic.h

+    // this output can be used as a per-element mask
+    HW_Flag_ReturnsPerElementMask = 0x10000,
+
+    // The intrinsic uses a mask in arg1 to select elements present in the result


arg1: Is it always be the case?

Can we not just check for TYP_MASK to determine this?

arg1: Is it always be the case?

Yes, that's the sve convention. Result, then mask, then inputs.

Can we not just check for TYP_MASK to determine this?

Ok, that sounds better. I can look and see how this would be done.

Can we not just check for TYP_MASK to determine this?

@tannergooding - Looking closer at this, I'm not quite sure what this would entail.

In hwintrinsiclistxarch.h the only reference to mask is use of HW_Flag_ReturnsPerElementMask.

I can't see any obvious way for the jit to understand know that the first arg of the method is expected to be a predicate mask, other than to use the enum or hardcode it with case statements somewhere.

The jit can check the type of the actual arg1 child node, but that only tells us what the type actually is, and not what the expected type is. I imagine I'll have to write code that says if the actual type and expected type don't match, then somehow convert arg1 to the expected type.

I imagine I'll have to write code that says if the actual type and expected type don't match, then somehow convert arg1 to the expected type.

Yes, basically.

Most intrinsics support masking optionally and so you'll have something similar to this https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/gentree.cpp#L19988-L20008. That is, you'll have some bool GenTree::isSveEmbeddedMaskingCompatibleHWIntrinsic() which likely looks up a flag in the hwintrinsiclistarm64.h table to see if that particular intrinsic supports embedded masking/predication.

There are then a handful of intrinsics which require masking. For example, SVE comparison intrinsics may always return a TYP_MASK, in which case you could either add a new entry to the table such as HW_Flag_ReturnsSveMask or explicitly handle it like xarch does here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicxarch.cpp#L3985-L3999

There are then a handful of intrinsics which require mask inputs and which aren't recognized via pattern matching. You would likewise add a flag or manually handle the few of them like this: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicxarch.cpp#L3970-L3983

The insertion of the ConvertVectorToMask and ConvertMaskToVector intrinsics is important since the user may have passedin something that was of the incorrect type. For example, it might've been a mask of bytes, where we needed a mask of ints; or might've been an actual vector where we needed a mask and vice-versa. Likewise it ensures we don't need to check the type on every other intrinsic that does properly take a vector.

We then make this efficient in morph (see https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/morph.cpp#L10775-L10827) where we ensure that we aren't unnecessarily converting from mask to vector and back to mask, or vice versa. This allows things that take a mask to consume a produced mask directly and gives the optimal codegen expected in most scenarios.

That was the comment around

We are notably missing and need to add a bit which handles the case where we have LCL_VAR TYP_SIMD = TYP_MASK because that can currently block the ability to consume a mask directly if it's multi-use. We ideally would have it stored as LCL_VAR TYP_MASK instead (even if the use manually hoisted as a Vector in C#/IL) and then have the things consume it as ConvertMaskToVector(LCL_VAR) if they actually needed a vector.

This shouldn't be overly complex to add, however, it's just not been done as of yet.

Right. That feels like it might touch quite a few files. Given the size of this PR, do you think it's worth keeping this PR as is, and then putting the LCL_VAR TYP_MASK in a follow on, along with the lowering code?

then putting the LCL_VAR TYP_MASK in a follow on

Yes, I think this would even be the preferred route given its not required and is its own isolated change really.

along with the lowering code?

Which lowering code is this?

In general I think its fine for this PR to be the basic plumbing of TYP_MASK support into the Arm64 side of the JIT. As long as TrueMask and LoadVector are minimally working as expected, I think we're golden and we can extend that to other operations and enable optimizations separately. That is exactly what we did for xarch to help with review and scoping.

Which lowering code is this?

I added some code do the remove the mask->vector->mask and vector->mask->vector conversions. But, nothing in this PR uses it because of the lcl var, so I decided not to push it.

Will mark this as ready now.

Will mark this as ready now.

... but not quite yet, as I need #99049 to merge so I can remove it from this PR.

src/coreclr/jit/hwintrinsic.h

src/coreclr/jit/hwintrinsiclistarm64sve.h

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Enums.cs

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

kunalspathak · 2024-02-09T15:35:26Z

src/coreclr/jit/emitarm64.cpp

+        case INS_sve_ld1h:
+        case INS_sve_ld1w:
+        case INS_sve_ld1d:
+            return emitIns_R_R_R_I(ins, size, reg1, reg2, reg3, 0, opt);


is there any reason why we can't call emitIns_R_R_R_I() directly from the caller?

There's two ways to do this. Without any special coding in the table, it'll just automatically use the R_R_R() version because that's how many args there are in the intrinsic.

I see that elsewhere in NEON etc, this is how it's already done.

Alternatively, we could add an extra flag something like HW_Flag_extra_zero_arg or just hardcode via the HW_Flag_specialcodegen route. That feels a lot of extra code.

a74nh · 2024-02-21T11:48:19Z

New version pushed:

Uses latest version of the API
My dummy tests have been removed. These were sometimes failing due to incorrect usage of GC pinning.
Now uses template tests. These pass on real SVE hardware.
Reflection test disabled for now. This requires mask register allocation before it will work as the reflection look up causes the mask variable to be stored to memory.
No fixes due to review comments yet

BEGIN EXECUTION
/home/alahay01/dotnet/runtime_sve/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false -p System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true HardwareIntrinsics_Arm_ro.dll ''
11:35:25.798 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_float()
Supported ISAs:
  AdvSimd:   True
  Aes:       True
  ArmBase:   True
  Crc32:     True
  Dp:        True
  Rdm:       True
  Sha1:      True
  Sha256:    True
  Sve:       True

Beginning scenario: RunBasicScenario_Load
11:35:25.891 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_float()
11:35:25.905 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_double()
Beginning scenario: RunBasicScenario_Load
11:35:25.914 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_double()
11:35:25.916 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_sbyte()
Beginning scenario: RunBasicScenario_Load
11:35:25.922 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_sbyte()
11:35:25.923 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_short()
Beginning scenario: RunBasicScenario_Load
11:35:25.929 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_short()
11:35:25.931 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_int()
Beginning scenario: RunBasicScenario_Load
11:35:25.937 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_int()
11:35:25.938 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_long()
Beginning scenario: RunBasicScenario_Load
11:35:25.945 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_long()
11:35:25.947 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_byte()
Beginning scenario: RunBasicScenario_Load
11:35:25.953 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_byte()
11:35:25.955 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_ushort()
Beginning scenario: RunBasicScenario_Load
11:35:25.961 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_ushort()
11:35:25.962 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_uint()
Beginning scenario: RunBasicScenario_Load
11:35:25.968 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_uint()
11:35:25.970 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_ulong()
Beginning scenario: RunBasicScenario_Load
11:35:25.976 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_ulong()
11:35:25.978 Running test: JIT/HardwareIntrinsics/Arm/ArmBase/Yield_ro/Yield_ro.dll
11:35:25.979 Passed test: JIT/HardwareIntrinsics/Arm/ArmBase/Yield_ro/Yield_ro.dll
Expected: 100
Actual: 100
END EXECUTION - PASSED

ryujit-bot · 2024-02-21T20:28:51Z

Diff results for #98218

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)

Collection	PDIFF
coreclr_tests.run.linux.arm64.checked.mch	-0.01%

MinOpts (-0.01% to -0.00%)

Collection	PDIFF
coreclr_tests.run.linux.arm64.checked.mch	-0.01%
libraries.crossgen2.linux.arm64.checked.mch	-0.01%
libraries.pmi.linux.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	-0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)

Collection	PDIFF
coreclr_tests.run.osx.arm64.checked.mch	-0.01%

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
coreclr_tests.run.osx.arm64.checked.mch	-0.01%
libraries.crossgen2.osx.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	-0.01%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)

Collection	PDIFF
coreclr_tests.run.windows.arm64.checked.mch	-0.01%

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
coreclr_tests.run.windows.arm64.checked.mch	-0.01%
libraries.crossgen2.windows.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	-0.01%

Details here

kunalspathak · 2024-02-22T15:16:33Z

src/coreclr/jit/lsraarm64.cpp

+                    case NI_Sve_CreateTrueMaskUInt16:
+                    case NI_Sve_CreateTrueMaskUInt32:
+                    case NI_Sve_CreateTrueMaskUInt64:
+                        needBranchTargetReg = !intrin.op1->isContainedIntOrIImmed();


this creates internal register for "def". Make sure that we create an internal register for "use" as well. I forgot to do that in one place and fixed it in #98814.

this creates internal register for "def". Make sure that we create an internal register for "use" as well. I forgot to do that in one place and fixed it in #98814.

I think this is ok as is. The code will use all the generic functionality and work down to the buildInternalRegisterUses() call at the end of the function.

kunalspathak · 2024-02-22T15:20:14Z

src/coreclr/jit/lsraarm64.cpp

+            switch (intrin.id)
+            {
+                case NI_Sve_LoadVector:
+                    srcCandidates = RBM_LOWMASK;


Is RBM_LOWMASK true for all variants of ld1* or are there some which could operate on higher mask register? I am wondering how we can make this easy for development of other APIs where developer don't have to think about which candidates to set for given intrinsic?

Is RBM_LOWMASK true for all variants of ld1* or are there some which could operate on higher mask register? I am wondering how we can make this easy for development of other APIs where developer don't have to think about which candidates to set for given intrinsic?

Yes, all ld1* should be the same. We should be able to pull this information automatically.

I've pushed an update which adds HW_Flag_LowMaskedOperation instead of the switch. I'm fairly keen on pushing as much as we can into the table as it reduces number of files touched each time an api is added. But, I'm a little concerned we'll run out of space for flags - at that point, we can either get creative with flag reuse or turn some flags back into a switch.

Change-Id: I9b3f7d73699e73ada7aa4b74a6cdadaa4574e996

a74nh · 2024-02-29T10:41:35Z

Updated version now produces the move to/from predicates.

Test case dump with 4 annotations:

*************** After end code gen, before unwindEmit()
G_M22300_IG01:        ; func=00, offs=0x000000, size=0x001C, bbWeight=1, PerfScore 6.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG

IN0044: 000000      stp     fp, lr, [sp, #-0x50]!
IN0045: 000004      mov     fp, sp
IN0046: 000008      str     xzr, [fp, #0x30]	// [V01 loc0]
IN0047: 00000C      str     xzr, [fp, #0x38]	// [V01 loc0+0x08]
IN0048: 000010      str     xzr, [fp, #0x20]	// [V02 loc1]
IN0049: 000014      str     xzr, [fp, #0x28]	// [V02 loc1+0x08]
IN004a: 000018      str     x0, [fp, #0x48]	// [V00 this]

G_M22300_IG02:        ; offs=0x00001C, size=0x010C, bbWeight=1, PerfScore 98.00, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB01 [0000], byref

IN0001: 00001C      movz    x0, #472
IN0002: 000020      movk    x0, #0xA0B4 LSL #16
IN0003: 000024      movk    x0, #0xFFFF LSL #32
IN0004: 000028      movz    x1, #0xE4C0      // code for TestLibrary.TestFramework:BeginScenario(System.String)
IN0005: 00002C      movk    x1, #0x5631 LSL #16
IN0006: 000030      movk    x1, #0xFFFF LSL #32
IN0007: 000034      ldr     x1, [x1]
IN0008: 000038      blr     x1
IN0009: 00003C      ptrue   p7.d                            - CREATE TRUE MASK
IN000a: 000040      mov     z16.d, p7/z, #1           -  MOVE MASK IN PREDICATE TO Z
IN000b: 000044      str     q16, [fp, #0x30]	// [V01 loc0]
IN000c: 000048      ldr     x0, [fp, #0x48]	// [V00 this]
IN000d: 00004C      ldrsb   wzr, [x0]
IN000e: 000050      ldr     x0, [fp, #0x48]	// [V00 this]
IN000f: 000054      add     x0, x0, #16
IN0010: 000058      movz    x1, #0xE2E0      // code for JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVector_ulong+DataTable:get_inArray1Ptr():ulong:this
IN0011: 00005C      movk    x1, #0x5631 LSL #16
IN0012: 000060      movk    x1, #0xFFFF LSL #32
IN0013: 000064      ldr     x1, [x1]
IN0014: 000068      blr     x1
IN0015: 00006C      ldr     q16, [fp, #0x30]	// [V01 loc0]
IN0016: 000070      ptrue   p7.d                                           - CREATE EMBEDDED MASK FOR PREDICATE MOVE
IN0017: 000074      cmpne   p7.d, p7/z, z16.d, #0             - MOVE MASK FROM Z TO PREDICATE
IN0018: 000078      ld1d    { z16.d }, p7/z, [x0]
IN0019: 00007C      str     q16, [fp, #0x20]	// [V02 loc1]
IN001a: 000080      ldr     x0, [fp, #0x48]	// [V00 this]
IN001b: 000084      ldrsb   wzr, [x0]
IN001c: 000088      ldr     x0, [fp, #0x48]	// [V00 this]
IN001d: 00008C      add     x0, x0, #16
IN001e: 000090      movz    x1, #0xE2F8      // code for JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVector_ulong+DataTable:get_outArrayPtr():ulong:this
IN001f: 000094      movk    x1, #0x5631 LSL #16
IN0020: 000098      movk    x1, #0xFFFF LSL #32
IN0021: 00009C      ldr     x1, [x1]
IN0022: 0000A0      blr     x1
IN0023: 0000A4      ldr     q16, [fp, #0x20]	// [V02 loc1]
IN0024: 0000A8      str     q16, [x0]
IN0025: 0000AC      ldr     x0, [fp, #0x48]	// [V00 this]
IN0026: 0000B0      ldrsb   wzr, [x0]
IN0027: 0000B4      ldr     x0, [fp, #0x48]	// [V00 this]
IN0028: 0000B8      add     x0, x0, #16
IN0029: 0000BC      movz    x1, #0xE2E0      // code for JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVector_ulong+DataTable:get_inArray1Ptr():ulong:this
IN002a: 0000C0      movk    x1, #0x5631 LSL #16
IN002b: 0000C4      movk    x1, #0xFFFF LSL #32
IN002c: 0000C8      ldr     x1, [x1]
IN002d: 0000CC      blr     x1
IN002e: 0000D0      str     x0, [fp, #0x18]	// [V04 tmp1]
IN002f: 0000D4      ldr     x0, [fp, #0x48]	// [V00 this]
IN0030: 0000D8      ldrsb   wzr, [x0]
IN0031: 0000DC      ldr     x0, [fp, #0x48]	// [V00 this]
IN0032: 0000E0      add     x0, x0, #16
IN0033: 0000E4      movz    x1, #0xE2F8      // code for JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVector_ulong+DataTable:get_outArrayPtr():ulong:this
IN0034: 0000E8      movk    x1, #0x5631 LSL #16
IN0035: 0000EC      movk    x1, #0xFFFF LSL #32
IN0036: 0000F0      ldr     x1, [x1]
IN0037: 0000F4      blr     x1
IN0038: 0000F8      str     x0, [fp, #0x10]	// [V05 tmp2]
IN0039: 0000FC      ldr     x2, [fp, #0x10]	// [V05 tmp2]
IN003a: 000100      ldr     x1, [fp, #0x18]	// [V04 tmp1]
IN003b: 000104      ldr     x0, [fp, #0x48]	// [V00 this]
IN003c: 000108      movz    x3, #472
IN003d: 00010C      movk    x3, #0xA0B4 LSL #16
IN003e: 000110      movk    x3, #0xFFFF LSL #32
IN003f: 000114      movz    x4, #0xE3E8      // code for JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVector_ulong:ValidateResult(ulong,ulong,System.String):this
IN0040: 000118      movk    x4, #0x5631 LSL #16
IN0041: 00011C      movk    x4, #0xFFFF LSL #32
IN0042: 000120      ldr     x4, [x4]
IN0043: 000124      blr     x4

G_M22300_IG03:        ; offs=0x000128, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend

IN004b: 000128      ldp     fp, lr, [sp], #0x50
IN004c: 00012C      ret     lr

Next I need to add the lowering code which spots masks moves and removes them

...braries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.PlatformNotSupported.cs

src/coreclr/jit/compiler.h

src/coreclr/jit/hwintrinsiclistarm64sve.h

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs

This reverts commit 6beb760.

kunalspathak

LGTM. Thanks!

a74nh added 2 commits February 9, 2024 11:44

JIT ARM64-SVE: Add TrueMask

6f94411

Change-Id: I285f8aba668409ca94e11be2489a6d9b50a4ec6e

LoadVector

864b925

Change-Id: I3ad4fd9a8d823cb43a9546ba6356006a0907ac57

ghost added the community-contribution Indicates that the PR has been added by a community member label Feb 9, 2024

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels Feb 9, 2024

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 9, 2024

kunalspathak requested changes Feb 9, 2024

View reviewed changes

ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Feb 9, 2024

a74nh added 4 commits February 19, 2024 15:46

Add SveLoadUnOpMaskedTest.template

c2031ca

Add CreateTrueMaskByte etc

83194f3

Fix up tests

1c66d45

Remove commented code

fe09128

a74nh added 2 commits February 21, 2024 13:07

Merge main

dce9aef

Explain SveMaskPattern

941db03

kunalspathak reviewed Feb 22, 2024

View reviewed changes

a74nh mentioned this pull request Feb 28, 2024

ARM64-SVE: Implement IF_SVE_BV_2A #99049

Merged

a74nh added 3 commits February 28, 2024 16:53

Merge main

8bd6507

Change-Id: I9b3f7d73699e73ada7aa4b74a6cdadaa4574e996

ARM64-SVE: Implement IF_SVE_BV_2A

5dc7234

Create vector to/from mask nodes in intrinsic generation

5a2e84e

a74nh changed the title ~~JIT ARM64-SVE: Add TrueMask~~ JIT ARM64-SVE: Add TrueMask and LoadVector Feb 29, 2024

Add HW_Flag_LowMaskedOperation

310812f

a74nh marked this pull request as ready for review March 4, 2024 17:29

a74nh marked this pull request as draft March 4, 2024 17:30

a74nh added 2 commits March 5, 2024 09:52

Revert "ARM64-SVE: Implement IF_SVE_BV_2A"

8fdd381

Merge main

93c33af

a74nh marked this pull request as ready for review March 5, 2024 10:11

This was referenced Mar 5, 2024

Internal CLR error in System.Reflection.FieldAccessor.GetValue #98998

Closed

Interop.0.1 failing on gcstress pipeline #99001

Closed

a74nh mentioned this pull request Mar 6, 2024

Add API call for Arm64 Sve.LoadVectorNonFaulting #97695

Closed

kunalspathak requested review from tannergooding and kunalspathak March 7, 2024 16:31

kunalspathak requested changes Mar 8, 2024

View reviewed changes

a74nh added 2 commits March 11, 2024 09:10

Use NI_Sve_CreateTrueMaskAll

afdae94

Mark API as experimental

6beb760

kunalspathak reviewed Mar 11, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs Outdated Show resolved Hide resolved

Revert "Mark API as experimental"

dae6d90

This reverts commit 6beb760.

kunalspathak approved these changes Mar 11, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into api_ptrue_2_github

fa07d6b

build-analysis bot mentioned this pull request Mar 12, 2024

Tracking issue for CI build timeouts #76454

Closed

kunalspathak merged commit 17eb59c into dotnet:main Mar 12, 2024
170 of 191 checks passed

kunalspathak mentioned this pull request Mar 22, 2024

Arm64: Implement SVE APIs #99957

Closed

github-actions bot locked and limited conversation to collaborators Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT ARM64-SVE: Add TrueMask and LoadVector #98218

JIT ARM64-SVE: Add TrueMask and LoadVector #98218

a74nh commented Feb 9, 2024

dotnet-issue-labeler bot commented Feb 9, 2024

ghost commented Feb 9, 2024

a74nh commented Feb 9, 2024

ryujit-bot commented Feb 9, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

kunalspathak left a comment

kunalspathak Feb 9, 2024

tannergooding Feb 9, 2024

a74nh Feb 9, 2024

a74nh Feb 21, 2024

tannergooding Feb 21, 2024

tannergooding Mar 4, 2024

a74nh Mar 4, 2024

tannergooding Mar 4, 2024

a74nh Mar 4, 2024

a74nh Mar 4, 2024

kunalspathak Feb 9, 2024

a74nh Feb 9, 2024

a74nh commented Feb 21, 2024

ryujit-bot commented Feb 21, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

kunalspathak Feb 22, 2024

a74nh Mar 4, 2024

kunalspathak Feb 22, 2024 •

edited

Loading

a74nh Mar 4, 2024

a74nh commented Feb 29, 2024

kunalspathak left a comment

JIT ARM64-SVE: Add TrueMask and LoadVector #98218

JIT ARM64-SVE: Add TrueMask and LoadVector #98218

Conversation

a74nh commented Feb 9, 2024

dotnet-issue-labeler bot commented Feb 9, 2024

ghost commented Feb 9, 2024

a74nh commented Feb 9, 2024

ryujit-bot commented Feb 9, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

kunalspathak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a74nh commented Feb 21, 2024

ryujit-bot commented Feb 21, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak Feb 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a74nh commented Feb 29, 2024

kunalspathak left a comment

Choose a reason for hiding this comment

kunalspathak Feb 22, 2024 •

edited

Loading