Add API call for Arm64 Sve.LoadVectorNonFaulting #97695

a74nh · 2024-01-30T13:24:16Z

Adds everything from the API down to calling the codegen.

LoadVectorNonFaulting() was chosen as it has been approved and requires no "hidden" mask nodes.

dotnet-issue-labeler · 2024-01-30T13:24:22Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

ghost · 2024-01-30T13:24:25Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Adds everything from the API down to calling the codegen.

LoadVectorNonFaulting() was chosen as it has been approved and requires no "hidden" mask nodes.

Author:	a74nh
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `new-api-needs-documentation`, `community-contribution`
Milestone:	-

a74nh · 2024-01-30T13:32:52Z

A few things missing:

The Sve API needs marking experimental (I couldn't find the exact tag).

Test app is a placeholder. It should be replaced with templates. I didn't want to do that yet until we decide on the format and then autogenerate them.

When run on real SVE hardware, the test fails because the jit allocates Z16 as the mask (due to having no predicate register allocation). The emit functions treat this as P16, and then assert fail because the max predicate register is P15. Ideally, we should add some code to limit the mask to Z15, but I'm not sure where to do that.

@kunalspathak @tannergooding @dotnet/arm64-contrib

ryujit-bot · 2024-01-30T15:43:36Z

Diff results for #97695

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)

Collection	PDIFF
coreclr_tests.run.linux.arm64.checked.mch	-0.01%

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
coreclr_tests.run.linux.arm64.checked.mch	-0.01%
libraries.crossgen2.linux.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	-0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)

Collection	PDIFF
benchmarks.run_tiered.osx.arm64.checked.mch	-0.01%
coreclr_tests.run.osx.arm64.checked.mch	-0.01%

MinOpts (-0.01% to -0.00%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	-0.01%
benchmarks.run_pgo.osx.arm64.checked.mch	-0.01%
benchmarks.run_tiered.osx.arm64.checked.mch	-0.01%
coreclr_tests.run.osx.arm64.checked.mch	-0.01%
libraries.crossgen2.osx.arm64.checked.mch	-0.01%
libraries.pmi.osx.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	-0.01%
realworld.run.osx.arm64.checked.mch	-0.01%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)

Collection	PDIFF
benchmarks.run_tiered.windows.arm64.checked.mch	-0.01%
coreclr_tests.run.windows.arm64.checked.mch	-0.01%

MinOpts (-0.01% to -0.00%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	-0.01%
benchmarks.run_pgo.windows.arm64.checked.mch	-0.01%
benchmarks.run_tiered.windows.arm64.checked.mch	-0.01%
coreclr_tests.run.windows.arm64.checked.mch	-0.01%
libraries.crossgen2.windows.arm64.checked.mch	-0.01%
libraries.pmi.windows.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	-0.01%
realworld.run.windows.arm64.checked.mch	-0.01%

Details here

a74nh · 2024-01-30T16:14:24Z

When run on real SVE hardware, the test fails because the jit allocates Z16 as the mask (due to having no predicate register allocation). The emit functions treat this as P16, and then assert fail because the max predicate register is P15. Ideally, we should add some code to limit the mask to Z15, but I'm not sure where to do that.

I think I can see where this is done in lsra. Will add something.....

kunalspathak · 2024-01-30T18:38:34Z

The Sve API needs marking experimental (I couldn't find the exact tag).

[System.Runtime.Versioning.RequiresPreviewFeaturesAttribute("Sve is in preview.")]

kunalspathak

Once the API is approved, we will have a top level issue with check boxes for the APIs, similar to how we have for #93095 to track the progress. There, we will upload the autogenerated boilerplate code like:

hwintrinsiclistarm64sve.h
Sve.cs
Sve.PlatformNotSupported.cs
System.Runtime.Intrinsic.cs
test templates

kunalspathak · 2024-01-30T18:39:45Z

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs

+    {
+        internal Sve() { }
+
+        public static new bool IsSupported { get => IsSupported; }


we will have to make sure to return false for Mono

Do you know where that check would be added? not sure if that would be in the API or the part that checks if SVE is supported in the OS.

@fanyang-mono - do you know?

One way of doing it is to add a new element to this array
https://github.com/dotnet/runtime/blob/52e1ad3779e57c35d2416cd10d8ad7d75b2c0c8b/src/mono/mono/mini/simd-intrinsics.c#L3896C26-L3896C50

It will be something like

"Sve", MONO_CPU_ARM64_SVE, unsupported, sizeof (unsupported)

Additionally, you need to define the enum MONO_CPU_ARM64_SVE here:

runtime/src/mono/mono/mini/mini.h

Line 2929 in 52e1ad3

MONO_CPU_ARM64_DP = 1 << 6,

One way of doing it is to add a new element to this array

aren't these the entries of things that are supported? so probably no SVE entry is needed in that array?

When you specify unsupported, IsSupported will return false. So it is needed.

Thanks! I can also see some examples of unsupported in supported_x86_intrinsics.

kunalspathak · 2024-01-30T18:51:31Z

src/coreclr/jit/emitarm64.cpp

+        case INS_sve_ldnf1h:
+        case INS_sve_ldnf1w:
+        case INS_sve_ldnf1d:
+            return emitIns_R_R_R_I(ins, size, reg1, reg2, reg3, 0, opt);


this doesn't look right. The caller should make sure to call appropriate emitIns* method.

Agreed, but there are lots of places this is done elsewhere:

case INS_adds: case INS_subs: emitIns_R_R_R_I(ins, attr, reg1, reg2, reg3, 0, opt); return;

Which means it can all use the existing table generation code. Plus, we get a handy shortcut for elsewhere where we don't need an immediate offset. This ideally needs some codegen test cases.

The alternative would be to use HW_Flag_SpecialCodeGen and then add a case in genHWIntrinsic(). That's more code and possibly slower in the long run? I suspect we'll get a lot of things added in genHWIntrinsic() by the end of SVE so it'd be nice to keep it short.

ryujit-bot · 2024-01-31T12:46:52Z

Diff results for #97695

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)

Collection	PDIFF
coreclr_tests.run.linux.arm64.checked.mch	-0.01%

MinOpts (-0.01% to -0.00%)

Collection	PDIFF
coreclr_tests.run.linux.arm64.checked.mch	-0.01%
libraries.crossgen2.linux.arm64.checked.mch	-0.01%
libraries.pmi.linux.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	-0.01%
realworld.run.linux.arm64.checked.mch	-0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)

Collection	PDIFF
coreclr_tests.run.osx.arm64.checked.mch	-0.01%

MinOpts (-0.01% to -0.00%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	-0.01%
benchmarks.run_tiered.osx.arm64.checked.mch	-0.01%
coreclr_tests.run.osx.arm64.checked.mch	-0.01%
libraries.pmi.osx.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	-0.01%
realworld.run.osx.arm64.checked.mch	-0.01%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)

Collection	PDIFF
coreclr_tests.run.windows.arm64.checked.mch	-0.01%

MinOpts (-0.01% to -0.00%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	-0.01%
benchmarks.run_tiered.windows.arm64.checked.mch	-0.01%
coreclr_tests.run.windows.arm64.checked.mch	-0.01%
libraries.crossgen2.windows.arm64.checked.mch	-0.01%
libraries.pmi.windows.arm64.checked.mch	-0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	-0.01%
realworld.run.windows.arm64.checked.mch	-0.01%

Details here

a74nh · 2024-03-06T12:18:25Z

This has been replaced with #98218

Add API call for Arm64 Sve.LoadVectorNonFaulting

01f6029

ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 30, 2024

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels Jan 30, 2024

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Jan 30, 2024

kunalspathak reviewed Jan 30, 2024

View reviewed changes

a74nh added 2 commits January 31, 2024 10:42

Check for predicated results in lsra

db3f33a

Add Sve preview marker

034806e

a74nh mentioned this pull request Feb 1, 2024

Add Sve.IsSupported support #97814

Merged

a74nh closed this Mar 6, 2024

github-actions bot locked and limited conversation to collaborators Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API call for Arm64 Sve.LoadVectorNonFaulting #97695

Add API call for Arm64 Sve.LoadVectorNonFaulting #97695

a74nh commented Jan 30, 2024

dotnet-issue-labeler bot commented Jan 30, 2024

ghost commented Jan 30, 2024

a74nh commented Jan 30, 2024

ryujit-bot commented Jan 30, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

a74nh commented Jan 30, 2024

kunalspathak commented Jan 30, 2024

kunalspathak left a comment

kunalspathak Jan 30, 2024

a74nh Jan 31, 2024

kunalspathak Jan 31, 2024

fanyang-mono Feb 1, 2024

kunalspathak Feb 1, 2024

fanyang-mono Feb 5, 2024 •

edited

Loading

a74nh Feb 6, 2024

kunalspathak Jan 30, 2024

a74nh Jan 31, 2024

ryujit-bot commented Jan 31, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

a74nh commented Mar 6, 2024

Add API call for Arm64 Sve.LoadVectorNonFaulting #97695

Add API call for Arm64 Sve.LoadVectorNonFaulting #97695

Conversation

a74nh commented Jan 30, 2024

dotnet-issue-labeler bot commented Jan 30, 2024

ghost commented Jan 30, 2024

a74nh commented Jan 30, 2024

ryujit-bot commented Jan 30, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

a74nh commented Jan 30, 2024

kunalspathak commented Jan 30, 2024

kunalspathak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fanyang-mono Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryujit-bot commented Jan 31, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

a74nh commented Mar 6, 2024

fanyang-mono Feb 5, 2024 •

edited

Loading