-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API call for Arm64 Sve.LoadVectorNonFaulting #97695
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsAdds everything from the API down to calling the codegen. LoadVectorNonFaulting() was chosen as it has been approved and requires no "hidden" mask nodes.
|
A few things missing: The Sve API needs marking experimental (I couldn't find the exact tag). Test app is a placeholder. It should be replaced with templates. I didn't want to do that yet until we decide on the format and then autogenerate them. When run on real SVE hardware, the test fails because the jit allocates @kunalspathak @tannergooding @dotnet/arm64-contrib |
Diff results for #97695Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.01% to -0.00%)
MinOpts (-0.01% to +0.00%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.01% to -0.00%)
MinOpts (-0.01% to -0.00%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.01% to -0.00%)
MinOpts (-0.01% to -0.00%)
Details here |
I think I can see where this is done in lsra. Will add something..... |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the API is approved, we will have a top level issue with check boxes for the APIs, similar to how we have for #93095 to track the progress. There, we will upload the autogenerated boilerplate code like:
- hwintrinsiclistarm64sve.h
- Sve.cs
- Sve.PlatformNotSupported.cs
- System.Runtime.Intrinsic.cs
- test templates
{ | ||
internal Sve() { } | ||
|
||
public static new bool IsSupported { get => IsSupported; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will have to make sure to return false
for Mono
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know where that check would be added? not sure if that would be in the API or the part that checks if SVE is supported in the OS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fanyang-mono - do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way of doing it is to add a new element to this array
https://github.com/dotnet/runtime/blob/52e1ad3779e57c35d2416cd10d8ad7d75b2c0c8b/src/mono/mono/mini/simd-intrinsics.c#L3896C26-L3896C50
It will be something like
"Sve", MONO_CPU_ARM64_SVE, unsupported, sizeof (unsupported)
Additionally, you need to define the enum MONO_CPU_ARM64_SVE
here:
runtime/src/mono/mono/mini/mini.h
Line 2929 in 52e1ad3
MONO_CPU_ARM64_DP = 1 << 6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way of doing it is to add a new element to this array
aren't these the entries of things that are supported? so probably no SVE entry is needed in that array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you specify unsupported
, IsSupported
will return false. So it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I can also see some examples of unsupported in supported_x86_intrinsics
.
case INS_sve_ldnf1h: | ||
case INS_sve_ldnf1w: | ||
case INS_sve_ldnf1d: | ||
return emitIns_R_R_R_I(ins, size, reg1, reg2, reg3, 0, opt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look right. The caller should make sure to call appropriate emitIns*
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but there are lots of places this is done elsewhere:
case INS_adds:
case INS_subs:
emitIns_R_R_R_I(ins, attr, reg1, reg2, reg3, 0, opt);
return;
Which means it can all use the existing table generation code. Plus, we get a handy shortcut for elsewhere where we don't need an immediate offset. This ideally needs some codegen test cases.
The alternative would be to use HW_Flag_SpecialCodeGen
and then add a case in genHWIntrinsic()
. That's more code and possibly slower in the long run? I suspect we'll get a lot of things added in genHWIntrinsic()
by the end of SVE so it'd be nice to keep it short.
Diff results for #97695Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.01% to -0.00%)
MinOpts (-0.01% to -0.00%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.01% to -0.00%)
MinOpts (-0.01% to -0.00%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.01% to -0.00%)
MinOpts (-0.01% to -0.00%)
Details here |
This has been replaced with #98218 |
Adds everything from the API down to calling the codegen.
LoadVectorNonFaulting() was chosen as it has been approved and requires no "hidden" mask nodes.