Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API call for Arm64 Sve.LoadVectorNonFaulting #97695

Closed
wants to merge 3 commits into from

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented Jan 30, 2024

Adds everything from the API down to calling the codegen.

LoadVectorNonFaulting() was chosen as it has been approved and requires no "hidden" mask nodes.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 30, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels Jan 30, 2024
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Jan 30, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Adds everything from the API down to calling the codegen.

LoadVectorNonFaulting() was chosen as it has been approved and requires no "hidden" mask nodes.

Author: a74nh
Assignees: -
Labels:

area-CodeGen-coreclr, new-api-needs-documentation, community-contribution

Milestone: -

@a74nh
Copy link
Contributor Author

a74nh commented Jan 30, 2024

A few things missing:

The Sve API needs marking experimental (I couldn't find the exact tag).

Test app is a placeholder. It should be replaced with templates. I didn't want to do that yet until we decide on the format and then autogenerate them.

When run on real SVE hardware, the test fails because the jit allocates Z16 as the mask (due to having no predicate register allocation). The emit functions treat this as P16, and then assert fail because the max predicate register is P15. Ideally, we should add some code to limit the mask to Z15, but I'm not sure where to do that.

@kunalspathak @tannergooding @dotnet/arm64-contrib

@ryujit-bot
Copy link

Diff results for #97695

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch -0.01%
MinOpts (-0.01% to +0.00%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch -0.01%
libraries.crossgen2.linux.arm64.checked.mch -0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)
Collection PDIFF
benchmarks.run_tiered.osx.arm64.checked.mch -0.01%
coreclr_tests.run.osx.arm64.checked.mch -0.01%
MinOpts (-0.01% to -0.00%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch -0.01%
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
benchmarks.run_tiered.osx.arm64.checked.mch -0.01%
coreclr_tests.run.osx.arm64.checked.mch -0.01%
libraries.crossgen2.osx.arm64.checked.mch -0.01%
libraries.pmi.osx.arm64.checked.mch -0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch -0.01%
realworld.run.osx.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)
Collection PDIFF
benchmarks.run_tiered.windows.arm64.checked.mch -0.01%
coreclr_tests.run.windows.arm64.checked.mch -0.01%
MinOpts (-0.01% to -0.00%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch -0.01%
benchmarks.run_pgo.windows.arm64.checked.mch -0.01%
benchmarks.run_tiered.windows.arm64.checked.mch -0.01%
coreclr_tests.run.windows.arm64.checked.mch -0.01%
libraries.crossgen2.windows.arm64.checked.mch -0.01%
libraries.pmi.windows.arm64.checked.mch -0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch -0.01%
realworld.run.windows.arm64.checked.mch -0.01%

Details here


@a74nh
Copy link
Contributor Author

a74nh commented Jan 30, 2024

When run on real SVE hardware, the test fails because the jit allocates Z16 as the mask (due to having no predicate register allocation). The emit functions treat this as P16, and then assert fail because the max predicate register is P15. Ideally, we should add some code to limit the mask to Z15, but I'm not sure where to do that.

I think I can see where this is done in lsra. Will add something.....

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Jan 30, 2024
@kunalspathak
Copy link
Member

The Sve API needs marking experimental (I couldn't find the exact tag).

[System.Runtime.Versioning.RequiresPreviewFeaturesAttribute("Sve is in preview.")]

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the API is approved, we will have a top level issue with check boxes for the APIs, similar to how we have for #93095 to track the progress. There, we will upload the autogenerated boilerplate code like:

  • hwintrinsiclistarm64sve.h
  • Sve.cs
  • Sve.PlatformNotSupported.cs
  • System.Runtime.Intrinsic.cs
  • test templates

{
internal Sve() { }

public static new bool IsSupported { get => IsSupported; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will have to make sure to return false for Mono

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know where that check would be added? not sure if that would be in the API or the part that checks if SVE is supported in the OS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fanyang-mono - do you know?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way of doing it is to add a new element to this array
https://github.com/dotnet/runtime/blob/52e1ad3779e57c35d2416cd10d8ad7d75b2c0c8b/src/mono/mono/mini/simd-intrinsics.c#L3896C26-L3896C50

It will be something like

"Sve", MONO_CPU_ARM64_SVE, unsupported, sizeof (unsupported)

Additionally, you need to define the enum MONO_CPU_ARM64_SVE here:

MONO_CPU_ARM64_DP = 1 << 6,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way of doing it is to add a new element to this array

aren't these the entries of things that are supported? so probably no SVE entry is needed in that array?

Copy link
Member

@fanyang-mono fanyang-mono Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you specify unsupported, IsSupported will return false. So it is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I can also see some examples of unsupported in supported_x86_intrinsics.

case INS_sve_ldnf1h:
case INS_sve_ldnf1w:
case INS_sve_ldnf1d:
return emitIns_R_R_R_I(ins, size, reg1, reg2, reg3, 0, opt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look right. The caller should make sure to call appropriate emitIns* method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but there are lots of places this is done elsewhere:

        case INS_adds:
        case INS_subs:
            emitIns_R_R_R_I(ins, attr, reg1, reg2, reg3, 0, opt);
            return;

Which means it can all use the existing table generation code. Plus, we get a handy shortcut for elsewhere where we don't need an immediate offset. This ideally needs some codegen test cases.

The alternative would be to use HW_Flag_SpecialCodeGen and then add a case in genHWIntrinsic(). That's more code and possibly slower in the long run? I suspect we'll get a lot of things added in genHWIntrinsic() by the end of SVE so it'd be nice to keep it short.

@ryujit-bot
Copy link

Diff results for #97695

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch -0.01%
MinOpts (-0.01% to -0.00%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch -0.01%
libraries.crossgen2.linux.arm64.checked.mch -0.01%
libraries.pmi.linux.arm64.checked.mch -0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.01%
realworld.run.linux.arm64.checked.mch -0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)
Collection PDIFF
coreclr_tests.run.osx.arm64.checked.mch -0.01%
MinOpts (-0.01% to -0.00%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch -0.01%
benchmarks.run_tiered.osx.arm64.checked.mch -0.01%
coreclr_tests.run.osx.arm64.checked.mch -0.01%
libraries.pmi.osx.arm64.checked.mch -0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch -0.01%
realworld.run.osx.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.01% to -0.00%)
Collection PDIFF
coreclr_tests.run.windows.arm64.checked.mch -0.01%
MinOpts (-0.01% to -0.00%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch -0.01%
benchmarks.run_tiered.windows.arm64.checked.mch -0.01%
coreclr_tests.run.windows.arm64.checked.mch -0.01%
libraries.crossgen2.windows.arm64.checked.mch -0.01%
libraries.pmi.windows.arm64.checked.mch -0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch -0.01%
realworld.run.windows.arm64.checked.mch -0.01%

Details here


@a74nh
Copy link
Contributor Author

a74nh commented Mar 6, 2024

This has been replaced with #98218

@a74nh a74nh closed this Mar 6, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants