Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT ARM64-SVE: Add simple bitwise ops #101762

Merged
merged 5 commits into from May 3, 2024

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented May 1, 2024

And, AndAcross, Or, OrAcross, Xor, XorAcross

Test results:

❯ ~/stress_tester.py $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_And
Starting test: /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_And
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_sbyte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_short() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_int() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_long() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_byte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_ushort() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_uint() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_ulong() : 7
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

❯ ~/stress_tester.py $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Or
Starting test: /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Or
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_sbyte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_short() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_int() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_long() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_byte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_ushort() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_uint() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_ulong() : 7
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------


❯ ~/stress_tester.py $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Xor
Starting test: /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Xor
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_sbyte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_short() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_int() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_long() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_byte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_ushort() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_uint() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_ulong() : 7
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

And,AndAcross,Or,OrAcross,Xor,XorAcross
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 1, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@@ -4802,11 +4802,11 @@ void CodeGen::genArm64EmitterUnitTestsSve()
INS_OPTS_SCALABLE_D); // CLASTB <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>

// IF_SVE_CN_3A
theEmitter->emitIns_R_R_R(INS_sve_clasta, EA_2BYTE, REG_V12, REG_P1, REG_V15, INS_OPTS_SCALABLE_H,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same changes as for AddAcross in previous PR - the size arg is not used, as the sizes are dependant on opts.

@@ -2919,7 +2919,10 @@ void emitter::emitInsSve_R_R_R(instruction ins,

if (sopt == INS_SCALABLE_OPTS_UNPREDICATED)
{
assert(opt == INS_OPTS_SCALABLE_D);
// The instruction only has a .D variant. However, this doesn't matter as
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this prevents adding special cases in hwinstrinccodegen.

@a74nh a74nh marked this pull request as ready for review May 1, 2024 12:49
@a74nh
Copy link
Contributor Author

a74nh commented May 1, 2024

@dotnet/arm64-contrib @kunalspathak

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some nit comments

/// MOVPRFX Zresult, Zop1; AND Zresult.B, Pg/M, Zresult.B, Zop2.B
/// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// AND Ztied1.B, Pg/M, Ztied1.B, Zop2.B
/// AND Ztied2.B, Pg/M, Ztied2.B, Zop1.B
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// AND Ztied2.B, Pg/M, Ztied2.B, Zop1.B

Why do we have 2 entries of the predicated version? Here and elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 250 is saying a = AND(a, b), whereas line 251 is showing b = AND(b, a)

It's a little awkward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to list every possible variant here nor the mov instructions required to handle RMW cases, we definitely don't do that for any other intrinsics across Arm64, x64, or WASM.

The main intent is really just to give a brief overview of the C/C++ intrinsic and the primary hardware instruction emitted so that users can map things more easily and know the primary location to lookup to understand the instruction (kind of like a see-also).

Ideally we'd be able to basically quote the Arm64 architecture manual and give a better description (with the notes we currently have as actual see-also), but said manuals come with an explicit copyright/proprietary notice and so cannot be reproduced without express written permission (which means getting legal of both companies involved and getting the relevant agreement put together). So this is the next best thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is to say, I think we can just do what we do for other ISAs and simplify it down to a few lines:

/// svuint8_t svand[_u8]_m(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// svuint8_t svand[_u8]_z(svbool_t pg, svuint8_t op1, svuint8_t op2)
///   AND <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>
///   AND <Zd>.D, <Zn>.D, <Zm>.D
///   AND <Zdn>.<T>, <Zdn>.<T>, #<const>
/// svbool_t svand[_b]_z(svbool_t pg, svbool_t op1, svbool_t op2)
///  AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

Which covers the 4x C/C++ intrinsics that map to this API and the 4x instruction entries that map, without getting into the implementation details of exactly how operands map to registers, how boilerplate instructions to handle RMW considerations are emitted (like mov, movprfx, etc), and without getting into how predication maps to the instructions (which is something to handle in a general conceptual doc that intrinsics can link to, not repeated per doc page).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I was allowing movprfx because that is something special in SVE land, but again I think it is an implementation RMW detail which we do not need in the summary docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoyingly I don't think that can be scripted. But, agreed with the approach, we'll have to simplify manually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fixed, but it's in the same style as existing:

        /// <summary>
        /// svuint8_t svand[_u8]_m(svbool_t pg, svuint8_t op1, svuint8_t op2)
        /// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2)
        /// svuint8_t svand[_u8]_z(svbool_t pg, svuint8_t op1, svuint8_t op2)
        ///   AND Ztied1.B, Pg/M, Ztied1.B, Zop2.B
        ///   AND Zresult.D, Zop1.D, Zop2.D
        /// svbool_t svand[_b]_z(svbool_t pg, svbool_t op1, svbool_t op2)
        ///   AND Presult.B, Pg/Z, Pop1.B, Pop2.B
        /// </summary>

Easier to update from the existing autogenerated and the tied is useful information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the autogenerated, you can skip outputting the ones that has movprfx. Can you refresh my memory of what is tied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the autogenerated, you can skip outputting the ones that has movprfx.

I'll do that

Can you refresh my memory of what is tied?

Both args marked as tied are the same register. RW semantics.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak kunalspathak merged commit f01a146 into dotnet:main May 3, 2024
161 of 168 checks passed
@a74nh a74nh deleted the simple_bitwise_github branch May 3, 2024 15:55
michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
* JIT ARM64-SVE: Add simple bitwise ops

And,AndAcross,Or,OrAcross,Xor,XorAcross

* Fix fadda

* Fix unpkh/fexpa/frecpe

* Reorder System.Runtime.Intrinsics.cs

* Fix API head comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime.Intrinsics arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants