Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] Implement ASIMD widening, narrowing, saturating intrinsics #35612

Conversation

echesakov
Copy link
Contributor

@echesakov echesakov commented Apr 29, 2020

Implement API as approved in #32512 (with corrections regarding their ISA class - see my comment here)

Note that I am using the following temporary names as per comment:

  • AddReturningHighNarrowUpper and AddReturningHighNarrowLower
  • AddReturningRoundedHighNarrowUpper and AddReturningRoundedHighNarrowLower
  • SubtractReturningHighNarrowUpper and SubtractReturningHighNarrowLower.

I will update the methods names after the next round of API review.

The change is quite straightforward - the special handling is required only for AddWideningLower, AddWideningUpper, SubractWideningLower and SubtractWideningUpper since each of those are mapped to four different instructions (saddl{2}, saddw{2}, uaddl{2} and uaddw{2}; ssubl{2}, ssubw{2}, usubl{2} and usubw{2}) and do not fit into our table-driven model.

The change depends on removing HW_Flag_UnfixedSIMDSize in #35594 - I didn't want to keep adding the flag for the added intrinsics for no particular reason.

Fixes #32512

@echesakov echesakov added arch-arm64 area-System.Runtime.Intrinsics area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Apr 29, 2020
@ghost
Copy link

ghost commented Apr 29, 2020

Tagging subscribers to this area: @tannergooding
Notify danmosemsft if you want to be subscribed.

@Dotnet-GitSync-Bot
Copy link
Collaborator

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@echesakov
Copy link
Contributor Author

@CarolEidt @tannergooding PTAL

@echesakov
Copy link
Contributor Author

@TamarChristinaArm PTAL

@echesakov echesakov force-pushed the Arm64-ASIMD-Widening-Narrowing-Saturating-Intrinsics branch from f5004a2 to 97e7d77 Compare April 30, 2020 02:46
@echesakov
Copy link
Contributor Author

Rebased on top of 1f436e0

if ((intrinsic == NI_AdvSimd_AddWideningUpper) || (intrinsic == NI_AdvSimd_SubtractWideningUpper))
{
assert(varTypeIsSIMD(op1->TypeGet()));
retNode->AsHWIntrinsic()->SetOtherBaseType(getBaseTypeOfSIMDType(argClass));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do widening and narrowing need an "other" base type? The return type is always twice the size and of the same signedness of the inputs for these operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do widening and narrowing need an "other" base type?

Narrowing intrinsics do not need - only AddWideningUpper and SubractWideningUpper do.

They don't need the operand 1 type itself (which is always going to be TYP_SIMD16) but the operand 1 element type so I can distinguish between each pair of these three

Vector128<int> AddWideningUpper(Vector128<short> left, Vector128<short> right);
Vector128<short> AddWideningUpper(Vector128<short> left, Vector128<sbyte> right);
Vector128<int> AddWideningUpper(Vector128<int> left, Vector128<short> right);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just use the second argument for the base type? That is what HW_Flag_BaseTypeFromSecondArg is for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nevermind, I missed that int = short + short and int = int + short need to be differentiated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - I renamed the gtIndexBaseType to gtOtherBaseType when I restructured the IR, because I figured there would be cases aside from gather where we'd need an additional base type - I didn't think it would be used that quickly :-)

assert(varTypeIsIntegral(intrin.baseType));
if (intrin.op1->TypeGet() == TYP_SIMD8)
{
ins = varTypeIsUnsigned(intrin.baseType) ? INS_uaddl : INS_saddl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to get rid of this logic once we change the tables to take simdSize into account during lookup, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can if we make simdSize and baseType of an intrinsic "independent" on each other - i.e. if we in addition to HW_Flag_BaseTypeFromFirstArg, HW_Flag_BaseTypeFromSecondArg we have HW_Flag_SimdSizeFromFirstArg, HW_Flag_SimdSizeFromSecondArg which would allow, for example, to set simdSize based on op1 and baseType from op2.

Then, the corresponding rows in hwintrinsiclistarm64.h could be updated as

HARDWARE_INTRINSIC(AdvSimd, AddWideningLower,  8, 2, {INS_saddl, INS_uaddl,    … HW_Flag_SimdSizeFromFirstArg|HW_Flag_BaseTypeFromSecondArg)
HARDWARE_INTRINSIC(AdvSimd, AddWideningLower, 16, 2, {INS_saddw, INS_uaddw,    … HW_Flag_SimdSizeFromFirstArg|HW_Flag_BaseTypeFromSecondArg)

HARDWARE_INTRINSIC(AdvSimd, AddWideningUpper,  8, 2, {INS_saddl2, INS_uaddl2,   … HW_Flag_SimdSizeFromFirstArg|HW_Flag_BaseTypeFromSecondArg)
HARDWARE_INTRINSIC(AdvSimd, AddWideningUpper, 16, 2, {INS_saddw2, INS_uaddw2,   … HW_Flag_SimdSizeFromFirstArg|HW_Flag_BaseTypeFromSecondArg)

In other words, if we allow x and y coordinates (where x is baseType and y is a tuple of (intrinsicId, simdSize)) of hwintrinsiclistarm64.h table to be orthogonal then the answer to your question is yes.

…dedHighNarrowUpper in hwintrinsiclistarm64.h
…nd SubtractWideningUpper in hwintrinsiccodegenarm64.cpp
…per and SubtractWideningUpper in hwintrinsic.cpp
@echesakov echesakov force-pushed the Arm64-ASIMD-Widening-Narrowing-Saturating-Intrinsics branch from 97e7d77 to 417c253 Compare April 30, 2020 17:51
@echesakov
Copy link
Contributor Author

Fixed merge conflicts and rebased on top of aa81328

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some non-blocking comments & questions.
I only lightly reviewed the test helper functions and related changes.

if ((intrinsic == NI_AdvSimd_AddWideningUpper) || (intrinsic == NI_AdvSimd_SubtractWideningUpper))
{
assert(varTypeIsSIMD(op1->TypeGet()));
retNode->AsHWIntrinsic()->SetOtherBaseType(getBaseTypeOfSIMDType(argClass));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - I renamed the gtIndexBaseType to gtOtherBaseType when I restructured the IR, because I figured there would be cases aside from gather where we'd need an additional base type - I didn't think it would be used that quickly :-)

@@ -205,7 +205,11 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)
emitAttr emitSize;
insOpts opt = INS_OPTS_NONE;

if ((intrin.category == HW_Category_SIMDScalar) || (intrin.category == HW_Category_Scalar))
if (intrin.category == HW_Category_SIMDScalar)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here highlight the fact that there are many subtleties with regard to emit sizes - the "actual type" of the baseType, the "raw" (emitTypeSize) of the baseType, and the emitSize of the node itself. I think it would be worth some additional comments both here and in the cases where we use emitTypeSize(node) for moves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I will follow up and add the comments

case NI_AdvSimd_AddWideningUpper:
case NI_AdvSimd_SubtractWideningLower:
case NI_AdvSimd_SubtractWideningUpper:
GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is identical to case NI_Crc32_ComputeCrc32: et al. Is there a reason you didn't combine it? Is the idea to keep them in order, and rely on the C++ compiler to de-duplicate the code? (I see there's some duplication already).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just missed this. Thanks for spotting this.

I will follow up and de-duplicate this in a separate PR and I also need to address one of your another suggestions here

@echesakov
Copy link
Contributor Author

Both runtime (Mono Product Build Android x86 debug) and runtime (Mono Product Build OSX x64 debug) has succeeded on the second attempt - https://dev.azure.com/dnceng/public/_build/results?buildId=625278 - merging

@echesakov echesakov merged commit a156293 into dotnet:master May 1, 2020
@echesakov echesakov deleted the Arm64-ASIMD-Widening-Narrowing-Saturating-Intrinsics branch May 1, 2020 02:47
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI area-System.Runtime.Intrinsics new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ARM additional arithmetic intrinsics
4 participants