Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] Implement FMA, FMS, MLA, MLS #31899

Conversation

@echesakovMSFT
Copy link
Member

echesakovMSFT commented Feb 7, 2020

This implements the fused multiply-add and multiply-subtract intrincsics and multiply-add multiply-subtract for integer types.

Note that this PR doesn't have an implementation for FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>] - this will be done in a separate PR.

Part of #24794

case NI_AdvSimd_Arm64_FusedMultiplyAdd:
case NI_AdvSimd_Arm64_FusedMultiplySubtract:
case NI_AdvSimd_MultiplyAdd:
case NI_AdvSimd_MultiplySubtract:

This comment has been minimized.

Copy link
@tannergooding

tannergooding Feb 7, 2020

Member

It might be worth checking the x86 handling here. Given it is three operands and doing a * b + c there was special handling added such that a and b could still be considered commutative (etc).

x86 also has a (I believe in lowering) optimization for converting things like FusedMultiplyAdd(a, b, -c) to a FusedMultiplySubtract(a, b, c)

This comment has been minimized.

Copy link
@echesakovMSFT

echesakovMSFT Feb 7, 2020

Author Member

@tannergooding

It might be worth checking the x86 handling here. Given it is three operands and doing a * b + c there was special handling added such that a and b could still be considered commutative (etc).

Yes, there is a special handling on x86 but, as far as I understand, it takes into account whether any of a, b or c operands is a memory operand and whether the upper bits of operands are preserved.

On arm64 there is only one form for each of the instructions (e.g. FMLA <Vc>, <Va>, <Vb>) so I am not sure what else we can do here?

x86 also has a (I believe in lowering) optimization for converting things like FusedMultiplyAdd(a, b, -c) to a FusedMultiplySubtract(a, b, c)

Yes, @EgorBo added this optimization in dotnet/coreclr#27060. I was planning working on these later but not as part of this PR.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Feb 7, 2020

Member

Some of the x86 handling is because you can have op1 * op3 + op2 or op2 * op1 + op3 or op2 * op3 + op1. That handling is x86 specific.

There is also handling for allowing a * b + c or b * a + c since the multiplication is commutative. That handling also applies to ARM.

I was planning working on these later but not as part of this PR.

Sounds good to me, as long as it is being tracked.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Feb 7, 2020

Member

Ah, I think I see.

Since it takes 4 registers (a destination and three inputs) there is no need for it to be commutative. I had misread and thought it was 3 inputs where one was also the destination.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Feb 7, 2020

Member

The vector version (FMLA) looks interesting however, since it does only take 3 registers where the accumulator is also the destination

This comment has been minimized.

Copy link
@echesakovMSFT

echesakovMSFT Feb 7, 2020

Author Member

What instructions are we talking about here?

  • FMADD, FNMADD, FMSUB, FNMSUB take four registers - destination and three source registers - these instructions are emitted during FusedMultiplyAddScalar, FusedMultiplyAddNegatedScalar, FusedMultiplySubtractScalar, FusedMultiplySubtractNegatedScalar. LSRA does not have any special logic for those.

  • On other hand, FMLA, FMLS, MLA, MLS take only three - destination and incoming accumulator operand register and two multipliers' registers. Those are used for vector forms of FusedMultiplyAdd, FusedMultiplySubract, MultiplyAdd, MultiplySubtract. The logic in lsraarm64.cpp applies only to these instructions/intrinsics. My question how we can use commutativity of operands in this particular case?

This comment has been minimized.

Copy link
@tannergooding

tannergooding Feb 7, 2020

Member

I had been looking at FMLA initially as it only takes three and I missed that FMADD took four.
I had also thought that the src/dest operand was one of the multipliers and so commutativity mattered.

@@ -210,6 +210,18 @@ public abstract partial class AdvSimd : System.Runtime.Intrinsics.Arm.ArmBase
public static System.Runtime.Intrinsics.Vector64<uint> CompareTest(System.Runtime.Intrinsics.Vector64<uint> left, System.Runtime.Intrinsics.Vector64<uint> right) { throw null; }
public static System.Runtime.Intrinsics.Vector64<double> DivideScalar(System.Runtime.Intrinsics.Vector64<double> left, System.Runtime.Intrinsics.Vector64<double> right) { throw null; }
public static System.Runtime.Intrinsics.Vector64<float> DivideScalar(System.Runtime.Intrinsics.Vector64<float> left, System.Runtime.Intrinsics.Vector64<float> right) { throw null; }
public static System.Runtime.Intrinsics.Vector128<float> FusedMultiplyAdd(System.Runtime.Intrinsics.Vector128<float> acc, System.Runtime.Intrinsics.Vector128<float> left, System.Runtime.Intrinsics.Vector128<float> right) { throw null; }

This comment has been minimized.

Copy link
@tannergooding

tannergooding Feb 7, 2020

Member

We may want to discuss the ordering in API review.

The scalar versions are FMADD <Sd>, <Sn>, <Sm>, <Sa>, where it does Sd = Sa + Sn * Sm (registers ordered dest, left, right, acc)

The vector versions are FMLA <Vd>, <Vn>, <Vm> where it does Vd = Vd + Vn * Vm (registers ordered dest/acc, left, right as you have here)

x86 intrinsics and the public Math.FusedMultiplyAdd methods are all dest, left, right, acc

Copy link
Member

CarolEidt left a comment

LGTM

…t-Multiply-Add-Subtract
@echesakovMSFT echesakovMSFT merged commit 7b58790 into dotnet:master Feb 11, 2020
95 checks passed
95 checks passed
WIP Ready for review
Details
license/cla All CLA requirements met.
Details
runtime Build #20200210.39 succeeded
Details
runtime (Checkout) Checkout succeeded
Details
runtime (CoreCLR Pri0 Test Build Linux arm checked) CoreCLR Pri0 Test Build Linux arm checked succeeded
Details
runtime (CoreCLR Pri0 Test Build Linux arm64 checked) CoreCLR Pri0 Test Build Linux arm64 checked succeeded
Details
runtime (CoreCLR Pri0 Test Build Linux x64 checked) CoreCLR Pri0 Test Build Linux x64 checked succeeded
Details
runtime (CoreCLR Pri0 Test Build OSX x64 checked) CoreCLR Pri0 Test Build OSX x64 checked succeeded
Details
runtime (CoreCLR Pri0 Test Build Windows_NT arm checked) CoreCLR Pri0 Test Build Windows_NT arm checked succeeded
Details
runtime (CoreCLR Pri0 Test Build Windows_NT arm64 checked) CoreCLR Pri0 Test Build Windows_NT arm64 checked succeeded
Details
runtime (CoreCLR Pri0 Test Build Windows_NT x64 checked) CoreCLR Pri0 Test Build Windows_NT x64 checked succeeded
Details
runtime (CoreCLR Pri0 Test Build Windows_NT x86 checked) CoreCLR Pri0 Test Build Windows_NT x86 checked succeeded
Details
runtime (CoreCLR Pri0 Test Run Linux arm checked) CoreCLR Pri0 Test Run Linux arm checked succeeded
Details
runtime (CoreCLR Pri0 Test Run Linux arm64 checked) CoreCLR Pri0 Test Run Linux arm64 checked succeeded
Details
runtime (CoreCLR Pri0 Test Run Linux x64 checked) CoreCLR Pri0 Test Run Linux x64 checked succeeded
Details
runtime (CoreCLR Pri0 Test Run OSX x64 checked) CoreCLR Pri0 Test Run OSX x64 checked succeeded
Details
runtime (CoreCLR Pri0 Test Run Windows_NT arm checked) CoreCLR Pri0 Test Run Windows_NT arm checked succeeded
Details
runtime (CoreCLR Pri0 Test Run Windows_NT x64 checked) CoreCLR Pri0 Test Run Windows_NT x64 checked succeeded
Details
runtime (CoreCLR Pri0 Test Run Windows_NT x86 checked) CoreCLR Pri0 Test Run Windows_NT x86 checked succeeded
Details
runtime (CoreCLR Product Build Linux arm checked) CoreCLR Product Build Linux arm checked succeeded
Details
runtime (CoreCLR Product Build Linux arm release) CoreCLR Product Build Linux arm release succeeded
Details
runtime (CoreCLR Product Build Linux arm64 checked) CoreCLR Product Build Linux arm64 checked succeeded
Details
runtime (CoreCLR Product Build Linux arm64 release) CoreCLR Product Build Linux arm64 release succeeded
Details
runtime (CoreCLR Product Build Linux x64 checked) CoreCLR Product Build Linux x64 checked succeeded
Details
runtime (CoreCLR Product Build Linux x64 release) CoreCLR Product Build Linux x64 release succeeded
Details
runtime (CoreCLR Product Build Linux_musl arm64 release) CoreCLR Product Build Linux_musl arm64 release succeeded
Details
runtime (CoreCLR Product Build Linux_musl x64 checked) CoreCLR Product Build Linux_musl x64 checked succeeded
Details
runtime (CoreCLR Product Build Linux_musl x64 release) CoreCLR Product Build Linux_musl x64 release succeeded
Details
runtime (CoreCLR Product Build OSX x64 checked) CoreCLR Product Build OSX x64 checked succeeded
Details
runtime (CoreCLR Product Build OSX x64 release) CoreCLR Product Build OSX x64 release succeeded
Details
runtime (CoreCLR Product Build Windows_NT arm checked) CoreCLR Product Build Windows_NT arm checked succeeded
Details
runtime (CoreCLR Product Build Windows_NT arm release) CoreCLR Product Build Windows_NT arm release succeeded
Details
runtime (CoreCLR Product Build Windows_NT arm64 checked) CoreCLR Product Build Windows_NT arm64 checked succeeded
Details
runtime (CoreCLR Product Build Windows_NT arm64 release) CoreCLR Product Build Windows_NT arm64 release succeeded
Details
runtime (CoreCLR Product Build Windows_NT x64 checked) CoreCLR Product Build Windows_NT x64 checked succeeded
Details
runtime (CoreCLR Product Build Windows_NT x64 release) CoreCLR Product Build Windows_NT x64 release succeeded
Details
runtime (CoreCLR Product Build Windows_NT x86 checked) CoreCLR Product Build Windows_NT x86 checked succeeded
Details
runtime (CoreCLR Product Build Windows_NT x86 release) CoreCLR Product Build Windows_NT x86 release succeeded
Details
runtime (Formatting Linux x64) Formatting Linux x64 succeeded
Details
runtime (Formatting Windows_NT x64) Formatting Windows_NT x64 succeeded
Details
runtime (Installer Build and Test Linux_arm Debug) Installer Build and Test Linux_arm Debug succeeded
Details
runtime (Installer Build and Test Linux_arm64 Debug) Installer Build and Test Linux_arm64 Debug succeeded
Details
runtime (Installer Build and Test Linux_musl_arm64 Debug) Installer Build and Test Linux_musl_arm64 Debug succeeded
Details
runtime (Installer Build and Test Linux_musl_x64 Debug) Installer Build and Test Linux_musl_x64 Debug succeeded
Details
runtime (Installer Build and Test Linux_x64 Debug) Installer Build and Test Linux_x64 Debug succeeded
Details
runtime (Installer Build and Test OSX_x64 Debug) Installer Build and Test OSX_x64 Debug succeeded
Details
runtime (Installer Build and Test Windows_NT_arm Debug) Installer Build and Test Windows_NT_arm Debug succeeded
Details
runtime (Installer Build and Test Windows_NT_arm64 Debug) Installer Build and Test Windows_NT_arm64 Debug succeeded
Details
runtime (Installer Build and Test Windows_NT_x64 Debug) Installer Build and Test Windows_NT_x64 Debug succeeded
Details
runtime (Installer Build and Test Windows_NT_x86 Debug) Installer Build and Test Windows_NT_x86 Debug succeeded
Details
runtime (Libraries Build Linux arm Release) Libraries Build Linux arm Release succeeded
Details
runtime (Libraries Build Linux arm64 Debug) Libraries Build Linux arm64 Debug succeeded
Details
runtime (Libraries Build Linux x64 Debug) Libraries Build Linux x64 Debug succeeded
Details
runtime (Libraries Build Linux_musl arm64 Release) Libraries Build Linux_musl arm64 Release succeeded
Details
runtime (Libraries Build Linux_musl x64 Debug) Libraries Build Linux_musl x64 Debug succeeded
Details
runtime (Libraries Build OSX x64 Debug) Libraries Build OSX x64 Debug succeeded
Details
runtime (Libraries Build WebAssembly wasm Debug) Libraries Build WebAssembly wasm Debug succeeded
Details
runtime (Libraries Build Windows_NT allConfigurations x64 Debug) Libraries Build Windows_NT allConfigurations x64 Debug succeeded
Details
runtime (Libraries Build Windows_NT arm Release) Libraries Build Windows_NT arm Release succeeded
Details
runtime (Libraries Build Windows_NT arm64 Release) Libraries Build Windows_NT arm64 Release succeeded
Details
runtime (Libraries Build Windows_NT net472 x86 Release) Libraries Build Windows_NT net472 x86 Release succeeded
Details
runtime (Libraries Build Windows_NT x64 Debug) Libraries Build Windows_NT x64 Debug succeeded
Details
runtime (Libraries Build Windows_NT x86 Debug) Libraries Build Windows_NT x86 Debug succeeded
Details
runtime (Libraries Build Windows_NT x86 Release) Libraries Build Windows_NT x86 Release succeeded
Details
runtime (Libraries Test Build Linux x64 Debug) Libraries Test Build Linux x64 Debug succeeded
Details
runtime (Libraries Test Build OSX x64 Debug) Libraries Test Build OSX x64 Debug succeeded
Details
runtime (Libraries Test Build Windows_NT x64 Debug) Libraries Test Build Windows_NT x64 Debug succeeded
Details
runtime (Libraries Test Run checked coreclr Linux x64 Debug) Libraries Test Run checked coreclr Linux x64 Debug succeeded
Details
runtime (Libraries Test Run checked coreclr Linux_musl x64 Debug) Libraries Test Run checked coreclr Linux_musl x64 Debug succeeded
Details
runtime (Libraries Test Run checked coreclr OSX x64 Debug) Libraries Test Run checked coreclr OSX x64 Debug succeeded
Details
runtime (Libraries Test Run checked coreclr Windows_NT x64 Debug) Libraries Test Run checked coreclr Windows_NT x64 Debug succeeded
Details
runtime (Libraries Test Run checked coreclr Windows_NT x86 Release) Libraries Test Run checked coreclr Windows_NT x86 Release succeeded
Details
runtime (Libraries Test Run release coreclr Linux x64 Debug) Libraries Test Run release coreclr Linux x64 Debug succeeded
Details
runtime (Libraries Test Run release coreclr Linux_musl x64 Debug) Libraries Test Run release coreclr Linux_musl x64 Debug succeeded
Details
runtime (Libraries Test Run release coreclr OSX x64 Debug) Libraries Test Run release coreclr OSX x64 Debug succeeded
Details
runtime (Libraries Test Run release coreclr Windows_NT x64 Debug) Libraries Test Run release coreclr Windows_NT x64 Debug succeeded
Details
runtime (Libraries Test Run release coreclr Windows_NT x86 Debug) Libraries Test Run release coreclr Windows_NT x86 Debug succeeded
Details
runtime (Libraries Test Run release coreclr Windows_NT x86 Release) Libraries Test Run release coreclr Windows_NT x86 Release succeeded
Details
runtime (Libraries Test Run release mono Linux x64 Debug) Libraries Test Run release mono Linux x64 Debug succeeded
Details
runtime (Libraries Test Run release mono OSX x64 Debug) Libraries Test Run release mono OSX x64 Debug succeeded
Details
runtime (Mono Product Build Linux x64 debug) Mono Product Build Linux x64 debug succeeded
Details
runtime (Mono Product Build Linux x64 release) Mono Product Build Linux x64 release succeeded
Details
runtime (Mono Product Build Linux_musl x64 debug) Mono Product Build Linux_musl x64 debug succeeded
Details
runtime (Mono Product Build Linux_musl x64 release) Mono Product Build Linux_musl x64 release succeeded
Details
runtime (Mono Product Build OSX x64 debug) Mono Product Build OSX x64 debug succeeded
Details
runtime (Mono Product Build OSX x64 release) Mono Product Build OSX x64 release succeeded
Details
runtime (Mono Product Build Windows_NT x64 debug) Mono Product Build Windows_NT x64 debug succeeded
Details
runtime (Mono Product Build Windows_NT x64 release) Mono Product Build Windows_NT x64 release succeeded
Details
runtime (Test crossgen-comparison Linux arm checked) Test crossgen-comparison Linux arm checked succeeded
Details
runtime-live-build Build #20200210.37 succeeded
Details
runtime-live-build (Build Linux x64 debug RuntimeFlavor_Mono) Build Linux x64 debug RuntimeFlavor_Mono succeeded
Details
runtime-live-build (Build Linux x64 debug Runtime_Release) Build Linux x64 debug Runtime_Release succeeded
Details
runtime-live-build (Build OSX x64 release Runtime_Debug) Build OSX x64 release Runtime_Debug succeeded
Details
runtime-live-build (Build Windows_NT x86 release Runtime_Debug) Build Windows_NT x86 release Runtime_Debug succeeded
Details
runtime-live-build (Checkout) Checkout succeeded
Details
@echesakovMSFT echesakovMSFT deleted the echesakovMSFT:Arm64-Fused-Or-Not-Multiply-Add-Subtract branch Feb 11, 2020
alistairjevans added a commit to alistairjevans/runtime that referenced this pull request Feb 11, 2020
* Add "FusedMultiplyAdd" and "FusedMultiplyAddScalar" in AdvSimd.cs AdvSimd.PlatformNotSupported.cs

* Add "FusedMultiplySubtract" and "FusedMultiplySubtractScalar" in AdvSimd.cs AdvSimd.PlatformNotSupported.cs

* Add "FusedMultiplyAddNegatedScalar" in AdvSimd.cs AdvSimd.PlatformNotSupported.cs

* Add "FusedMultiplySubtractNegatedScalar" in AdvSimd.cs AdvSimd.PlatformNotSupported.cs

* Add "MultiplyAdd" and "MultiplyAddScalar" in AdvSimd.cs AdvSimd.PlatformNotSupported.cs

* Add "MultiplySubtract" and "MultiplySubtractScalar" in AdvSimd.cs AdvSimd.PlatformNotSupported.cs

* Update System.Runtime.Intrinsics.Experimental.cs

* Add fused multiply-add and multiply-subtract intrinsics in hwintrinsiclistarm64.h

* Add "MultiplyAdd" and "MultiplySubtract" in hwintrinsiclistarm64.h

* Implement fused multiply-add and multiply-subtract intrinsics in hwintrinsiccodegenarm64.cpp

* Add fused multiply-add and multiply-subtract intrinsics in GenerateTests.csx

* Mark uses of op2 and op3 of fma, fms, mla, mls intrinsics as delay free in lsraarm64.cpp

* Add "MultiplyAdd" in GenerateTests.csx

* Add "MultiplySubtract" in GenerateTests.csx

* Update AdvSimd/ AdvSimd.Arm64/

* Update Helpers.cs Helpers.tt
safern added a commit that referenced this pull request Feb 11, 2020
This reverts commit 7b58790.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.