AMDGPU: Reduce cost of f64 copysign #141944

arsenm · 2025-05-29T13:37:04Z

The real implementation is 1 real instruction plus a constant
materialize. Call that a 1, it's not a real f64 operation.

arsenm · 2025-05-29T13:37:22Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-05-29T13:37:58Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

The real implementation is 1 real instruction plus a constant
materialize. Call that a 1, it's not a real f64 operation.

Full diff: https://github.com/llvm/llvm-project/pull/141944.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+7-5)
(modified) llvm/test/Analysis/CostModel/AMDGPU/copysign.ll (+16-16)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 0dbaf7c548f89..c1ccc8f6798a6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -718,9 +718,6 @@ GCNTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
 
   MVT::SimpleValueType SLT = LT.second.getScalarType().SimpleTy;
 
-  if (SLT == MVT::f64)
-    return LT.first * NElts * get64BitInstrCost(CostKind);
-
   if ((ST->hasVOP3PInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
       (ST->hasPackedFP32Ops() && SLT == MVT::f32))
     NElts = (NElts + 1) / 2;
@@ -731,6 +728,11 @@ GCNTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
   switch (ICA.getID()) {
   case Intrinsic::fma:
   case Intrinsic::fmuladd:
+    if (SLT == MVT::f64) {
+      InstRate = get64BitInstrCost(CostKind);
+      break;
+    }
+
     if ((SLT == MVT::f32 && ST->hasFastFMAF32()) || SLT == MVT::f16)
       InstRate = getFullRateInstrCost();
     else {
@@ -741,8 +743,8 @@ GCNTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
   case Intrinsic::copysign:
     return NElts * getFullRateInstrCost();
   case Intrinsic::canonicalize: {
-    assert(SLT != MVT::f64);
-    InstRate = getFullRateInstrCost();
+    InstRate =
+        SLT == MVT::f64 ? get64BitInstrCost(CostKind) : getFullRateInstrCost();
     break;
   }
   case Intrinsic::uadd_sat:
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
index 334bb341a3c3e..5b042a8a04603 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
@@ -245,25 +245,25 @@ define void @copysign_bf16() {
 
 define void @copysign_f64() {
 ; ALL-LABEL: 'copysign_f64'
-; ALL-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %f64 = call double @llvm.copysign.f64(double undef, double undef)
-; ALL-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v2f64 = call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
-; ALL-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %v3f64 = call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
-; ALL-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %v4f64 = call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
-; ALL-NEXT:  Cost Model: Found an estimated cost of 96 for instruction: %v5f64 = call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
-; ALL-NEXT:  Cost Model: Found an estimated cost of 96 for instruction: %v8f64 = call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
-; ALL-NEXT:  Cost Model: Found an estimated cost of 256 for instruction: %v9f64 = call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x double> undef)
-; ALL-NEXT:  Cost Model: Found an estimated cost of 320 for instruction: %v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x double> undef)
+; ALL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %f64 = call double @llvm.copysign.f64(double undef, double undef)
+; ALL-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2f64 = call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
+; ALL-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v3f64 = call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
+; ALL-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v4f64 = call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
+; ALL-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v5f64 = call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
+; ALL-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v8f64 = call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
+; ALL-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %v9f64 = call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x double> undef)
+; ALL-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x double> undef)
 ; ALL-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: ret void
 ;
 ; ALL-SIZE-LABEL: 'copysign_f64'
-; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %f64 = call double @llvm.copysign.f64(double undef, double undef)
-; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v2f64 = call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
-; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v3f64 = call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
-; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v4f64 = call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
-; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 48 for instruction: %v5f64 = call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
-; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 48 for instruction: %v8f64 = call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
-; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 128 for instruction: %v9f64 = call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x double> undef)
-; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 160 for instruction: %v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x double> undef)
+; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %f64 = call double @llvm.copysign.f64(double undef, double undef)
+; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2f64 = call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
+; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v3f64 = call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
+; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v4f64 = call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
+; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v5f64 = call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
+; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v8f64 = call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
+; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %v9f64 = call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x double> undef)
+; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x double> undef)
 ; ALL-SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
   %f64 = call double @llvm.copysign.f64(double undef, double undef)

Pierre-vh · 2025-06-02T07:41:58Z

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

-    assert(SLT != MVT::f64);
-    InstRate = getFullRateInstrCost();
+    InstRate =
+        SLT == MVT::f64 ? get64BitInstrCost(CostKind) : getFullRateInstrCost();
    break;
  }
  case Intrinsic::uadd_sat:


are those cases below fine with handling f64 now?

They are only integer intrinsics

arsenm · 2025-06-17T22:54:38Z

Merge activity

Jun 17, 10:54 PM UTC: A user started a stack merge that includes this pull request via Graphite.
Jun 17, 11:07 PM UTC: Graphite rebased this pull request as part of a merge.
Jun 17, 11:10 PM UTC: @arsenm merged this pull request with Graphite.

The real implementation is 1 real instruction plus a constant materialize. Call that a 1, it's not a real f64 operation.

This was referenced May 29, 2025

AMDGPU: Directly select minimumnum/maximumnum with ieee_mode=0 #141903

Merged

AMDGPU: Add cost model tests for minimumnum/maximumnum #141904

Merged

AMDGPU: Fix cost model for 16-bit operations on gfx8 #141943

Merged

This was referenced May 29, 2025

AMDGPU: Move fpenvIEEEMode into TTI #141945

Merged

AMDGPU: Cost model for minimumnum/maximumnum #141946

Merged

AMDGPU: Add baseline cost model tests for special argument intrinsics #141947

Merged

AMDGPU: Report special input intrinsics as free #141948

Merged

arsenm added backend:AMDGPU llvm:analysis labels May 29, 2025 — with Graphite App

arsenm requested review from jayfoad, mbrkusanin, piotrAMD and Sisyph May 29, 2025 13:37

arsenm marked this pull request as ready for review May 29, 2025 13:37

Pierre-vh reviewed Jun 2, 2025

View reviewed changes

Pierre-vh approved these changes Jun 4, 2025

View reviewed changes

arsenm force-pushed the users/arsenm/amdgpu/fix-cost-of-16-bit-ops branch from 017304b to d990f79 Compare June 17, 2025 13:21

arsenm force-pushed the users/arsenm/amdgpu/reduce-cost-f64-copysign branch from 19ab42a to 598db89 Compare June 17, 2025 13:21

arsenm force-pushed the users/arsenm/amdgpu/fix-cost-of-16-bit-ops branch from d990f79 to e4fcaab Compare June 17, 2025 15:30

arsenm force-pushed the users/arsenm/amdgpu/reduce-cost-f64-copysign branch from 598db89 to 0ddc81d Compare June 17, 2025 15:31

arsenm force-pushed the users/arsenm/amdgpu/fix-cost-of-16-bit-ops branch from e4fcaab to 7fbe4e2 Compare June 17, 2025 15:33

arsenm force-pushed the users/arsenm/amdgpu/reduce-cost-f64-copysign branch from 0ddc81d to 641ab37 Compare June 17, 2025 15:33

arsenm force-pushed the users/arsenm/amdgpu/fix-cost-of-16-bit-ops branch 3 times, most recently from 5fdd877 to 90d3969 Compare June 17, 2025 23:05

Base automatically changed from users/arsenm/amdgpu/fix-cost-of-16-bit-ops to main June 17, 2025 23:07

AMDGPU: Reduce cost of f64 copysign

9f4bc8d

The real implementation is 1 real instruction plus a constant materialize. Call that a 1, it's not a real f64 operation.

arsenm force-pushed the users/arsenm/amdgpu/reduce-cost-f64-copysign branch from 641ab37 to 9f4bc8d Compare June 17, 2025 23:07

arsenm merged commit 3800a83 into main Jun 17, 2025
5 of 7 checks passed

arsenm deleted the users/arsenm/amdgpu/reduce-cost-f64-copysign branch June 17, 2025 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Reduce cost of f64 copysign #141944

AMDGPU: Reduce cost of f64 copysign #141944

arsenm commented May 29, 2025

Uh oh!

arsenm commented May 29, 2025 •

edited

Loading

Uh oh!

llvmbot commented May 29, 2025 •

edited

Loading

Uh oh!

Pierre-vh Jun 2, 2025

Uh oh!

arsenm Jun 2, 2025

Uh oh!

arsenm commented Jun 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

AMDGPU: Reduce cost of f64 copysign #141944

AMDGPU: Reduce cost of f64 copysign #141944

Conversation

arsenm commented May 29, 2025

Uh oh!

arsenm commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pierre-vh Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Uh oh!

arsenm commented May 29, 2025 •

edited

Loading

llvmbot commented May 29, 2025 •

edited

Loading

arsenm commented Jun 17, 2025 •

edited

Loading