[TTI] Provide a cost for memset_pattern which matches the libcall #139978

preames · 2025-05-14T23:20:15Z

The motivation is that differences in unrolling were noticed when trying to switch from the libcall to the intrinsic. There are likely also differences not yet noticed in other cost based decisions - such as inlining, and possibly vectorization.

Neither cost is a good, well considered, cost but for the moment, let's have them be equal to simplify migration. We can come back and refine this once we have it being exercised by default.

The motivation is that differences in unrolling were noticed when trying to swich from the libcall to the intrinsic. Neither cost is a good, well considered cost, but for the moment, let's have them be equal to simplify migration.

llvmbot · 2025-05-14T23:20:44Z

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-analysis

Author: Philip Reames (preames)

Changes

The motivation is that differences in unrolling were noticed when trying to switch from the libcall to the intrinsic. There are likely also differences not yet noticed in other cost based decisions - such as inlining, and possibly vectorization.

Neither cost is a good, well considered, cost but for the moment, let's have them be equal to simplify migration. We can come back and refine this once we have it being exercised by default.

Full diff: https://github.com/llvm/llvm-project/pull/139978.diff

2 Files Affected:

(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+3)
(added) llvm/test/Analysis/CostModel/X86/memset-pattern.ll (+40)

diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index ff8778168686d..449e9e8f85561 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2408,6 +2408,9 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
                                           CmpInst::ICMP_ULT, CostKind);
       return Cost;
     }
+    case Intrinsic::experimental_memset_pattern:
+      // This cost is set to match the cost of the memset_pattern16 libcall
+      return TTI::TCC_Basic * 4;
     case Intrinsic::abs:
       ISD = ISD::ABS;
       break;
diff --git a/llvm/test/Analysis/CostModel/X86/memset-pattern.ll b/llvm/test/Analysis/CostModel/X86/memset-pattern.ll
new file mode 100644
index 0000000000000..aa0c6efdf34fa
--- /dev/null
+++ b/llvm/test/Analysis/CostModel/X86/memset-pattern.ll
@@ -0,0 +1,40 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 4
+; RUN: opt < %s -mtriple=x86_64-apple-darwin10.0.0 -passes="print<cost-model>" 2>&1 -disable-output | FileCheck %s
+
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
+
+target triple = "x86_64-apple-darwin10.0.0"
+
+@.memset_pattern = private unnamed_addr constant [4 x i32] [i32 2, i32 2, i32 2, i32 2], align 16
+
+define void @via_libcall(ptr %p) nounwind ssp {
+; CHECK-LABEL: 'via_libcall'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: call void @memset_pattern4(ptr %p, ptr @.memset_pattern, i64 200)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: call void @memset_pattern8(ptr %p, ptr @.memset_pattern, i64 200)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: call void @memset_pattern16(ptr %p, ptr @.memset_pattern, i64 200)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+  call void @memset_pattern4(ptr %p, ptr @.memset_pattern, i64 200)
+  call void @memset_pattern8(ptr %p, ptr @.memset_pattern, i64 200)
+  call void @memset_pattern16(ptr %p, ptr @.memset_pattern, i64 200)
+  ret void
+}
+
+declare void @memset_pattern4(ptr, ptr, i64)
+declare void @memset_pattern8(ptr, ptr, i64)
+declare void @memset_pattern16(ptr, ptr, i64)
+
+define void @via_intrinsic(ptr %p) {
+; CHECK-LABEL: 'via_intrinsic'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.experimental.memset.pattern.p0.i16.i64(ptr align 4 %p, i16 2, i64 100, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.experimental.memset.pattern.p0.i32.i64(ptr align 4 %p, i32 2, i64 50, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.experimental.memset.pattern.p0.i64.i64(ptr align 4 %p, i64 2, i64 25, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.experimental.memset.pattern.p0.i128.i64(ptr align 4 %p, i128 2, i64 12, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+  call void @llvm.experimental.memset.pattern(ptr align 4 %p, i16 2, i64 100, i1 false)
+  call void @llvm.experimental.memset.pattern(ptr align 4 %p, i32 2, i64 50, i1 false)
+  call void @llvm.experimental.memset.pattern(ptr align 4 %p, i64 2, i64 25, i1 false)
+  call void @llvm.experimental.memset.pattern(ptr align 4 %p, i128 2, i64 12, i1 false)
+  ret void
+}

preames · 2025-05-23T18:01:36Z

ping

asb

I think this is a pragmatic migration aid - makes sense to me. Perhaps the comment could be adjusted to make it clearer that this isn't necessarily the best cost long term. e.g. "This cost is set to match the cost of the memset_pattern16 libcall. It should likely be re-evaluated after migration to this intrinsic is complete."

…vm#139978) The motivation is that differences in unrolling were noticed when trying to switch from the libcall to the intrinsic. There are likely also differences not yet noticed in other cost based decisions - such as inlining, and possibly vectorization. Neither cost is a good, well considered, cost but for the moment, let's have them be equal to simplify migration. We can come back and refine this once we have it being exercised by default.

preames added 2 commits May 14, 2025 16:08

Precommit tests

aee4e57

preames requested review from asb, nikic and david-arm May 14, 2025 23:20

llvmbot added backend:X86 llvm:analysis labels May 14, 2025

asb approved these changes May 28, 2025

View reviewed changes

preames added 2 commits May 28, 2025 09:25

Address review comment

f6d068b

Merge branch 'main' into pr-cost-for-memset-pattern

db3e1a2

preames merged commit ff5095d into llvm:main May 28, 2025
6 of 9 checks passed

preames deleted the pr-cost-for-memset-pattern branch May 28, 2025 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TTI] Provide a cost for memset_pattern which matches the libcall #139978

[TTI] Provide a cost for memset_pattern which matches the libcall #139978

Uh oh!

preames commented May 14, 2025

Uh oh!

llvmbot commented May 14, 2025 •

edited

Loading

Uh oh!

preames commented May 23, 2025

Uh oh!

asb left a comment

Uh oh!

Uh oh!

Uh oh!

[TTI] Provide a cost for memset_pattern which matches the libcall #139978

[TTI] Provide a cost for memset_pattern which matches the libcall #139978

Uh oh!

Conversation

preames commented May 14, 2025

Uh oh!

llvmbot commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

preames commented May 23, 2025

Uh oh!

asb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented May 14, 2025 •

edited

Loading