[SLPVectorizer] Use accurate cost for external users of resize shuffles #137419

jrbyrnes · 2025-04-25T23:42:42Z

When implementing the vectorization, we potentially need to add shuffles for external users. In such cases, we may be shuffling a smaller vector into a larger vector. When this happens ResizeToVF will just build a poison padded identity vector. Then the to build the final shuffle, we just use the SK_InsertSubvector mask.

This is possibly clearer by looking at the included test in SLPVectorizer/AMDGPU/external-shuffle.ll

In the exit block we have a bunch of shuffles to glue the vectorized tree match the InsertElement users. TMP25 holds the result of resizing the v2i16 vectorized sequence to match the InsertElement size v16i16. Then TMP26 is the final shuffle which replaces the InsertElement sequence. This is just an insertsubvector.

However, when calculating the cost for these shuffles, we aren't modelling this correctly. ResizeToVF will indicate to performExtractsShuffleAction that we cannot use the original mask due to the resize shuffle. The consequence is that the cost calculation uses a different shuffle mask than what is ultimately used.

Going back to the included test, we can consider again TMP26. Clearly we can see the shuffle uses a mask {0, 1, 2, 3, 16, 17, poison ..}. However, we will currently calculate the cost with a mask {0, 1, 2, 3, 20, 21, ...} we have replaced 16 and 17 with 20 and 21 (Index + Vector Size). Queries like BasicTTImpl::improveShuffleKindFromMask will not recognize this as an SK_InsertSubvector mask, and targets which have reduced costs for SK_InsertSubvector will not accurately calculate the cost.

llvmbot · 2025-04-25T23:43:13Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Jeffrey Byrnes (jrbyrnes)

Changes

When implementing the vectorization, we potentially need to add shuffles for external users. In such cases, we may be shuffling a smaller vector into a larger vector. When this happens ResizeToVF will just build a poison padded identity vector. Then the to build the final shuffle, we just use the SK_InsertSubvector mask.

This is possibly clearer by looking at the included test in SLPVectorizer/AMDGPU/external-shuffle.ll

In the exit block we have a bunch of shuffles to glue the vectorized tree match the InsertElement users. TMP25 holds the result of resizing the v2i16 vectorized sequence to match the InsertElement size v16i16. Then TMP26 is the final shuffle which replaces the InsertElement sequence. This is just an insertsubvector.

However, when calculating the cost for these shuffles, we aren't modelling this correctly. ResizeToVF will indicate to performExtractsShuffleAction that we cannot use the original mask due to the resize shuffle. The consequence is that the cost calculation uses a different shuffle mask than what is ultimately used.

Going back to the included test, we can consider again TMP26. Clearly we can see the shuffle uses a mask {0, 1, 2, 3, 16, 17, poison ..}. However, we will currently calculate the cost with a mask {0, 1, 2, 3, 20, 21, ...} we have replaced 16 and 17 with 20 and 21 (Index + Vector Size). Queries like BasicTTImpl::improveShuffleKindFromMask will not recognize this as an SK_InsertSubvector mask, and targets which have reduced costs for SK_InsertSubvector will not accurately calculate the cost.

Patch is 28.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137419.diff

4 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+34-1)
(added) llvm/test/Transforms/SLPVectorizer/AMDGPU/external-shuffle.ll (+222)
(modified) llvm/test/Transforms/SLPVectorizer/X86/extractelement-single-use-many-nodes.ll (+7-7)
(modified) llvm/test/Transforms/SLPVectorizer/X86/insertelements-with-reused-indices.ll (+5-5)

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 53da78ee599b7..8c261930c0a4c 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -14344,6 +14344,7 @@ static T *performExtractsShuffleAction(
     [[maybe_unused]] auto *V = ValueSelect::get<T *>(Base);
     assert((!V || GetVF(V) == Mask.size()) &&
            "Expected base vector of VF number of elements.");
+
     Prev = Action(Mask, {nullptr, Res.first});
   } else if (ShuffleMask.size() == 1) {
     // Base is undef and only 1 vector is shuffled - perform the action only for
@@ -14802,7 +14803,39 @@ InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals,
                  << " for final shuffle of insertelement external users.\n";
           TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n");
       Cost += C;
-      return std::make_pair(TE, true);
+
+      bool HasLargeIndex =
+          any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); });
+
+      // If the resize source is just an identity vector, then will produce an
+      // insert subvector shufflevector
+      bool NeedsResizeExtract = true;
+      if ((VecVF < VF) && !HasLargeIndex) {
+        NeedsResizeExtract = false;
+        SmallVector<int> ResizeMask(VF, PoisonMaskElem);
+        for (unsigned I = 0; I < VF; ++I) {
+          if (Mask[I] != PoisonMaskElem) {
+            assert((size_t)Mask[I] < ResizeMask.size());
+            ResizeMask[Mask[I]] = Mask[I];
+          }
+        }
+
+        unsigned MinVF = std::min(VF, VecVF);
+        // Check if our mask is a a padded identity mask with non poision
+        // intermediaries. This implies that our resize shuffle is just a
+        // contiguous insertsubvector
+        for (auto [Position, Mask] : enumerate(ResizeMask)) {
+          if (Position >= MinVF)
+            break;
+
+          if ((int)Position != Mask) {
+            NeedsResizeExtract = true;
+            break;
+          }
+        }
+      }
+
+      return std::make_pair(TE, NeedsResizeExtract);
     }
     return std::make_pair(TE, false);
   };
diff --git a/llvm/test/Transforms/SLPVectorizer/AMDGPU/external-shuffle.ll b/llvm/test/Transforms/SLPVectorizer/AMDGPU/external-shuffle.ll
new file mode 100644
index 0000000000000..1886606ecb12d
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/AMDGPU/external-shuffle.ll
@@ -0,0 +1,222 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=slp-vectorizer,dce < %s | FileCheck -check-prefixes=GCN %s
+
+define protected  void @phi_4(ptr addrspace(3) %inptr0, ptr addrspace(3) %inptr1, ptr %out, ptr %out1, ptr %out2, i32 %flag) {
+; GCN-LABEL: define protected void @phi_4(
+; GCN-SAME: ptr addrspace(3) [[INPTR0:%.*]], ptr addrspace(3) [[INPTR1:%.*]], ptr [[OUT:%.*]], ptr [[OUT1:%.*]], ptr [[OUT2:%.*]], i32 [[FLAG:%.*]]) #[[ATTR0:[0-9]+]] {
+; GCN-NEXT:  [[ENTRY:.*]]:
+; GCN-NEXT:    [[GEP0:%.*]] = getelementptr i16, ptr addrspace(3) [[INPTR0]], i32 0
+; GCN-NEXT:    [[TMP0:%.*]] = load <2 x i16>, ptr addrspace(3) [[GEP0]], align 8
+; GCN-NEXT:    [[GEP2:%.*]] = getelementptr i16, ptr addrspace(3) [[INPTR0]], i32 2
+; GCN-NEXT:    [[TMP1:%.*]] = load <2 x i16>, ptr addrspace(3) [[GEP2]], align 2
+; GCN-NEXT:    [[GEP4:%.*]] = getelementptr i16, ptr addrspace(3) [[INPTR0]], i32 4
+; GCN-NEXT:    [[TMP2:%.*]] = load <2 x i16>, ptr addrspace(3) [[GEP4]], align 8
+; GCN-NEXT:    [[GEP6:%.*]] = getelementptr i16, ptr addrspace(3) [[INPTR0]], i32 6
+; GCN-NEXT:    [[TMP3:%.*]] = load <2 x i16>, ptr addrspace(3) [[GEP6]], align 2
+; GCN-NEXT:    [[GEP8:%.*]] = getelementptr i16, ptr addrspace(3) [[INPTR0]], i32 8
+; GCN-NEXT:    [[TMP4:%.*]] = load <2 x i16>, ptr addrspace(3) [[GEP8]], align 8
+; GCN-NEXT:    [[GEP10:%.*]] = getelementptr i16, ptr addrspace(3) [[INPTR0]], i32 10
+; GCN-NEXT:    [[TMP5:%.*]] = load <2 x i16>, ptr addrspace(3) [[GEP10]], align 2
+; GCN-NEXT:    [[GEP12:%.*]] = getelementptr i16, ptr addrspace(3) [[INPTR0]], i32 12
+; GCN-NEXT:    [[TMP6:%.*]] = load <2 x i16>, ptr addrspace(3) [[GEP12]], align 8
+; GCN-NEXT:    [[GEP14:%.*]] = getelementptr i16, ptr addrspace(3) [[INPTR0]], i32 14
+; GCN-NEXT:    [[TMP7:%.*]] = load <2 x i16>, ptr addrspace(3) [[GEP14]], align 2
+; GCN-NEXT:    br label %[[DO_BODY:.*]]
+; GCN:       [[DO_BODY]]:
+; GCN-NEXT:    [[TMP8:%.*]] = phi <2 x i16> [ [[TMP0]], %[[ENTRY]] ], [ [[TMP16:%.*]], %[[DO_BODY]] ]
+; GCN-NEXT:    [[TMP9:%.*]] = phi <2 x i16> [ [[TMP1]], %[[ENTRY]] ], [ [[TMP17:%.*]], %[[DO_BODY]] ]
+; GCN-NEXT:    [[TMP10:%.*]] = phi <2 x i16> [ [[TMP2]], %[[ENTRY]] ], [ [[TMP18:%.*]], %[[DO_BODY]] ]
+; GCN-NEXT:    [[TMP11:%.*]] = phi <2 x i16> [ [[TMP3]], %[[ENTRY]] ], [ [[TMP19:%.*]], %[[DO_BODY]] ]
+; GCN-NEXT:    [[TMP12:%.*]] = phi <2 x i16> [ [[TMP4]], %[[ENTRY]] ], [ [[TMP20:%.*]], %[[DO_BODY]] ]
+; GCN-NEXT:    [[TMP13:%.*]] = phi <2 x i16> [ [[TMP5]], %[[ENTRY]] ], [ [[TMP21:%.*]], %[[DO_BODY]] ]
+; GCN-NEXT:    [[TMP14:%.*]] = phi <2 x i16> [ [[TMP6]], %[[ENTRY]] ], [ [[TMP22:%.*]], %[[DO_BODY]] ]
+; GCN-NEXT:    [[TMP42:%.*]] = phi <2 x i16> [ [[TMP7]], %[[ENTRY]] ], [ [[TMP23:%.*]], %[[DO_BODY]] ]
+; GCN-NEXT:    [[TMP16]] = load <2 x i16>, ptr addrspace(3) [[GEP0]], align 8
+; GCN-NEXT:    [[TMP17]] = load <2 x i16>, ptr addrspace(3) [[GEP2]], align 2
+; GCN-NEXT:    [[TMP18]] = load <2 x i16>, ptr addrspace(3) [[GEP4]], align 8
+; GCN-NEXT:    [[TMP19]] = load <2 x i16>, ptr addrspace(3) [[GEP6]], align 2
+; GCN-NEXT:    [[TMP20]] = load <2 x i16>, ptr addrspace(3) [[GEP8]], align 8
+; GCN-NEXT:    [[TMP21]] = load <2 x i16>, ptr addrspace(3) [[GEP10]], align 2
+; GCN-NEXT:    [[TMP22]] = load <2 x i16>, ptr addrspace(3) [[GEP12]], align 8
+; GCN-NEXT:    [[TMP23]] = load <2 x i16>, ptr addrspace(3) [[GEP14]], align 2
+; GCN-NEXT:    [[CMP:%.*]] = icmp eq i32 [[FLAG]], 0
+; GCN-NEXT:    br i1 [[CMP]], label %[[EXIT:.*]], label %[[DO_BODY]]
+; GCN:       [[EXIT]]:
+; GCN-NEXT:    [[TMP24:%.*]] = shufflevector <2 x i16> [[TMP16]], <2 x i16> [[TMP17]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP25:%.*]] = shufflevector <2 x i16> [[TMP18]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP26:%.*]] = shufflevector <16 x i16> [[TMP24]], <16 x i16> [[TMP25]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP27:%.*]] = shufflevector <2 x i16> [[TMP19]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP28:%.*]] = shufflevector <16 x i16> [[TMP26]], <16 x i16> [[TMP27]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 16, i32 17, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP29:%.*]] = shufflevector <2 x i16> [[TMP20]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP30:%.*]] = shufflevector <16 x i16> [[TMP28]], <16 x i16> [[TMP29]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP31:%.*]] = shufflevector <2 x i16> [[TMP21]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP32:%.*]] = shufflevector <16 x i16> [[TMP30]], <16 x i16> [[TMP31]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 16, i32 17, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP33:%.*]] = shufflevector <2 x i16> [[TMP22]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP47:%.*]] = shufflevector <16 x i16> [[TMP32]], <16 x i16> [[TMP33]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP48:%.*]] = shufflevector <2 x i16> [[TMP23]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP49:%.*]] = shufflevector <16 x i16> [[TMP47]], <16 x i16> [[TMP48]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 16, i32 17>
+; GCN-NEXT:    [[TMP37:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> [[TMP1]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP38:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP39:%.*]] = shufflevector <16 x i16> [[TMP37]], <16 x i16> [[TMP38]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP40:%.*]] = shufflevector <2 x i16> [[TMP3]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP41:%.*]] = shufflevector <16 x i16> [[TMP39]], <16 x i16> [[TMP40]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 16, i32 17, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP57:%.*]] = shufflevector <2 x i16> [[TMP4]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP43:%.*]] = shufflevector <16 x i16> [[TMP41]], <16 x i16> [[TMP57]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP44:%.*]] = shufflevector <2 x i16> [[TMP5]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP45:%.*]] = shufflevector <16 x i16> [[TMP43]], <16 x i16> [[TMP44]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 16, i32 17, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP46:%.*]] = shufflevector <2 x i16> [[TMP6]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP58:%.*]] = shufflevector <16 x i16> [[TMP45]], <16 x i16> [[TMP46]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP60:%.*]] = shufflevector <2 x i16> [[TMP7]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[VEC2157:%.*]] = shufflevector <16 x i16> [[TMP58]], <16 x i16> [[TMP60]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 16, i32 17>
+; GCN-NEXT:    [[TMP50:%.*]] = shufflevector <2 x i16> [[TMP8]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[TMP51:%.*]] = shufflevector <2 x i16> [[TMP9]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[VEC231:%.*]] = shufflevector <16 x i16> [[TMP50]], <16 x i16> [[TMP51]], <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; GCN-NEXT:    [[TMP52:%.*]] = shufflevector <2 x i16> [[TMP10]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[VEC252:%.*]] = shufflevector <16 x i16> [[VEC231]], <16 x i16> [[TMP52]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; GCN-NEXT:    [[TMP53:%.*]] = shufflevector <2 x i16> [[TMP11]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[VEC273:%.*]] = shufflevector <16 x i16> [[VEC252]], <16 x i16> [[TMP53]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 16, i32 17, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; GCN-NEXT:    [[TMP54:%.*]] = shufflevector <2 x i16> [[TMP12]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[VEC294:%.*]] = shufflevector <16 x i16> [[VEC273]], <16 x i16> [[TMP54]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; GCN-NEXT:    [[TMP55:%.*]] = shufflevector <2 x i16> [[TMP13]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[VEC2115:%.*]] = shufflevector <16 x i16> [[VEC294]], <16 x i16> [[TMP55]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 16, i32 17, i32 12, i32 13, i32 14, i32 15>
+; GCN-NEXT:    [[TMP56:%.*]] = shufflevector <2 x i16> [[TMP14]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[VEC2136:%.*]] = shufflevector <16 x i16> [[VEC2115]], <16 x i16> [[TMP56]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 14, i32 15>
+; GCN-NEXT:    [[TMP59:%.*]] = shufflevector <2 x i16> [[TMP42]], <2 x i16> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; GCN-NEXT:    [[VEC2151:%.*]] = shufflevector <16 x i16> [[VEC2136]], <16 x i16> [[TMP59]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 16, i32 17>
+; GCN-NEXT:    store <16 x i16> [[VEC2157]], ptr [[OUT]], align 32
+; GCN-NEXT:    store <16 x i16> [[TMP49]], ptr [[OUT1]], align 32
+; GCN-NEXT:    store <16 x i16> [[VEC2151]], ptr [[OUT2]], align 32
+; GCN-NEXT:    ret void
+;
+entry:
+  %gep0 = getelementptr i16, ptr addrspace(3) %inptr0, i32 0
+  %ele0 = load i16, ptr addrspace(3) %gep0, align 8
+  %gep1 = getelementptr i16, ptr addrspace(3) %inptr0, i32 1
+  %ele1 = load i16, ptr addrspace(3) %gep1, align 1
+  %gep2 = getelementptr i16, ptr addrspace(3) %inptr0, i32 2
+  %ele2 = load i16, ptr addrspace(3) %gep2, align 2
+  %gep3 = getelementptr i16, ptr addrspace(3) %inptr0, i32 3
+  %ele3 = load i16, ptr addrspace(3) %gep3, align 1
+  %gep4 = getelementptr i16, ptr addrspace(3) %inptr0, i32 4
+  %ele4 = load i16, ptr addrspace(3) %gep4, align 8
+  %gep5 = getelementptr i16, ptr addrspace(3) %inptr0, i32 5
+  %ele5 = load i16, ptr addrspace(3) %gep5, align 1
+  %gep6 = getelementptr i16, ptr addrspace(3) %inptr0, i32 6
+  %ele6 = load i16, ptr addrspace(3) %gep6, align 2
+  %gep7 = getelementptr i16, ptr addrspace(3) %inptr0, i32 7
+  %ele7 = load i16, ptr addrspace(3) %gep7, align 1
+  %gep8 = getelementptr i16, ptr addrspace(3) %inptr0, i32 8
+  %ele8 = load i16, ptr addrspace(3) %gep8, align 8
+  %gep9 = getelementptr i16, ptr addrspace(3) %inptr0, i32 9
+  %ele9 = load i16, ptr addrspace(3) %gep9, align 1
+  %gep10 = getelementptr i16, ptr addrspace(3) %inptr0, i32 10
+  %ele10 = load i16, ptr addrspace(3) %gep10, align 2
+  %gep11 = getelementptr i16, ptr addrspace(3) %inptr0, i32 11
+  %ele11 = load i16, ptr addrspace(3) %gep11, align 1
+  %gep12 = getelementptr i16, ptr addrspace(3) %inptr0, i32 12
+  %ele12 = load i16, ptr addrspace(3) %gep12, align 8
+  %gep13 = getelementptr i16, ptr addrspace(3) %inptr0, i32 13
+  %ele13 = load i16, ptr addrspace(3) %gep13, align 1
+  %gep14 = getelementptr i16, ptr addrspace(3) %inptr0, i32 14
+  %ele14 = load i16, ptr addrspace(3) %gep14, align 2
+  %gep15 = getelementptr i16, ptr addrspace(3) %inptr0, i32 15
+  %ele15 = load i16, ptr addrspace(3) %gep15, align 1
+  br label %do.body
+
+do.body:
+  %phi0 = phi i16 [ %ele0, %entry ], [ %otherele0, %do.body ]
+  %phi1 = phi i16 [ %ele1, %entry ], [ %otherele1, %do.body ]
+  %phi2 = phi i16 [ %ele2, %entry ], [ %otherele2, %do.body ]
+  %phi3 = phi i16 [ %ele3, %entry ], [ %otherele3, %do.body ]
+  %phi4 = phi i16 [ %ele4, %entry ], [ %otherele4, %do.body ]
+  %phi5 = phi i16 [ %ele5, %entry ], [ %otherele5, %do.body ]
+  %phi6 = phi i16 [ %ele6, %entry ], [ %otherele6, %do.body ]
+  %phi7 = phi i16 [ %ele7, %entry ], [ %otherele7, %do.body ]
+  %phi8 = phi i16 [ %ele8, %entry ], [ %otherele8, %do.body ]
+  %phi9 = phi i16 [ %ele9, %entry ], [ %otherele9, %do.body ]
+  %phi10 = phi i16 [ %ele10, %entry ], [ %otherele10, %do.body ]
+  %phi11 = phi i16 [ %ele11, %entry ], [ %otherele11, %do.body ]
+  %phi12 = phi i16 [ %ele12, %entry ], [ %otherele12, %do.body ]
+  %phi13 = phi i16 [ %ele13, %entry ], [ %otherele13, %do.body ]
+  %phi14 = phi i16 [ %ele14, %entry ], [ %otherele14, %do.body ]
+  %phi15 = phi i16 [ %ele15, %entry ], [ %otherele15, %do.body ]
+
+  %otherele0 = load i16, ptr addrspace(3) %gep0, align 8
+  %otherele1 = load i16, ptr addrspace(3) %gep1, align 1
+  %otherele2 = load i16, ptr addrspace(3) %gep2, align 2
+  %otherele3 = load i16, ptr addrspace(3) %gep3, align 1
+  %otherele4 = load i16, ptr addrspace(3) %gep4, align 8
+  %otherele5 = load i16, ptr addrspace(3) %gep5, align 1
+  %otherele6 = load i16, ptr addrspace(3) %gep6, align 2
+  %otherele7 = load i16, ptr addrspace(3) %gep7, align 1
+  %otherele8 = load i16, ptr addrspace(3) %gep8, align 8
+  %otherele9 = load i16, ptr addrspace(3) %gep9, align 1
+  %otherele10 = load i16, ptr ...
[truncated]

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

alexey-bataev · 2025-04-27T13:46:30Z

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

@@ -14802,7 +14803,39 @@ InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals,
                 << " for final shuffle of insertelement external users.\n";
          TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n");
      Cost += C;
-      return std::make_pair(TE, true);
+


Thanks for the patch! What about something like this, which is more close to the codegen:

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp index afba099f5154..c269a1efa415 100644 --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -14757,25 +14757,43 @@ InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals, Cost += ExtractCost; auto &&ResizeToVF = [this, &Cost](const TreeEntry *TE, ArrayRef<int> Mask, - bool) { + bool ForSingleMask) { InstructionCost C = 0; unsigned VF = Mask.size(); unsigned VecVF = TE->getVectorFactor(); if (VF != VecVF && (any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); }) || !ShuffleVectorInst::isIdentityMask(Mask, VF))) { - SmallVector<int> OrigMask(VecVF, PoisonMaskElem); - std::copy(Mask.begin(), std::next(Mask.begin(), std::min(VF, VecVF)), - OrigMask.begin()); - C = ::getShuffleCost(*TTI, TTI::SK_PermuteSingleSrc, - getWidenedType(TE->getMainOp()->getType(), VecVF), - OrigMask); - LLVM_DEBUG( - dbgs() << "SLP: Adding cost " << C - << " for final shuffle of insertelement external users.\n"; - TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n"); - Cost += C; - return std::make_pair(TE, true); + if (any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); })) { + SmallVector<int> OrigMask(VecVF, PoisonMaskElem); + std::copy(Mask.begin(), std::next(Mask.begin(), std::min(VF, VecVF)), + OrigMask.begin()); + C = ::getShuffleCost(*TTI, TTI::SK_PermuteSingleSrc, + getWidenedType(TE->getMainOp()->getType(), VecVF), + OrigMask); + LLVM_DEBUG( + dbgs() << "SLP: Adding cost " << C + << " for final shuffle of insertelement external users.\n"; + TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n"); + Cost += C; + return std::make_pair(TE, true); + } + if (!ForSingleMask) { + SmallVector<int> ResizeMask(VF, PoisonMaskElem); + for (unsigned I = 0; I < VF; ++I) { + if (Mask[I] != PoisonMaskElem) + ResizeMask[Mask[I]] = Mask[I]; + } + if (!ShuffleVectorInst::isIdentityMask(Mask, VF)) + C = ::getShuffleCost( + *TTI, TTI::SK_PermuteSingleSrc, + getWidenedType(TE->getMainOp()->getType(), VecVF), ResizeMask); + LLVM_DEBUG( + dbgs() << "SLP: Adding cost " << C + << " for final shuffle of insertelement external users.\n"; + TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n"); + Cost += C; + } } return std::make_pair(TE, false); };

Thanks -- I'll try to adopt it, but it's causing some concerning looking diffs in the lit tests that I need to work through.

@alexey-bataev Sorry for the delay, I've updated the PR to include your suggestion. It causes some changes in the lit tests that don't seem to be beneficial. I'm looking into them, but thought I'd post them as well to see if you had any insight.

alexey-bataev · 2025-04-27T13:46:50Z

llvm/test/Transforms/SLPVectorizer/AMDGPU/external-shuffle.ll

@@ -0,0 +1,222 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5


The test should be precommitted in a separate NFC patch

llvm/test/Transforms/SLPVectorizer/X86/insertelements-with-reused-indices.ll

llvm/test/Transforms/SLPVectorizer/X86/buildvector-shuffle.ll

jrbyrnes · 2025-06-11T16:39:15Z

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll

+; CHECK-NEXT:    [[TMP10:%.*]] = add nsw <4 x i32> [[TMP5]], [[TMP9]]
+; CHECK-NEXT:    [[TMP11:%.*]] = mul nsw <4 x i32> [[TMP5]], [[TMP9]]
+; CHECK-NEXT:    [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
+; CHECK-NEXT:    [[T701:%.*]] = shufflevector <4 x i32> [[TMP12]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 poison, i32 3>


The effect this PR has had on this test I think is fine.

We have VF of 4 to the insertelement user of width 8 (our vectorizable portion starts at insertelement %t65).

The external use mask for our vectorized tree is <-1, -1, -1, -1, 0, 1, -1, 3>

Previously, we used a resize mask of <-1, -1, -1, -1> to calculate the resize cost and a mask of <0, 1, 2, 3, 12, 13, -1, 15> to calculate the external shuffle cost. These shuffles both have costs of 1, this makes it so that we do not vectorize. It seems our previous calculations are based on the assumption that we will prepad our vector with poison to resize it, however, that isn't how the resize + shuffle is actually implemented.

In actuality, we resize using mask <0, 1, -1, 3, -1, -1, -1, -1> which is just identity with poison. Then, we shuffle using <0, 1, 2, 3, 0, 1, -1, 3>. In the new version, we are much more accurate, and use <0, 1, -1, 3, -1, -1, -1, -1> to caculate the resize cost (identity with poison) and <0, 1, 2, 3, 8, 9, -1, 11> to calculate the shuffle cost. These have costs 0 and 1 respectively which keeps up within the cost threshold, so we vectorize.

There is difference in the generated assembly before / after.

jrbyrnes · 2025-06-11T19:00:10Z

llvm/test/Transforms/SLPVectorizer/X86/extractelement-single-use-many-nodes.ll

 ; CHECK-NEXT:    [[I118:%.*]] = fadd double [[TMP19]], [[TMP20]]
 ; CHECK-NEXT:    [[TMP21:%.*]] = fmul <4 x double> zeroinitializer, [[TMP1]]
-; CHECK-NEXT:    [[TMP22:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
-; CHECK-NEXT:    [[TMP23:%.*]] = shufflevector <4 x double> <double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double poison>, <4 x double> [[TMP22]], <4 x i32> <i32 0, i32 1, i32 2, i32 5>
+; CHECK-NEXT:    [[TMP23:%.*]] = insertelement <4 x double> <double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double poison>, double [[I82]], i32 3


The effect this PR has had on this test also looks good to me.

The effect is less immediately obvious on this test compared to the last, because we are evaluating vectorizing a list after already vectorizing several others. In the partially vectorized version, we have some insertelements that are not present in the original IR. These insertelements are external users for our tree.

We have a VF of 2 and are inserting the 2nd element into 4th position of external insertelement vector (external user width is 4). The vector we are inserting into is a literal 0.0 vector, and is the other operand in any external shuffles.

The external use mask for our vectorized tree is <-1, -1, -1, 1>

Previously, we used a resize mask of <-1, -1> to calculate the resize cost and a mask of <0, 1, 2, 7> to calculate the external shuffle cost. These shuffles both have costs of 1 so we vectorize. However, this isn't how the SLP codegen works -- we don't shuffle during the resize (though that may be more optimal in this case). In actuality, we resize using mask <0, 1, -1 -1> which is just identity with poison. Then, we shuffle using <0, 1, 2, 5>. In the new version, we are much more accurate, and use <-1, 1, -1 -1> to calculate the resize cost (identity with poison), and <0, 1, 2, 5> to calculate the shuffle cost. These have costs 0 and 3 respectively, and push us over the cost threshold, so we do not vectorize this part.

There is difference in the generated assembly before / after.

jrbyrnes · 2025-06-11T19:00:44Z

I took a look at the lit changes and my conclusion is that they are reflecting our cost model being more accurate -- so I am happy with the current changes.

Precommit for #137419

Precommit for llvm/llvm-project#137419

Change-Id: I7620d2bd3d65be994bd290b84267832fdb4f1bb4

Change-Id: Ie1209bb7f6c49992d41e3fec8a195a81d04f34d4

Change-Id: I9c4865641974152f6289df79df4aa057cdcdb8ed

jrbyrnes · 2025-06-12T15:14:04Z

Force push for test precommit

jrbyrnes · 2025-06-13T21:38:26Z

Add @RKSimon for any comment on the x86 test changes. I think they're fine, but might be good to double check.

Precommit for llvm#137419

…es (llvm#137419) When implementing the vectorization, we potentially need to add shuffles for external users. In such cases, we may be shuffling a smaller vector into a larger vector. When this happens `ResizeToVF` will just build a poison padded identity vector. Then the to build the final shuffle, we just use the `SK_InsertSubvector` mask. This is possibly clearer by looking at the included test in SLPVectorizer/AMDGPU/external-shuffle.ll In the exit block we have a bunch of shuffles to glue the vectorized tree match the `InsertElement` users. `TMP25` holds the result of resizing the v2i16 vectorized sequence to match the `InsertElement` size v16i16. Then `TMP26` is the final shuffle which replaces the `InsertElement` sequence. This is just an insertsubvector. However, when calculating the cost for these shuffles, we aren't modelling this correctly. `ResizeToVF` will indicate to `performExtractsShuffleAction` that we cannot use the original mask due to the resize shuffle. The consequence is that the cost calculation uses a different shuffle mask than what is ultimately used. Going back to the included test, we can consider again `TMP26`. Clearly we can see the shuffle uses a mask {0, 1, 2, 3, 16, 17, poison ..}. However, we will currently calculate the cost with a mask {0, 1, 2, 3, 20, 21, ...} we have replaced 16 and 17 with 20 and 21 (Index + Vector Size). Queries like BasicTTImpl::improveShuffleKindFromMask will not recognize this as an `SK_InsertSubvector` mask, and targets which have reduced costs for `SK_InsertSubvector` will not accurately calculate the cost.

RKSimon

x86 LGTM (sorry for late reply)

jrbyrnes · 2025-06-17T17:27:30Z

x86 LGTM (sorry for late reply)

Thanks for taking a look!

jrbyrnes requested review from arsenm and alexey-bataev April 25, 2025 23:42

llvmbot added backend:AMDGPU vectorizers llvm:transforms labels Apr 25, 2025

alexey-bataev reviewed Apr 27, 2025

View reviewed changes

jrbyrnes mentioned this pull request Apr 28, 2025

[SLP] NFC: Precommit test for pull/137419 #137730

Merged

jrbyrnes force-pushed the ExtractShuffle branch from d80c979 to e2fa409 Compare June 11, 2025 00:33

alexey-bataev reviewed Jun 11, 2025

View reviewed changes

llvm/test/Transforms/SLPVectorizer/X86/insertelements-with-reused-indices.ll Outdated Show resolved Hide resolved

llvm/test/Transforms/SLPVectorizer/X86/buildvector-shuffle.ll Outdated Show resolved Hide resolved

jrbyrnes commented Jun 11, 2025

View reviewed changes

jrbyrnes added a commit that referenced this pull request Jun 12, 2025

[SLP] NFC: Precommit test for pull/137419 (#137730)

cc17f68

Precommit for #137419

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 12, 2025

Automerge: [SLP] NFC: Precommit test for pull/137419 (#137730)

8d1c61c

Precommit for llvm/llvm-project#137419

jrbyrnes added 3 commits June 12, 2025 07:57

[SLPVectorizer] Use accurate cost for external users of resize shuffles

8bbd7f9

Change-Id: I7620d2bd3d65be994bd290b84267832fdb4f1bb4

Use correct mask

c9241b0

Change-Id: Ie1209bb7f6c49992d41e3fec8a195a81d04f34d4

Revert unchanged tests

2fd8979

Change-Id: I9c4865641974152f6289df79df4aa057cdcdb8ed

jrbyrnes force-pushed the ExtractShuffle branch from 9dcc5ea to 2fd8979 Compare June 12, 2025 15:08

jrbyrnes requested a review from RKSimon June 13, 2025 21:37

tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025

[SLP] NFC: Precommit test for pull/137419 (llvm#137730)

533349a

Precommit for llvm#137419

alexey-bataev approved these changes Jun 17, 2025

View reviewed changes

jrbyrnes merged commit c9a87a5 into llvm:main Jun 17, 2025
7 checks passed

RKSimon reviewed Jun 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SLPVectorizer] Use accurate cost for external users of resize shuffles #137419

[SLPVectorizer] Use accurate cost for external users of resize shuffles #137419

jrbyrnes commented Apr 25, 2025

Uh oh!

llvmbot commented Apr 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

alexey-bataev Apr 27, 2025

Uh oh!

jrbyrnes Apr 29, 2025

Uh oh!

jrbyrnes Jun 11, 2025

Uh oh!

alexey-bataev Apr 27, 2025

Uh oh!

Uh oh!

Uh oh!

jrbyrnes Jun 11, 2025 •

edited

Loading

Uh oh!

jrbyrnes Jun 11, 2025

Uh oh!

jrbyrnes commented Jun 11, 2025 •

edited

Loading

Uh oh!

jrbyrnes commented Jun 12, 2025

Uh oh!

jrbyrnes commented Jun 13, 2025

Uh oh!

Uh oh!

RKSimon left a comment

Uh oh!

jrbyrnes commented Jun 17, 2025

Uh oh!

Uh oh!

		@@ -0,0 +1,222 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5

[SLPVectorizer] Use accurate cost for external users of resize shuffles #137419

[SLPVectorizer] Use accurate cost for external users of resize shuffles #137419

Conversation

jrbyrnes commented Apr 25, 2025

Uh oh!

llvmbot commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-bataev Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

jrbyrnes Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

jrbyrnes Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

alexey-bataev Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jrbyrnes Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrbyrnes Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

jrbyrnes commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrbyrnes commented Jun 12, 2025

Uh oh!

jrbyrnes commented Jun 13, 2025

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

jrbyrnes commented Jun 17, 2025

Uh oh!

Uh oh!

llvmbot commented Apr 25, 2025 •

edited

Loading

jrbyrnes Jun 11, 2025 •

edited

Loading

jrbyrnes commented Jun 11, 2025 •

edited

Loading