[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638

lukel97 · 2024-12-04T13:51:07Z

The motivation of this PR is to make #115274 easier to implement, and should allow us to add EVL support by just passing EVL to the VF operand.

The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body.

The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example:

  vector.ph:
  Successor(s): vector loop

  <x1> vector loop: {
    vector.body:
      WIDEN-INDUCTION %i = phi %start, %step, %vf
      ...
      EMIT branch-on-count ...
    No successors
  }

gets expanded to:

vector.ph:
  ...
  vp<%induction.start> = ...
  vp<%induction.increment> = ...

Successor(s): vector loop

<x1> vector loop: {
  vector.body:
    ir<%i> = WIDEN-PHI vp<%induction.start>, vp<%vec.ind.next>
    ...
    vp<%vec.ind.next> = add ir<%i>, vp<%induction.increment>
    EMIT branch-on-count ...
  No successors
}

This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves.

After this #115274 should just become a matter of setting the VF operand to EVL (and building the increment step in the loop body, not the preheader).

llvmbot · 2024-12-04T13:51:46Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

The motivation of this PR is to make #115274 easier to implement. It's similar to the idea in #82021 (but admittedly I didn't notice it until I had already written this!), and should allow us to add EVL support by just passing EVL to the VF operand.

The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body.

The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example:

  Live-in ir&lt;%5&gt; = VF

  vector.ph:
  Successor(s): vector loop

  &lt;x1&gt; vector loop: {
    vector.body:
      WIDEN-INDUCTION %i = phi %bc.resume.val, %i.next, ir&lt;1&gt;, ir&lt;%5&gt;
      ...
      EMIT branch-on-count vp&lt;%index.next&gt;, ir&lt;%n.vec&gt;
    No successors
  }

gets expanded to:

VPlan 'Final VPlan for VF={vscale x 1,vscale x 2},UF={1}' {
  Live-in ir&lt;%5&gt; = VF

  vector.ph:
    vp&lt;%0&gt; = WIDEN-INDUCTION-START ir&lt;0&gt;, ir&lt;1&gt;
  Successor(s): vector loop

  &lt;x1&gt; vector loop: {
    vector.body:
      ir&lt;%i&gt; = WIDEN-INDUCTION-PHI vp&lt;%0&gt;, vp&lt;%4&gt;
      ...
      vp&lt;%4&gt; = WIDEN-INDUCTION-INC ir&lt;1&gt;, ir&lt;%5&gt;, ir&lt;%i&gt;
      EMIT branch-on-count vp&lt;%index.next&gt;, ir&lt;%n.vec&gt;
    No successors
  }

This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves.

I initially tried just splitting up VPWidenIntOrFpInductionRecipe immediately in createWidenInductionRecipes, but as pointed out in #114305 (comment) it turns out to be a total pain trying to detect these in other places like removeRedundantInductionCasts and unrollHeaderPHIByUF. In a way it's kind of like a pseudo-instruction at the MachineInstr level.

I also tried to avoid the start and increment recipes by expanding them directly into VPInstructions/VPWidenIntrinsicRecipe/VPScalarCastRecipes, but I ran into some difficulties when trying to broadcast a scalar type to use in a widened binary op. (I'm happy to give this another try though!)

I hoped to make this an NFC, but unfortunately some of the splatted values get shuffled about in the preheaders.

I may be duplicating other work here, and this was just the solution I stumbled upon. Open to any other ideas or approaches!

Patch is 84.38 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118638.diff

19 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+6-16)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+151-6)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+11-8)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+67-43)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+79)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/mask-index-type.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr87378-vpinstruction-or-drop-poison-generating-flags.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+13-13)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cond-reduction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-any-of-reductions.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+35-35)
(modified) llvm/test/Transforms/LoopVectorize/induction.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+18-18)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index b801d1863e252c..4e4c4dfd461824 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1043,18 +1043,12 @@ void VPlan::execute(VPTransformState *State) {
     if (isa<VPWidenPHIRecipe>(&R))
       continue;
 
-    if (isa<VPWidenPointerInductionRecipe>(&R) ||
-        isa<VPWidenIntOrFpInductionRecipe>(&R)) {
-      PHINode *Phi = nullptr;
-      if (isa<VPWidenIntOrFpInductionRecipe>(&R)) {
-        Phi = cast<PHINode>(State->get(R.getVPSingleValue()));
-      } else {
-        auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
-        assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) &&
-               "recipe generating only scalars should have been replaced");
-        auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi));
-        Phi = cast<PHINode>(GEP->getPointerOperand());
-      }
+    if (isa<VPWidenPointerInductionRecipe>(&R)) {
+      auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
+      assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) &&
+             "recipe generating only scalars should have been replaced");
+      auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi));
+      PHINode *Phi = cast<PHINode>(GEP->getPointerOperand());
 
       Phi->setIncomingBlock(1, VectorLatchBB);
 
@@ -1062,10 +1056,6 @@ void VPlan::execute(VPTransformState *State) {
       // consistent placement of all induction updates.
       Instruction *Inc = cast<Instruction>(Phi->getIncomingValue(1));
       Inc->moveBefore(VectorLatchBB->getTerminator()->getPrevNode());
-
-      // Use the steps for the last part as backedge value for the induction.
-      if (auto *IV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R))
-        Inc->setOperand(0, State->get(IV->getLastUnrolledPartOperand()));
       continue;
     }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e1d828f038f9a2..513e973af8fc1a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2089,7 +2089,9 @@ class VPHeaderPHIRecipe : public VPSingleDefRecipe {
 };
 
 /// A recipe for handling phi nodes of integer and floating-point inductions,
-/// producing their vector values.
+/// producing their vector values. This won't execute any LLVM IR and will get
+/// expanded later into VPWidenIntOrFpInitialRecipe, VPWidenIntOrFpPHIRecipe and
+/// VPWidenIntOrFpBackedgeRecipe.
 class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
   PHINode *IV;
   TruncInst *Trunc;
@@ -2122,9 +2124,10 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
 
   VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
 
-  /// Generate the vectorized and scalarized versions of the phi node as
-  /// needed by their users.
-  void execute(VPTransformState &State) override;
+  void execute(VPTransformState &State) override {
+    llvm_unreachable("cannot execute this recipe, should be expanded via "
+                     "expandVPWidenIntOrFpInductionRecipe");
+  }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   /// Print the recipe.
@@ -2180,10 +2183,152 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
   }
 
   /// Returns the VPValue representing the value of this induction at
-  /// the last unrolled part, if it exists. Returns itself if unrolling did not
+  /// the last unrolled part, if it exists. Returns nullptr if unrolling did not
   /// take place.
   VPValue *getLastUnrolledPartOperand() {
-    return getNumOperands() == 5 ? getOperand(4) : this;
+    return getNumOperands() == 5 ? getOperand(4) : nullptr;
+  }
+};
+
+/// A recipe to compute the initial value for a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionInitialRecipe : public VPSingleDefRecipe {
+  Instruction *IV;
+  const InductionDescriptor &ID;
+
+public:
+  VPWidenIntOrFpInductionInitialRecipe(Instruction *IV, VPValue *Start,
+                                       VPValue *Step,
+                                       const InductionDescriptor &ID)
+      : VPSingleDefRecipe(VPDef::VPWidenIntOrFpInductionStartSC, {Start, Step}),
+        IV(IV), ID(ID) {
+    assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+           "Expected either an induction phi-node or a truncate of it!");
+  }
+
+  ~VPWidenIntOrFpInductionInitialRecipe() override = default;
+
+  VPWidenIntOrFpInductionInitialRecipe *clone() override {
+    return new VPWidenIntOrFpInductionInitialRecipe(IV, getOperand(0),
+                                                    getOperand(1), ID);
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionStartSC)
+
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+
+  VPValue *getStartValue() { return getOperand(0); }
+  const VPValue *getStartValue() const { return getOperand(0); }
+
+  VPValue *getStepValue() { return getOperand(1); }
+  const VPValue *getStepValue() const { return getOperand(1); }
+
+  /// Returns the scalar type of the induction.
+  Type *getScalarType() const { return IV->getType(); }
+
+  bool onlyFirstLaneUsed(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    return true;
+  }
+};
+
+/// A recipe to generate the PHI of a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionPHIRecipe : public VPHeaderPHIRecipe {
+  Instruction *IV;
+
+public:
+  VPWidenIntOrFpInductionPHIRecipe(Instruction *IV, VPValue *Start)
+      : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionPHISC, IV, Start),
+        IV(IV) {
+    assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+           "Expected either an induction phi-node or a truncate of it!");
+  }
+
+  ~VPWidenIntOrFpInductionPHIRecipe() override = default;
+
+  VPWidenIntOrFpInductionPHIRecipe *clone() override {
+    auto *R = new VPWidenIntOrFpInductionPHIRecipe(IV, getOperand(0));
+    R->addOperand(getBackedgeValue());
+    return R;
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionPHISC)
+
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+};
+
+/// A recipe to compute the backedge value for a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionBackedgeRecipe : public VPSingleDefRecipe {
+  Instruction *IV;
+  const InductionDescriptor &ID;
+
+public:
+  VPWidenIntOrFpInductionBackedgeRecipe(Instruction *IV, VPValue *Step,
+                                        VPValue *VF, VPValue *Prev,
+                                        VPValue *SplatVF,
+                                        const InductionDescriptor &ID)
+      : VPSingleDefRecipe(VPDef::VPWidenIntOrFpInductionSC, {Step, VF, Prev}),
+        IV(IV), ID(ID) {
+    assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+           "Expected either an induction phi-node or a truncate of it!");
+    if (SplatVF)
+      addOperand(SplatVF);
+  }
+
+  ~VPWidenIntOrFpInductionBackedgeRecipe() override = default;
+
+  VPWidenIntOrFpInductionBackedgeRecipe *clone() override {
+    return new VPWidenIntOrFpInductionBackedgeRecipe(
+        IV, getOperand(0), getOperand(1), getOperand(2), getOperand(3), ID);
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionIncSC)
+
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+
+  VPValue *getStepValue() { return getOperand(0); }
+  const VPValue *getStepValue() const { return getOperand(0); }
+
+  VPValue *getVFValue() { return getOperand(1); }
+  const VPValue *getVFValue() const { return getOperand(1); }
+
+  VPValue *getPrevValue() { return getOperand(2); }
+  const VPValue *getPrevValue() const { return getOperand(2); }
+
+  VPValue *getSplatVFValue() {
+    // If the recipe has been unrolled (4 operands), return the VPValue for the
+    // induction increment.
+    return getNumOperands() == 4 ? getOperand(3) : nullptr;
+  }
+
+  /// Returns the scalar type of the induction.
+  Type *getScalarType() const { return IV->getType(); }
+
+  bool onlyFirstLaneUsed(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    return Op == getOperand(0) || Op == getOperand(1);
   }
 };
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 969d07b229e469..8a9e64b00850e2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -214,14 +214,17 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
           .Case<VPActiveLaneMaskPHIRecipe, VPCanonicalIVPHIRecipe,
                 VPFirstOrderRecurrencePHIRecipe, VPReductionPHIRecipe,
                 VPWidenPointerInductionRecipe, VPEVLBasedIVPHIRecipe,
-                VPScalarPHIRecipe>([this](const auto *R) {
-            // Handle header phi recipes, except VPWidenIntOrFpInduction
-            // which needs special handling due it being possibly truncated.
-            // TODO: consider inferring/caching type of siblings, e.g.,
-            // backedge value, here and in cases below.
-            return inferScalarType(R->getStartValue());
-          })
-          .Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe>(
+                VPScalarPHIRecipe, VPWidenIntOrFpInductionPHIRecipe>(
+              [this](const auto *R) {
+                // Handle header phi recipes, except VPWidenIntOrFpInduction
+                // which needs special handling due it being possibly truncated.
+                // TODO: consider inferring/caching type of siblings, e.g.,
+                // backedge value, here and in cases below.
+                return inferScalarType(R->getStartValue());
+              })
+          .Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe,
+                VPWidenIntOrFpInductionInitialRecipe,
+                VPWidenIntOrFpInductionBackedgeRecipe>(
               [](const auto *R) { return R->getScalarType(); })
           .Case<VPReductionRecipe, VPPredInstPHIRecipe, VPWidenPHIRecipe,
                 VPScalarIVStepsRecipe, VPWidenGEPRecipe, VPVectorPointerRecipe,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index ef5f6e22f82206..8ae1e7382bce38 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1631,47 +1631,73 @@ static Constant *getSignedIntOrFpConstant(Type *Ty, int64_t C) {
                            : ConstantFP::get(Ty, C);
 }
 
-void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
+void VPWidenIntOrFpInductionInitialRecipe::execute(VPTransformState &State) {
   assert(!State.Lane && "Int or FP induction being replicated.");
 
-  Value *Start = getStartValue()->getLiveInIRValue();
-  const InductionDescriptor &ID = getInductionDescriptor();
-  TruncInst *Trunc = getTruncInst();
+  Value *Start = State.get(getStartValue(), true);
   IRBuilderBase &Builder = State.Builder;
-  assert(IV->getType() == ID.getStartValue()->getType() && "Types must match");
   assert(State.VF.isVector() && "must have vector VF");
 
-  // The value from the original loop to which we are mapping the new induction
-  // variable.
-  Instruction *EntryVal = Trunc ? cast<Instruction>(Trunc) : IV;
-
   // Fast-math-flags propagate from the original induction instruction.
   IRBuilder<>::FastMathFlagGuard FMFG(Builder);
-  if (ID.getInductionBinOp() && isa<FPMathOperator>(ID.getInductionBinOp()))
+  if (isa_and_nonnull<FPMathOperator>(ID.getInductionBinOp()))
     Builder.setFastMathFlags(ID.getInductionBinOp()->getFastMathFlags());
 
   // Now do the actual transformations, and start with fetching the step value.
   Value *Step = State.get(getStepValue(), VPLane(0));
 
-  assert((isa<PHINode>(EntryVal) || isa<TruncInst>(EntryVal)) &&
-         "Expected either an induction phi-node or a truncate of it!");
-
-  // Construct the initial value of the vector IV in the vector loop preheader
-  auto CurrIP = Builder.saveIP();
-  BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
-  Builder.SetInsertPoint(VectorPH->getTerminator());
-  if (isa<TruncInst>(EntryVal)) {
-    assert(Start->getType()->isIntegerTy() &&
-           "Truncation requires an integer type");
-    auto *TruncType = cast<IntegerType>(EntryVal->getType());
-    Step = Builder.CreateTrunc(Step, TruncType);
-    Start = Builder.CreateCast(Instruction::Trunc, Start, TruncType);
-  }
-
+  // Construct the initial value of the vector IV
   Value *Zero = getSignedIntOrFpConstant(Start->getType(), 0);
   Value *SplatStart = Builder.CreateVectorSplat(State.VF, Start);
   Value *SteppedStart = getStepVector(
       SplatStart, Zero, Step, ID.getInductionOpcode(), State.VF, State.Builder);
+  State.set(this, SteppedStart);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionInitialRecipe::print(
+    raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
+  O << Indent;
+  printAsOperand(O, SlotTracker);
+  O << " = WIDEN-INDUCTION-START ";
+  printOperands(O, SlotTracker);
+}
+#endif
+
+void VPWidenIntOrFpInductionPHIRecipe::execute(VPTransformState &State) {
+  BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
+
+  Value *Start = State.get(getOperand(0));
+  PHINode *Phi = State.Builder.CreatePHI(Start->getType(), 2, "vec.ind");
+  Phi->addIncoming(Start, VectorPH);
+  Phi->setDebugLoc(IV->getDebugLoc());
+  State.set(this, Phi);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionPHIRecipe::print(raw_ostream &O,
+                                             const Twine &Indent,
+                                             VPSlotTracker &SlotTracker) const {
+  O << Indent;
+  printAsOperand(O, SlotTracker);
+  O << " = WIDEN-INDUCTION-PHI ";
+  printOperands(O, SlotTracker);
+}
+#endif
+
+void VPWidenIntOrFpInductionBackedgeRecipe::execute(VPTransformState &State) {
+  IRBuilderBase &Builder = State.Builder;
+
+  // Fast-math-flags propagate from the original induction instruction.
+  IRBuilder<>::FastMathFlagGuard FMFG(Builder);
+  if (isa_and_nonnull<FPMathOperator>(ID.getInductionBinOp()))
+    Builder.setFastMathFlags(ID.getInductionBinOp()->getFastMathFlags());
+
+  Value *Step = State.get(getStepValue(), VPLane(0));
+
+  auto CurrIP = Builder.saveIP();
+  BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
+  Builder.SetInsertPoint(VectorPH->getTerminator());
 
   // We create vector phi nodes for both integer and floating-point induction
   // variables. Here, we determine the kind of arithmetic we will perform.
@@ -1706,29 +1732,27 @@ void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
   }
 
   Builder.restoreIP(CurrIP);
-
-  // We may need to add the step a number of times, depending on the unroll
-  // factor. The last of those goes into the PHI.
-  PHINode *VecInd = PHINode::Create(SteppedStart->getType(), 2, "vec.ind");
-  VecInd->insertBefore(State.CFG.PrevBB->getFirstInsertionPt());
-  VecInd->setDebugLoc(EntryVal->getDebugLoc());
-  State.set(this, VecInd);
+  Value *PrevVal = State.get(getPrevValue());
 
   Instruction *LastInduction = cast<Instruction>(
-      Builder.CreateBinOp(AddOp, VecInd, SplatVF, "vec.ind.next"));
-  if (isa<TruncInst>(EntryVal))
-    State.addMetadata(LastInduction, EntryVal);
-  LastInduction->setDebugLoc(EntryVal->getDebugLoc());
+      Builder.CreateBinOp(AddOp, PrevVal, SplatVF, "vec.ind.next"));
+  if (isa<TruncInst>(IV))
+    State.addMetadata(LastInduction, IV);
+  LastInduction->setDebugLoc(IV->getDebugLoc());
 
-  VecInd->addIncoming(SteppedStart, VectorPH);
-  // Add induction update using an incorrect block temporarily. The phi node
-  // will be fixed after VPlan execution. Note that at this point the latch
-  // block cannot be used, as it does not exist yet.
-  // TODO: Model increment value in VPlan, by turning the recipe into a
-  // multi-def and a subclass of VPHeaderPHIRecipe.
-  VecInd->addIncoming(LastInduction, VectorPH);
+  State.set(this, LastInduction);
 }
 
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionBackedgeRecipe::print(
+    raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
+  O << Indent;
+  printAsOperand(O, SlotTracker);
+  O << " = WIDEN-INDUCTION-INC ";
+  printOperands(O, SlotTracker);
+}
+#endif
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 void VPWidenIntOrFpInductionRecipe::print(raw_ostream &O, const Twine &Indent,
                                           VPSlotTracker &SlotTracker) const {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index cee83d1015b536..9abf5f28936d53 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1820,12 +1820,91 @@ void VPlanTransforms::createInterleaveGroups(
   }
 }
 
+/// Expand a VPWidenIntOrFpInduction into separate recipes for the initial
+/// value, phi and backedge value. In the followng example:
+///
+///  vector.ph:
+///  Successor(s): vector loop
+///
+///  <x1> vector loop: {
+///    vector.body:
+///      WIDEN-INDUCTION %i = phi %bc.resume.val, %i.next, ir<1>, ir<%5>
+///      ...
+///      EMIT branch-on-count vp<%index.next>, ir<%n.vec>
+///    No successors
+///  }
+///
+/// WIDEN-INDUCTION will get expanded to:
+///
+///  vector.ph:
+///    vp<%0> = WIDEN-INDUCTION-START ir<0>, ir<1>
+///  Successor(s): vector loop
+///
+///  <x1> vector loop: {
+///    vector.body:
+///      ir<%i> = WIDEN-INDUCTION-PHI vp<%0>, vp<%4>
+///      ...
+///      vp<%4> = WIDEN-INDUCTION-INC ir<1>, ir<%5>, ir<%i>
+///      EMIT branch-on-count vp<%index.next>, ir<%n.vec>
+///    No successors
+///  }
+static void
+expandVPWidenIntOrFpInduction(VPWidenIntOrFpInductionRecipe *WidenIVR) {
+  VPlan *Plan = WidenIVR->getParent()->getPlan();
+  PHINode *PHI = WidenIVR->getPHINode();
+  VPValue *Start = WidenIVR->getStartValue();
+  VPValue *Step = WidenIVR->getStepValue();
+  VPValue *VF = WidenIVR->getVFValue();
+  const InductionDescriptor &ID = WidenIVR->getInductionDescriptor();
+  TruncInst *Trunc = WidenIVR->getTruncInst();
+
+  // The value from the original loop to which we are mapping the new induction
+  // variable.
+  Instruction *IV = Trunc ? cast<Instruction>(Trunc) : PHI;
+
+  // If the phi is truncated, truncate the start and step values.
+  VPBuilder Builder(Plan->getVectorPreheader());
+  if (isa<TruncInst>(IV)) {
+    assert(Start->getUnderlyingValue()->getType()->isIntegerTy() &&
+           "Truncation requires an integer type");
+    auto *TruncType = cast<IntegerType>(IV->getType());
+    Step = Builder.createScalarCast(Instruction::Trunc, Step, TruncType);
+    Start = Builder.createScalarCast(Instruction::Trunc, Start, TruncType);
+  }
+
+  // Construct the initial value of the vector IV in the vector loop preheader.
+  auto *StartR = new VPWidenIntOrFpInductionInitialRecipe(IV, Start, Step, ID);
+  Plan->getVectorPreheader()->insert(StartR, Builder.getInsertPoint());
+
+  // Create the widened phi of the vector IV.
+  auto *PhiR = new VPWidenIntOrFpInductionPHIRecipe(IV, StartR);
+  PhiR->insertBefore(WidenIVR);
+
+  // Create the backedge value for the vector IV.
+  VPValue *Prev = PhiR;
+  // If unrolled, use the last unrolled part in the increment.
+  if (auto *UnrolledPart = WidenIVR->getLastUnrolledPartOperand())
+    Prev = UnrolledPart;
+  auto *IncR = new VPWidenIntOrFpInductionBackedgeRecipe(
+      IV, Step, VF, Prev, WidenIVR->getSplatVFValue(), ID);
+  VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock();
+  ExitingBB->insert(IncR, ExitingBB->getTer...
[truncated]

lukel97 · 2024-12-04T13:51:52Z

cc @nikolaypanchenko

We can reuse VPWidenPHI in llvm#118638, but it requires us to allow it in the non-native path. We also need to propagate the DebugLoc and use a different name in the generated PHI, so this splits these parts off in case we want it. We lose some debug info in dbg-outer-loop-vect.ll, but I think this is because the underlying phi node didn't have a DebugLoc to begin with. I think the current version is just carrying over the DebugLoc from the previous state.

lukel97 · 2024-12-04T16:17:06Z

llvm/lib/Transforms/Vectorize/VPlan.h

+
+/// A recipe to generate the PHI of a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionPHIRecipe : public VPHeaderPHIRecipe {


We could reuse VPWidenPHI for this like is done in #115274 but we need to enable it on the non-native vplan path and pass through the debugloc, so I've opened up a separate PR for that in #118662

llvm/lib/Transforms/Vectorize/VPlan.cpp

fhahn · 2024-12-05T19:58:23Z

Overall direction aligns well with current and future VPlan roadmap. As is the current version adds a bunch of complexity though, will need some time to see how/if this could be further simplified

Mel-Chen · 2024-12-09T11:32:24Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

-  // TODO: Model increment value in VPlan, by turning the recipe into a
-  // multi-def and a subclass of VPHeaderPHIRecipe.


Could we first have a separate patch focused specifically on completing this TODO? If I understand correctly, this seems to be the primary issue at hand.

Additionally, I’d like to understand the difference between introducing a new recipe, VPWidenIntOrFpInductionBackedgeRecipe, and combining VPInstruction::Mul and VPInstruction::Add(#82021). What are the key distinctions, advantages, or trade-offs between these two approaches?

Could we first have a separate patch focused specifically on completing this TODO? If I understand correctly, this seems to be the primary issue at hand.

The VPHeaderPHIRecipe part of this todo is already done, but I'm not sure if the part about turning this into a multi-def will fix the issue.

If I'm understanding this correctly, if we had a multi-def recipe we would still be executing the increment/backedge value in the header before the rest of the loop body, and so we wouldn't be able to use the EVL in it.

Additionally, I’d like to understand the difference between introducing a new recipe, VPWidenIntOrFpInductionBackedgeRecipe, and combining VPInstruction::Mul and VPInstruction::Add(#82021).

I actually initially tried to avoid the new recipes and just use VPInstructions but I struggled with trying to splat two scalar VPInstructions into a vector operand. But actually now that I realise that #82021 does it I'll give it another try!

I would prefer to avoid adding extra recipes if possible, I don't see any particular advantage to having explicit recipes.

Yes, in general at this late stage there's less advantage from specialized recipes, better to use simpler recipes

arcbbb · 2024-12-10T09:54:40Z

Appreciate the effort! Just wondering - for EVL transform, do you expect to generate vp intrinsics during the backedge recipe execution, or will a separate transformation be needed to convert the backedge recipes into WidenIntrinsicRecipes?

lukel97 · 2024-12-10T10:57:03Z

Appreciate the effort! Just wondering - for EVL transform, do you expect to generate vp intrinsics during the backedge recipe execution, or will a separate transformation be needed to convert the backedge recipes into WidenIntrinsicRecipes?

I'm trying to get rid of the backedge recipe and just replace it with regular VPInstructions + VPWidenRecipes. In that case we could then rerun transformRecipestoEVLRecipes after the expansion?

Although I just want to double check, I was thinking that doing the regular widening for now would still be correct. Can we convert it to VP intrinsics as a separate task after the EVL transform or is it needed for correctness?

As a side note on RISC-V, I wonder if RISCVLOptimizer.cpp from #108640 might take care of removing the VL toggle for us in the backend as well

Mel-Chen · 2024-12-10T11:10:02Z

I'm trying to get rid of the backedge recipe and just replace it with regular VPInstructions + VPWidenRecipes. In that case we could then rerun transformRecipestoEVLRecipes after the expansion?

Yes, VPInstructions should be good enough for correctness. Transform mul/ add to vp.mul/ vp.add is for performance.

As a side note on RISC-V, I wonder if RISCVLOptimizer.cpp from #108640 might take care of removing the VL toggle for us in the backend as well

I have the same thought, but it doesn't seem to be accepted.
#109614 (comment)

npanchen · 2024-12-10T14:54:13Z

Although I just want to double check, I was thinking that doing the regular widening for now would still be correct. Can we convert it to VP intrinsics as a separate task after the EVL transform or is it needed for correctness?

I don't think regular widening of these inductions (i.e. without EVL) will be correct. The main problem is vsetvl's 6.3.2 so that if increment is VLMAX, last two iteration may result to overincrement.
fyi: QEMU has special rvv_vl_half_avl option to properly test 6.3.2

lukel97 · 2024-12-10T15:30:36Z

Although I just want to double check, I was thinking that doing the regular widening for now would still be correct. Can we convert it to VP intrinsics as a separate task after the EVL transform or is it needed for correctness?

I don't think regular widening of these inductions (i.e. without EVL) will be correct. The main problem is vsetvl's 6.3.2 so that if increment is VLMAX, last two iteration may result to overincrement. fyi: QEMU has special rvv_vl_half_avl option to properly test 6.3.2

This will allow us to use EVL as the increment value in the backedge value to deal with vsetvl's 6.3.2, instead of VF, e.g. something like

%phi.next = add <vscale x 2 x i32> %phi.cur, %evl <- from @llvm.experimental.get.vector.length

My thinking is that it shouldn't matter if it's a vp.add or a regular widened add since the phi value should be correct across all lanes for the next iteration?

npanchen · 2024-12-10T16:38:16Z

This will allow us to use EVL as the increment value in the backedge value to deal with vsetvl's 6.3.2, instead of VF, e.g. something like
My thinking is that it shouldn't matter if it's a vp.add or a regular widened add since the phi value should be correct across all lanes for the next iteration?

I see, I did misunderstand you then. Yes, than it should be correct to use regular add there. However, it does make more sense to stick with same approach: either emit vp-intrinsics always out of vectorizer or to have a minimal subset + postprocessing pass after vectorizer. Current vision is to do former.
By the way, unary and binary evl-recipes are already supported #93854, so it should be fine to use that, right ?

lukel97 · 2024-12-10T16:41:37Z

However, it does make more sense to stick with same approach: either emit vp-intrinsics always out of vectorizer or to have a minimal subset + postprocessing pass after vectorizer. Current vision is to do former. By the way, unary and binary evl-recipes are already supported #93854, so it should be fine to use that, right ?

Agreed, that seems like a sensible way forwards. And just to echo what @Mel-Chen and @arcbbb are saying, we can do the EVL-recipe conversion as a separate incremental follow up to #115274 (which hopefully in turn can become a follow up to this PR)

lukel97 · 2024-12-12T17:19:30Z

I've reworked this to remove the IV specific initial value and backedge recipes, and instead they're modelled in VPlan during expansion.

I had to add two new recipes to model splats and step vectors in VPlan. I needed the former because although it looks like you can broadcast underlying scalar IR values, I couldn't see a way to broadcast a scalar VPValue with no underlying instruction.

We could also get rid of the PHI recipe if we relax the native-path restriction on VPWidenPHIRecipe, see #118662

There's some extra diff now due to how this doesn't emit an identity add when building the initial value, I've also opened a separate PR to split this off: #119668

fhahn · 2024-12-12T17:23:43Z

I've reworked this to remove the IV specific initial value and backedge recipes, and instead they're modelled in VPlan during expansion.

I had to add two new recipes to model splats and step vectors in VPlan. I needed the former because although it looks like you can broadcast underlying scalar IR values, I couldn't see a way to broadcast a scalar VPValue with no underlying instruction.

For the induction step, there's already code the materializes it (for unrolling), #119284 generalizes it to a VPInstruction opcode. Would that help?

lukel97 · 2024-12-12T17:36:07Z

For the induction step, there's already code the materializes it (for unrolling), #119284 generalizes it to a VPInstruction opcode. Would that help?

I think I can reuse that here, that would take care of the casting part. And I guess we would get the benefit of the constant step optimization too. I'll rebase this PR on top of it

I think we would need to expand the VPWidenIntOrFpInductionRecipes first in convertToConcreteRecipes so that the WideIVSteps get a chance to be expanded afterwards?

…hi in VPWidenPHI::execute

…ze/split-VPWidenIntOrFpInductionRecipe

After updating llvm#118638 on tip of tree, expanding VPWidenIntOrFpInductionRecipes fails because it needs the loop region to get the latch to insert the increment into: VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock(); Builder.setInsertPoint(ExitingBB, ExitingBB->getTerminator()->getIterator()); auto *Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags, WidenIVR->getDebugLoc(), "vec.ind.next"); However after llvm#117506, the region is dissolved so it doesn't work. This shuffles the dissolveLoopRegions steps to be after convertToConcreteRecipes so we can use the region when expanding VPWidenIntOrFpInductionRecipes

…ze/split-VPWidenIntOrFpInductionRecipe

…FCI (#141999) After updating #118638 on tip of tree, expanding VPWidenIntOrFpInductionRecipes fails because it needs the loop region to get the latch to insert the increment into: VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock(); Builder.setInsertPoint(ExitingBB, ExitingBB->getTerminator()->getIterator()); auto *Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags, WidenIVR->getDebugLoc(), "vec.ind.next"); However after #117506, the region is dissolved so it doesn't work. This shuffles the dissolveLoopRegions steps to be after convertToConcreteRecipes so we can use the region when expanding VPWidenIntOrFpInductionRecipes

…ze/split-VPWidenIntOrFpInductionRecipe

I noticed this after updating llvm#118638 on top of llvm#117506, and seeing that some broadcasts were no longer being hoisted into the loop preheader. It was calling VPlan::getVectorPreheader(), which at this point now is dissolved and returns null. This fixes it by getting the header from vputils::getFirstLoopHeader and getting the preheader from its first successor. I've also added an assertion in getVectorPreheader() to make sure its not called after the regions are dissolved.

…FCI (llvm#141999) After updating llvm#118638 on tip of tree, expanding VPWidenIntOrFpInductionRecipes fails because it needs the loop region to get the latch to insert the increment into: VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock(); Builder.setInsertPoint(ExitingBB, ExitingBB->getTerminator()->getIterator()); auto *Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags, WidenIVR->getDebugLoc(), "vec.ind.next"); However after llvm#117506, the region is dissolved so it doesn't work. This shuffles the dissolveLoopRegions steps to be after convertToConcreteRecipes so we can use the region when expanding VPWidenIntOrFpInductionRecipes

…ze/split-VPWidenIntOrFpInductionRecipe

lukel97 · 2025-06-11T23:24:40Z

llvm/test/Transforms/LoopVectorize/scalable-inductions.ll

+; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP8]], i64 0
+; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer


Making a note here that if we're removing the hoisting logic in #142594 anyway then these diffs where the broadcast is sunk shouldn't be important. LICM will hoist them out afterwards anyway

…FCI (llvm#141999) After updating llvm#118638 on tip of tree, expanding VPWidenIntOrFpInductionRecipes fails because it needs the loop region to get the latch to insert the increment into: VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock(); Builder.setInsertPoint(ExitingBB, ExitingBB->getTerminator()->getIterator()); auto *Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags, WidenIVR->getDebugLoc(), "vec.ind.next"); However after llvm#117506, the region is dissolved so it doesn't work. This shuffles the dissolveLoopRegions steps to be after convertToConcreteRecipes so we can use the region when expanding VPWidenIntOrFpInductionRecipes

…ze/split-VPWidenIntOrFpInductionRecipe

lukel97 · 2025-06-13T11:18:29Z

Ping, all dependent patches should be landed now

fhahn · 2025-06-17T08:51:55Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  // The value from the original loop to which we are mapping the new induction
+  // variable.
+  Instruction *IV = Trunc ? cast<Instruction>(Trunc) : WidenIVR->getPHINode();
+  Type *Ty = IV->getType();


Suggested change

Type *Ty = IV->getType();

Type *Ty = TypeInfo.inferScalarType(WidenIVR);

Done in 7cf85b9

fhahn · 2025-06-17T08:52:55Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+  // If the phi is truncated, truncate the start and step values.
+  VPBuilder Builder(Plan->getVectorPreheader());
+  if (isa<TruncInst>(IV)) {


Can we check the types here instead if casts are needed?

Good idea, done in 7cf85b9

fhahn · 2025-06-17T12:00:32Z

...orms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll

 ; VF4-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; VF4-NEXT:    [[VEC_IND_NEXT]] = add <4 x i8> [[VEC_IND]], splat (i8 4)


Ah OK, thanks for checking!

…ze/split-VPWidenIntOrFpInductionRecipe

… trunc

fhahn

LGTM with a few more small comments inline, thanks!

fhahn · 2025-06-17T14:40:39Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+  WidePHI->addOperand(Next);
+
+  WidenIVR->replaceAllUsesWith(WidePHI);


This should be added to ToRemove instead of directly removing I think.

Thanks, done in e8d64a4

fhahn · 2025-06-17T14:41:28Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  // FIXME: The newly created binary instructions should contain nsw/nuw
+  // flags, which can be found from the original scalar operations.


Fixme better moved to where Flags is defined?

fhahn · 2025-06-17T14:43:11Z

llvm/test/Transforms/LoopVectorize/pointer-induction.ll

+  br label %for.body
+
+for.body:                                         ; preds = %for.body, %entry
+  %iv.ptr = phi ptr [ null, %entry ], [ %iv.ptr.next, %for.body ]


probably best to start at a concrete pointer, to avoid poison.

fhahn · 2025-06-17T14:44:04Z

llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll

+; CHECK-NEXT: Successor(s): middle.block, vector.body
+entry:
+  br label %loop
+loop:


nit:

Suggested change

loop:

loop:

fhahn · 2025-06-17T14:44:27Z

llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll

+entry:
+  br label %loop
+loop:
+  %i = phi i64 [0, %entry], [%i.next, %loop]


nit for consistency

Suggested change

%i = phi i64 [0, %entry], [%i.next, %loop]

%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]

fhahn · 2025-06-17T14:44:42Z

llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll

+  %i.next = add i64 %i, 1
+  %done = icmp eq i64 %i.next, %n
+  br i1 %done, label %exit, label %loop
+exit:


nit:

Suggested change

exit:

exit:

llvm-ci · 2025-06-17T17:53:20Z

LLVM Buildbot has detected a new failure on builder ml-opt-rel-x86-64 running on ml-opt-rel-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/20325

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/RISCV/GlobalISel/irtranslator/fallback.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=riscv64 -mattr='+v' -O0 -global-isel -global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/fallback.ll -o /b/ml-opt-rel-x86-64-b1/build/test/CodeGen/RISCV/GlobalISel/irtranslator/Output/fallback.ll.tmp.out 2> /b/ml-opt-rel-x86-64-b1/build/test/CodeGen/RISCV/GlobalISel/irtranslator/Output/fallback.ll.tmp.err # RUN: at line 1
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=riscv64 -mattr=+v -O0 -global-isel -global-isel-abort=2 '-pass-remarks-missed=gisel*' -verify-machineinstrs /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/fallback.ll -o /b/ml-opt-rel-x86-64-b1/build/test/CodeGen/RISCV/GlobalISel/irtranslator/Output/fallback.ll.tmp.out
/b/ml-opt-rel-x86-64-b1/build/bin/FileCheck /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/fallback.ll --check-prefix=FALLBACK-WITH-REPORT-OUT < /b/ml-opt-rel-x86-64-b1/build/test/CodeGen/RISCV/GlobalISel/irtranslator/Output/fallback.ll.tmp.out # RUN: at line 2
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/fallback.ll --check-prefix=FALLBACK-WITH-REPORT-OUT
/b/ml-opt-rel-x86-64-b1/build/bin/FileCheck /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/fallback.ll --check-prefix=FALLBACK-WITH-REPORT-ERR < /b/ml-opt-rel-x86-64-b1/build/test/CodeGen/RISCV/GlobalISel/irtranslator/Output/fallback.ll.tmp.err # RUN: at line 3
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/fallback.ll --check-prefix=FALLBACK-WITH-REPORT-ERR
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/fallback.ll:25:29: error: FALLBACK-WITH-REPORT-ERR: expected string not found in input
; FALLBACK-WITH-REPORT-ERR: remark: <unknown>:0:0: unable to translate instruction: call:
                            ^
<stdin>:1:1: note: scanning from here
remark: <unknown>:0:0: unable to translate instruction: call llvm.riscv.vadd.nxv1i8.nxv1i8.i64: ' %a = call <vscale x 1 x i8> @llvm.riscv.vadd.nxv1i8.nxv1i8.i64(<vscale x 1 x i8> undef, <vscale x 1 x i8> %0, <vscale x 1 x i8> %1, i64 %2)' (in function: scalable_arg)
^

Input file: <stdin>
Check file: /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/fallback.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: remark: <unknown>:0:0: unable to translate instruction: call llvm.riscv.vadd.nxv1i8.nxv1i8.i64: ' %a = call <vscale x 1 x i8> @llvm.riscv.vadd.nxv1i8.nxv1i8.i64(<vscale x 1 x i8> undef, <vscale x 1 x i8> %0, <vscale x 1 x i8> %1, i64 %2)' (in function: scalable_arg) 
check:25     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
          2: warning: Instruction selection used fallback path for scalable_arg 
check:25     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          3: remark: <unknown>:0:0: unable to translate instruction: call llvm.riscv.vadd.nxv1i8.nxv1i8.i64: ' %a = call <vscale x 1 x i8> @llvm.riscv.vadd.nxv1i8.nxv1i8.i64(<vscale x 1 x i8> undef, <vscale x 1 x i8> undef, <vscale x 1 x i8> undef, i64 %0)' (in function: scalable_inst) 
check:25     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          4: warning: Instruction selection used fallback path for scalable_inst 
check:25     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          5: remark: <unknown>:0:0: unable to translate instruction: alloca: ' %local0 = alloca <vscale x 16 x i8>, align 16' (in function: scalable_alloca) 
check:25     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          6: warning: Instruction selection used fallback path for scalable_alloca 
check:25     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

********************

Following on from llvm#118638, this handles widened induction variables with EVL tail folding by setting the VF operand to be EVL, calculated in the vector body. We need to do this for correctness since with EVL tail folding the number of elements processed in the penultimate iteration may not be VF, but the runtime EVL, and we need to increment induction variables as such. - Because the VF may now not be a live-in we need to move the builder to just after its definition - We also need to avoid truncating it when it's the same size as the step type, previously this wasn't a problem for live-ins. - Also because the VF may be smaller than the IV type, since the EVL is always i32, we may need to zext it. On -march=rva23u64 -O3 we get 87.1% more loops vectorized on TSVC, and 42.8% more loops vectorized on SPEC CPU 2017

lukel97 requested review from fhahn, arcbbb, npanchen, alexey-bataev, Mel-Chen and ayalz December 4, 2024 13:51

llvmbot added vectorizers llvm:transforms labels Dec 4, 2024

lukel97 requested a review from preames December 4, 2024 13:55

lukel97 mentioned this pull request Dec 4, 2024

[VPlan] Allow VPWidenPHI in non-native path. NFC #118662

Closed

lukel97 commented Dec 4, 2024

View reviewed changes

lukel97 changed the title ~~[LV] Expand VPWidenIntOrFpInductionRecipe into separate recipes~~ [VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes Dec 4, 2024

alexey-bataev reviewed Dec 5, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlan.cpp Outdated Show resolved Hide resolved

Mel-Chen reviewed Dec 9, 2024

View reviewed changes

lukel97 force-pushed the loop-vectorize/split-VPWidenIntOrFpInductionRecipe branch from cd020fd to 4247eaa Compare December 12, 2024 17:07

lukel97 mentioned this pull request Dec 13, 2024

[VPlan] Add opcode to create step for wide inductions. #119284

Merged

lukel97 added 6 commits May 14, 2025 15:22

Fix typo, use TypeInfo in assert

61bd641

Don't move VPWidenPointerInductionRecipe, instead fixup location of p…

ec5fe59

…hi in VPWidenPHI::execute

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

466ed14

…ze/split-VPWidenIntOrFpInductionRecipe

Fix comments

b315afb

Remove VPWidenPHIRecipe change

993ba23

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

894bde0

…ze/split-VPWidenIntOrFpInductionRecipe

lukel97 mentioned this pull request May 29, 2025

[VPlan] Convert to concrete recipes before dissolving loop regions. NFCI #141999

Merged

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

8037a19

…ze/split-VPWidenIntOrFpInductionRecipe

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

ddca9b9

…ze/split-VPWidenIntOrFpInductionRecipe

lukel97 mentioned this pull request Jun 3, 2025

[VPlan] Remove hoisting to preheader in VPTransformState::get #142594

Open

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

27c724e

…ze/split-VPWidenIntOrFpInductionRecipe

lukel97 commented Jun 11, 2025

View reviewed changes

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

b83af78

…ze/split-VPWidenIntOrFpInductionRecipe

fhahn reviewed Jun 17, 2025

View reviewed changes

lukel97 added 2 commits June 17, 2025 13:29

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

55deb1c

…ze/split-VPWidenIntOrFpInductionRecipe

Infer and check types instead of peeking through underlying value for…

7cf85b9

… trunc

fhahn approved these changes Jun 17, 2025

View reviewed changes

lukel97 added 3 commits June 17, 2025 17:42

Move comment to flags

4f3acf0

Use ToRemove

e8d64a4

Remove null ptr in test, update variable names

f34b569

lukel97 merged commit 9dd1c66 into llvm:main Jun 17, 2025
6 of 7 checks passed

lukel97 mentioned this pull request Jun 18, 2025

[VPlan] Support VPWidenIntOrFpInductionRecipes with EVL tail folding #144666

Open

		// TODO: Model increment value in VPlan, by turning the recipe into a
		// multi-def and a subclass of VPHeaderPHIRecipe.

		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP8]], i64 0
		; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer

	Type *Ty = IV->getType();
	Type *Ty = TypeInfo.inferScalarType(WidenIVR);

		; VF4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; VF4-NEXT: [[VEC_IND_NEXT]] = add <4 x i8> [[VEC_IND]], splat (i8 4)


		WidePHI->addOperand(Next);

		WidenIVR->replaceAllUsesWith(WidePHI);

		// FIXME: The newly created binary instructions should contain nsw/nuw
		// flags, which can be found from the original scalar operations.

	%i = phi i64 [0, %entry], [%i.next, %loop]
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]

[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638

[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638

Conversation

lukel97 commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Dec 4, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fhahn commented Dec 5, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arcbbb commented Dec 10, 2024

Uh oh!

lukel97 commented Dec 10, 2024

Uh oh!

Mel-Chen commented Dec 10, 2024

Uh oh!

npanchen commented Dec 10, 2024

Uh oh!

lukel97 commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

npanchen commented Dec 10, 2024

Uh oh!

lukel97 commented Dec 10, 2024

Uh oh!

lukel97 commented Dec 12, 2024

Uh oh!

fhahn commented Dec 12, 2024

Uh oh!

lukel97 commented Dec 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Jun 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Jun 17, 2025

Uh oh!

Uh oh!

lukel97 commented Dec 4, 2024 •

edited

Loading

llvmbot commented Dec 4, 2024 •

edited

Loading

lukel97 commented Dec 10, 2024 •

edited

Loading