-
Notifications
You must be signed in to change notification settings - Fork 14k
[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesThe motivation of this PR is to make #115274 easier to implement. It's similar to the idea in #82021 (but admittedly I didn't notice it until I had already written this!), and should allow us to add EVL support by just passing EVL to the VF operand. The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body. The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example:
gets expanded to:
This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves. I initially tried just splitting up VPWidenIntOrFpInductionRecipe immediately in I also tried to avoid the start and increment recipes by expanding them directly into VPInstructions/VPWidenIntrinsicRecipe/VPScalarCastRecipes, but I ran into some difficulties when trying to broadcast a scalar type to use in a widened binary op. (I'm happy to give this another try though!) I hoped to make this an NFC, but unfortunately some of the splatted values get shuffled about in the preheaders. I may be duplicating other work here, and this was just the solution I stumbled upon. Open to any other ideas or approaches! Patch is 84.38 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118638.diff 19 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index b801d1863e252c..4e4c4dfd461824 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1043,18 +1043,12 @@ void VPlan::execute(VPTransformState *State) {
if (isa<VPWidenPHIRecipe>(&R))
continue;
- if (isa<VPWidenPointerInductionRecipe>(&R) ||
- isa<VPWidenIntOrFpInductionRecipe>(&R)) {
- PHINode *Phi = nullptr;
- if (isa<VPWidenIntOrFpInductionRecipe>(&R)) {
- Phi = cast<PHINode>(State->get(R.getVPSingleValue()));
- } else {
- auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
- assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) &&
- "recipe generating only scalars should have been replaced");
- auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi));
- Phi = cast<PHINode>(GEP->getPointerOperand());
- }
+ if (isa<VPWidenPointerInductionRecipe>(&R)) {
+ auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
+ assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) &&
+ "recipe generating only scalars should have been replaced");
+ auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi));
+ PHINode *Phi = cast<PHINode>(GEP->getPointerOperand());
Phi->setIncomingBlock(1, VectorLatchBB);
@@ -1062,10 +1056,6 @@ void VPlan::execute(VPTransformState *State) {
// consistent placement of all induction updates.
Instruction *Inc = cast<Instruction>(Phi->getIncomingValue(1));
Inc->moveBefore(VectorLatchBB->getTerminator()->getPrevNode());
-
- // Use the steps for the last part as backedge value for the induction.
- if (auto *IV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R))
- Inc->setOperand(0, State->get(IV->getLastUnrolledPartOperand()));
continue;
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e1d828f038f9a2..513e973af8fc1a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2089,7 +2089,9 @@ class VPHeaderPHIRecipe : public VPSingleDefRecipe {
};
/// A recipe for handling phi nodes of integer and floating-point inductions,
-/// producing their vector values.
+/// producing their vector values. This won't execute any LLVM IR and will get
+/// expanded later into VPWidenIntOrFpInitialRecipe, VPWidenIntOrFpPHIRecipe and
+/// VPWidenIntOrFpBackedgeRecipe.
class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
PHINode *IV;
TruncInst *Trunc;
@@ -2122,9 +2124,10 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
- /// Generate the vectorized and scalarized versions of the phi node as
- /// needed by their users.
- void execute(VPTransformState &State) override;
+ void execute(VPTransformState &State) override {
+ llvm_unreachable("cannot execute this recipe, should be expanded via "
+ "expandVPWidenIntOrFpInductionRecipe");
+ }
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
/// Print the recipe.
@@ -2180,10 +2183,152 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
}
/// Returns the VPValue representing the value of this induction at
- /// the last unrolled part, if it exists. Returns itself if unrolling did not
+ /// the last unrolled part, if it exists. Returns nullptr if unrolling did not
/// take place.
VPValue *getLastUnrolledPartOperand() {
- return getNumOperands() == 5 ? getOperand(4) : this;
+ return getNumOperands() == 5 ? getOperand(4) : nullptr;
+ }
+};
+
+/// A recipe to compute the initial value for a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionInitialRecipe : public VPSingleDefRecipe {
+ Instruction *IV;
+ const InductionDescriptor &ID;
+
+public:
+ VPWidenIntOrFpInductionInitialRecipe(Instruction *IV, VPValue *Start,
+ VPValue *Step,
+ const InductionDescriptor &ID)
+ : VPSingleDefRecipe(VPDef::VPWidenIntOrFpInductionStartSC, {Start, Step}),
+ IV(IV), ID(ID) {
+ assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+ "Expected either an induction phi-node or a truncate of it!");
+ }
+
+ ~VPWidenIntOrFpInductionInitialRecipe() override = default;
+
+ VPWidenIntOrFpInductionInitialRecipe *clone() override {
+ return new VPWidenIntOrFpInductionInitialRecipe(IV, getOperand(0),
+ getOperand(1), ID);
+ }
+
+ VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionStartSC)
+
+ void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ /// Print the recipe.
+ void print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const override;
+#endif
+
+ VPValue *getStartValue() { return getOperand(0); }
+ const VPValue *getStartValue() const { return getOperand(0); }
+
+ VPValue *getStepValue() { return getOperand(1); }
+ const VPValue *getStepValue() const { return getOperand(1); }
+
+ /// Returns the scalar type of the induction.
+ Type *getScalarType() const { return IV->getType(); }
+
+ bool onlyFirstLaneUsed(const VPValue *Op) const override {
+ assert(is_contained(operands(), Op) &&
+ "Op must be an operand of the recipe");
+ return true;
+ }
+};
+
+/// A recipe to generate the PHI of a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionPHIRecipe : public VPHeaderPHIRecipe {
+ Instruction *IV;
+
+public:
+ VPWidenIntOrFpInductionPHIRecipe(Instruction *IV, VPValue *Start)
+ : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionPHISC, IV, Start),
+ IV(IV) {
+ assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+ "Expected either an induction phi-node or a truncate of it!");
+ }
+
+ ~VPWidenIntOrFpInductionPHIRecipe() override = default;
+
+ VPWidenIntOrFpInductionPHIRecipe *clone() override {
+ auto *R = new VPWidenIntOrFpInductionPHIRecipe(IV, getOperand(0));
+ R->addOperand(getBackedgeValue());
+ return R;
+ }
+
+ VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionPHISC)
+
+ void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ /// Print the recipe.
+ void print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const override;
+#endif
+};
+
+/// A recipe to compute the backedge value for a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionBackedgeRecipe : public VPSingleDefRecipe {
+ Instruction *IV;
+ const InductionDescriptor &ID;
+
+public:
+ VPWidenIntOrFpInductionBackedgeRecipe(Instruction *IV, VPValue *Step,
+ VPValue *VF, VPValue *Prev,
+ VPValue *SplatVF,
+ const InductionDescriptor &ID)
+ : VPSingleDefRecipe(VPDef::VPWidenIntOrFpInductionSC, {Step, VF, Prev}),
+ IV(IV), ID(ID) {
+ assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+ "Expected either an induction phi-node or a truncate of it!");
+ if (SplatVF)
+ addOperand(SplatVF);
+ }
+
+ ~VPWidenIntOrFpInductionBackedgeRecipe() override = default;
+
+ VPWidenIntOrFpInductionBackedgeRecipe *clone() override {
+ return new VPWidenIntOrFpInductionBackedgeRecipe(
+ IV, getOperand(0), getOperand(1), getOperand(2), getOperand(3), ID);
+ }
+
+ VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionIncSC)
+
+ void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ /// Print the recipe.
+ void print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const override;
+#endif
+
+ VPValue *getStepValue() { return getOperand(0); }
+ const VPValue *getStepValue() const { return getOperand(0); }
+
+ VPValue *getVFValue() { return getOperand(1); }
+ const VPValue *getVFValue() const { return getOperand(1); }
+
+ VPValue *getPrevValue() { return getOperand(2); }
+ const VPValue *getPrevValue() const { return getOperand(2); }
+
+ VPValue *getSplatVFValue() {
+ // If the recipe has been unrolled (4 operands), return the VPValue for the
+ // induction increment.
+ return getNumOperands() == 4 ? getOperand(3) : nullptr;
+ }
+
+ /// Returns the scalar type of the induction.
+ Type *getScalarType() const { return IV->getType(); }
+
+ bool onlyFirstLaneUsed(const VPValue *Op) const override {
+ assert(is_contained(operands(), Op) &&
+ "Op must be an operand of the recipe");
+ return Op == getOperand(0) || Op == getOperand(1);
}
};
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 969d07b229e469..8a9e64b00850e2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -214,14 +214,17 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
.Case<VPActiveLaneMaskPHIRecipe, VPCanonicalIVPHIRecipe,
VPFirstOrderRecurrencePHIRecipe, VPReductionPHIRecipe,
VPWidenPointerInductionRecipe, VPEVLBasedIVPHIRecipe,
- VPScalarPHIRecipe>([this](const auto *R) {
- // Handle header phi recipes, except VPWidenIntOrFpInduction
- // which needs special handling due it being possibly truncated.
- // TODO: consider inferring/caching type of siblings, e.g.,
- // backedge value, here and in cases below.
- return inferScalarType(R->getStartValue());
- })
- .Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe>(
+ VPScalarPHIRecipe, VPWidenIntOrFpInductionPHIRecipe>(
+ [this](const auto *R) {
+ // Handle header phi recipes, except VPWidenIntOrFpInduction
+ // which needs special handling due it being possibly truncated.
+ // TODO: consider inferring/caching type of siblings, e.g.,
+ // backedge value, here and in cases below.
+ return inferScalarType(R->getStartValue());
+ })
+ .Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe,
+ VPWidenIntOrFpInductionInitialRecipe,
+ VPWidenIntOrFpInductionBackedgeRecipe>(
[](const auto *R) { return R->getScalarType(); })
.Case<VPReductionRecipe, VPPredInstPHIRecipe, VPWidenPHIRecipe,
VPScalarIVStepsRecipe, VPWidenGEPRecipe, VPVectorPointerRecipe,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index ef5f6e22f82206..8ae1e7382bce38 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1631,47 +1631,73 @@ static Constant *getSignedIntOrFpConstant(Type *Ty, int64_t C) {
: ConstantFP::get(Ty, C);
}
-void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
+void VPWidenIntOrFpInductionInitialRecipe::execute(VPTransformState &State) {
assert(!State.Lane && "Int or FP induction being replicated.");
- Value *Start = getStartValue()->getLiveInIRValue();
- const InductionDescriptor &ID = getInductionDescriptor();
- TruncInst *Trunc = getTruncInst();
+ Value *Start = State.get(getStartValue(), true);
IRBuilderBase &Builder = State.Builder;
- assert(IV->getType() == ID.getStartValue()->getType() && "Types must match");
assert(State.VF.isVector() && "must have vector VF");
- // The value from the original loop to which we are mapping the new induction
- // variable.
- Instruction *EntryVal = Trunc ? cast<Instruction>(Trunc) : IV;
-
// Fast-math-flags propagate from the original induction instruction.
IRBuilder<>::FastMathFlagGuard FMFG(Builder);
- if (ID.getInductionBinOp() && isa<FPMathOperator>(ID.getInductionBinOp()))
+ if (isa_and_nonnull<FPMathOperator>(ID.getInductionBinOp()))
Builder.setFastMathFlags(ID.getInductionBinOp()->getFastMathFlags());
// Now do the actual transformations, and start with fetching the step value.
Value *Step = State.get(getStepValue(), VPLane(0));
- assert((isa<PHINode>(EntryVal) || isa<TruncInst>(EntryVal)) &&
- "Expected either an induction phi-node or a truncate of it!");
-
- // Construct the initial value of the vector IV in the vector loop preheader
- auto CurrIP = Builder.saveIP();
- BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
- Builder.SetInsertPoint(VectorPH->getTerminator());
- if (isa<TruncInst>(EntryVal)) {
- assert(Start->getType()->isIntegerTy() &&
- "Truncation requires an integer type");
- auto *TruncType = cast<IntegerType>(EntryVal->getType());
- Step = Builder.CreateTrunc(Step, TruncType);
- Start = Builder.CreateCast(Instruction::Trunc, Start, TruncType);
- }
-
+ // Construct the initial value of the vector IV
Value *Zero = getSignedIntOrFpConstant(Start->getType(), 0);
Value *SplatStart = Builder.CreateVectorSplat(State.VF, Start);
Value *SteppedStart = getStepVector(
SplatStart, Zero, Step, ID.getInductionOpcode(), State.VF, State.Builder);
+ State.set(this, SteppedStart);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionInitialRecipe::print(
+ raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
+ O << Indent;
+ printAsOperand(O, SlotTracker);
+ O << " = WIDEN-INDUCTION-START ";
+ printOperands(O, SlotTracker);
+}
+#endif
+
+void VPWidenIntOrFpInductionPHIRecipe::execute(VPTransformState &State) {
+ BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
+
+ Value *Start = State.get(getOperand(0));
+ PHINode *Phi = State.Builder.CreatePHI(Start->getType(), 2, "vec.ind");
+ Phi->addIncoming(Start, VectorPH);
+ Phi->setDebugLoc(IV->getDebugLoc());
+ State.set(this, Phi);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionPHIRecipe::print(raw_ostream &O,
+ const Twine &Indent,
+ VPSlotTracker &SlotTracker) const {
+ O << Indent;
+ printAsOperand(O, SlotTracker);
+ O << " = WIDEN-INDUCTION-PHI ";
+ printOperands(O, SlotTracker);
+}
+#endif
+
+void VPWidenIntOrFpInductionBackedgeRecipe::execute(VPTransformState &State) {
+ IRBuilderBase &Builder = State.Builder;
+
+ // Fast-math-flags propagate from the original induction instruction.
+ IRBuilder<>::FastMathFlagGuard FMFG(Builder);
+ if (isa_and_nonnull<FPMathOperator>(ID.getInductionBinOp()))
+ Builder.setFastMathFlags(ID.getInductionBinOp()->getFastMathFlags());
+
+ Value *Step = State.get(getStepValue(), VPLane(0));
+
+ auto CurrIP = Builder.saveIP();
+ BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
+ Builder.SetInsertPoint(VectorPH->getTerminator());
// We create vector phi nodes for both integer and floating-point induction
// variables. Here, we determine the kind of arithmetic we will perform.
@@ -1706,29 +1732,27 @@ void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
}
Builder.restoreIP(CurrIP);
-
- // We may need to add the step a number of times, depending on the unroll
- // factor. The last of those goes into the PHI.
- PHINode *VecInd = PHINode::Create(SteppedStart->getType(), 2, "vec.ind");
- VecInd->insertBefore(State.CFG.PrevBB->getFirstInsertionPt());
- VecInd->setDebugLoc(EntryVal->getDebugLoc());
- State.set(this, VecInd);
+ Value *PrevVal = State.get(getPrevValue());
Instruction *LastInduction = cast<Instruction>(
- Builder.CreateBinOp(AddOp, VecInd, SplatVF, "vec.ind.next"));
- if (isa<TruncInst>(EntryVal))
- State.addMetadata(LastInduction, EntryVal);
- LastInduction->setDebugLoc(EntryVal->getDebugLoc());
+ Builder.CreateBinOp(AddOp, PrevVal, SplatVF, "vec.ind.next"));
+ if (isa<TruncInst>(IV))
+ State.addMetadata(LastInduction, IV);
+ LastInduction->setDebugLoc(IV->getDebugLoc());
- VecInd->addIncoming(SteppedStart, VectorPH);
- // Add induction update using an incorrect block temporarily. The phi node
- // will be fixed after VPlan execution. Note that at this point the latch
- // block cannot be used, as it does not exist yet.
- // TODO: Model increment value in VPlan, by turning the recipe into a
- // multi-def and a subclass of VPHeaderPHIRecipe.
- VecInd->addIncoming(LastInduction, VectorPH);
+ State.set(this, LastInduction);
}
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionBackedgeRecipe::print(
+ raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
+ O << Indent;
+ printAsOperand(O, SlotTracker);
+ O << " = WIDEN-INDUCTION-INC ";
+ printOperands(O, SlotTracker);
+}
+#endif
+
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
void VPWidenIntOrFpInductionRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index cee83d1015b536..9abf5f28936d53 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1820,12 +1820,91 @@ void VPlanTransforms::createInterleaveGroups(
}
}
+/// Expand a VPWidenIntOrFpInduction into separate recipes for the initial
+/// value, phi and backedge value. In the followng example:
+///
+/// vector.ph:
+/// Successor(s): vector loop
+///
+/// <x1> vector loop: {
+/// vector.body:
+/// WIDEN-INDUCTION %i = phi %bc.resume.val, %i.next, ir<1>, ir<%5>
+/// ...
+/// EMIT branch-on-count vp<%index.next>, ir<%n.vec>
+/// No successors
+/// }
+///
+/// WIDEN-INDUCTION will get expanded to:
+///
+/// vector.ph:
+/// vp<%0> = WIDEN-INDUCTION-START ir<0>, ir<1>
+/// Successor(s): vector loop
+///
+/// <x1> vector loop: {
+/// vector.body:
+/// ir<%i> = WIDEN-INDUCTION-PHI vp<%0>, vp<%4>
+/// ...
+/// vp<%4> = WIDEN-INDUCTION-INC ir<1>, ir<%5>, ir<%i>
+/// EMIT branch-on-count vp<%index.next>, ir<%n.vec>
+/// No successors
+/// }
+static void
+expandVPWidenIntOrFpInduction(VPWidenIntOrFpInductionRecipe *WidenIVR) {
+ VPlan *Plan = WidenIVR->getParent()->getPlan();
+ PHINode *PHI = WidenIVR->getPHINode();
+ VPValue *Start = WidenIVR->getStartValue();
+ VPValue *Step = WidenIVR->getStepValue();
+ VPValue *VF = WidenIVR->getVFValue();
+ const InductionDescriptor &ID = WidenIVR->getInductionDescriptor();
+ TruncInst *Trunc = WidenIVR->getTruncInst();
+
+ // The value from the original loop to which we are mapping the new induction
+ // variable.
+ Instruction *IV = Trunc ? cast<Instruction>(Trunc) : PHI;
+
+ // If the phi is truncated, truncate the start and step values.
+ VPBuilder Builder(Plan->getVectorPreheader());
+ if (isa<TruncInst>(IV)) {
+ assert(Start->getUnderlyingValue()->getType()->isIntegerTy() &&
+ "Truncation requires an integer type");
+ auto *TruncType = cast<IntegerType>(IV->getType());
+ Step = Builder.createScalarCast(Instruction::Trunc, Step, TruncType);
+ Start = Builder.createScalarCast(Instruction::Trunc, Start, TruncType);
+ }
+
+ // Construct the initial value of the vector IV in the vector loop preheader.
+ auto *StartR = new VPWidenIntOrFpInductionInitialRecipe(IV, Start, Step, ID);
+ Plan->getVectorPreheader()->insert(StartR, Builder.getInsertPoint());
+
+ // Create the widened phi of the vector IV.
+ auto *PhiR = new VPWidenIntOrFpInductionPHIRecipe(IV, StartR);
+ PhiR->insertBefore(WidenIVR);
+
+ // Create the backedge value for the vector IV.
+ VPValue *Prev = PhiR;
+ // If unrolled, use the last unrolled part in the increment.
+ if (auto *UnrolledPart = WidenIVR->getLastUnrolledPartOperand())
+ Prev = UnrolledPart;
+ auto *IncR = new VPWidenIntOrFpInductionBackedgeRecipe(
+ IV, Step, VF, Prev, WidenIVR->getSplatVFValue(), ID);
+ VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock();
+ ExitingBB->insert(IncR, ExitingBB->getTer...
[truncated]
|
We can reuse VPWidenPHI in llvm#118638, but it requires us to allow it in the non-native path. We also need to propagate the DebugLoc and use a different name in the generated PHI, so this splits these parts off in case we want it. We lose some debug info in dbg-outer-loop-vect.ll, but I think this is because the underlying phi node didn't have a DebugLoc to begin with. I think the current version is just carrying over the DebugLoc from the previous state.
|
||
/// A recipe to generate the PHI of a widened IV, expanded from | ||
/// VPWidenIntOrFpInductionRecipe. | ||
class VPWidenIntOrFpInductionPHIRecipe : public VPHeaderPHIRecipe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall direction aligns well with current and future VPlan roadmap. As is the current version adds a bunch of complexity though, will need some time to see how/if this could be further simplified |
// TODO: Model increment value in VPlan, by turning the recipe into a | ||
// multi-def and a subclass of VPHeaderPHIRecipe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we first have a separate patch focused specifically on completing this TODO? If I understand correctly, this seems to be the primary issue at hand.
Additionally, I’d like to understand the difference between introducing a new recipe, VPWidenIntOrFpInductionBackedgeRecipe
, and combining VPInstruction::Mul
and VPInstruction::Add
(#82021). What are the key distinctions, advantages, or trade-offs between these two approaches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we first have a separate patch focused specifically on completing this TODO? If I understand correctly, this seems to be the primary issue at hand.
The VPHeaderPHIRecipe part of this todo is already done, but I'm not sure if the part about turning this into a multi-def will fix the issue.
If I'm understanding this correctly, if we had a multi-def recipe we would still be executing the increment/backedge value in the header before the rest of the loop body, and so we wouldn't be able to use the EVL in it.
Additionally, I’d like to understand the difference between introducing a new recipe, VPWidenIntOrFpInductionBackedgeRecipe, and combining VPInstruction::Mul and VPInstruction::Add(#82021).
I actually initially tried to avoid the new recipes and just use VPInstructions but I struggled with trying to splat two scalar VPInstructions into a vector operand. But actually now that I realise that #82021 does it I'll give it another try!
I would prefer to avoid adding extra recipes if possible, I don't see any particular advantage to having explicit recipes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in general at this late stage there's less advantage from specialized recipes, better to use simpler recipes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in general at this late stage there's less advantage from specialized recipes, better to use simpler recipes
Appreciate the effort! Just wondering - for EVL transform, do you expect to generate vp intrinsics during the backedge recipe execution, or will a separate transformation be needed to convert the backedge recipes into WidenIntrinsicRecipes? |
I'm trying to get rid of the backedge recipe and just replace it with regular VPInstructions + VPWidenRecipes. In that case we could then rerun transformRecipestoEVLRecipes after the expansion? Although I just want to double check, I was thinking that doing the regular widening for now would still be correct. Can we convert it to VP intrinsics as a separate task after the EVL transform or is it needed for correctness? As a side note on RISC-V, I wonder if RISCVLOptimizer.cpp from #108640 might take care of removing the VL toggle for us in the backend as well |
Yes, VPInstructions should be good enough for correctness. Transform mul/ add to vp.mul/ vp.add is for performance.
I have the same thought, but it doesn't seem to be accepted. |
I don't think regular widening of these inductions (i.e. without EVL) will be correct. The main problem is vsetvl's 6.3.2 so that if increment is VLMAX, last two iteration may result to overincrement. |
This will allow us to use EVL as the increment value in the backedge value to deal with vsetvl's 6.3.2, instead of VF, e.g. something like %phi.next = add <vscale x 2 x i32> %phi.cur, %evl <- from @llvm.experimental.get.vector.length My thinking is that it shouldn't matter if it's a vp.add or a regular widened add since the phi value should be correct across all lanes for the next iteration? |
I see, I did misunderstand you then. Yes, than it should be correct to use regular add there. However, it does make more sense to stick with same approach: either emit vp-intrinsics always out of vectorizer or to have a minimal subset + postprocessing pass after vectorizer. Current vision is to do former. |
Agreed, that seems like a sensible way forwards. And just to echo what @Mel-Chen and @arcbbb are saying, we can do the EVL-recipe conversion as a separate incremental follow up to #115274 (which hopefully in turn can become a follow up to this PR) |
cd020fd
to
4247eaa
Compare
I've reworked this to remove the IV specific initial value and backedge recipes, and instead they're modelled in VPlan during expansion. I had to add two new recipes to model splats and step vectors in VPlan. I needed the former because although it looks like you can broadcast underlying scalar IR values, I couldn't see a way to broadcast a scalar VPValue with no underlying instruction. We could also get rid of the PHI recipe if we relax the native-path restriction on VPWidenPHIRecipe, see #118662 There's some extra diff now due to how this doesn't emit an identity add when building the initial value, I've also opened a separate PR to split this off: #119668 |
For the induction step, there's already code the materializes it (for unrolling), #119284 generalizes it to a VPInstruction opcode. Would that help? |
I think I can reuse that here, that would take care of the casting part. And I guess we would get the benefit of the constant step optimization too. I'll rebase this PR on top of it I think we would need to expand the VPWidenIntOrFpInductionRecipes first in convertToConcreteRecipes so that the WideIVSteps get a chance to be expanded afterwards? |
…hi in VPWidenPHI::execute
…ze/split-VPWidenIntOrFpInductionRecipe
…ze/split-VPWidenIntOrFpInductionRecipe
After updating llvm#118638 on tip of tree, expanding VPWidenIntOrFpInductionRecipes fails because it needs the loop region to get the latch to insert the increment into: VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock(); Builder.setInsertPoint(ExitingBB, ExitingBB->getTerminator()->getIterator()); auto *Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags, WidenIVR->getDebugLoc(), "vec.ind.next"); However after llvm#117506, the region is dissolved so it doesn't work. This shuffles the dissolveLoopRegions steps to be after convertToConcreteRecipes so we can use the region when expanding VPWidenIntOrFpInductionRecipes
…ze/split-VPWidenIntOrFpInductionRecipe
…FCI (#141999) After updating #118638 on tip of tree, expanding VPWidenIntOrFpInductionRecipes fails because it needs the loop region to get the latch to insert the increment into: VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock(); Builder.setInsertPoint(ExitingBB, ExitingBB->getTerminator()->getIterator()); auto *Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags, WidenIVR->getDebugLoc(), "vec.ind.next"); However after #117506, the region is dissolved so it doesn't work. This shuffles the dissolveLoopRegions steps to be after convertToConcreteRecipes so we can use the region when expanding VPWidenIntOrFpInductionRecipes
…ze/split-VPWidenIntOrFpInductionRecipe
I noticed this after updating llvm#118638 on top of llvm#117506, and seeing that some broadcasts were no longer being hoisted into the loop preheader. It was calling VPlan::getVectorPreheader(), which at this point now is dissolved and returns null. This fixes it by getting the header from vputils::getFirstLoopHeader and getting the preheader from its first successor. I've also added an assertion in getVectorPreheader() to make sure its not called after the regions are dissolved.
…FCI (llvm#141999) After updating llvm#118638 on tip of tree, expanding VPWidenIntOrFpInductionRecipes fails because it needs the loop region to get the latch to insert the increment into: VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock(); Builder.setInsertPoint(ExitingBB, ExitingBB->getTerminator()->getIterator()); auto *Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags, WidenIVR->getDebugLoc(), "vec.ind.next"); However after llvm#117506, the region is dissolved so it doesn't work. This shuffles the dissolveLoopRegions steps to be after convertToConcreteRecipes so we can use the region when expanding VPWidenIntOrFpInductionRecipes
…ze/split-VPWidenIntOrFpInductionRecipe
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP8]], i64 0 | ||
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making a note here that if we're removing the hoisting logic in #142594 anyway then these diffs where the broadcast is sunk shouldn't be important. LICM will hoist them out afterwards anyway
…FCI (llvm#141999) After updating llvm#118638 on tip of tree, expanding VPWidenIntOrFpInductionRecipes fails because it needs the loop region to get the latch to insert the increment into: VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock(); Builder.setInsertPoint(ExitingBB, ExitingBB->getTerminator()->getIterator()); auto *Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags, WidenIVR->getDebugLoc(), "vec.ind.next"); However after llvm#117506, the region is dissolved so it doesn't work. This shuffles the dissolveLoopRegions steps to be after convertToConcreteRecipes so we can use the region when expanding VPWidenIntOrFpInductionRecipes
…ze/split-VPWidenIntOrFpInductionRecipe
Ping, all dependent patches should be landed now |
// The value from the original loop to which we are mapping the new induction | ||
// variable. | ||
Instruction *IV = Trunc ? cast<Instruction>(Trunc) : WidenIVR->getPHINode(); | ||
Type *Ty = IV->getType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type *Ty = IV->getType(); | |
Type *Ty = TypeInfo.inferScalarType(WidenIVR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 7cf85b9
|
||
// If the phi is truncated, truncate the start and step values. | ||
VPBuilder Builder(Plan->getVectorPreheader()); | ||
if (isa<TruncInst>(IV)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check the types here instead if casts are needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done in 7cf85b9
; VF4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 | ||
; VF4-NEXT: [[VEC_IND_NEXT]] = add <4 x i8> [[VEC_IND]], splat (i8 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK, thanks for checking!
…ze/split-VPWidenIntOrFpInductionRecipe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few more small comments inline, thanks!
|
||
WidePHI->addOperand(Next); | ||
|
||
WidenIVR->replaceAllUsesWith(WidePHI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added to ToRemove
instead of directly removing I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done in e8d64a4
// FIXME: The newly created binary instructions should contain nsw/nuw | ||
// flags, which can be found from the original scalar operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixme better moved to where Flags
is defined?
br label %for.body | ||
|
||
for.body: ; preds = %for.body, %entry | ||
%iv.ptr = phi ptr [ null, %entry ], [ %iv.ptr.next, %for.body ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably best to start at a concrete pointer, to avoid poison.
; CHECK-NEXT: Successor(s): middle.block, vector.body | ||
entry: | ||
br label %loop | ||
loop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
loop: | |
loop: |
entry: | ||
br label %loop | ||
loop: | ||
%i = phi i64 [0, %entry], [%i.next, %loop] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for consistency
%i = phi i64 [0, %entry], [%i.next, %loop] | |
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ] |
%i.next = add i64 %i, 1 | ||
%done = icmp eq i64 %i.next, %n | ||
br i1 %done, label %exit, label %loop | ||
exit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
exit: | |
exit: |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/20325 Here is the relevant piece of the build log for the reference
|
Following on from llvm#118638, this handles widened induction variables with EVL tail folding by setting the VF operand to be EVL, calculated in the vector body. We need to do this for correctness since with EVL tail folding the number of elements processed in the penultimate iteration may not be VF, but the runtime EVL, and we need to increment induction variables as such. - Because the VF may now not be a live-in we need to move the builder to just after its definition - We also need to avoid truncating it when it's the same size as the step type, previously this wasn't a problem for live-ins. - Also because the VF may be smaller than the IV type, since the EVL is always i32, we may need to zext it. On -march=rva23u64 -O3 we get 87.1% more loops vectorized on TSVC, and 42.8% more loops vectorized on SPEC CPU 2017
The motivation of this PR is to make #115274 easier to implement, and should allow us to add EVL support by just passing EVL to the VF operand.
The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body.
The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example:
gets expanded to:
This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves.
After this #115274 should just become a matter of setting the VF operand to EVL (and building the increment step in the loop body, not the preheader).