[CodeGen][Spill2Reg] Initial patch #118832

vporpo · 2024-12-05T17:14:14Z

This is the first commit for the Spill2Reg optimization pass. The goal of this pass is to selectively replace spills to the stack with spills to vector registers. This can help remove back-end stalls in x86.

Old code review: https://reviews.llvm.org/D118298

RFC:
https://lists.llvm.org/pipermail/llvm-dev/2022-January/154782.html https://discourse.llvm.org/t/rfc-spill2reg-selectively-replace-spills-to-stack-with-spills-to-vector-registers/59630

llvmbot · 2024-12-12T00:23:32Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-regalloc

Author: vporpo (vporpo)

Changes

This is the first commit for the Spill2Reg optimization pass. The goal of this pass is to selectively replace spills to the stack with spills to vector registers. This can help remove back-end stalls in x86.

Old code review: https://reviews.llvm.org/D118298

RFC:
https://lists.llvm.org/pipermail/llvm-dev/2022-January/154782.html https://discourse.llvm.org/t/rfc-spill2reg-selectively-replace-spills-to-stack-with-spills-to-vector-registers/59630

Full diff: https://github.com/llvm/llvm-project/pull/118832.diff

6 Files Affected:

(modified) llvm/include/llvm/CodeGen/Passes.h (+3)
(modified) llvm/include/llvm/InitializePasses.h (+1)
(modified) llvm/lib/CodeGen/CMakeLists.txt (+1)
(modified) llvm/lib/CodeGen/CodeGen.cpp (+1)
(added) llvm/lib/CodeGen/Spill2Reg.cpp (+56)
(modified) llvm/lib/CodeGen/TargetPassConfig.cpp (+9)

diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index d1fac4a304cffe..77d305aa7d0a9c 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -608,6 +608,9 @@ namespace llvm {
 
   /// Lowers KCFI operand bundles for indirect calls.
   FunctionPass *createKCFIPass();
+
+  /// This pass replaces spills to stack with spills to registers.
+  extern char &Spill2RegID;
 } // End llvm namespace
 
 #endif
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 7b81c9a8e143a3..7467844ec34038 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -321,6 +321,7 @@ void initializeWasmEHPreparePass(PassRegistry &);
 void initializeWinEHPreparePass(PassRegistry &);
 void initializeWriteBitcodePassPass(PassRegistry &);
 void initializeXRayInstrumentationPass(PassRegistry &);
+void initializeSpill2RegPass(PassRegistry &);
 
 } // end namespace llvm
 
diff --git a/llvm/lib/CodeGen/CMakeLists.txt b/llvm/lib/CodeGen/CMakeLists.txt
index 7b47c0e6f75dbe..8cbd5650fdd10c 100644
--- a/llvm/lib/CodeGen/CMakeLists.txt
+++ b/llvm/lib/CodeGen/CMakeLists.txt
@@ -219,6 +219,7 @@ add_llvm_component_library(LLVMCodeGen
   SjLjEHPrepare.cpp
   SlotIndexes.cpp
   SpillPlacement.cpp
+  Spill2Reg.cpp
   SplitKit.cpp
   StackColoring.cpp
   StackFrameLayoutAnalysisPass.cpp
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index 59428818c1ee7c..2e599451a4b4a2 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -143,4 +143,5 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
   initializeWasmEHPreparePass(Registry);
   initializeWinEHPreparePass(Registry);
   initializeXRayInstrumentationPass(Registry);
+  initializeSpill2RegPass(Registry);
 }
diff --git a/llvm/lib/CodeGen/Spill2Reg.cpp b/llvm/lib/CodeGen/Spill2Reg.cpp
new file mode 100644
index 00000000000000..09ffa71b891cb5
--- /dev/null
+++ b/llvm/lib/CodeGen/Spill2Reg.cpp
@@ -0,0 +1,56 @@
+//===- Spill2Reg.cpp - Spill To Register Optimization ---------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+//
+/// \file This file implements Spill2Reg, an optimization which selectively
+/// replaces spills/reloads to/from the stack with register copies to/from the
+/// vector register file. This works even on targets where load/stores have
+/// similar latency to register copies because it can free up memory units which
+/// helps avoid back-end stalls.
+///
+//===----------------------------------------------------------------------===//
+
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Support/CommandLine.h"
+
+using namespace llvm;
+
+namespace {
+
+class Spill2Reg : public MachineFunctionPass {
+public:
+  static char ID;
+  Spill2Reg() : MachineFunctionPass(ID) {
+    initializeSpill2RegPass(*PassRegistry::getPassRegistry());
+  }
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+  void releaseMemory() override;
+  bool runOnMachineFunction(MachineFunction &) override;
+};
+
+} // namespace
+
+void Spill2Reg::getAnalysisUsage(AnalysisUsage &AU) const {
+  AU.setPreservesCFG();
+  MachineFunctionPass::getAnalysisUsage(AU);
+}
+
+void Spill2Reg::releaseMemory() {}
+
+bool Spill2Reg::runOnMachineFunction(MachineFunction &MFn) {
+  llvm_unreachable("Unimplemented");
+}
+
+char Spill2Reg::ID = 0;
+
+char &llvm::Spill2RegID = Spill2Reg::ID;
+
+INITIALIZE_PASS_BEGIN(Spill2Reg, "spill2reg", "Spill2Reg", false, false)
+INITIALIZE_PASS_END(Spill2Reg, "spill2reg", "Spill2Reg", false, false)
diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp
index d407e9f0871d4c..87ee076db7a9f3 100644
--- a/llvm/lib/CodeGen/TargetPassConfig.cpp
+++ b/llvm/lib/CodeGen/TargetPassConfig.cpp
@@ -214,6 +214,11 @@ static cl::opt<bool> DisableReplaceWithVecLib(
     "disable-replace-with-vec-lib", cl::Hidden,
     cl::desc("Disable replace with vector math call pass"));
 
+// Enable the Spill2Reg pass.
+static cl::opt<bool> EnableSpill2Reg("enable-spill2reg", cl::Hidden,
+                                     cl::init(false),
+                                     cl::desc("Enable Spill2Reg pass"));
+
 /// Option names for limiting the codegen pipeline.
 /// Those are used in error reporting and we didn't want
 /// to duplicate their names all over the place.
@@ -1415,6 +1420,10 @@ bool TargetPassConfig::addRegAssignAndRewriteOptimized() {
   // Finally rewrite virtual registers.
   addPass(&VirtRegRewriterID);
 
+  // Replace spills to stack with spills to registers.
+  if (EnableSpill2Reg)
+    addPass(&Spill2RegID);
+
   // Regalloc scoring for ML-driven eviction - noop except when learning a new
   // eviction policy.
   addPass(createRegAllocScoringPass());

vporpo · 2024-12-20T13:15:35Z

Per @nvjle 's request I uploaded the rest of the patches for reference.

williamweixiao · 2024-12-25T12:49:41Z

There are some DFS on the CFG and do you have data about the impact to compilation speed?

williamweixiao · 2024-12-24T14:22:55Z

llvm/lib/CodeGen/Spill2Reg.cpp

+    MachineBasicBlock *MBB = Reload->getParent();
+    bool IsSpillBlock = SpillMBBs.count(MBB);
+    // Add all MBB's live-outs.
+    LRU.addLiveOuts(*MBB);


Do we need below similar "stepBackward" code here as in "GetReloadLRU"?

// Start at the bottom of the BB and walk up until we find `Reload`. for (MachineInstr &MI : llvm::reverse(*MBB)) { if (&MI == Reload) break; ReloadLRU.stepBackward(MI); }

That's a good question. I think LRU.accumulate() is the correct one because LRU.stepBackward() seems to be removing the defined regs from the set. But what we need is to collect all the registers that are used at any point to avoid using them as the target vector register. I added a TODO to check this later.

I agree that LRU.accumulate() is correct but conservative.
consider below example:

spill rax ... reload rax xmm1 = ... xmm2 = ...

we can actually reuse xmm1/xmm2 for the spill.

If the spill and reload are in the same block then we walk from the reload to the spill (see line 458 bool FoundSpill = AccumulateLRUUntilSpillFn(Reload, &LRU: ReloadLRU); so the xmm registers won't be in the set.

I think accumulate() is the correct function to use because of cases like:

spill rax xmm1 = ... ... = xmm1 reload rax

In this case if we stepBackward() from reload rax to spill rax then I think the register set at spill rax won't contain xmm1. But it's not safe to use.

Use my above example for discussion, the xmm1/xmm2 registers will be in the set by line 392 (i.e. "LRU.addLiveOuts(*MBB);").
The loop from line 394 to 400 is ok and we should use "accumulate()" there.
But we can remove the xmm1/xmm2 registers from the set before the loop (from line 394 to 400) with below code:

// Start at the bottom of the BB and walk up until we find `Reload`. for (MachineInstr &MI : llvm::reverse(*MBB)) { if (&MI == Reload) break; ReloadLRU.stepBackward(MI); }

I think you are right:

AccumulateLRUUntilSpillFn() is called in two cases:

when we calculate live regs from reload until a spill (after the call to GetReloadLRU()), in which case we have already calculated the live-outs at the point of the reload using GetReloadLRU(), starting from the live-outs and stepping backwards until the reload.

when we are looking for the spills, in which case we start from the bottom of the BB and walk up until the spill (called inside AccumulateLRUFn()). In this case we need to initialize the set with the live-outs at MBB.

So I think the issue is that we call LRU.addLiveOuts() inside AccumulateLRUUntilSpillFn(), instead of calling it only in AccumulateLURFn() just before we call AccumulateLRUUntilSpillFn().

I have added some comments to make the code more readable.

williamweixiao · 2024-12-25T01:19:22Z

llvm/lib/CodeGen/Spill2Reg.cpp

+    SmallVector<MIDataWithLiveIn, 1> Reloads;
+
+    /// \Returns the physical register being spilled.
+    Register getSpilledReg() const { return Spills.front().MO->getReg(); }


The function name is somewhat confusing. One stackslot can receive spills from multiple registers. What we really need is TargetRegisterClass right?

I think you are right. We are only using this to get the register class. I replaced this with getSpilledRegClass().

williamweixiao · 2024-12-25T01:39:21Z

llvm/lib/CodeGen/Spill2Reg.cpp

+      if (const MachineOperand *MO = TII->isStoreToStackSlotMO(MI, StackSlot)) {
+        MachineInstr *Spill = &MI;
+        auto &Entry = StackSlotData[StackSlot];
+        if (SkipEntry(StackSlot, MO->getReg())) {


check "Entry.Disable" first?

Good point, I changed this to: if (Entry.Disable || SkipEntry(StackSlot, Reg: MO->getReg())).

williamweixiao · 2024-12-25T01:42:10Z

llvm/lib/CodeGen/Spill2Reg.cpp

+      } else {
+        // This should capture uses of the stack in instructions that access
+        // memory (e.g., folded spills/reloads) and non-memory instructions,
+        // like x86 LEA.


My hunch is that most cases may come from memory folding instructions.

Yeah, this should be the common case at least in x86.

williamweixiao · 2024-12-25T02:39:13Z

llvm/lib/CodeGen/Spill2Reg.cpp

+
+bool Spill2Reg::run() {
+  // Walk over each instruction in the code keeping track of the processor's
+  // port pressure and look for memory unit hot-spots.


I guess "port pressure and look for memory unit hot-spots." is "TODO" work, right?

Yeah, this is not currently modeled properly. Ideally we should feed the instructions into a pipeline model and check for bottlenecks.

williamweixiao · 2024-12-25T03:01:53Z

llvm/lib/Target/X86/X86InstrInfo.cpp

+  if (X86::VK16RegClass.contains(Reg))
+    return false;
+
+  switch (unsigned Bits = TRI->getRegSizeInBits(Reg, *MRI)) {


Are "double" and "float" legal here?

No, float or double is not legal because they are already in a vector register. I think Bits is 128 or more for those, depending on the target.

williamweixiao · 2024-12-25T05:18:33Z

llvm/lib/CodeGen/Spill2Reg.cpp

+      if (!MBB->isLiveIn(VectorReg))
+        MBB->addLiveIn(VectorReg);
+    }
+    for (MachineBasicBlock *PredMBB : Reload->getParent()->predecessors())


We don't need to do "DFS" if "ReloadData.IsLiveIn" is false.

Good catch, fixed.

vporpo

There are some DFS on the CFG and do you have data about the impact to compilation speed?

It's been a while since I tested this pass, but as far as I remember it wasn't too bad. The traversals done in the code generation phase of the pass are not too frequent as they only happen when we need to spill to vector registers, which shouldn't be too often.

vporpo · 2024-12-25T13:50:06Z

llvm/lib/Target/X86/X86InstrInfo.cpp

+  if (X86::VK16RegClass.contains(Reg))
+    return false;
+
+  switch (unsigned Bits = TRI->getRegSizeInBits(Reg, *MRI)) {


No, float or double is not legal because they are already in a vector register. I think Bits is 128 or more for those, depending on the target.

vporpo · 2024-12-25T13:52:32Z

llvm/lib/CodeGen/Spill2Reg.cpp

+
+bool Spill2Reg::run() {
+  // Walk over each instruction in the code keeping track of the processor's
+  // port pressure and look for memory unit hot-spots.


Yeah, this is not currently modeled properly. Ideally we should feed the instructions into a pipeline model and check for bottlenecks.

vporpo · 2024-12-25T14:31:31Z

llvm/lib/CodeGen/Spill2Reg.cpp

+    SmallVector<MIDataWithLiveIn, 1> Reloads;
+
+    /// \Returns the physical register being spilled.
+    Register getSpilledReg() const { return Spills.front().MO->getReg(); }


I think you are right. We are only using this to get the register class. I replaced this with getSpilledRegClass().

vporpo · 2024-12-25T14:43:08Z

llvm/lib/CodeGen/Spill2Reg.cpp

+      if (const MachineOperand *MO = TII->isStoreToStackSlotMO(MI, StackSlot)) {
+        MachineInstr *Spill = &MI;
+        auto &Entry = StackSlotData[StackSlot];
+        if (SkipEntry(StackSlot, MO->getReg())) {


Good point, I changed this to: if (Entry.Disable || SkipEntry(StackSlot, Reg: MO->getReg())).

vporpo · 2024-12-25T14:44:10Z

llvm/lib/CodeGen/Spill2Reg.cpp

+      } else {
+        // This should capture uses of the stack in instructions that access
+        // memory (e.g., folded spills/reloads) and non-memory instructions,
+        // like x86 LEA.


Yeah, this should be the common case at least in x86.

vporpo · 2024-12-25T15:16:22Z

llvm/lib/CodeGen/Spill2Reg.cpp

+      if (!MBB->isLiveIn(VectorReg))
+        MBB->addLiveIn(VectorReg);
+    }
+    for (MachineBasicBlock *PredMBB : Reload->getParent()->predecessors())


Good catch, fixed.

vporpo · 2024-12-25T15:47:43Z

llvm/lib/CodeGen/Spill2Reg.cpp

+    MachineBasicBlock *MBB = Reload->getParent();
+    bool IsSpillBlock = SpillMBBs.count(MBB);
+    // Add all MBB's live-outs.
+    LRU.addLiveOuts(*MBB);


That's a good question. I think LRU.accumulate() is the correct one because LRU.stepBackward() seems to be removing the defined regs from the set. But what we need is to collect all the registers that are used at any point to avoid using them as the target vector register. I added a TODO to check this later.

williamweixiao · 2024-12-30T08:18:31Z

llvm/lib/CodeGen/Spill2Reg.cpp

+
+    /// \Returns the register class of the register being spilled.
+    const TargetRegisterClass *
+    getSpilledRegClass(const TargetInstrInfo *TII,


Yeah, this must have been the result of a bad rebase, I removed it.

williamweixiao · 2024-12-31T02:24:05Z

llvm/lib/CodeGen/Spill2Reg.cpp

+    LiveRegUnits LRU(*TRI);
+    calculateLiveRegs(Entry, LRU);
+
+    // Look for a physical register that in LRU.


that is not in LRU?

williamweixiao · 2024-12-31T02:37:06Z

llvm/lib/Target/X86/X86InstrInfo.cpp

+      const unsigned MinVecBits =
+          TRI->getRegSizeInBits(*TRI->getRegClass(X86::VR128RegClassID));
+      if (MF->getFrameInfo().getObjectSize(MO.getIndex()) >= MinVecBits)
+        return true;


could you please give me an instruction example that can return "true" here (i.e., vector-size stack access without any vector register operand)?

I don't have such an instruction in mind and I can't recall if I did back when I wrote this. It's probably a conservative check. I added a TODO to check if this is needed.

williamweixiao · 2024-12-31T02:51:09Z

llvm/lib/CodeGen/Spill2Reg.cpp

+    MachineBasicBlock *MBB = Reload->getParent();
+    bool IsSpillBlock = SpillMBBs.count(MBB);
+    // Add all MBB's live-outs.
+    LRU.addLiveOuts(*MBB);


I agree that LRU.accumulate() is correct but conservative.
consider below example:

spill rax ... reload rax xmm1 = ... xmm2 = ...

we can actually reuse xmm1/xmm2 for the spill.

williamweixiao · 2024-12-31T03:09:58Z

llvm/lib/CodeGen/Spill2Reg.cpp

+        // compilation time.
+        for (auto &MID : Entry.Reloads)
+          if (MID.MI->getParent() == &MBB)
+            MID.IsLiveIn = false;


do we need "live-in" for below case?

... reload stack.0 ... spill stack.0 ... reload stack.0

Yes, this looks wrong. I removed this code.

williamweixiao · 2025-01-01T03:57:39Z

llvm/lib/Target/X86/X86InstrInfo.cpp

+  if (X86::VK16RegClass.contains(Reg))
+    return false;
+
+  switch (unsigned Bits = TRI->getRegSizeInBits(Reg, *MRI)) {


"Bits" is unused variable.

RKSimon

I'd like to ensure this patch doesn't get too focussed on just working for gpr->vector spills. In my experience those profitable cases are pretty rare. What has been more useful has been cases such as storing scalar f32/f64 in the upper elements of xmm registers, or even using ymm upper halfs to store xmm vector data, and to a lesser extent storing a i32 in the upper 32-bits of a i64 gpr.

RKSimon · 2025-01-02T11:59:50Z

llvm/include/llvm/CodeGen/TargetInstrInfo.h

@@ -294,6 +294,11 @@ class TargetInstrInfo : public MCInstrInfo {
    return isLoadFromStackSlot(MI, FrameIndex);
  }

+  virtual const MachineOperand *isLoadFromStackSlotMO(const MachineInstr &MI,
+                                                      int &FrameIndex) const {
+    llvm_unreachable("target did not implement");


Why not just return nullptr by default?

The reasoning is that if I was implementing this for a new target I would prefer getting a crash telling me that I should override this function, rather than getting it to silently skip spill2reg because some functions are not overridden. Wdyt?

RKSimon · 2025-01-02T12:01:01Z

llvm/include/llvm/CodeGen/TargetInstrInfo.h

+  }
+
+  virtual const TargetRegisterClass *
+  getVectorRegisterClassForSpill2Reg(const TargetRegisterInfo *TRI,


I'd prefer we don't use Vector in the spill2reg naming convention as I'd like to see this work for more cases than spilling gpr to vector registers - maybe getCandidateRegisterClassForSpill2Reg?

Makes sense, done.

There are a couple more of these, like spill2RegExtractFromVectorReg(). I am thinking of using the term Host instead of Vector to describe the register used by spill2reg, what do you think?

RKSimon · 2025-01-02T12:02:34Z

llvm/include/llvm/CodeGen/TargetInstrInfo.h

+                                  const MachineRegisterInfo *MRI) const {
+    llvm_unreachable(
+        "Target didn't implement TargetInstrInfo::isLegalToSpill2Reg!");
+  }


Why are we putting all of these in here and not TargetRegisterInfo?

I think most of them have to do with Instructions rather than registers, with the exception of isLegalToSpill2Reg(), and getVectorRegisterClassForSpill2Reg()`. I will move these to TargetRegisterInfo.

isStoreToStackSlotMO() inspects an instruction

targetSupportsSpill2Reg() could be placed in either file as it does not check an instruction or register

isSpill2RegProfitable() checks the instruction sequence

spill2RegInsertToVectoReg() emits instructions

spill2RegExtractFromVectorReg() emits instructions

williamweixiao · 2025-01-02T14:31:27Z

I'd like to ensure this patch doesn't get too focussed on just working for gpr->vector spills. In my experience those profitable cases are pretty rare. What has been more useful has been cases such as storing scalar f32/f64 in the upper elements of xmm registers, or even using ymm upper halfs to store xmm vector data, and to a lesser extent storing a i32 in the upper 32-bits of a i64 gpr.

Yes, we also observed some cases in which spilling float value into GPR can help performance (suppose GPR register pressure is low meanwhile).

vporpo · 2025-01-09T19:12:16Z

I'd like to ensure this patch doesn't get too focussed on just working for gpr->vector spills. In my experience those profitable cases are pretty rare. What has been more useful has been cases such as storing scalar f32/f64 in the upper elements of xmm registers, or even using ymm upper halfs to store xmm vector data, and to a lesser extent storing a i32 in the upper 32-bits of a i64 gpr.

Yes, we also observed some cases in which spilling float value into GPR can help performance (suppose GPR register pressure is low meanwhile).

I think that the structure of the pass is already fairly agnostic to the variant of spill2reg (like GPR->lower vector, GPR->upper vector, GPR->upper GPR, F32/64->upper vector). The candidates are filtered by TRI callbacks, like isLegalToSpillToReg() and code generation is done with TII callbacks: spill2RegInsertToVectorReg() and spill2RegExtractFromVectoReg(). All of these can be update to work with different spill2reg variants.

Once we add support for more than one spill2reg variant, then during the collection phase we would need to determine the variant and set it in StackSlotDataEntry using an enum. This is going to be used later in the pass (in generateCode()) to generate the corresponding code.

vporpo · 2025-01-09T22:44:20Z

Changed TII function names spill2RegInsertToVectorReg() to spill2RegInsertToS2RReg() and spill2RegExtractFromVectorReg() to spill2RegExtractFromS2RReg(). Rebased.

github-actions · 2025-01-09T22:46:05Z

✅ With the latest revision this PR passed the C/C++ code formatter.

vporpo · 2025-01-14T19:16:03Z

Shall we start focusing on one patch at a time for the code reviews? @williamweixiao @RKSimon any comments on the first patch?

williamweixiao · 2025-01-17T13:50:09Z

llvm/lib/CodeGen/Spill2Reg.cpp

+    MachineBasicBlock *MBB = Reload->getParent();
+    bool IsSpillBlock = SpillMBBs.count(MBB);
+    // Add all MBB's live-outs.
+    LRU.addLiveOuts(*MBB);


Use my above example for discussion, the xmm1/xmm2 registers will be in the set by line 392 (i.e. "LRU.addLiveOuts(*MBB);").
The loop from line 394 to 400 is ok and we should use "accumulate()" there.
But we can remove the xmm1/xmm2 registers from the set before the loop (from line 394 to 400) with below code:

// Start at the bottom of the BB and walk up until we find `Reload`. for (MachineInstr &MI : llvm::reverse(*MBB)) { if (&MI == Reload) break; ReloadLRU.stepBackward(MI); }

williamweixiao · 2025-01-17T14:48:07Z

llvm/lib/CodeGen/Spill2Reg.cpp

+      /// already walking through the code there. Otherwise we would need to
+      /// walk throught the code again in `updateLiveIns()` just to check for
+      /// other spills in the block, which would waste compilation time.
+      bool IsLiveIn = true;


It seems that "IsLiveIn" is always "true". Do we really need it?

I think a better solution is to set the IsLiveIn flag whenever we visit a reload in collectSpillsAndReloads(). I have updated the code.

This is the first commit for the Spill2Reg optimization pass. The goal of this pass is to selectively replace spills to the stack with spills to other registers. This can help remove back-end stalls in x86. Old code review: https://reviews.llvm.org/D118298 RFC: https://lists.llvm.org/pipermail/llvm-dev/2022-January/154782.html https://discourse.llvm.org/t/rfc-spill2reg-selectively-replace-spills-to-stack-with-spills-to-vector-registers/59630

Walk through the code looking for spills and reloads and group them per stack slot. Original review: https://reviews.llvm.org/D118299

This patch adds the main structure of the code generation phase of Spill2Reg. Iterate through the spills/reloads collected earlier and generate the new instructions. Original review: https://reviews.llvm.org/D118300

Spill2Reg can now emit spill and reload instructions. This will not generate correct code, as it does not keep track of live regs. Original review: https://reviews.llvm.org/D118302

This patch implements tracking of live registers. This is used to look for free vector registers. It works by walking up the CFG from the reloads all the way to the spills, accumulating the register units being used. This implementation caches the live register units used by each MBB for faster compilation time. Note: Live register tracking relies on MBB liveins/outs being maintained correctly, which is implemented in a follow-up patch. So this patch will still not generate correct code for all but some simple cases. Original review: https://reviews.llvm.org/D118303

This patch implements updates for the MBB live-ins due to the newly introduced instructions emitted by spill2reg. This is required for correct tracking of live register usage. Original review: https://reviews.llvm.org/D118304

This patch adds support for 8/16 bit values in x86. Original review: https://reviews.llvm.org/D118305

This patch updates the vector spill/reload instructions to use the AVX opcodes by default if the targets supports it. This can be turned off with the -spill2reg-no-avx flag. Original review: https://reviews.llvm.org/D118951

vporpo · 2025-01-17T19:43:52Z

Thank you for the comments @williamweixiao . Do you think we should start focusing on the individual patches of this PR one by one? There are 8 patches in the chain at this point which makes it hard to review and to maintain.

RKSimon · 2025-01-21T15:06:54Z

Have you tried graphite to maintain the patch series?

vporpo · 2025-01-21T16:50:58Z

Have you tried graphite to maintain the patch series?

Thanks for the suggestion, I will give it a try. My main concern is testing: When the patch series is off-tree then we can only do so much testing on each change we are making, which can lead to regressions that get unnoticed. Given that there is enough interest in this project, my suggestion is to start working on this in-tree.

arsenm added the llvm:regalloc label Dec 12, 2024

vporpo force-pushed the Spill2Reg branch from cb93d27 to e3eb142 Compare December 20, 2024 13:09

llvmbot added backend:X86 llvm:transforms labels Dec 20, 2024

vporpo requested a review from nvjle December 20, 2024 13:10

williamweixiao self-requested a review December 22, 2024 00:21

williamweixiao reviewed Dec 25, 2024

View reviewed changes

vporpo force-pushed the Spill2Reg branch from e3eb142 to 04a5281 Compare December 25, 2024 15:59

vporpo commented Dec 25, 2024

View reviewed changes

williamweixiao reviewed Jan 1, 2025

View reviewed changes

RKSimon requested review from RKSimon and topperc January 2, 2025 11:56

RKSimon reviewed Jan 2, 2025

View reviewed changes

vporpo force-pushed the Spill2Reg branch from 04a5281 to 8358208 Compare January 2, 2025 18:10

vporpo force-pushed the Spill2Reg branch from 8358208 to 2fc21c0 Compare January 9, 2025 22:42

vporpo force-pushed the Spill2Reg branch 3 times, most recently from 8b212b6 to d2147f1 Compare January 14, 2025 19:15

williamweixiao reviewed Jan 17, 2025

View reviewed changes

vporpo added 3 commits January 17, 2025 09:03

[Spill2Reg] This patch adds spill/reload collection

1030b71

Walk through the code looking for spills and reloads and group them per stack slot. Original review: https://reviews.llvm.org/D118299

[Spill2Reg] Code generation part 1

246c59a

This patch adds the main structure of the code generation phase of Spill2Reg. Iterate through the spills/reloads collected earlier and generate the new instructions. Original review: https://reviews.llvm.org/D118300

vporpo added 5 commits January 17, 2025 09:03

[Spill2Reg] Code generation part 2

9a480de

Spill2Reg can now emit spill and reload instructions. This will not generate correct code, as it does not keep track of live regs. Original review: https://reviews.llvm.org/D118302

[Spill2Reg] MBB live-ins are now being updated

b2910a5

This patch implements updates for the MBB live-ins due to the newly introduced instructions emitted by spill2reg. This is required for correct tracking of live register usage. Original review: https://reviews.llvm.org/D118304

[Spill2Reg] Adds code generation for 8/16bit spill/reloads in x86

b58e0a1

This patch adds support for 8/16 bit values in x86. Original review: https://reviews.llvm.org/D118305

[Spill2Reg] Use AVX opcodes when available

eb7d0cc

This patch updates the vector spill/reload instructions to use the AVX opcodes by default if the targets supports it. This can be turned off with the -spill2reg-no-avx flag. Original review: https://reviews.llvm.org/D118951

vporpo force-pushed the Spill2Reg branch from d2147f1 to eb7d0cc Compare January 17, 2025 19:36

[CodeGen][Spill2Reg] Initial patch #118832

Are you sure you want to change the base?

[CodeGen][Spill2Reg] Initial patch #118832

Uh oh!

Conversation

vporpo commented Dec 5, 2024

Uh oh!

llvmbot commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vporpo commented Dec 20, 2024

Uh oh!

williamweixiao commented Dec 25, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vporpo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

llvmbot commented Dec 12, 2024 •

edited

Loading

RKSimon left a comment •

edited

Loading