-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeGen][Spill2Reg] Initial patch #118832
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-regalloc Author: vporpo (vporpo) ChangesThis is the first commit for the Spill2Reg optimization pass. The goal of this pass is to selectively replace spills to the stack with spills to vector registers. This can help remove back-end stalls in x86. Old code review: https://reviews.llvm.org/D118298 RFC: Full diff: https://github.com/llvm/llvm-project/pull/118832.diff 6 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index d1fac4a304cffe..77d305aa7d0a9c 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -608,6 +608,9 @@ namespace llvm {
/// Lowers KCFI operand bundles for indirect calls.
FunctionPass *createKCFIPass();
+
+ /// This pass replaces spills to stack with spills to registers.
+ extern char &Spill2RegID;
} // End llvm namespace
#endif
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 7b81c9a8e143a3..7467844ec34038 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -321,6 +321,7 @@ void initializeWasmEHPreparePass(PassRegistry &);
void initializeWinEHPreparePass(PassRegistry &);
void initializeWriteBitcodePassPass(PassRegistry &);
void initializeXRayInstrumentationPass(PassRegistry &);
+void initializeSpill2RegPass(PassRegistry &);
} // end namespace llvm
diff --git a/llvm/lib/CodeGen/CMakeLists.txt b/llvm/lib/CodeGen/CMakeLists.txt
index 7b47c0e6f75dbe..8cbd5650fdd10c 100644
--- a/llvm/lib/CodeGen/CMakeLists.txt
+++ b/llvm/lib/CodeGen/CMakeLists.txt
@@ -219,6 +219,7 @@ add_llvm_component_library(LLVMCodeGen
SjLjEHPrepare.cpp
SlotIndexes.cpp
SpillPlacement.cpp
+ Spill2Reg.cpp
SplitKit.cpp
StackColoring.cpp
StackFrameLayoutAnalysisPass.cpp
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index 59428818c1ee7c..2e599451a4b4a2 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -143,4 +143,5 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeWasmEHPreparePass(Registry);
initializeWinEHPreparePass(Registry);
initializeXRayInstrumentationPass(Registry);
+ initializeSpill2RegPass(Registry);
}
diff --git a/llvm/lib/CodeGen/Spill2Reg.cpp b/llvm/lib/CodeGen/Spill2Reg.cpp
new file mode 100644
index 00000000000000..09ffa71b891cb5
--- /dev/null
+++ b/llvm/lib/CodeGen/Spill2Reg.cpp
@@ -0,0 +1,56 @@
+//===- Spill2Reg.cpp - Spill To Register Optimization ---------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+//
+/// \file This file implements Spill2Reg, an optimization which selectively
+/// replaces spills/reloads to/from the stack with register copies to/from the
+/// vector register file. This works even on targets where load/stores have
+/// similar latency to register copies because it can free up memory units which
+/// helps avoid back-end stalls.
+///
+//===----------------------------------------------------------------------===//
+
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Support/CommandLine.h"
+
+using namespace llvm;
+
+namespace {
+
+class Spill2Reg : public MachineFunctionPass {
+public:
+ static char ID;
+ Spill2Reg() : MachineFunctionPass(ID) {
+ initializeSpill2RegPass(*PassRegistry::getPassRegistry());
+ }
+ void getAnalysisUsage(AnalysisUsage &AU) const override;
+ void releaseMemory() override;
+ bool runOnMachineFunction(MachineFunction &) override;
+};
+
+} // namespace
+
+void Spill2Reg::getAnalysisUsage(AnalysisUsage &AU) const {
+ AU.setPreservesCFG();
+ MachineFunctionPass::getAnalysisUsage(AU);
+}
+
+void Spill2Reg::releaseMemory() {}
+
+bool Spill2Reg::runOnMachineFunction(MachineFunction &MFn) {
+ llvm_unreachable("Unimplemented");
+}
+
+char Spill2Reg::ID = 0;
+
+char &llvm::Spill2RegID = Spill2Reg::ID;
+
+INITIALIZE_PASS_BEGIN(Spill2Reg, "spill2reg", "Spill2Reg", false, false)
+INITIALIZE_PASS_END(Spill2Reg, "spill2reg", "Spill2Reg", false, false)
diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp
index d407e9f0871d4c..87ee076db7a9f3 100644
--- a/llvm/lib/CodeGen/TargetPassConfig.cpp
+++ b/llvm/lib/CodeGen/TargetPassConfig.cpp
@@ -214,6 +214,11 @@ static cl::opt<bool> DisableReplaceWithVecLib(
"disable-replace-with-vec-lib", cl::Hidden,
cl::desc("Disable replace with vector math call pass"));
+// Enable the Spill2Reg pass.
+static cl::opt<bool> EnableSpill2Reg("enable-spill2reg", cl::Hidden,
+ cl::init(false),
+ cl::desc("Enable Spill2Reg pass"));
+
/// Option names for limiting the codegen pipeline.
/// Those are used in error reporting and we didn't want
/// to duplicate their names all over the place.
@@ -1415,6 +1420,10 @@ bool TargetPassConfig::addRegAssignAndRewriteOptimized() {
// Finally rewrite virtual registers.
addPass(&VirtRegRewriterID);
+ // Replace spills to stack with spills to registers.
+ if (EnableSpill2Reg)
+ addPass(&Spill2RegID);
+
// Regalloc scoring for ML-driven eviction - noop except when learning a new
// eviction policy.
addPass(createRegAllocScoringPass());
|
Per @nvjle 's request I uploaded the rest of the patches for reference. |
There are some DFS on the CFG and do you have data about the impact to compilation speed? |
llvm/lib/CodeGen/Spill2Reg.cpp
Outdated
MachineBasicBlock *MBB = Reload->getParent(); | ||
bool IsSpillBlock = SpillMBBs.count(MBB); | ||
// Add all MBB's live-outs. | ||
LRU.addLiveOuts(*MBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need below similar "stepBackward" code here as in "GetReloadLRU"?
// Start at the bottom of the BB and walk up until we find `Reload`.
for (MachineInstr &MI : llvm::reverse(*MBB)) {
if (&MI == Reload)
break;
ReloadLRU.stepBackward(MI);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. I think LRU.accumulate()
is the correct one because LRU.stepBackward()
seems to be removing the defined regs from the set. But what we need is to collect all the registers that are used at any point to avoid using them as the target vector register. I added a TODO to check this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that LRU.accumulate() is correct but conservative.
consider below example:
spill rax
...
reload rax
xmm1 = ...
xmm2 = ...
we can actually reuse xmm1/xmm2 for the spill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the spill and reload are in the same block then we walk from the reload to the spill (see line 458 bool FoundSpill = AccumulateLRUUntilSpillFn(Reload, &LRU: ReloadLRU);
so the xmm registers won't be in the set.
I think accumulate()
is the correct function to use because of cases like:
spill rax
xmm1 = ...
... = xmm1
reload rax
In this case if we stepBackward()
from reload rax
to spill rax
then I think the register set at spill rax
won't contain xmm1
. But it's not safe to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use my above example for discussion, the xmm1/xmm2 registers will be in the set by line 392 (i.e. "LRU.addLiveOuts(*MBB);").
The loop from line 394 to 400 is ok and we should use "accumulate()" there.
But we can remove the xmm1/xmm2 registers from the set before the loop (from line 394 to 400) with below code:
// Start at the bottom of the BB and walk up until we find `Reload`.
for (MachineInstr &MI : llvm::reverse(*MBB)) {
if (&MI == Reload)
break;
ReloadLRU.stepBackward(MI);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right:
AccumulateLRUUntilSpillFn()
is called in two cases:
- when we calculate live regs from reload until a spill (after the call to
GetReloadLRU()
), in which case we have already calculated the live-outs at the point of the reload usingGetReloadLRU()
, starting from the live-outs and stepping backwards until the reload. - when we are looking for the spills, in which case we start from the bottom of the BB and walk up until the spill (called inside
AccumulateLRUFn()
). In this case we need to initialize the set with the live-outs at MBB.
So I think the issue is that we call LRU.addLiveOuts()
inside AccumulateLRUUntilSpillFn()
, instead of calling it only in AccumulateLURFn()
just before we call AccumulateLRUUntilSpillFn()
.
I have added some comments to make the code more readable.
llvm/lib/CodeGen/Spill2Reg.cpp
Outdated
SmallVector<MIDataWithLiveIn, 1> Reloads; | ||
|
||
/// \Returns the physical register being spilled. | ||
Register getSpilledReg() const { return Spills.front().MO->getReg(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name is somewhat confusing. One stackslot can receive spills from multiple registers. What we really need is TargetRegisterClass right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. We are only using this to get the register class. I replaced this with getSpilledRegClass()
.
llvm/lib/CodeGen/Spill2Reg.cpp
Outdated
if (const MachineOperand *MO = TII->isStoreToStackSlotMO(MI, StackSlot)) { | ||
MachineInstr *Spill = &MI; | ||
auto &Entry = StackSlotData[StackSlot]; | ||
if (SkipEntry(StackSlot, MO->getReg())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check "Entry.Disable" first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I changed this to: if (Entry.Disable || SkipEntry(StackSlot, Reg: MO->getReg()))
.
} else { | ||
// This should capture uses of the stack in instructions that access | ||
// memory (e.g., folded spills/reloads) and non-memory instructions, | ||
// like x86 LEA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hunch is that most cases may come from memory folding instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this should be the common case at least in x86.
|
||
bool Spill2Reg::run() { | ||
// Walk over each instruction in the code keeping track of the processor's | ||
// port pressure and look for memory unit hot-spots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess "port pressure and look for memory unit hot-spots." is "TODO" work, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is not currently modeled properly. Ideally we should feed the instructions into a pipeline model and check for bottlenecks.
llvm/lib/Target/X86/X86InstrInfo.cpp
Outdated
if (X86::VK16RegClass.contains(Reg)) | ||
return false; | ||
|
||
switch (unsigned Bits = TRI->getRegSizeInBits(Reg, *MRI)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are "double" and "float" legal here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, float or double is not legal because they are already in a vector register. I think Bits
is 128 or more for those, depending on the target.
if (!MBB->isLiveIn(VectorReg)) | ||
MBB->addLiveIn(VectorReg); | ||
} | ||
for (MachineBasicBlock *PredMBB : Reload->getParent()->predecessors()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to do "DFS" if "ReloadData.IsLiveIn" is false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some DFS on the CFG and do you have data about the impact to compilation speed?
It's been a while since I tested this pass, but as far as I remember it wasn't too bad. The traversals done in the code generation phase of the pass are not too frequent as they only happen when we need to spill to vector registers, which shouldn't be too often.
llvm/lib/Target/X86/X86InstrInfo.cpp
Outdated
if (X86::VK16RegClass.contains(Reg)) | ||
return false; | ||
|
||
switch (unsigned Bits = TRI->getRegSizeInBits(Reg, *MRI)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, float or double is not legal because they are already in a vector register. I think Bits
is 128 or more for those, depending on the target.
|
||
bool Spill2Reg::run() { | ||
// Walk over each instruction in the code keeping track of the processor's | ||
// port pressure and look for memory unit hot-spots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is not currently modeled properly. Ideally we should feed the instructions into a pipeline model and check for bottlenecks.
llvm/lib/CodeGen/Spill2Reg.cpp
Outdated
SmallVector<MIDataWithLiveIn, 1> Reloads; | ||
|
||
/// \Returns the physical register being spilled. | ||
Register getSpilledReg() const { return Spills.front().MO->getReg(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. We are only using this to get the register class. I replaced this with getSpilledRegClass()
.
llvm/lib/CodeGen/Spill2Reg.cpp
Outdated
if (const MachineOperand *MO = TII->isStoreToStackSlotMO(MI, StackSlot)) { | ||
MachineInstr *Spill = &MI; | ||
auto &Entry = StackSlotData[StackSlot]; | ||
if (SkipEntry(StackSlot, MO->getReg())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I changed this to: if (Entry.Disable || SkipEntry(StackSlot, Reg: MO->getReg()))
.
} else { | ||
// This should capture uses of the stack in instructions that access | ||
// memory (e.g., folded spills/reloads) and non-memory instructions, | ||
// like x86 LEA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this should be the common case at least in x86.
if (!MBB->isLiveIn(VectorReg)) | ||
MBB->addLiveIn(VectorReg); | ||
} | ||
for (MachineBasicBlock *PredMBB : Reload->getParent()->predecessors()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
llvm/lib/CodeGen/Spill2Reg.cpp
Outdated
MachineBasicBlock *MBB = Reload->getParent(); | ||
bool IsSpillBlock = SpillMBBs.count(MBB); | ||
// Add all MBB's live-outs. | ||
LRU.addLiveOuts(*MBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. I think LRU.accumulate()
is the correct one because LRU.stepBackward()
seems to be removing the defined regs from the set. But what we need is to collect all the registers that are used at any point to avoid using them as the target vector register. I added a TODO to check this later.
llvm/lib/CodeGen/Spill2Reg.cpp
Outdated
|
||
/// \Returns the register class of the register being spilled. | ||
const TargetRegisterClass * | ||
getSpilledRegClass(const TargetInstrInfo *TII, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this must have been the result of a bad rebase, I removed it.
llvm/lib/CodeGen/Spill2Reg.cpp
Outdated
LiveRegUnits LRU(*TRI); | ||
calculateLiveRegs(Entry, LRU); | ||
|
||
// Look for a physical register that in LRU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is not in LRU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
const unsigned MinVecBits = | ||
TRI->getRegSizeInBits(*TRI->getRegClass(X86::VR128RegClassID)); | ||
if (MF->getFrameInfo().getObjectSize(MO.getIndex()) >= MinVecBits) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please give me an instruction example that can return "true" here (i.e., vector-size stack access without any vector register operand)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have such an instruction in mind and I can't recall if I did back when I wrote this. It's probably a conservative check. I added a TODO to check if this is needed.
llvm/lib/CodeGen/Spill2Reg.cpp
Outdated
MachineBasicBlock *MBB = Reload->getParent(); | ||
bool IsSpillBlock = SpillMBBs.count(MBB); | ||
// Add all MBB's live-outs. | ||
LRU.addLiveOuts(*MBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that LRU.accumulate() is correct but conservative.
consider below example:
spill rax
...
reload rax
xmm1 = ...
xmm2 = ...
we can actually reuse xmm1/xmm2 for the spill.
// compilation time. | ||
for (auto &MID : Entry.Reloads) | ||
if (MID.MI->getParent() == &MBB) | ||
MID.IsLiveIn = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need "live-in" for below case?
...
reload stack.0
...
spill stack.0
...
reload stack.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks wrong. I removed this code.
llvm/lib/Target/X86/X86InstrInfo.cpp
Outdated
if (X86::VK16RegClass.contains(Reg)) | ||
return false; | ||
|
||
switch (unsigned Bits = TRI->getRegSizeInBits(Reg, *MRI)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Bits" is unused variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to ensure this patch doesn't get too focussed on just working for gpr->vector spills. In my experience those profitable cases are pretty rare. What has been more useful has been cases such as storing scalar f32/f64 in the upper elements of xmm registers, or even using ymm upper halfs to store xmm vector data, and to a lesser extent storing a i32 in the upper 32-bits of a i64 gpr.
@@ -294,6 +294,11 @@ class TargetInstrInfo : public MCInstrInfo { | |||
return isLoadFromStackSlot(MI, FrameIndex); | |||
} | |||
|
|||
virtual const MachineOperand *isLoadFromStackSlotMO(const MachineInstr &MI, | |||
int &FrameIndex) const { | |||
llvm_unreachable("target did not implement"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just return nullptr by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasoning is that if I was implementing this for a new target I would prefer getting a crash telling me that I should override this function, rather than getting it to silently skip spill2reg because some functions are not overridden. Wdyt?
} | ||
|
||
virtual const TargetRegisterClass * | ||
getVectorRegisterClassForSpill2Reg(const TargetRegisterInfo *TRI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer we don't use Vector in the spill2reg naming convention as I'd like to see this work for more cases than spilling gpr to vector registers - maybe getCandidateRegisterClassForSpill2Reg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, done.
There are a couple more of these, like spill2RegExtractFromVectorReg()
. I am thinking of using the term Host
instead of Vector
to describe the register used by spill2reg, what do you think?
const MachineRegisterInfo *MRI) const { | ||
llvm_unreachable( | ||
"Target didn't implement TargetInstrInfo::isLegalToSpill2Reg!"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we putting all of these in here and not TargetRegisterInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most of them have to do with Instructions rather than registers, with the exception of isLegalToSpill2Reg(), and
getVectorRegisterClassForSpill2Reg()`. I will move these to TargetRegisterInfo.
isStoreToStackSlotMO()
inspects an instructiontargetSupportsSpill2Reg()
could be placed in either file as it does not check an instruction or registerisSpill2RegProfitable()
checks the instruction sequencespill2RegInsertToVectoReg()
emits instructionsspill2RegExtractFromVectorReg()
emits instructions
Yes, we also observed some cases in which spilling float value into GPR can help performance (suppose GPR register pressure is low meanwhile). |
I think that the structure of the pass is already fairly agnostic to the variant of spill2reg (like GPR->lower vector, GPR->upper vector, GPR->upper GPR, F32/64->upper vector). The candidates are filtered by Once we add support for more than one spill2reg variant, then during the collection phase we would need to determine the variant and set it in |
Changed TII function names |
✅ With the latest revision this PR passed the C/C++ code formatter. |
8b212b6
to
d2147f1
Compare
Shall we start focusing on one patch at a time for the code reviews? @williamweixiao @RKSimon any comments on the first patch? |
llvm/lib/CodeGen/Spill2Reg.cpp
Outdated
MachineBasicBlock *MBB = Reload->getParent(); | ||
bool IsSpillBlock = SpillMBBs.count(MBB); | ||
// Add all MBB's live-outs. | ||
LRU.addLiveOuts(*MBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use my above example for discussion, the xmm1/xmm2 registers will be in the set by line 392 (i.e. "LRU.addLiveOuts(*MBB);").
The loop from line 394 to 400 is ok and we should use "accumulate()" there.
But we can remove the xmm1/xmm2 registers from the set before the loop (from line 394 to 400) with below code:
// Start at the bottom of the BB and walk up until we find `Reload`.
for (MachineInstr &MI : llvm::reverse(*MBB)) {
if (&MI == Reload)
break;
ReloadLRU.stepBackward(MI);
}
/// already walking through the code there. Otherwise we would need to | ||
/// walk throught the code again in `updateLiveIns()` just to check for | ||
/// other spills in the block, which would waste compilation time. | ||
bool IsLiveIn = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that "IsLiveIn" is always "true". Do we really need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better solution is to set the IsLiveIn
flag whenever we visit a reload in collectSpillsAndReloads()
. I have updated the code.
This is the first commit for the Spill2Reg optimization pass. The goal of this pass is to selectively replace spills to the stack with spills to other registers. This can help remove back-end stalls in x86. Old code review: https://reviews.llvm.org/D118298 RFC: https://lists.llvm.org/pipermail/llvm-dev/2022-January/154782.html https://discourse.llvm.org/t/rfc-spill2reg-selectively-replace-spills-to-stack-with-spills-to-vector-registers/59630
Walk through the code looking for spills and reloads and group them per stack slot. Original review: https://reviews.llvm.org/D118299
This patch adds the main structure of the code generation phase of Spill2Reg. Iterate through the spills/reloads collected earlier and generate the new instructions. Original review: https://reviews.llvm.org/D118300
Spill2Reg can now emit spill and reload instructions. This will not generate correct code, as it does not keep track of live regs. Original review: https://reviews.llvm.org/D118302
This patch implements tracking of live registers. This is used to look for free vector registers. It works by walking up the CFG from the reloads all the way to the spills, accumulating the register units being used. This implementation caches the live register units used by each MBB for faster compilation time. Note: Live register tracking relies on MBB liveins/outs being maintained correctly, which is implemented in a follow-up patch. So this patch will still not generate correct code for all but some simple cases. Original review: https://reviews.llvm.org/D118303
This patch implements updates for the MBB live-ins due to the newly introduced instructions emitted by spill2reg. This is required for correct tracking of live register usage. Original review: https://reviews.llvm.org/D118304
This patch adds support for 8/16 bit values in x86. Original review: https://reviews.llvm.org/D118305
This patch updates the vector spill/reload instructions to use the AVX opcodes by default if the targets supports it. This can be turned off with the -spill2reg-no-avx flag. Original review: https://reviews.llvm.org/D118951
Thank you for the comments @williamweixiao . Do you think we should start focusing on the individual patches of this PR one by one? There are 8 patches in the chain at this point which makes it hard to review and to maintain. |
Have you tried graphite to maintain the patch series? |
Thanks for the suggestion, I will give it a try. My main concern is testing: When the patch series is off-tree then we can only do so much testing on each change we are making, which can lead to regressions that get unnoticed. Given that there is enough interest in this project, my suggestion is to start working on this in-tree. |
This is the first commit for the Spill2Reg optimization pass. The goal of this pass is to selectively replace spills to the stack with spills to vector registers. This can help remove back-end stalls in x86.
Old code review: https://reviews.llvm.org/D118298
RFC:
https://lists.llvm.org/pipermail/llvm-dev/2022-January/154782.html https://discourse.llvm.org/t/rfc-spill2reg-selectively-replace-spills-to-stack-with-spills-to-vector-registers/59630