-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[RISCV] Support RVV register overlapping constraints #145004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In this PR, it models the RVV register overlapping constraints by:
As a result, it implements the RVV operand constraints and reduces the register pressure. There are some issues with this approach. First, it breaks the early-clobber flag only at the RA stage, but other passes could be based on the early-clobber attribute to perform optimizations (for example: MachineCopyPropagation). Second, it causes several workarounds across RegAllocGreedy, LiveRegMatrix, and RegAllocEvictionAdvisor. I post the PR for two reason.
|
You can test this locally with the following command:git-clang-format --diff HEAD~1 HEAD --extensions cpp,h -- llvm/include/llvm/CodeGen/LiveRegMatrix.h llvm/include/llvm/CodeGen/TargetRegisterInfo.h llvm/lib/CodeGen/LiveRegMatrix.cpp llvm/lib/CodeGen/MachineCopyPropagation.cpp llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp llvm/lib/CodeGen/RegAllocGreedy.cpp llvm/lib/Target/RISCV/RISCVInstrInfo.cpp llvm/lib/Target/RISCV/RISCVInstrInfo.h llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp llvm/lib/Target/RISCV/RISCVRegisterInfo.h View the diff from clang-format here.diff --git a/llvm/lib/CodeGen/LiveRegMatrix.cpp b/llvm/lib/CodeGen/LiveRegMatrix.cpp
index 64e1695e9..6c2336bae 100644
--- a/llvm/lib/CodeGen/LiveRegMatrix.cpp
+++ b/llvm/lib/CodeGen/LiveRegMatrix.cpp
@@ -259,21 +259,19 @@ LiveRegMatrix::checkInterference(const LiveInterval &VirtReg,
return IK_VirtReg;
// Check the matrix for virtual register interference.
- bool Interference = foreachUnit(TRI, VirtReg, PhysReg,
- [&](MCRegUnit Unit, const LiveRange &LR) {
- LiveRange NewLR;
- if (TRI->enableTargetInterference() &&
- TRI->needUpdateECSlot(
- LR, NewLR = copyLiveRange(LR), *LIS)) {
- // Update LiveRange could make cache
- // information stable. Refresh cache to
- // handle it.
- invalidateVirtRegs();
- return query(NewLR, Unit)
- .checkInterference();
- }
- return query(LR, Unit).checkInterference();
- });
+ bool Interference = foreachUnit(
+ TRI, VirtReg, PhysReg, [&](MCRegUnit Unit, const LiveRange &LR) {
+ LiveRange NewLR;
+ if (TRI->enableTargetInterference() &&
+ TRI->needUpdateECSlot(LR, NewLR = copyLiveRange(LR), *LIS)) {
+ // Update LiveRange could make cache
+ // information stable. Refresh cache to
+ // handle it.
+ invalidateVirtRegs();
+ return query(NewLR, Unit).checkInterference();
+ }
+ return query(LR, Unit).checkInterference();
+ });
if (Interference)
return IK_VirtReg;
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index 8cf119929..138fdc861 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -921,7 +921,8 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
// later.
if (MO.isTied())
ReadRegister(Reg, MI, RegularUse);
- // If it be used by another instruction, it should not be deleted.Add commentMore actions
+ // If it be used by another instruction, it should not be deleted.Add
+ // commentMore actions
for (const MachineOperand &UseMO : MI.uses()) {
if (!UseMO.isReg())
continue;
|
One idea is to overhaul the early-clobber flag and use partial early-clobber to model the correct constraints between the destination operand and source operand. This may need changes from tablegen to register allocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, we have done an evaluation before (the overlapping constraints in RVV are not reasonable for OoO cores with renaming so we wanted to know how big the impact was):
- We removed all the overlapping constraints in TableGen.
- Compiled SPEC CPU 2017 and llvm-testsuite.
- Collected the spill/reload statistics. No COPY data because we think it doesn't matter.
The benchmarks were compiled with -mllvm -riscv-v-register-bit-width-lmul=N
to control the register pressure when vectorizing, and the baselines were compiled with the same commands but with an unmodified compiler (with RA early-clobber constraints).
The results are:
llvm-testsuite | Spill | Reload |
---|---|---|
M1 | -0.46% | -0.14% |
M2 | -0.65% | -0.29% |
M4 | -0.64% | +0.10% |
M8 | -0.70% | +0.23% |
SPEC CPU 2017 | Spill | Reload |
---|---|---|
M1 | -0.01% | -0.0% |
M2 | -0.10% | -0.10% |
M4 | -0.40% | -0.40% |
M8 | -0.60% | -0.30% |
I won't interpret some of the result here (I don't know why there are some increases of reloads in llvm-testsuite). I just want to show how many theoretical gains we can get from supporting the partial overlapping.
if (!RVV) | ||
return 0; | ||
|
||
return RVV->TargetOverlapConstraintType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be stored in TSFlags?
static bool isVectorVirtRegClass(MachineOperand MO, Register R, | ||
const MachineRegisterInfo *MRI) { | ||
// Register like sub_vrm4_0:vrn2m4 also need check. | ||
return RISCV::VRRegClass.hasSubClassEq(MRI->getRegClass(R)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use isRVVRegClass?
} | ||
|
||
unsigned RISCVRegisterInfo::getMCRegLMUL(MCRegister Reg) { | ||
if (RISCVMCRegisterClasses[RISCV::VRRegClassID].contains(Reg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it from RISCVRI::getLMul?
Glad to see the investigation. I've stumbled into this a few times myself, and agree that the first step is trying to get some sense of how much impact fixing this would be. All of the approaches I've come up with involve significant investment, so getting a bound on the benefit seems quite worthwhile. @wangpc-pp - Did you get any estimate of runtime impact? I realize this might be quite challenging, but even something like dynamic icount for a modified qemu might be interesting. Essentially, I'm curious as to where the spills and fills are, and whether removing some of them actually helps. Worth noting is that once we get to implementation, there's a couple possible sub-cases we can handle without solving the whole problem. e.g. If I remember my prior investigation correctly, I think we can handle many of the mask generating instructions with some extra pseudos in the current scheme. |
No, we haven't evaluated the runtime impact because as you have said, the efforts we need to make qemu runnable could be large and we just wanted a quick estimation. The overlapping constraints will only affect the RA (the pressure) so I think we can only pay attention to the spill/reload part. I posted the changes here: #145353. I checked some of the spills/reloads, and I think some mask cases are interesting. |
Relate to #144289
This PR aims to demonstrate how we can model the RVV register overlapping constraints instead of using early-clobber, and how it affects the codegen results.
I don't expect this version to be the final version, but it provides a proof of concept to measure the performance gain (register pressure decrease) after implementing RVV register overlapping constraints.