Skip to content

[RISCV] Support RVV register overlapping constraints #145004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

BeMg
Copy link
Contributor

@BeMg BeMg commented Jun 20, 2025

Relate to #144289

This PR aims to demonstrate how we can model the RVV register overlapping constraints instead of using early-clobber, and how it affects the codegen results.

I don't expect this version to be the final version, but it provides a proof of concept to measure the performance gain (register pressure decrease) after implementing RVV register overlapping constraints.

@BeMg
Copy link
Contributor Author

BeMg commented Jun 20, 2025

In this PR, it models the RVV register overlapping constraints by:

  1. Letting RVV pseudo instructions have a TargetConstraints attribute to retrieve the type of RVV overlapping rule

https://github.com/riscvarchive/riscv-v-spec/blob/master/v-spec.adoc#52-vector-operands
A destination vector register group can overlap a source vector register group only if one of the following holds:

  • The destination EEW equals the source EEW.

  • The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi v0, v0, 3 is legal, but a destination of v1 is not).

  • The destination EEW is greater than the source EEW, the source EMUL is at least 1, and the overlap is in the highest-numbered part of the destination register group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or v4 is not).

  1. Provide a series of target hooks to check the real constraints in RVV instructions.
  2. Extend the register allocation interference feature with target constraints to replace the early-clobber constraints.

As a result, it implements the RVV operand constraints and reduces the register pressure.


There are some issues with this approach.

First, it breaks the early-clobber flag only at the RA stage, but other passes could be based on the early-clobber attribute to perform optimizations (for example: MachineCopyPropagation). Second, it causes several workarounds across RegAllocGreedy, LiveRegMatrix, and RegAllocEvictionAdvisor.


I post the PR for two reason.

  1. It could measure the benefit of supporting the RVV register overlapping constraints.
  2. If we decide to support it, how to support it formally.

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp,h -- llvm/include/llvm/CodeGen/LiveRegMatrix.h llvm/include/llvm/CodeGen/TargetRegisterInfo.h llvm/lib/CodeGen/LiveRegMatrix.cpp llvm/lib/CodeGen/MachineCopyPropagation.cpp llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp llvm/lib/CodeGen/RegAllocGreedy.cpp llvm/lib/Target/RISCV/RISCVInstrInfo.cpp llvm/lib/Target/RISCV/RISCVInstrInfo.h llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp llvm/lib/Target/RISCV/RISCVRegisterInfo.h
View the diff from clang-format here.
diff --git a/llvm/lib/CodeGen/LiveRegMatrix.cpp b/llvm/lib/CodeGen/LiveRegMatrix.cpp
index 64e1695e9..6c2336bae 100644
--- a/llvm/lib/CodeGen/LiveRegMatrix.cpp
+++ b/llvm/lib/CodeGen/LiveRegMatrix.cpp
@@ -259,21 +259,19 @@ LiveRegMatrix::checkInterference(const LiveInterval &VirtReg,
     return IK_VirtReg;
 
   // Check the matrix for virtual register interference.
-  bool Interference = foreachUnit(TRI, VirtReg, PhysReg,
-                                  [&](MCRegUnit Unit, const LiveRange &LR) {
-                                    LiveRange NewLR;
-                                    if (TRI->enableTargetInterference() &&
-                                        TRI->needUpdateECSlot(
-                                            LR, NewLR = copyLiveRange(LR), *LIS)) {
-                                      // Update LiveRange could make cache
-                                      // information stable. Refresh cache to
-                                      // handle it.
-                                      invalidateVirtRegs();
-                                      return query(NewLR, Unit)
-                                          .checkInterference();
-                                    } 
-                                    return query(LR, Unit).checkInterference();
-                                  });
+  bool Interference = foreachUnit(
+      TRI, VirtReg, PhysReg, [&](MCRegUnit Unit, const LiveRange &LR) {
+        LiveRange NewLR;
+        if (TRI->enableTargetInterference() &&
+            TRI->needUpdateECSlot(LR, NewLR = copyLiveRange(LR), *LIS)) {
+          // Update LiveRange could make cache
+          // information stable. Refresh cache to
+          // handle it.
+          invalidateVirtRegs();
+          return query(NewLR, Unit).checkInterference();
+        }
+        return query(LR, Unit).checkInterference();
+      });
   if (Interference)
     return IK_VirtReg;
 
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index 8cf119929..138fdc861 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -921,7 +921,8 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
         // later.
         if (MO.isTied())
           ReadRegister(Reg, MI, RegularUse);
-        // If it be used by another instruction, it should not be deleted.Add commentMore actions
+        // If it be used by another instruction, it should not be deleted.Add
+        // commentMore actions
         for (const MachineOperand &UseMO : MI.uses()) {
           if (!UseMO.isReg())
             continue;

@BeMg
Copy link
Contributor Author

BeMg commented Jun 20, 2025

@BeMg
Copy link
Contributor Author

BeMg commented Jun 20, 2025

  1. If we decide to support it, how to support it formally.

One idea is to overhaul the early-clobber flag and use partial early-clobber to model the correct constraints between the destination operand and source operand. This may need changes from tablegen to register allocation.

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, we have done an evaluation before (the overlapping constraints in RVV are not reasonable for OoO cores with renaming so we wanted to know how big the impact was):

  1. We removed all the overlapping constraints in TableGen.
  2. Compiled SPEC CPU 2017 and llvm-testsuite.
  3. Collected the spill/reload statistics. No COPY data because we think it doesn't matter.

The benchmarks were compiled with -mllvm -riscv-v-register-bit-width-lmul=N to control the register pressure when vectorizing, and the baselines were compiled with the same commands but with an unmodified compiler (with RA early-clobber constraints).

The results are:

llvm-testsuite Spill Reload
M1 -0.46% -0.14%
M2 -0.65% -0.29%
M4 -0.64% +0.10%
M8 -0.70% +0.23%
SPEC CPU 2017 Spill Reload
M1 -0.01% -0.0%
M2 -0.10% -0.10%
M4 -0.40% -0.40%
M8 -0.60% -0.30%

I won't interpret some of the result here (I don't know why there are some increases of reloads in llvm-testsuite). I just want to show how many theoretical gains we can get from supporting the partial overlapping.

if (!RVV)
return 0;

return RVV->TargetOverlapConstraintType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be stored in TSFlags?

static bool isVectorVirtRegClass(MachineOperand MO, Register R,
const MachineRegisterInfo *MRI) {
// Register like sub_vrm4_0:vrn2m4 also need check.
return RISCV::VRRegClass.hasSubClassEq(MRI->getRegClass(R)) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use isRVVRegClass?

}

unsigned RISCVRegisterInfo::getMCRegLMUL(MCRegister Reg) {
if (RISCVMCRegisterClasses[RISCV::VRRegClassID].contains(Reg))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it from RISCVRI::getLMul?

@preames
Copy link
Collaborator

preames commented Jun 23, 2025

Glad to see the investigation. I've stumbled into this a few times myself, and agree that the first step is trying to get some sense of how much impact fixing this would be. All of the approaches I've come up with involve significant investment, so getting a bound on the benefit seems quite worthwhile.

@wangpc-pp - Did you get any estimate of runtime impact? I realize this might be quite challenging, but even something like dynamic icount for a modified qemu might be interesting. Essentially, I'm curious as to where the spills and fills are, and whether removing some of them actually helps.

Worth noting is that once we get to implementation, there's a couple possible sub-cases we can handle without solving the whole problem. e.g. If I remember my prior investigation correctly, I think we can handle many of the mask generating instructions with some extra pseudos in the current scheme.

@wangpc-pp
Copy link
Contributor

Did you get any estimate of runtime impact? I realize this might be quite challenging, but even something like dynamic icount for a modified qemu might be interesting. Essentially, I'm curious as to where the spills and fills are, and whether removing some of them actually helps.

No, we haven't evaluated the runtime impact because as you have said, the efforts we need to make qemu runnable could be large and we just wanted a quick estimation. The overlapping constraints will only affect the RA (the pressure) so I think we can only pay attention to the spill/reload part.

I posted the changes here: #145353. I checked some of the spills/reloads, and I think some mask cases are interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants