[RISCV] Set AllocationPriority in line with LMUL #131176

preames · 2025-03-13T17:55:45Z

This mechanism causes the greedy register allocator to prefer allocating register classes with higher priority first. This helps to ensure that high LMUL registers obtain a register without having to go through the eviction mechanism. In practice, it seems to cause a bunch of code churn, and some minor improvement around widening and narrowing operations.

In a few of the widening tests, we have what look like code size regressions because we end up with two smaller register class copies instead of one larger one after the instruction. However, in any larger code sequence, these are likely to be folded into the producing instructions. (But so were the wider copies after the operation.)

Two observations:

We're not setting the greedy-regclass-priority-trumps-globalness flag
on the register class, so this doesn't help long mask ranges. I
thought about doing that, but the benefit is non-obvious, so I
decided it was worth a separate change at minimum.
We could arguably set the priority higher for the register classes
that exclude v0. I tried that, and it caused a whole bunch of
further churn. I may return to it in a separate patch.

This mechanism causes the greedy register allocator to prefer allocating register classes with higher priority first. This helps to ensure that high LMUL registers obtain a register without having to go through the eviction mechanism. In practice, it seems to cause a bunch of code churn, and some minor improvement around widening and narrowing operations. In a few of the widening tests, we have what look like code size regressions because we end up with two smaller register class copies instead of one larger one after the instruction. However, in any larger code sequence, these are likely to be folded into the producing instructions. (But so were the wider copies after the operation.) Two observations: 1) We're not setting the greedy-regclass-priority-trumps-globalness flag on the register class, so this doesn't help long mask ranges. I thought about doing that, but the benefit is non-obvious, so I decided it was worth a separate change at minimum. 2) We could arguably set the priority higher for the register classes that exclude v0. I tried that, and it caused a whole bunch of further churn. I may return to it in a separate patch.

llvmbot · 2025-03-13T17:56:22Z

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This mechanism causes the greedy register allocator to prefer allocating register classes with higher priority first. This helps to ensure that high LMUL registers obtain a register without having to go through the eviction mechanism. In practice, it seems to cause a bunch of code churn, and some minor improvement around widening and narrowing operations.

In a few of the widening tests, we have what look like code size regressions because we end up with two smaller register class copies instead of one larger one after the instruction. However, in any larger code sequence, these are likely to be folded into the producing instructions. (But so were the wider copies after the operation.)

Two observations:

We're not setting the greedy-regclass-priority-trumps-globalness flag
on the register class, so this doesn't help long mask ranges. I
thought about doing that, but the benefit is non-obvious, so I
decided it was worth a separate change at minimum.
We could arguably set the priority higher for the register classes
that exclude v0. I tried that, and it caused a whole bunch of
further churn. I may return to it in a separate patch.

Patch is 1.47 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131176.diff

179 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.td (+6)
(modified) llvm/test/CodeGen/RISCV/redundant-copy-from-tail-duplicate.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/combine-store-extract-crash.ll (+20-20)
(modified) llvm/test/CodeGen/RISCV/rvv/common-shuffle-patterns.ll (+4-3)
(modified) llvm/test/CodeGen/RISCV/rvv/compressstore.ll (+25-25)
(modified) llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll (+68-68)
(modified) llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll (+46-46)
(modified) llvm/test/CodeGen/RISCV/rvv/cttz-sdnode.ll (+103-103)
(modified) llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll (+65-65)
(modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+646-636)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abs.ll (+9-9)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll (+94-94)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll (+116-116)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-conv.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-interleave.ll (+39-27)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-setcc.ll (+108-108)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll (+728-728)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp2i-sat.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp2i.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpext-vp.ll (+8-7)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fptosi-vp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fptoui-vp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-i2fp.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+33-33)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-exttrunc.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-interleave.ll (+48-33)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll (+23-23)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll (+40-40)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll (+252-231)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-inttoptr-ptrtoint.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-llrint-vp.ll (+16-16)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-llrint.ll (+22-22)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-lrint-vp.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-lrint.ll (+38-38)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-sad.ll (+14-14)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-scalarized.ll (+14-14)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-fp-vp.ll (+274-258)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-sext-vp.ll (+18-17)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll (+25-25)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll (+19-19)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-reverse.ll (+138-138)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-rotate.ll (+21-21)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shufflevector-vnsrl.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-sitofp-vp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-uitofp-vp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfadd-vp.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfdiv-vp.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmax.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmin.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmul-vp.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfpext-constrained-sdnode.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfptoi-constrained-sdnode.ll (+16-16)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfsub-vp.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwmacc.ll (+18-18)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vitofp-constrained-sdnode.ll (+26-24)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpgather.ll (+24-24)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vrol.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vror.ll (+38-38)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwadd-mask.ll (+17-14)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+5-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsll.ll (+147-141)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsub-mask.ll (+13-11)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-zext-vp.ll (+18-17)
(modified) llvm/test/CodeGen/RISCV/rvv/float-round-conv.ll (+28-28)
(modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-vp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fminimum-vp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fptosi-sat.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/fptoui-sat.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/half-round-conv.ll (+14-14)
(modified) llvm/test/CodeGen/RISCV/rvv/interleave-crash.ll (+20-20)
(modified) llvm/test/CodeGen/RISCV/rvv/intrinsic-vector-match.ll (+90-90)
(modified) llvm/test/CodeGen/RISCV/rvv/llrint-sdnode.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/llrint-vp.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/lrint-sdnode.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/lrint-vp.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/mgather-sdnode.ll (+22-22)
(modified) llvm/test/CodeGen/RISCV/rvv/named-vector-shuffle-reverse.ll (+140-140)
(modified) llvm/test/CodeGen/RISCV/rvv/narrow-shift-extend.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/pr61561.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/pr95865.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/setcc-fp-vp.ll (+66-64)
(modified) llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/vcpop-shl-zext-opt.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll (+36-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll (+7-7)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll (+58-58)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll (+58-50)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll (+14-14)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll (+394-376)
(modified) llvm/test/CodeGen/RISCV/rvv/vexts-sdnode.ll (+72-72)
(modified) llvm/test/CodeGen/RISCV/rvv/vfadd-constrained-sdnode.ll (+18-18)
(modified) llvm/test/CodeGen/RISCV/rvv/vfadd-vp.ll (+36-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vfcopysign-sdnode.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/vfdiv-constrained-sdnode.ll (+24-24)
(modified) llvm/test/CodeGen/RISCV/rvv/vfdiv-vp.ll (+36-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll (+537-659)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmsub-constrained-sdnode.ll (+25-57)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmul-constrained-sdnode.ll (+18-18)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmul-vp.ll (+18-18)
(modified) llvm/test/CodeGen/RISCV/rvv/vfnmadd-constrained-sdnode.ll (+35-35)
(modified) llvm/test/CodeGen/RISCV/rvv/vfnmsub-constrained-sdnode.ll (+17-17)
(modified) llvm/test/CodeGen/RISCV/rvv/vfpext-constrained-sdnode.ll (+14-14)
(modified) llvm/test/CodeGen/RISCV/rvv/vfpext-sdnode.ll (+18-18)
(modified) llvm/test/CodeGen/RISCV/rvv/vfpext-vp.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/vfptoi-constrained-sdnode.ll (+24-24)
(modified) llvm/test/CodeGen/RISCV/rvv/vfptoi-sdnode.ll (+24-24)
(modified) llvm/test/CodeGen/RISCV/rvv/vfptosi-vp.ll (+9-10)
(modified) llvm/test/CodeGen/RISCV/rvv/vfptoui-vp.ll (+9-10)
(modified) llvm/test/CodeGen/RISCV/rvv/vfptrunc-vp.ll (+17-17)
(modified) llvm/test/CodeGen/RISCV/rvv/vfsub-constrained-sdnode.ll (+24-24)
(modified) llvm/test/CodeGen/RISCV/rvv/vfsub-vp.ll (+36-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwadd-sdnode.ll (+21-18)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwadd.ll (+54-48)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwadd.w.ll (+20-20)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-f-f.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-f-x.ll (+18-18)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-f-xu.ll (+18-18)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-rtz-x-f.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-rtz-xu-f.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-x-f.ll (+24-24)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-xu-f.ll (+24-24)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwcvtbf16-f-f.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwmacc-vp.ll (+38-40)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwmsac-vp.ll (+40-40)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwmul-sdnode.ll (+21-18)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwmul.ll (+54-48)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwnmacc-vp.ll (+56-58)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwnmsac-vp.ll (+56-58)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwsub-sdnode.ll (+21-18)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwsub.ll (+54-48)
(modified) llvm/test/CodeGen/RISCV/rvv/vfwsub.w.ll (+20-20)
(modified) llvm/test/CodeGen/RISCV/rvv/vitofp-constrained-sdnode.ll (+36-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vitofp-sdnode.ll (+56-56)
(modified) llvm/test/CodeGen/RISCV/rvv/vl-opt-instrs.ll (+14-14)
(modified) llvm/test/CodeGen/RISCV/rvv/vloxei.ll (+72-72)
(modified) llvm/test/CodeGen/RISCV/rvv/vluxei.ll (+72-72)
(modified) llvm/test/CodeGen/RISCV/rvv/vp-inttoptr-ptrtoint.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/vp-vector-interleaved-access.ll (+10-10)
(modified) llvm/test/CodeGen/RISCV/rvv/vpgather-sdnode.ll (+18-18)
(modified) llvm/test/CodeGen/RISCV/rvv/vpmerge-sdnode.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/vrol-sdnode.ll (+9-9)
(modified) llvm/test/CodeGen/RISCV/rvv/vror-sdnode.ll (+9-9)
(modified) llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll (+32-32)
(modified) llvm/test/CodeGen/RISCV/rvv/vsext-vp.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/vsext.ll (+36-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vsitofp-vp.ll (+9-10)
(modified) llvm/test/CodeGen/RISCV/rvv/vtrunc-vp.ll (+12-11)
(modified) llvm/test/CodeGen/RISCV/rvv/vuitofp-vp.ll (+9-10)
(modified) llvm/test/CodeGen/RISCV/rvv/vwadd-mask-sdnode.ll (+17-14)
(modified) llvm/test/CodeGen/RISCV/rvv/vwadd-sdnode.ll (+58-52)
(modified) llvm/test/CodeGen/RISCV/rvv/vwadd.ll (+45-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vwadd.w.ll (+16-16)
(modified) llvm/test/CodeGen/RISCV/rvv/vwaddu.ll (+45-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vwaddu.w.ll (+16-16)
(modified) llvm/test/CodeGen/RISCV/rvv/vwmul-sdnode.ll (+45-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vwmul.ll (+45-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vwmulsu.ll (+45-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vwmulu.ll (+45-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vwsll-sdnode.ll (+132-126)
(modified) llvm/test/CodeGen/RISCV/rvv/vwsll-vp.ll (+111-105)
(modified) llvm/test/CodeGen/RISCV/rvv/vwsll.ll (+63-54)
(modified) llvm/test/CodeGen/RISCV/rvv/vwsub-mask-sdnode.ll (+13-11)
(modified) llvm/test/CodeGen/RISCV/rvv/vwsub-sdnode.ll (+54-48)
(modified) llvm/test/CodeGen/RISCV/rvv/vwsub.ll (+45-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vwsub.w.ll (+16-16)
(modified) llvm/test/CodeGen/RISCV/rvv/vwsubu.ll (+45-36)
(modified) llvm/test/CodeGen/RISCV/rvv/vwsubu.w.ll (+16-16)
(modified) llvm/test/CodeGen/RISCV/rvv/vzext-vp.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/vzext.ll (+36-36)
(modified) llvm/test/CodeGen/RISCV/rvv/zvbb-demanded-bits.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+13-13)

diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
index a5dfb5ba1a2fc..1e0541e667895 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
@@ -752,18 +752,24 @@ def VR : VReg<!listconcat(VM1VTs, VMaskVTs),
 
 def VRNoV0 : VReg<!listconcat(VM1VTs, VMaskVTs), (sub VR, V0), 1>;
 
+let AllocationPriority = 2 in
 def VRM2 : VReg<VM2VTs, (add (sequence "V%uM2", 8, 31, 2),
                              (sequence "V%uM2", 6, 0, 2)), 2>;
 
+let AllocationPriority = 2 in
 def VRM2NoV0 : VReg<VM2VTs, (sub VRM2, V0M2), 2>;
 
+let AllocationPriority = 4 in
 def VRM4 : VReg<VM4VTs, (add V8M4, V12M4, V16M4, V20M4,
                              V24M4, V28M4, V4M4, V0M4), 4>;
 
+let AllocationPriority = 4 in
 def VRM4NoV0 : VReg<VM4VTs, (sub VRM4, V0M4), 4>;
 
+let AllocationPriority = 8 in
 def VRM8 : VReg<VM8VTs, (add V8M8, V16M8, V24M8, V0M8), 8>;
 
+let AllocationPriority = 8 in
 def VRM8NoV0 : VReg<VM8VTs, (sub VRM8, V0M8), 8>;
 
 def VMV0 : VReg<VMaskVTs, (add V0), 1>;
diff --git a/llvm/test/CodeGen/RISCV/redundant-copy-from-tail-duplicate.ll b/llvm/test/CodeGen/RISCV/redundant-copy-from-tail-duplicate.ll
index 5d588ad66b9ca..15b5698c22e81 100644
--- a/llvm/test/CodeGen/RISCV/redundant-copy-from-tail-duplicate.ll
+++ b/llvm/test/CodeGen/RISCV/redundant-copy-from-tail-duplicate.ll
@@ -20,10 +20,10 @@ define signext i32 @sum(ptr %a, i32 signext %n, i1 %prof.min.iters.check, <vscal
 ; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB0_4: # %vector.ph
 ; CHECK-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
-; CHECK-NEXT:    vmv.s.x v8, zero
-; CHECK-NEXT:    vmv.v.i v12, 0
+; CHECK-NEXT:    vmv.s.x v12, zero
+; CHECK-NEXT:    vmv.v.i v8, 0
 ; CHECK-NEXT:    vsetivli zero, 1, e32, m4, ta, ma
-; CHECK-NEXT:    vredsum.vs v8, v12, v8, v0.t
+; CHECK-NEXT:    vredsum.vs v8, v8, v12, v0.t
 ; CHECK-NEXT:    vmv.x.s a0, v8
 ; CHECK-NEXT:    ret
 entry:
diff --git a/llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll b/llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll
index 4ade6c09fe43d..ec422a8fbb928 100644
--- a/llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll
@@ -106,12 +106,12 @@ define <32 x i1> @fv32(ptr %p, i64 %index, i64 %tc) {
 ; CHECK-NEXT:    lui a0, %hi(.LCPI8_0)
 ; CHECK-NEXT:    addi a0, a0, %lo(.LCPI8_0)
 ; CHECK-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
-; CHECK-NEXT:    vle8.v v8, (a0)
-; CHECK-NEXT:    vid.v v16
-; CHECK-NEXT:    vsaddu.vx v16, v16, a1
-; CHECK-NEXT:    vmsltu.vx v0, v16, a2
-; CHECK-NEXT:    vsext.vf8 v16, v8
-; CHECK-NEXT:    vsaddu.vx v8, v16, a1
+; CHECK-NEXT:    vle8.v v16, (a0)
+; CHECK-NEXT:    vid.v v8
+; CHECK-NEXT:    vsaddu.vx v8, v8, a1
+; CHECK-NEXT:    vmsltu.vx v0, v8, a2
+; CHECK-NEXT:    vsext.vf8 v8, v16
+; CHECK-NEXT:    vsaddu.vx v8, v8, a1
 ; CHECK-NEXT:    vmsltu.vx v16, v8, a2
 ; CHECK-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
 ; CHECK-NEXT:    vslideup.vi v0, v16, 2
diff --git a/llvm/test/CodeGen/RISCV/rvv/combine-store-extract-crash.ll b/llvm/test/CodeGen/RISCV/rvv/combine-store-extract-crash.ll
index 482cf83d540c4..496755738e6fa 100644
--- a/llvm/test/CodeGen/RISCV/rvv/combine-store-extract-crash.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/combine-store-extract-crash.ll
@@ -9,21 +9,21 @@ define void @test(ptr %ref_array, ptr %sad_array) {
 ; RV32:       # %bb.0: # %entry
 ; RV32-NEXT:    th.lwd a2, a3, (a0), 0, 3
 ; RV32-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
-; RV32-NEXT:    vle8.v v8, (a2)
+; RV32-NEXT:    vle8.v v12, (a2)
 ; RV32-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; RV32-NEXT:    vzext.vf4 v12, v8
-; RV32-NEXT:    vmv.s.x v8, zero
-; RV32-NEXT:    vredsum.vs v9, v12, v8
-; RV32-NEXT:    vmv.x.s a0, v9
+; RV32-NEXT:    vzext.vf4 v8, v12
+; RV32-NEXT:    vmv.s.x v12, zero
+; RV32-NEXT:    vredsum.vs v8, v8, v12
+; RV32-NEXT:    vmv.x.s a0, v8
 ; RV32-NEXT:    th.swia a0, (a1), 4, 0
 ; RV32-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
-; RV32-NEXT:    vle8.v v9, (a3)
-; RV32-NEXT:    vmv.v.i v10, 0
+; RV32-NEXT:    vle8.v v13, (a3)
+; RV32-NEXT:    vmv.v.i v8, 0
 ; RV32-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
-; RV32-NEXT:    vslideup.vi v9, v10, 4
+; RV32-NEXT:    vslideup.vi v13, v8, 4
 ; RV32-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; RV32-NEXT:    vzext.vf4 v12, v9
-; RV32-NEXT:    vredsum.vs v8, v12, v8
+; RV32-NEXT:    vzext.vf4 v8, v13
+; RV32-NEXT:    vredsum.vs v8, v8, v12
 ; RV32-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV32-NEXT:    vse32.v v8, (a1)
 ; RV32-NEXT:    ret
@@ -32,21 +32,21 @@ define void @test(ptr %ref_array, ptr %sad_array) {
 ; RV64:       # %bb.0: # %entry
 ; RV64-NEXT:    th.ldd a2, a3, (a0), 0, 4
 ; RV64-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
-; RV64-NEXT:    vle8.v v8, (a2)
+; RV64-NEXT:    vle8.v v12, (a2)
 ; RV64-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; RV64-NEXT:    vzext.vf4 v12, v8
-; RV64-NEXT:    vmv.s.x v8, zero
-; RV64-NEXT:    vredsum.vs v9, v12, v8
-; RV64-NEXT:    vmv.x.s a0, v9
+; RV64-NEXT:    vzext.vf4 v8, v12
+; RV64-NEXT:    vmv.s.x v12, zero
+; RV64-NEXT:    vredsum.vs v8, v8, v12
+; RV64-NEXT:    vmv.x.s a0, v8
 ; RV64-NEXT:    th.swia a0, (a1), 4, 0
 ; RV64-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
-; RV64-NEXT:    vle8.v v9, (a3)
-; RV64-NEXT:    vmv.v.i v10, 0
+; RV64-NEXT:    vle8.v v13, (a3)
+; RV64-NEXT:    vmv.v.i v8, 0
 ; RV64-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
-; RV64-NEXT:    vslideup.vi v9, v10, 4
+; RV64-NEXT:    vslideup.vi v13, v8, 4
 ; RV64-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; RV64-NEXT:    vzext.vf4 v12, v9
-; RV64-NEXT:    vredsum.vs v8, v12, v8
+; RV64-NEXT:    vzext.vf4 v8, v13
+; RV64-NEXT:    vredsum.vs v8, v8, v12
 ; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV64-NEXT:    vse32.v v8, (a1)
 ; RV64-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/rvv/common-shuffle-patterns.ll b/llvm/test/CodeGen/RISCV/rvv/common-shuffle-patterns.ll
index 1845c0e4bd3b6..7649d9ad6059f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/common-shuffle-patterns.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/common-shuffle-patterns.ll
@@ -8,10 +8,11 @@ define dso_local <16 x i16> @interleave(<8 x i16> %v0, <8 x i16> %v1) {
 ; CHECK-LABEL: interleave:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; CHECK-NEXT:    vwaddu.vv v10, v8, v9
+; CHECK-NEXT:    vmv1r.v v10, v9
+; CHECK-NEXT:    vmv1r.v v11, v8
+; CHECK-NEXT:    vwaddu.vv v8, v11, v10
 ; CHECK-NEXT:    li a0, -1
-; CHECK-NEXT:    vwmaccu.vx v10, a0, v9
-; CHECK-NEXT:    vmv2r.v v8, v10
+; CHECK-NEXT:    vwmaccu.vx v8, a0, v10
 ; CHECK-NEXT:    ret
 entry:
   %v2 = shufflevector <8 x i16> %v0, <8 x i16> poison, <16 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 2, i32 undef, i32 3, i32 undef, i32 4, i32 undef, i32 5, i32 undef, i32 6, i32 undef, i32 7, i32 undef>
diff --git a/llvm/test/CodeGen/RISCV/rvv/compressstore.ll b/llvm/test/CodeGen/RISCV/rvv/compressstore.ll
index 61fb457a7eb65..69822e9d9d2e3 100644
--- a/llvm/test/CodeGen/RISCV/rvv/compressstore.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/compressstore.ll
@@ -200,12 +200,12 @@ define void @test_compresstore_v256i8(ptr %p, <256 x i1> %mask, <256 x i8> %data
 ; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; RV64-NEXT:    vmv1r.v v7, v8
 ; RV64-NEXT:    li a2, 128
-; RV64-NEXT:    vslidedown.vi v9, v0, 1
+; RV64-NEXT:    vslidedown.vi v8, v0, 1
 ; RV64-NEXT:    vmv.x.s a3, v0
 ; RV64-NEXT:    vsetvli zero, a2, e8, m8, ta, ma
 ; RV64-NEXT:    vle8.v v24, (a1)
 ; RV64-NEXT:    vsetvli zero, a2, e64, m1, ta, ma
-; RV64-NEXT:    vmv.x.s a1, v9
+; RV64-NEXT:    vmv.x.s a1, v8
 ; RV64-NEXT:    vsetvli zero, a2, e8, m8, ta, ma
 ; RV64-NEXT:    vcompress.vm v8, v16, v0
 ; RV64-NEXT:    vcpop.m a4, v0
@@ -227,14 +227,14 @@ define void @test_compresstore_v256i8(ptr %p, <256 x i1> %mask, <256 x i8> %data
 ; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; RV32-NEXT:    vmv1r.v v7, v8
 ; RV32-NEXT:    li a2, 128
-; RV32-NEXT:    vslidedown.vi v9, v0, 1
+; RV32-NEXT:    vslidedown.vi v8, v0, 1
 ; RV32-NEXT:    li a3, 32
 ; RV32-NEXT:    vmv.x.s a4, v0
 ; RV32-NEXT:    vsetvli zero, a2, e8, m8, ta, ma
 ; RV32-NEXT:    vle8.v v24, (a1)
 ; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
-; RV32-NEXT:    vsrl.vx v6, v9, a3
-; RV32-NEXT:    vmv.x.s a1, v9
+; RV32-NEXT:    vsrl.vx v6, v8, a3
+; RV32-NEXT:    vmv.x.s a1, v8
 ; RV32-NEXT:    vsrl.vx v5, v0, a3
 ; RV32-NEXT:    vsetvli zero, a2, e8, m8, ta, ma
 ; RV32-NEXT:    vcompress.vm v8, v16, v0
@@ -438,16 +438,16 @@ define void @test_compresstore_v128i16(ptr %p, <128 x i1> %mask, <128 x i16> %da
 ; RV64-NEXT:    vcompress.vm v24, v8, v0
 ; RV64-NEXT:    vcpop.m a2, v0
 ; RV64-NEXT:    vsetivli zero, 8, e8, m1, ta, ma
-; RV64-NEXT:    vslidedown.vi v8, v0, 8
+; RV64-NEXT:    vslidedown.vi v7, v0, 8
 ; RV64-NEXT:    vsetvli zero, a1, e16, m8, ta, ma
-; RV64-NEXT:    vcompress.vm v0, v16, v8
-; RV64-NEXT:    vcpop.m a1, v8
+; RV64-NEXT:    vcompress.vm v8, v16, v7
+; RV64-NEXT:    vcpop.m a1, v7
 ; RV64-NEXT:    vsetvli zero, a2, e16, m8, ta, ma
 ; RV64-NEXT:    vse16.v v24, (a0)
 ; RV64-NEXT:    slli a2, a2, 1
 ; RV64-NEXT:    add a0, a0, a2
 ; RV64-NEXT:    vsetvli zero, a1, e16, m8, ta, ma
-; RV64-NEXT:    vse16.v v0, (a0)
+; RV64-NEXT:    vse16.v v8, (a0)
 ; RV64-NEXT:    ret
 ;
 ; RV32-LABEL: test_compresstore_v128i16:
@@ -635,16 +635,16 @@ define void @test_compresstore_v64i32(ptr %p, <64 x i1> %mask, <64 x i32> %data)
 ; RV64-NEXT:    vsetvli zero, a2, e32, m8, ta, ma
 ; RV64-NEXT:    vse32.v v24, (a0)
 ; RV64-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
-; RV64-NEXT:    vslidedown.vi v8, v0, 4
+; RV64-NEXT:    vslidedown.vi v24, v0, 4
 ; RV64-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
 ; RV64-NEXT:    vmv.x.s a1, v0
-; RV64-NEXT:    vcompress.vm v24, v16, v8
-; RV64-NEXT:    vcpop.m a2, v8
+; RV64-NEXT:    vcompress.vm v8, v16, v24
+; RV64-NEXT:    vcpop.m a2, v24
 ; RV64-NEXT:    cpopw a1, a1
 ; RV64-NEXT:    slli a1, a1, 2
 ; RV64-NEXT:    add a0, a0, a1
 ; RV64-NEXT:    vsetvli zero, a2, e32, m8, ta, ma
-; RV64-NEXT:    vse32.v v24, (a0)
+; RV64-NEXT:    vse32.v v8, (a0)
 ; RV64-NEXT:    ret
 ;
 ; RV32-LABEL: test_compresstore_v64i32:
@@ -654,16 +654,16 @@ define void @test_compresstore_v64i32(ptr %p, <64 x i1> %mask, <64 x i32> %data)
 ; RV32-NEXT:    vcompress.vm v24, v8, v0
 ; RV32-NEXT:    vcpop.m a2, v0
 ; RV32-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
-; RV32-NEXT:    vslidedown.vi v8, v0, 4
+; RV32-NEXT:    vslidedown.vi v7, v0, 4
 ; RV32-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
-; RV32-NEXT:    vcompress.vm v0, v16, v8
-; RV32-NEXT:    vcpop.m a1, v8
+; RV32-NEXT:    vcompress.vm v8, v16, v7
+; RV32-NEXT:    vcpop.m a1, v7
 ; RV32-NEXT:    vsetvli zero, a2, e32, m8, ta, ma
 ; RV32-NEXT:    vse32.v v24, (a0)
 ; RV32-NEXT:    slli a2, a2, 2
 ; RV32-NEXT:    add a0, a0, a2
 ; RV32-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
-; RV32-NEXT:    vse32.v v0, (a0)
+; RV32-NEXT:    vse32.v v8, (a0)
 ; RV32-NEXT:    ret
 entry:
   tail call void @llvm.masked.compressstore.v64i32(<64 x i32> %data, ptr align 4 %p, <64 x i1> %mask)
@@ -796,18 +796,18 @@ define void @test_compresstore_v32i64(ptr %p, <32 x i1> %mask, <32 x i64> %data)
 ; RV64-NEXT:    vsetvli zero, a1, e64, m8, ta, ma
 ; RV64-NEXT:    vse64.v v24, (a0)
 ; RV64-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
-; RV64-NEXT:    vslidedown.vi v8, v0, 2
+; RV64-NEXT:    vslidedown.vi v24, v0, 2
 ; RV64-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; RV64-NEXT:    vmv.x.s a1, v0
 ; RV64-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
-; RV64-NEXT:    vcompress.vm v24, v16, v8
+; RV64-NEXT:    vcompress.vm v8, v16, v24
 ; RV64-NEXT:    zext.h a1, a1
 ; RV64-NEXT:    cpopw a1, a1
 ; RV64-NEXT:    slli a1, a1, 3
 ; RV64-NEXT:    add a0, a0, a1
-; RV64-NEXT:    vcpop.m a1, v8
+; RV64-NEXT:    vcpop.m a1, v24
 ; RV64-NEXT:    vsetvli zero, a1, e64, m8, ta, ma
-; RV64-NEXT:    vse64.v v24, (a0)
+; RV64-NEXT:    vse64.v v8, (a0)
 ; RV64-NEXT:    ret
 ;
 ; RV32-LABEL: test_compresstore_v32i64:
@@ -818,18 +818,18 @@ define void @test_compresstore_v32i64(ptr %p, <32 x i1> %mask, <32 x i64> %data)
 ; RV32-NEXT:    vsetvli zero, a1, e64, m8, ta, ma
 ; RV32-NEXT:    vse64.v v24, (a0)
 ; RV32-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
-; RV32-NEXT:    vslidedown.vi v8, v0, 2
+; RV32-NEXT:    vslidedown.vi v24, v0, 2
 ; RV32-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; RV32-NEXT:    vmv.x.s a1, v0
 ; RV32-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
-; RV32-NEXT:    vcompress.vm v24, v16, v8
+; RV32-NEXT:    vcompress.vm v8, v16, v24
 ; RV32-NEXT:    zext.h a1, a1
 ; RV32-NEXT:    cpop a1, a1
 ; RV32-NEXT:    slli a1, a1, 3
 ; RV32-NEXT:    add a0, a0, a1
-; RV32-NEXT:    vcpop.m a1, v8
+; RV32-NEXT:    vcpop.m a1, v24
 ; RV32-NEXT:    vsetvli zero, a1, e64, m8, ta, ma
-; RV32-NEXT:    vse64.v v24, (a0)
+; RV32-NEXT:    vse64.v v8, (a0)
 ; RV32-NEXT:    ret
 entry:
   tail call void @llvm.masked.compressstore.v32i64(<32 x i64> %data, ptr align 8 %p, <32 x i1> %mask)
diff --git a/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll
index 208735b18cbab..97e1a7f41b92f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll
@@ -162,12 +162,12 @@ define <vscale x 4 x i8> @ctlz_nxv4i8(<vscale x 4 x i8> %va) {
 ; CHECK-F-LABEL: ctlz_nxv4i8:
 ; CHECK-F:       # %bb.0:
 ; CHECK-F-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
-; CHECK-F-NEXT:    vzext.vf2 v9, v8
+; CHECK-F-NEXT:    vzext.vf2 v10, v8
 ; CHECK-F-NEXT:    li a0, 134
-; CHECK-F-NEXT:    vfwcvt.f.xu.v v10, v9
-; CHECK-F-NEXT:    vnsrl.wi v8, v10, 23
+; CHECK-F-NEXT:    vfwcvt.f.xu.v v8, v10
+; CHECK-F-NEXT:    vnsrl.wi v10, v8, 23
 ; CHECK-F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
-; CHECK-F-NEXT:    vnsrl.wi v8, v8, 0
+; CHECK-F-NEXT:    vnsrl.wi v8, v10, 0
 ; CHECK-F-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-F-NEXT:    li a0, 8
 ; CHECK-F-NEXT:    vminu.vx v8, v8, a0
@@ -176,12 +176,12 @@ define <vscale x 4 x i8> @ctlz_nxv4i8(<vscale x 4 x i8> %va) {
 ; CHECK-D-LABEL: ctlz_nxv4i8:
 ; CHECK-D:       # %bb.0:
 ; CHECK-D-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
-; CHECK-D-NEXT:    vzext.vf2 v9, v8
+; CHECK-D-NEXT:    vzext.vf2 v10, v8
 ; CHECK-D-NEXT:    li a0, 134
-; CHECK-D-NEXT:    vfwcvt.f.xu.v v10, v9
-; CHECK-D-NEXT:    vnsrl.wi v8, v10, 23
+; CHECK-D-NEXT:    vfwcvt.f.xu.v v8, v10
+; CHECK-D-NEXT:    vnsrl.wi v10, v8, 23
 ; CHECK-D-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
-; CHECK-D-NEXT:    vnsrl.wi v8, v8, 0
+; CHECK-D-NEXT:    vnsrl.wi v8, v10, 0
 ; CHECK-D-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-D-NEXT:    li a0, 8
 ; CHECK-D-NEXT:    vminu.vx v8, v8, a0
@@ -225,13 +225,13 @@ define <vscale x 8 x i8> @ctlz_nxv8i8(<vscale x 8 x i8> %va) {
 ; CHECK-F-LABEL: ctlz_nxv8i8:
 ; CHECK-F:       # %bb.0:
 ; CHECK-F-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
-; CHECK-F-NEXT:    vzext.vf2 v10, v8
+; CHECK-F-NEXT:    vzext.vf2 v12, v8
 ; CHECK-F-NEXT:    li a0, 134
-; CHECK-F-NEXT:    vfwcvt.f.xu.v v12, v10
-; CHECK-F-NEXT:    vnsrl.wi v8, v12, 23
+; CHECK-F-NEXT:    vfwcvt.f.xu.v v8, v12
+; CHECK-F-NEXT:    vnsrl.wi v12, v8, 23
 ; CHECK-F-NEXT:    vsetvli zero, zero, e8, m1, ta, ma
-; CHECK-F-NEXT:    vnsrl.wi v10, v8, 0
-; CHECK-F-NEXT:    vrsub.vx v8, v10, a0
+; CHECK-F-NEXT:    vnsrl.wi v8, v12, 0
+; CHECK-F-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-F-NEXT:    li a0, 8
 ; CHECK-F-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-F-NEXT:    ret
@@ -239,13 +239,13 @@ define <vscale x 8 x i8> @ctlz_nxv8i8(<vscale x 8 x i8> %va) {
 ; CHECK-D-LABEL: ctlz_nxv8i8:
 ; CHECK-D:       # %bb.0:
 ; CHECK-D-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
-; CHECK-D-NEXT:    vzext.vf2 v10, v8
+; CHECK-D-NEXT:    vzext.vf2 v12, v8
 ; CHECK-D-NEXT:    li a0, 134
-; CHECK-D-NEXT:    vfwcvt.f.xu.v v12, v10
-; CHECK-D-NEXT:    vnsrl.wi v8, v12, 23
+; CHECK-D-NEXT:    vfwcvt.f.xu.v v8, v12
+; CHECK-D-NEXT:    vnsrl.wi v12, v8, 23
 ; CHECK-D-NEXT:    vsetvli zero, zero, e8, m1, ta, ma
-; CHECK-D-NEXT:    vnsrl.wi v10, v8, 0
-; CHECK-D-NEXT:    vrsub.vx v8, v10, a0
+; CHECK-D-NEXT:    vnsrl.wi v8, v12, 0
+; CHECK-D-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-D-NEXT:    li a0, 8
 ; CHECK-D-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-D-NEXT:    ret
@@ -288,13 +288,13 @@ define <vscale x 16 x i8> @ctlz_nxv16i8(<vscale x 16 x i8> %va) {
 ; CHECK-F-LABEL: ctlz_nxv16i8:
 ; CHECK-F:       # %bb.0:
 ; CHECK-F-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
-; CHECK-F-NEXT:    vzext.vf2 v12, v8
+; CHECK-F-NEXT:    vzext.vf2 v16, v8
 ; CHECK-F-NEXT:    li a0, 134
-; CHECK-F-NEXT:    vfwcvt.f.xu.v v16, v12
-; CHECK-F-NEXT:    vnsrl.wi v8, v16, 23
+; CHECK-F-NEXT:    vfwcvt.f.xu.v v8, v16
+; CHECK-F-NEXT:    vnsrl.wi v16, v8, 23
 ; CHECK-F-NEXT:    vsetvli zero, zero, e8, m2, ta, ma
-; CHECK-F-NEXT:    vnsrl.wi v12, v8, 0
-; CHECK-F-NEXT:    vrsub.vx v8, v12, a0
+; CHECK-F-NEXT:    vnsrl.wi v8, v16, 0
+; CHECK-F-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-F-NEXT:    li a0, 8
 ; CHECK-F-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-F-NEXT:    ret
@@ -302,13 +302,13 @@ define <vscale x 16 x i8> @ctlz_nxv16i8(<vscale x 16 x i8> %va) {
 ; CHECK-D-LABEL: ctlz_nxv16i8:
 ; CHECK-D:       # %bb.0:
 ; CHECK-D-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
-; CHECK-D-NEXT:    vzext.vf2 v12, v8
+; CHECK-D-NEXT:    vzext.vf2 v16, v8
 ; CHECK-D-NEXT:    li a0, 134
-; CHECK-D-NEXT:    vfwcvt.f.xu.v v16, v12
-; CHECK-D-NEXT:    vnsrl.wi v8, v16, 23
+; CHECK-D-NEXT:    vfwcvt.f.xu.v v8, v16
+; CHECK-D-NEXT:    vnsrl.wi v16, v8, 23
 ; CHECK-D-NEXT:    vsetvli zero, zero, e8, m2, ta, ma
-; CHECK-D-NEXT:    vnsrl.wi v12, v8, 0
-; CHECK-D-NEXT:    vrsub.vx v8, v12, a0
+; CHECK-D-NEXT:    vnsrl.wi v8, v16, 0
+; CHECK-D-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-D-NEXT:    li a0, 8
 ; CHECK-D-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-D-NEXT:    ret
@@ -1375,12 +1375,12 @@ define <vscale x 2 x i64> @ctlz_nxv2i64(<vscale x 2 x i64> %va) {
 ; CHECK-F-NEXT:    fsrmi a1, 1
 ; CHECK-F-NEXT:    vsetvli a2, zero, e32, m1, ta, ma
 ; CHECK-F-NEXT:    vfncvt.f.xu.w v10, v8
-; CHECK-F-NEXT:    vmv.v.x v8, a0
-; CHECK-F-NEXT:    vsrl.vi v9, v10, 23
-; CHECK-F-NEXT:    vwsubu.vv v10, v8, v9
+; CHECK-F-NEXT:    vmv.v.x v11, a0
+; CHECK-F-NEXT:    vsrl.vi v10, v10, 23
+; CHECK-F-NEXT:    vwsubu.vv v8, v11, v10
 ; CHECK-F-NEXT:    li a0, 64
 ; CHECK-F-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
-; CHECK-F-NEXT:    vminu.vx v8, v10, a0
+; CHECK-F-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-F-NEXT:    fsrm a1
 ; CHECK-F-NEXT:    ret
 ;
@@ -1515,12 +1515,12 @@ define <vscale x 4 x i64> @ctlz_nxv4i64(<vscale x 4 x i64> %va) {
 ; CHECK-F-NEXT:    fsrmi a1, 1
 ; CHECK-F-NEXT:    vsetvli a2, zero, e32, m2, ta, ma
 ; CHECK-F-NEXT:    vfncvt.f.xu.w v12, v8
-; CHECK-F-NEXT:    vmv.v.x v8, a0
-; CHECK-F-NEXT:    vsrl.vi v10, v12, 23
-; CHECK-F-NEXT:    vwsubu.vv v12, v8, v10
+; CHECK-F-NEXT:    vmv.v.x v14, a0
+; CHECK-F-NEXT:    vsrl.vi v12, v12, 23
+; CHECK-F-NEXT:    vwsubu.vv v8, v14, v12
 ; CHECK-F-NEXT:    li a0, 64
 ; CHECK-F-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
-; CHECK-F-NEXT:    vminu.vx v8, v12, a0
+; CHECK-F-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-F-NEXT:    fsrm a1
 ; CHECK-F-NEXT:    ret
 ;
@@ -1655,12 +1655,12 @@ define <vscale x 8 x i64> @ctlz_nxv8i64(<vscale x 8 x i64> %va) {
 ; CHECK-F-NEXT:    fsrmi a1, 1
 ; CHECK-F-NEXT:    vsetvli a2, zero, e32, m4, ta, ma
 ; CHECK-F-NEXT:    vfncvt.f.xu.w v16, v8
-; CHECK-F-NEXT:    vmv.v.x v8, a0
-; CHECK-F-NEXT:    vsrl.vi v12, v16, 23
-; CHECK-F-NEXT:    vwsubu.vv v16, v8, v12
+; CHECK-F-NEXT:    vmv.v.x v20, a0
+; CHECK-F-NEXT:    vsrl.vi v16, v16, 23
+; CHECK-F-NEXT:    vwsubu.vv v8, v20, v16
 ; CHECK-F-NEXT:    li a0, 64
 ; CHECK-F-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
-; CHECK-F-NEXT:    vminu.vx v8, v16, a0
+; CHECK-F-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-F-NEXT:    fsrm a1
 ; CHECK-F-NEXT:    ret
 ;
@@ -1832,11 +1832,11 @@ define <vscale x 4 x i8> @ctlz_zero_undef_nxv4i8(<vscale x 4 x i8> %va) {
 ; CHECK-F-LABEL: ctlz_zero_undef_nxv4i8:
 ; CHECK-F:       # %bb.0:
 ; CHECK-F-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
-; CHECK-F-NEXT:    vzext.vf2 v9, v8
-; CHECK-F-NEXT:    vfwcvt.f.xu.v v10, v9
-; CHECK-F-NEXT:    vnsrl.wi v8, v10, 23
+; CHECK-F-NEXT:    vzext.vf2 v10, v8
+; CHECK-F-NEXT:    vfwcvt.f.xu.v v8, v10
+; CHECK-F-NEXT:    vnsrl.wi v10, v8, 23
 ; CHECK-F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
-; CHECK-F-NEXT:    vnsrl.wi v8, v8, 0
+; CHECK-F-NEXT:    vnsrl.wi v8, v10, 0
 ; CHECK-F-NEXT:    li a0, 134
 ; CHECK-F-NEXT:    vrsu...
[truncated]

topperc · 2025-03-13T21:00:33Z

llvm/lib/Target/RISCV/RISCVRegisterInfo.td

@@ -752,18 +752,24 @@ def VR : VReg<!listconcat(VM1VTs, VMaskVTs),

 def VRNoV0 : VReg<!listconcat(VM1VTs, VMaskVTs), (sub VR, V0), 1>;

+let AllocationPriority = 2 in


Can we do this in side of VReg using the lmul?

In the current change, yes. I'd done it this way because I'd originally planned to have the NoV0 cases have different values. I'll switch.

Doing this revealed that I hadn't handled the segment tuple register classes. I decided to treat those as having the same priority as their lmul component; that is, I ignored NF. There might be a better heuristic here.

wangpc-pp

Thanks for looking at this! I tried this before but I can't remember why I dropped it (maybe it was simply because setting AllocationPriority couldn't fix #113489).
I do see some improvements and regressions but I don't think they are significant, this PR locates in improving compile-time since it avoids the later eviction? @lukel97 Can you also evaluate the performance please?

wangpc-pp · 2025-03-14T03:25:40Z

llvm/test/CodeGen/RISCV/rvv/expandload.ll

 ; CHECK-RV32-NEXT:    sub sp, sp, a1
-; CHECK-RV32-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x18, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 24 * vlenb
+; CHECK-RV32-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x20, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 32 * vlenb


Regression.

wangpc-pp · 2025-03-14T03:30:02Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll

 ; RV32-NEXT:    mul a2, a2, a3
 ; RV32-NEXT:    sub sp, sp, a2
-; RV32-NEXT:    .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0xe4, 0x00, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 100 * vlenb
+; RV32-NEXT:    .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0xe0, 0x00, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 96 * vlenb


Improvement.

wangpc-pp · 2025-03-14T03:32:57Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll

+; RV32-NEXT:    mul a2, a2, a3
+; RV32-NEXT:    add a2, sp, a2
+; RV32-NEXT:    addi a2, a2, 16
+; RV32-NEXT:    vs2r.v v14, (a2) # Unknown-size Folded Spill


Use less memory but cause more spills?

lukel97 · 2025-03-14T09:06:26Z

Thanks for looking at this! I tried this before but I can't remember why I dropped it (maybe it was simply because setting AllocationPriority couldn't fix #113489). I do see some improvements and regressions but I don't think they are significant, this PR locates in improving compile-time since it avoids the later eviction? @lukel97 Can you also evaluate the performance please?

I've kicked off a run on a banana pi now, it should be done over the weekend

lukel97 · 2025-03-17T18:54:47Z

Results on rva22u64_v -O3 -flto: https://lnt.lukelau.me/db_default/v4/nts/311

preames · 2025-03-17T19:21:54Z

Results on rva22u64_v -O3 -flto: https://lnt.lukelau.me/db_default/v4/nts/311

Looks like we saw a small improvement in x264 and not much else. That's actually a bit better than I'd expected; definitely nothing problematic at least.

wangpc-pp

I'd like to see this being landed.
LGTM but please wait for one more approval since this changes a lot.

The cost of a vector spill/reload may vary highly depending on the size of the vector register being spilled, i.e. LMUL, so the usual regalloc.NumSpills/regalloc.NumReloads statistics may not be an accurate reflection of the total cost. This adds two new statistics for RISCVInstrInfo that collects the total LMUL for vector register spills and reloads. It can be used to get a better idea of regalloc changes in e.g. llvm#131176 llvm#113675

lukel97

This seems to increase the overall LMUL spilled on 538.imagick_r:

Program                                       riscv-instr-info.TotalLMULSpilled               riscv-instr-info.TotalLMULReloaded              
                                              lhs                               rhs     diff  lhs                                rhs     diff 
FP2017rate/538.imagick_r/538.imagick_r        4239.00                           5082.00 19.9% 6697.00                            7321.00  9.3%
FP2017speed/638.imagick_s/638.imagick_s       4239.00                           5082.00 19.9% 6697.00                            7321.00  9.3%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s    132.00                            134.00  1.5%  274.00                             248.00 -9.5%
INT2017rat...31.deepsjeng_r/531.deepsjeng_r    132.00                            134.00  1.5%  274.00                             248.00 -9.5%
INT2017rate/520.omnetpp_r/520.omnetpp_r          4.00                              4.00  0.0%    5.00                               5.00  0.0%
INT2017speed/625.x264_s/625.x264_s              83.00                             83.00  0.0%   93.00                              93.00  0.0%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s      6.00                              6.00  0.0%    6.00                               6.00  0.0%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s      4.00                              4.00  0.0%    5.00                               5.00  0.0%
INT2017speed/602.gcc_s/602.gcc_s                85.00                             85.00  0.0%   91.00                              91.00  0.0%
INT2017spe...00.perlbench_s/600.perlbench_s      4.00                              4.00  0.0%    8.00                               8.00  0.0%
INT2017rate/525.x264_r/525.x264_r               83.00                             83.00  0.0%   93.00                              93.00  0.0%
INT2017rat...23.xalancbmk_r/523.xalancbmk_r      6.00                              6.00  0.0%    6.00                               6.00  0.0%
FP2017rate/508.namd_r/508.namd_r                 1.00                              1.00  0.0%    6.00                               6.00  0.0%
FP2017rate/510.parest_r/510.parest_r          1084.00                           1084.00  0.0% 1368.00                            1368.00  0.0%
INT2017rat...00.perlbench_r/500.perlbench_r      4.00                              4.00  0.0%    8.00                               8.00  0.0%
FP2017speed/644.nab_s/644.nab_s                 25.00                             25.00  0.0%   25.00                              25.00  0.0%
FP2017speed/619.lbm_s/619.lbm_s                 38.00                             38.00  0.0%   38.00                              38.00  0.0%
FP2017rate/544.nab_r/544.nab_r                  25.00                             25.00  0.0%   25.00                              25.00  0.0%
FP2017rate/519.lbm_r/519.lbm_r                  38.00                             38.00  0.0%   38.00                              38.00  0.0%
FP2017rate/511.povray_r/511.povray_r           121.00                            121.00  0.0%  138.00                             138.00  0.0%
INT2017rate/502.gcc_r/502.gcc_r                 85.00                             85.00  0.0%   91.00                              91.00  0.0%
FP2017rate/526.blender_r/526.blender_r        1159.00                           1155.00 -0.3% 1292.00                            1298.00  0.5%

With that said though there's no actual impact on the runtime performance of that benchmark so I presume these spills are on the cold path. I looked at the codegen changes and the code there is pretty bad to begin with anyway.

I'm just flagging this FYI, I'm happy for this to land in any case.

preames · 2025-03-18T15:33:28Z

This seems to increase the overall LMUL spilled on 538.imagick_r:
(snip)
With that said though there's no actual impact on the runtime performance of that benchmark so I presume these spills are on the cold path. I looked at the codegen changes and the code there is pretty bad to begin with anyway.

I'm just flagging this FYI, I'm happy for this to land in any case.

I wrote this down to follow up on. At a minimum, it may be an interesting register allocation case I can find something from.

The cost of a vector spill/reload may vary highly depending on the size of the vector register being spilled, i.e. LMUL, so the usual regalloc.NumSpills/regalloc.NumReloads statistics may not be an accurate reflection of the total cost. This adds two new statistics for RISCVInstrInfo that collects the total number of vector registers spilled/reloaded within groups. It can be used to get a better idea of regalloc changes in e.g. #131176 #113675

preames requested review from mikhailramalho, lukel97, topperc and wangpc-pp March 13, 2025 17:55

llvmbot added the backend:RISC-V label Mar 13, 2025

preames mentioned this pull request Mar 13, 2025

[RegAlloc] Scale the spill weight by target factor #113675

Merged

preames requested a review from arsenm March 13, 2025 18:00

topperc reviewed Mar 13, 2025

View reviewed changes

Address review comment

6cf0b43

wangpc-pp reviewed Mar 14, 2025

View reviewed changes

wangpc-pp approved these changes Mar 18, 2025

View reviewed changes

lukel97 mentioned this pull request Mar 18, 2025

[RISCV] Add statistics for total LMUL spilled/reloaded #131747

Merged

lukel97 approved these changes Mar 18, 2025

View reviewed changes

Merge branch 'main' into pr-riscv-allocation-priority

def2991

preames merged commit 2175c6c into llvm:main Mar 18, 2025
6 of 9 checks passed

preames deleted the pr-riscv-allocation-priority branch March 18, 2025 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Set AllocationPriority in line with LMUL #131176

[RISCV] Set AllocationPriority in line with LMUL #131176

Uh oh!

preames commented Mar 13, 2025

Uh oh!

llvmbot commented Mar 13, 2025

Uh oh!

topperc Mar 13, 2025

Uh oh!

preames Mar 13, 2025

Uh oh!

preames Mar 14, 2025

Uh oh!

wangpc-pp left a comment

Uh oh!

wangpc-pp Mar 14, 2025

Uh oh!

wangpc-pp Mar 14, 2025

Uh oh!

wangpc-pp Mar 14, 2025

Uh oh!

lukel97 commented Mar 14, 2025

Uh oh!

lukel97 commented Mar 17, 2025

Uh oh!

preames commented Mar 17, 2025

Uh oh!

wangpc-pp left a comment

Uh oh!

lukel97 left a comment

Uh oh!

Uh oh!

preames commented Mar 18, 2025

Uh oh!

Uh oh!

		@@ -752,18 +752,24 @@ def VR : VReg<!listconcat(VM1VTs, VMaskVTs),

		def VRNoV0 : VReg<!listconcat(VM1VTs, VMaskVTs), (sub VR, V0), 1>;

		let AllocationPriority = 2 in

[RISCV] Set AllocationPriority in line with LMUL #131176

[RISCV] Set AllocationPriority in line with LMUL #131176

Uh oh!

Conversation

preames commented Mar 13, 2025

Uh oh!

llvmbot commented Mar 13, 2025

Uh oh!

topperc Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

preames Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

preames Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

wangpc-pp left a comment

Choose a reason for hiding this comment

Uh oh!

wangpc-pp Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

wangpc-pp Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

wangpc-pp Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Mar 14, 2025

Uh oh!

lukel97 commented Mar 17, 2025

Uh oh!

preames commented Mar 17, 2025

Uh oh!

wangpc-pp left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

preames commented Mar 18, 2025

Uh oh!

Uh oh!