Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Set AllocationPriority in line with LMUL #131176

Merged
merged 3 commits into from
Mar 18, 2025

Conversation

preames
Copy link
Collaborator

@preames preames commented Mar 13, 2025

This mechanism causes the greedy register allocator to prefer allocating register classes with higher priority first. This helps to ensure that high LMUL registers obtain a register without having to go through the eviction mechanism. In practice, it seems to cause a bunch of code churn, and some minor improvement around widening and narrowing operations.

In a few of the widening tests, we have what look like code size regressions because we end up with two smaller register class copies instead of one larger one after the instruction. However, in any larger code sequence, these are likely to be folded into the producing instructions. (But so were the wider copies after the operation.)

Two observations:

  1. We're not setting the greedy-regclass-priority-trumps-globalness flag
    on the register class, so this doesn't help long mask ranges. I
    thought about doing that, but the benefit is non-obvious, so I
    decided it was worth a separate change at minimum.
  2. We could arguably set the priority higher for the register classes
    that exclude v0. I tried that, and it caused a whole bunch of
    further churn. I may return to it in a separate patch.

This mechanism causes the greedy register allocator to prefer allocating
register classes with higher priority first.  This helps to ensure that
high LMUL registers obtain a register without having to go through the
eviction mechanism.  In practice, it seems to cause a bunch of code
churn, and some minor improvement around widening and narrowing
operations.

In a few of the widening tests, we have what look like code size
regressions because we end up with two smaller register class copies
instead of one larger one after the instruction.  However, in any
larger code sequence, these are likely to be folded into the producing
instructions.  (But so were the wider copies after the operation.)

Two observations:
1) We're not setting the greedy-regclass-priority-trumps-globalness flag
   on the register class, so this doesn't help long mask ranges.  I
   thought about doing that, but the benefit is non-obvious, so I
   decided it was worth a separate change at minimum.
2) We could arguably set the priority higher for the register classes
   that exclude v0.  I tried that, and it caused a whole bunch of
   further churn.  I may return to it in a separate patch.
@llvmbot
Copy link
Member

llvmbot commented Mar 13, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This mechanism causes the greedy register allocator to prefer allocating register classes with higher priority first. This helps to ensure that high LMUL registers obtain a register without having to go through the eviction mechanism. In practice, it seems to cause a bunch of code churn, and some minor improvement around widening and narrowing operations.

In a few of the widening tests, we have what look like code size regressions because we end up with two smaller register class copies instead of one larger one after the instruction. However, in any larger code sequence, these are likely to be folded into the producing instructions. (But so were the wider copies after the operation.)

Two observations:

  1. We're not setting the greedy-regclass-priority-trumps-globalness flag
    on the register class, so this doesn't help long mask ranges. I
    thought about doing that, but the benefit is non-obvious, so I
    decided it was worth a separate change at minimum.
  2. We could arguably set the priority higher for the register classes
    that exclude v0. I tried that, and it caused a whole bunch of
    further churn. I may return to it in a separate patch.

Patch is 1.47 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131176.diff

179 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.td (+6)
  • (modified) llvm/test/CodeGen/RISCV/redundant-copy-from-tail-duplicate.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/combine-store-extract-crash.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/rvv/common-shuffle-patterns.ll (+4-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/compressstore.ll (+25-25)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll (+68-68)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll (+46-46)
  • (modified) llvm/test/CodeGen/RISCV/rvv/cttz-sdnode.ll (+103-103)
  • (modified) llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll (+65-65)
  • (modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+646-636)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abs.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll (+94-94)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll (+116-116)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-conv.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-interleave.ll (+39-27)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-setcc.ll (+108-108)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll (+728-728)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp2i-sat.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp2i.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpext-vp.ll (+8-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fptosi-vp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fptoui-vp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-i2fp.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+33-33)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-exttrunc.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-interleave.ll (+48-33)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll (+23-23)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll (+252-231)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-inttoptr-ptrtoint.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-llrint-vp.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-llrint.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-lrint-vp.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-lrint.ll (+38-38)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-sad.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-scalarized.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-fp-vp.ll (+274-258)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-sext-vp.ll (+18-17)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll (+25-25)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll (+19-19)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-reverse.ll (+138-138)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-rotate.ll (+21-21)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shufflevector-vnsrl.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-sitofp-vp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-uitofp-vp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfadd-vp.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfdiv-vp.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmax.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmin.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmul-vp.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfpext-constrained-sdnode.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfptoi-constrained-sdnode.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfsub-vp.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwmacc.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vitofp-constrained-sdnode.ll (+26-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpgather.ll (+24-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vrol.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vror.ll (+38-38)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwadd-mask.ll (+17-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+5-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsll.ll (+147-141)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsub-mask.ll (+13-11)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-zext-vp.ll (+18-17)
  • (modified) llvm/test/CodeGen/RISCV/rvv/float-round-conv.ll (+28-28)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-vp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fminimum-vp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fptosi-sat.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fptoui-sat.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/half-round-conv.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/interleave-crash.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/rvv/intrinsic-vector-match.ll (+90-90)
  • (modified) llvm/test/CodeGen/RISCV/rvv/llrint-sdnode.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/llrint-vp.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/lrint-sdnode.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/lrint-vp.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/mgather-sdnode.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/rvv/named-vector-shuffle-reverse.ll (+140-140)
  • (modified) llvm/test/CodeGen/RISCV/rvv/narrow-shift-extend.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/pr61561.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/pr95865.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/setcc-fp-vp.ll (+66-64)
  • (modified) llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vcpop-shl-zext-opt.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll (+36-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll (+58-58)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll (+58-50)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll (+394-376)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vexts-sdnode.ll (+72-72)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfadd-constrained-sdnode.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfadd-vp.ll (+36-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfcopysign-sdnode.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfdiv-constrained-sdnode.ll (+24-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfdiv-vp.ll (+36-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll (+537-659)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfmsub-constrained-sdnode.ll (+25-57)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfmul-constrained-sdnode.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfmul-vp.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfnmadd-constrained-sdnode.ll (+35-35)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfnmsub-constrained-sdnode.ll (+17-17)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfpext-constrained-sdnode.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfpext-sdnode.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfpext-vp.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfptoi-constrained-sdnode.ll (+24-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfptoi-sdnode.ll (+24-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfptosi-vp.ll (+9-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfptoui-vp.ll (+9-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfptrunc-vp.ll (+17-17)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfsub-constrained-sdnode.ll (+24-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfsub-vp.ll (+36-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwadd-sdnode.ll (+21-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwadd.ll (+54-48)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwadd.w.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-f-f.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-f-x.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-f-xu.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-rtz-x-f.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-rtz-xu-f.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-x-f.ll (+24-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwcvt-xu-f.ll (+24-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwcvtbf16-f-f.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwmacc-vp.ll (+38-40)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwmsac-vp.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwmul-sdnode.ll (+21-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwmul.ll (+54-48)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwnmacc-vp.ll (+56-58)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwnmsac-vp.ll (+56-58)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwsub-sdnode.ll (+21-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwsub.ll (+54-48)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfwsub.w.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vitofp-constrained-sdnode.ll (+36-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vitofp-sdnode.ll (+56-56)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vl-opt-instrs.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vloxei.ll (+72-72)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vluxei.ll (+72-72)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vp-inttoptr-ptrtoint.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vp-vector-interleaved-access.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vpgather-sdnode.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vpmerge-sdnode.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vrol-sdnode.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vror-sdnode.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll (+32-32)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsext-vp.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsext.ll (+36-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsitofp-vp.ll (+9-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vtrunc-vp.ll (+12-11)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vuitofp-vp.ll (+9-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwadd-mask-sdnode.ll (+17-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwadd-sdnode.ll (+58-52)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwadd.ll (+45-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwadd.w.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwaddu.ll (+45-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwaddu.w.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwmul-sdnode.ll (+45-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwmul.ll (+45-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwmulsu.ll (+45-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwmulu.ll (+45-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwsll-sdnode.ll (+132-126)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwsll-vp.ll (+111-105)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwsll.ll (+63-54)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwsub-mask-sdnode.ll (+13-11)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwsub-sdnode.ll (+54-48)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwsub.ll (+45-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwsub.w.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwsubu.ll (+45-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwsubu.w.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vzext-vp.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vzext.ll (+36-36)
  • (modified) llvm/test/CodeGen/RISCV/rvv/zvbb-demanded-bits.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+13-13)
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
index a5dfb5ba1a2fc..1e0541e667895 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
@@ -752,18 +752,24 @@ def VR : VReg<!listconcat(VM1VTs, VMaskVTs),
 
 def VRNoV0 : VReg<!listconcat(VM1VTs, VMaskVTs), (sub VR, V0), 1>;
 
+let AllocationPriority = 2 in
 def VRM2 : VReg<VM2VTs, (add (sequence "V%uM2", 8, 31, 2),
                              (sequence "V%uM2", 6, 0, 2)), 2>;
 
+let AllocationPriority = 2 in
 def VRM2NoV0 : VReg<VM2VTs, (sub VRM2, V0M2), 2>;
 
+let AllocationPriority = 4 in
 def VRM4 : VReg<VM4VTs, (add V8M4, V12M4, V16M4, V20M4,
                              V24M4, V28M4, V4M4, V0M4), 4>;
 
+let AllocationPriority = 4 in
 def VRM4NoV0 : VReg<VM4VTs, (sub VRM4, V0M4), 4>;
 
+let AllocationPriority = 8 in
 def VRM8 : VReg<VM8VTs, (add V8M8, V16M8, V24M8, V0M8), 8>;
 
+let AllocationPriority = 8 in
 def VRM8NoV0 : VReg<VM8VTs, (sub VRM8, V0M8), 8>;
 
 def VMV0 : VReg<VMaskVTs, (add V0), 1>;
diff --git a/llvm/test/CodeGen/RISCV/redundant-copy-from-tail-duplicate.ll b/llvm/test/CodeGen/RISCV/redundant-copy-from-tail-duplicate.ll
index 5d588ad66b9ca..15b5698c22e81 100644
--- a/llvm/test/CodeGen/RISCV/redundant-copy-from-tail-duplicate.ll
+++ b/llvm/test/CodeGen/RISCV/redundant-copy-from-tail-duplicate.ll
@@ -20,10 +20,10 @@ define signext i32 @sum(ptr %a, i32 signext %n, i1 %prof.min.iters.check, <vscal
 ; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB0_4: # %vector.ph
 ; CHECK-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
-; CHECK-NEXT:    vmv.s.x v8, zero
-; CHECK-NEXT:    vmv.v.i v12, 0
+; CHECK-NEXT:    vmv.s.x v12, zero
+; CHECK-NEXT:    vmv.v.i v8, 0
 ; CHECK-NEXT:    vsetivli zero, 1, e32, m4, ta, ma
-; CHECK-NEXT:    vredsum.vs v8, v12, v8, v0.t
+; CHECK-NEXT:    vredsum.vs v8, v8, v12, v0.t
 ; CHECK-NEXT:    vmv.x.s a0, v8
 ; CHECK-NEXT:    ret
 entry:
diff --git a/llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll b/llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll
index 4ade6c09fe43d..ec422a8fbb928 100644
--- a/llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll
@@ -106,12 +106,12 @@ define <32 x i1> @fv32(ptr %p, i64 %index, i64 %tc) {
 ; CHECK-NEXT:    lui a0, %hi(.LCPI8_0)
 ; CHECK-NEXT:    addi a0, a0, %lo(.LCPI8_0)
 ; CHECK-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
-; CHECK-NEXT:    vle8.v v8, (a0)
-; CHECK-NEXT:    vid.v v16
-; CHECK-NEXT:    vsaddu.vx v16, v16, a1
-; CHECK-NEXT:    vmsltu.vx v0, v16, a2
-; CHECK-NEXT:    vsext.vf8 v16, v8
-; CHECK-NEXT:    vsaddu.vx v8, v16, a1
+; CHECK-NEXT:    vle8.v v16, (a0)
+; CHECK-NEXT:    vid.v v8
+; CHECK-NEXT:    vsaddu.vx v8, v8, a1
+; CHECK-NEXT:    vmsltu.vx v0, v8, a2
+; CHECK-NEXT:    vsext.vf8 v8, v16
+; CHECK-NEXT:    vsaddu.vx v8, v8, a1
 ; CHECK-NEXT:    vmsltu.vx v16, v8, a2
 ; CHECK-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
 ; CHECK-NEXT:    vslideup.vi v0, v16, 2
diff --git a/llvm/test/CodeGen/RISCV/rvv/combine-store-extract-crash.ll b/llvm/test/CodeGen/RISCV/rvv/combine-store-extract-crash.ll
index 482cf83d540c4..496755738e6fa 100644
--- a/llvm/test/CodeGen/RISCV/rvv/combine-store-extract-crash.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/combine-store-extract-crash.ll
@@ -9,21 +9,21 @@ define void @test(ptr %ref_array, ptr %sad_array) {
 ; RV32:       # %bb.0: # %entry
 ; RV32-NEXT:    th.lwd a2, a3, (a0), 0, 3
 ; RV32-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
-; RV32-NEXT:    vle8.v v8, (a2)
+; RV32-NEXT:    vle8.v v12, (a2)
 ; RV32-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; RV32-NEXT:    vzext.vf4 v12, v8
-; RV32-NEXT:    vmv.s.x v8, zero
-; RV32-NEXT:    vredsum.vs v9, v12, v8
-; RV32-NEXT:    vmv.x.s a0, v9
+; RV32-NEXT:    vzext.vf4 v8, v12
+; RV32-NEXT:    vmv.s.x v12, zero
+; RV32-NEXT:    vredsum.vs v8, v8, v12
+; RV32-NEXT:    vmv.x.s a0, v8
 ; RV32-NEXT:    th.swia a0, (a1), 4, 0
 ; RV32-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
-; RV32-NEXT:    vle8.v v9, (a3)
-; RV32-NEXT:    vmv.v.i v10, 0
+; RV32-NEXT:    vle8.v v13, (a3)
+; RV32-NEXT:    vmv.v.i v8, 0
 ; RV32-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
-; RV32-NEXT:    vslideup.vi v9, v10, 4
+; RV32-NEXT:    vslideup.vi v13, v8, 4
 ; RV32-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; RV32-NEXT:    vzext.vf4 v12, v9
-; RV32-NEXT:    vredsum.vs v8, v12, v8
+; RV32-NEXT:    vzext.vf4 v8, v13
+; RV32-NEXT:    vredsum.vs v8, v8, v12
 ; RV32-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV32-NEXT:    vse32.v v8, (a1)
 ; RV32-NEXT:    ret
@@ -32,21 +32,21 @@ define void @test(ptr %ref_array, ptr %sad_array) {
 ; RV64:       # %bb.0: # %entry
 ; RV64-NEXT:    th.ldd a2, a3, (a0), 0, 4
 ; RV64-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
-; RV64-NEXT:    vle8.v v8, (a2)
+; RV64-NEXT:    vle8.v v12, (a2)
 ; RV64-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; RV64-NEXT:    vzext.vf4 v12, v8
-; RV64-NEXT:    vmv.s.x v8, zero
-; RV64-NEXT:    vredsum.vs v9, v12, v8
-; RV64-NEXT:    vmv.x.s a0, v9
+; RV64-NEXT:    vzext.vf4 v8, v12
+; RV64-NEXT:    vmv.s.x v12, zero
+; RV64-NEXT:    vredsum.vs v8, v8, v12
+; RV64-NEXT:    vmv.x.s a0, v8
 ; RV64-NEXT:    th.swia a0, (a1), 4, 0
 ; RV64-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
-; RV64-NEXT:    vle8.v v9, (a3)
-; RV64-NEXT:    vmv.v.i v10, 0
+; RV64-NEXT:    vle8.v v13, (a3)
+; RV64-NEXT:    vmv.v.i v8, 0
 ; RV64-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
-; RV64-NEXT:    vslideup.vi v9, v10, 4
+; RV64-NEXT:    vslideup.vi v13, v8, 4
 ; RV64-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; RV64-NEXT:    vzext.vf4 v12, v9
-; RV64-NEXT:    vredsum.vs v8, v12, v8
+; RV64-NEXT:    vzext.vf4 v8, v13
+; RV64-NEXT:    vredsum.vs v8, v8, v12
 ; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV64-NEXT:    vse32.v v8, (a1)
 ; RV64-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/rvv/common-shuffle-patterns.ll b/llvm/test/CodeGen/RISCV/rvv/common-shuffle-patterns.ll
index 1845c0e4bd3b6..7649d9ad6059f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/common-shuffle-patterns.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/common-shuffle-patterns.ll
@@ -8,10 +8,11 @@ define dso_local <16 x i16> @interleave(<8 x i16> %v0, <8 x i16> %v1) {
 ; CHECK-LABEL: interleave:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; CHECK-NEXT:    vwaddu.vv v10, v8, v9
+; CHECK-NEXT:    vmv1r.v v10, v9
+; CHECK-NEXT:    vmv1r.v v11, v8
+; CHECK-NEXT:    vwaddu.vv v8, v11, v10
 ; CHECK-NEXT:    li a0, -1
-; CHECK-NEXT:    vwmaccu.vx v10, a0, v9
-; CHECK-NEXT:    vmv2r.v v8, v10
+; CHECK-NEXT:    vwmaccu.vx v8, a0, v10
 ; CHECK-NEXT:    ret
 entry:
   %v2 = shufflevector <8 x i16> %v0, <8 x i16> poison, <16 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 2, i32 undef, i32 3, i32 undef, i32 4, i32 undef, i32 5, i32 undef, i32 6, i32 undef, i32 7, i32 undef>
diff --git a/llvm/test/CodeGen/RISCV/rvv/compressstore.ll b/llvm/test/CodeGen/RISCV/rvv/compressstore.ll
index 61fb457a7eb65..69822e9d9d2e3 100644
--- a/llvm/test/CodeGen/RISCV/rvv/compressstore.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/compressstore.ll
@@ -200,12 +200,12 @@ define void @test_compresstore_v256i8(ptr %p, <256 x i1> %mask, <256 x i8> %data
 ; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; RV64-NEXT:    vmv1r.v v7, v8
 ; RV64-NEXT:    li a2, 128
-; RV64-NEXT:    vslidedown.vi v9, v0, 1
+; RV64-NEXT:    vslidedown.vi v8, v0, 1
 ; RV64-NEXT:    vmv.x.s a3, v0
 ; RV64-NEXT:    vsetvli zero, a2, e8, m8, ta, ma
 ; RV64-NEXT:    vle8.v v24, (a1)
 ; RV64-NEXT:    vsetvli zero, a2, e64, m1, ta, ma
-; RV64-NEXT:    vmv.x.s a1, v9
+; RV64-NEXT:    vmv.x.s a1, v8
 ; RV64-NEXT:    vsetvli zero, a2, e8, m8, ta, ma
 ; RV64-NEXT:    vcompress.vm v8, v16, v0
 ; RV64-NEXT:    vcpop.m a4, v0
@@ -227,14 +227,14 @@ define void @test_compresstore_v256i8(ptr %p, <256 x i1> %mask, <256 x i8> %data
 ; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; RV32-NEXT:    vmv1r.v v7, v8
 ; RV32-NEXT:    li a2, 128
-; RV32-NEXT:    vslidedown.vi v9, v0, 1
+; RV32-NEXT:    vslidedown.vi v8, v0, 1
 ; RV32-NEXT:    li a3, 32
 ; RV32-NEXT:    vmv.x.s a4, v0
 ; RV32-NEXT:    vsetvli zero, a2, e8, m8, ta, ma
 ; RV32-NEXT:    vle8.v v24, (a1)
 ; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
-; RV32-NEXT:    vsrl.vx v6, v9, a3
-; RV32-NEXT:    vmv.x.s a1, v9
+; RV32-NEXT:    vsrl.vx v6, v8, a3
+; RV32-NEXT:    vmv.x.s a1, v8
 ; RV32-NEXT:    vsrl.vx v5, v0, a3
 ; RV32-NEXT:    vsetvli zero, a2, e8, m8, ta, ma
 ; RV32-NEXT:    vcompress.vm v8, v16, v0
@@ -438,16 +438,16 @@ define void @test_compresstore_v128i16(ptr %p, <128 x i1> %mask, <128 x i16> %da
 ; RV64-NEXT:    vcompress.vm v24, v8, v0
 ; RV64-NEXT:    vcpop.m a2, v0
 ; RV64-NEXT:    vsetivli zero, 8, e8, m1, ta, ma
-; RV64-NEXT:    vslidedown.vi v8, v0, 8
+; RV64-NEXT:    vslidedown.vi v7, v0, 8
 ; RV64-NEXT:    vsetvli zero, a1, e16, m8, ta, ma
-; RV64-NEXT:    vcompress.vm v0, v16, v8
-; RV64-NEXT:    vcpop.m a1, v8
+; RV64-NEXT:    vcompress.vm v8, v16, v7
+; RV64-NEXT:    vcpop.m a1, v7
 ; RV64-NEXT:    vsetvli zero, a2, e16, m8, ta, ma
 ; RV64-NEXT:    vse16.v v24, (a0)
 ; RV64-NEXT:    slli a2, a2, 1
 ; RV64-NEXT:    add a0, a0, a2
 ; RV64-NEXT:    vsetvli zero, a1, e16, m8, ta, ma
-; RV64-NEXT:    vse16.v v0, (a0)
+; RV64-NEXT:    vse16.v v8, (a0)
 ; RV64-NEXT:    ret
 ;
 ; RV32-LABEL: test_compresstore_v128i16:
@@ -635,16 +635,16 @@ define void @test_compresstore_v64i32(ptr %p, <64 x i1> %mask, <64 x i32> %data)
 ; RV64-NEXT:    vsetvli zero, a2, e32, m8, ta, ma
 ; RV64-NEXT:    vse32.v v24, (a0)
 ; RV64-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
-; RV64-NEXT:    vslidedown.vi v8, v0, 4
+; RV64-NEXT:    vslidedown.vi v24, v0, 4
 ; RV64-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
 ; RV64-NEXT:    vmv.x.s a1, v0
-; RV64-NEXT:    vcompress.vm v24, v16, v8
-; RV64-NEXT:    vcpop.m a2, v8
+; RV64-NEXT:    vcompress.vm v8, v16, v24
+; RV64-NEXT:    vcpop.m a2, v24
 ; RV64-NEXT:    cpopw a1, a1
 ; RV64-NEXT:    slli a1, a1, 2
 ; RV64-NEXT:    add a0, a0, a1
 ; RV64-NEXT:    vsetvli zero, a2, e32, m8, ta, ma
-; RV64-NEXT:    vse32.v v24, (a0)
+; RV64-NEXT:    vse32.v v8, (a0)
 ; RV64-NEXT:    ret
 ;
 ; RV32-LABEL: test_compresstore_v64i32:
@@ -654,16 +654,16 @@ define void @test_compresstore_v64i32(ptr %p, <64 x i1> %mask, <64 x i32> %data)
 ; RV32-NEXT:    vcompress.vm v24, v8, v0
 ; RV32-NEXT:    vcpop.m a2, v0
 ; RV32-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
-; RV32-NEXT:    vslidedown.vi v8, v0, 4
+; RV32-NEXT:    vslidedown.vi v7, v0, 4
 ; RV32-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
-; RV32-NEXT:    vcompress.vm v0, v16, v8
-; RV32-NEXT:    vcpop.m a1, v8
+; RV32-NEXT:    vcompress.vm v8, v16, v7
+; RV32-NEXT:    vcpop.m a1, v7
 ; RV32-NEXT:    vsetvli zero, a2, e32, m8, ta, ma
 ; RV32-NEXT:    vse32.v v24, (a0)
 ; RV32-NEXT:    slli a2, a2, 2
 ; RV32-NEXT:    add a0, a0, a2
 ; RV32-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
-; RV32-NEXT:    vse32.v v0, (a0)
+; RV32-NEXT:    vse32.v v8, (a0)
 ; RV32-NEXT:    ret
 entry:
   tail call void @llvm.masked.compressstore.v64i32(<64 x i32> %data, ptr align 4 %p, <64 x i1> %mask)
@@ -796,18 +796,18 @@ define void @test_compresstore_v32i64(ptr %p, <32 x i1> %mask, <32 x i64> %data)
 ; RV64-NEXT:    vsetvli zero, a1, e64, m8, ta, ma
 ; RV64-NEXT:    vse64.v v24, (a0)
 ; RV64-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
-; RV64-NEXT:    vslidedown.vi v8, v0, 2
+; RV64-NEXT:    vslidedown.vi v24, v0, 2
 ; RV64-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; RV64-NEXT:    vmv.x.s a1, v0
 ; RV64-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
-; RV64-NEXT:    vcompress.vm v24, v16, v8
+; RV64-NEXT:    vcompress.vm v8, v16, v24
 ; RV64-NEXT:    zext.h a1, a1
 ; RV64-NEXT:    cpopw a1, a1
 ; RV64-NEXT:    slli a1, a1, 3
 ; RV64-NEXT:    add a0, a0, a1
-; RV64-NEXT:    vcpop.m a1, v8
+; RV64-NEXT:    vcpop.m a1, v24
 ; RV64-NEXT:    vsetvli zero, a1, e64, m8, ta, ma
-; RV64-NEXT:    vse64.v v24, (a0)
+; RV64-NEXT:    vse64.v v8, (a0)
 ; RV64-NEXT:    ret
 ;
 ; RV32-LABEL: test_compresstore_v32i64:
@@ -818,18 +818,18 @@ define void @test_compresstore_v32i64(ptr %p, <32 x i1> %mask, <32 x i64> %data)
 ; RV32-NEXT:    vsetvli zero, a1, e64, m8, ta, ma
 ; RV32-NEXT:    vse64.v v24, (a0)
 ; RV32-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
-; RV32-NEXT:    vslidedown.vi v8, v0, 2
+; RV32-NEXT:    vslidedown.vi v24, v0, 2
 ; RV32-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; RV32-NEXT:    vmv.x.s a1, v0
 ; RV32-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
-; RV32-NEXT:    vcompress.vm v24, v16, v8
+; RV32-NEXT:    vcompress.vm v8, v16, v24
 ; RV32-NEXT:    zext.h a1, a1
 ; RV32-NEXT:    cpop a1, a1
 ; RV32-NEXT:    slli a1, a1, 3
 ; RV32-NEXT:    add a0, a0, a1
-; RV32-NEXT:    vcpop.m a1, v8
+; RV32-NEXT:    vcpop.m a1, v24
 ; RV32-NEXT:    vsetvli zero, a1, e64, m8, ta, ma
-; RV32-NEXT:    vse64.v v24, (a0)
+; RV32-NEXT:    vse64.v v8, (a0)
 ; RV32-NEXT:    ret
 entry:
   tail call void @llvm.masked.compressstore.v32i64(<32 x i64> %data, ptr align 8 %p, <32 x i1> %mask)
diff --git a/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll
index 208735b18cbab..97e1a7f41b92f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll
@@ -162,12 +162,12 @@ define <vscale x 4 x i8> @ctlz_nxv4i8(<vscale x 4 x i8> %va) {
 ; CHECK-F-LABEL: ctlz_nxv4i8:
 ; CHECK-F:       # %bb.0:
 ; CHECK-F-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
-; CHECK-F-NEXT:    vzext.vf2 v9, v8
+; CHECK-F-NEXT:    vzext.vf2 v10, v8
 ; CHECK-F-NEXT:    li a0, 134
-; CHECK-F-NEXT:    vfwcvt.f.xu.v v10, v9
-; CHECK-F-NEXT:    vnsrl.wi v8, v10, 23
+; CHECK-F-NEXT:    vfwcvt.f.xu.v v8, v10
+; CHECK-F-NEXT:    vnsrl.wi v10, v8, 23
 ; CHECK-F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
-; CHECK-F-NEXT:    vnsrl.wi v8, v8, 0
+; CHECK-F-NEXT:    vnsrl.wi v8, v10, 0
 ; CHECK-F-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-F-NEXT:    li a0, 8
 ; CHECK-F-NEXT:    vminu.vx v8, v8, a0
@@ -176,12 +176,12 @@ define <vscale x 4 x i8> @ctlz_nxv4i8(<vscale x 4 x i8> %va) {
 ; CHECK-D-LABEL: ctlz_nxv4i8:
 ; CHECK-D:       # %bb.0:
 ; CHECK-D-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
-; CHECK-D-NEXT:    vzext.vf2 v9, v8
+; CHECK-D-NEXT:    vzext.vf2 v10, v8
 ; CHECK-D-NEXT:    li a0, 134
-; CHECK-D-NEXT:    vfwcvt.f.xu.v v10, v9
-; CHECK-D-NEXT:    vnsrl.wi v8, v10, 23
+; CHECK-D-NEXT:    vfwcvt.f.xu.v v8, v10
+; CHECK-D-NEXT:    vnsrl.wi v10, v8, 23
 ; CHECK-D-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
-; CHECK-D-NEXT:    vnsrl.wi v8, v8, 0
+; CHECK-D-NEXT:    vnsrl.wi v8, v10, 0
 ; CHECK-D-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-D-NEXT:    li a0, 8
 ; CHECK-D-NEXT:    vminu.vx v8, v8, a0
@@ -225,13 +225,13 @@ define <vscale x 8 x i8> @ctlz_nxv8i8(<vscale x 8 x i8> %va) {
 ; CHECK-F-LABEL: ctlz_nxv8i8:
 ; CHECK-F:       # %bb.0:
 ; CHECK-F-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
-; CHECK-F-NEXT:    vzext.vf2 v10, v8
+; CHECK-F-NEXT:    vzext.vf2 v12, v8
 ; CHECK-F-NEXT:    li a0, 134
-; CHECK-F-NEXT:    vfwcvt.f.xu.v v12, v10
-; CHECK-F-NEXT:    vnsrl.wi v8, v12, 23
+; CHECK-F-NEXT:    vfwcvt.f.xu.v v8, v12
+; CHECK-F-NEXT:    vnsrl.wi v12, v8, 23
 ; CHECK-F-NEXT:    vsetvli zero, zero, e8, m1, ta, ma
-; CHECK-F-NEXT:    vnsrl.wi v10, v8, 0
-; CHECK-F-NEXT:    vrsub.vx v8, v10, a0
+; CHECK-F-NEXT:    vnsrl.wi v8, v12, 0
+; CHECK-F-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-F-NEXT:    li a0, 8
 ; CHECK-F-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-F-NEXT:    ret
@@ -239,13 +239,13 @@ define <vscale x 8 x i8> @ctlz_nxv8i8(<vscale x 8 x i8> %va) {
 ; CHECK-D-LABEL: ctlz_nxv8i8:
 ; CHECK-D:       # %bb.0:
 ; CHECK-D-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
-; CHECK-D-NEXT:    vzext.vf2 v10, v8
+; CHECK-D-NEXT:    vzext.vf2 v12, v8
 ; CHECK-D-NEXT:    li a0, 134
-; CHECK-D-NEXT:    vfwcvt.f.xu.v v12, v10
-; CHECK-D-NEXT:    vnsrl.wi v8, v12, 23
+; CHECK-D-NEXT:    vfwcvt.f.xu.v v8, v12
+; CHECK-D-NEXT:    vnsrl.wi v12, v8, 23
 ; CHECK-D-NEXT:    vsetvli zero, zero, e8, m1, ta, ma
-; CHECK-D-NEXT:    vnsrl.wi v10, v8, 0
-; CHECK-D-NEXT:    vrsub.vx v8, v10, a0
+; CHECK-D-NEXT:    vnsrl.wi v8, v12, 0
+; CHECK-D-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-D-NEXT:    li a0, 8
 ; CHECK-D-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-D-NEXT:    ret
@@ -288,13 +288,13 @@ define <vscale x 16 x i8> @ctlz_nxv16i8(<vscale x 16 x i8> %va) {
 ; CHECK-F-LABEL: ctlz_nxv16i8:
 ; CHECK-F:       # %bb.0:
 ; CHECK-F-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
-; CHECK-F-NEXT:    vzext.vf2 v12, v8
+; CHECK-F-NEXT:    vzext.vf2 v16, v8
 ; CHECK-F-NEXT:    li a0, 134
-; CHECK-F-NEXT:    vfwcvt.f.xu.v v16, v12
-; CHECK-F-NEXT:    vnsrl.wi v8, v16, 23
+; CHECK-F-NEXT:    vfwcvt.f.xu.v v8, v16
+; CHECK-F-NEXT:    vnsrl.wi v16, v8, 23
 ; CHECK-F-NEXT:    vsetvli zero, zero, e8, m2, ta, ma
-; CHECK-F-NEXT:    vnsrl.wi v12, v8, 0
-; CHECK-F-NEXT:    vrsub.vx v8, v12, a0
+; CHECK-F-NEXT:    vnsrl.wi v8, v16, 0
+; CHECK-F-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-F-NEXT:    li a0, 8
 ; CHECK-F-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-F-NEXT:    ret
@@ -302,13 +302,13 @@ define <vscale x 16 x i8> @ctlz_nxv16i8(<vscale x 16 x i8> %va) {
 ; CHECK-D-LABEL: ctlz_nxv16i8:
 ; CHECK-D:       # %bb.0:
 ; CHECK-D-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
-; CHECK-D-NEXT:    vzext.vf2 v12, v8
+; CHECK-D-NEXT:    vzext.vf2 v16, v8
 ; CHECK-D-NEXT:    li a0, 134
-; CHECK-D-NEXT:    vfwcvt.f.xu.v v16, v12
-; CHECK-D-NEXT:    vnsrl.wi v8, v16, 23
+; CHECK-D-NEXT:    vfwcvt.f.xu.v v8, v16
+; CHECK-D-NEXT:    vnsrl.wi v16, v8, 23
 ; CHECK-D-NEXT:    vsetvli zero, zero, e8, m2, ta, ma
-; CHECK-D-NEXT:    vnsrl.wi v12, v8, 0
-; CHECK-D-NEXT:    vrsub.vx v8, v12, a0
+; CHECK-D-NEXT:    vnsrl.wi v8, v16, 0
+; CHECK-D-NEXT:    vrsub.vx v8, v8, a0
 ; CHECK-D-NEXT:    li a0, 8
 ; CHECK-D-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-D-NEXT:    ret
@@ -1375,12 +1375,12 @@ define <vscale x 2 x i64> @ctlz_nxv2i64(<vscale x 2 x i64> %va) {
 ; CHECK-F-NEXT:    fsrmi a1, 1
 ; CHECK-F-NEXT:    vsetvli a2, zero, e32, m1, ta, ma
 ; CHECK-F-NEXT:    vfncvt.f.xu.w v10, v8
-; CHECK-F-NEXT:    vmv.v.x v8, a0
-; CHECK-F-NEXT:    vsrl.vi v9, v10, 23
-; CHECK-F-NEXT:    vwsubu.vv v10, v8, v9
+; CHECK-F-NEXT:    vmv.v.x v11, a0
+; CHECK-F-NEXT:    vsrl.vi v10, v10, 23
+; CHECK-F-NEXT:    vwsubu.vv v8, v11, v10
 ; CHECK-F-NEXT:    li a0, 64
 ; CHECK-F-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
-; CHECK-F-NEXT:    vminu.vx v8, v10, a0
+; CHECK-F-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-F-NEXT:    fsrm a1
 ; CHECK-F-NEXT:    ret
 ;
@@ -1515,12 +1515,12 @@ define <vscale x 4 x i64> @ctlz_nxv4i64(<vscale x 4 x i64> %va) {
 ; CHECK-F-NEXT:    fsrmi a1, 1
 ; CHECK-F-NEXT:    vsetvli a2, zero, e32, m2, ta, ma
 ; CHECK-F-NEXT:    vfncvt.f.xu.w v12, v8
-; CHECK-F-NEXT:    vmv.v.x v8, a0
-; CHECK-F-NEXT:    vsrl.vi v10, v12, 23
-; CHECK-F-NEXT:    vwsubu.vv v12, v8, v10
+; CHECK-F-NEXT:    vmv.v.x v14, a0
+; CHECK-F-NEXT:    vsrl.vi v12, v12, 23
+; CHECK-F-NEXT:    vwsubu.vv v8, v14, v12
 ; CHECK-F-NEXT:    li a0, 64
 ; CHECK-F-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
-; CHECK-F-NEXT:    vminu.vx v8, v12, a0
+; CHECK-F-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-F-NEXT:    fsrm a1
 ; CHECK-F-NEXT:    ret
 ;
@@ -1655,12 +1655,12 @@ define <vscale x 8 x i64> @ctlz_nxv8i64(<vscale x 8 x i64> %va) {
 ; CHECK-F-NEXT:    fsrmi a1, 1
 ; CHECK-F-NEXT:    vsetvli a2, zero, e32, m4, ta, ma
 ; CHECK-F-NEXT:    vfncvt.f.xu.w v16, v8
-; CHECK-F-NEXT:    vmv.v.x v8, a0
-; CHECK-F-NEXT:    vsrl.vi v12, v16, 23
-; CHECK-F-NEXT:    vwsubu.vv v16, v8, v12
+; CHECK-F-NEXT:    vmv.v.x v20, a0
+; CHECK-F-NEXT:    vsrl.vi v16, v16, 23
+; CHECK-F-NEXT:    vwsubu.vv v8, v20, v16
 ; CHECK-F-NEXT:    li a0, 64
 ; CHECK-F-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
-; CHECK-F-NEXT:    vminu.vx v8, v16, a0
+; CHECK-F-NEXT:    vminu.vx v8, v8, a0
 ; CHECK-F-NEXT:    fsrm a1
 ; CHECK-F-NEXT:    ret
 ;
@@ -1832,11 +1832,11 @@ define <vscale x 4 x i8> @ctlz_zero_undef_nxv4i8(<vscale x 4 x i8> %va) {
 ; CHECK-F-LABEL: ctlz_zero_undef_nxv4i8:
 ; CHECK-F:       # %bb.0:
 ; CHECK-F-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
-; CHECK-F-NEXT:    vzext.vf2 v9, v8
-; CHECK-F-NEXT:    vfwcvt.f.xu.v v10, v9
-; CHECK-F-NEXT:    vnsrl.wi v8, v10, 23
+; CHECK-F-NEXT:    vzext.vf2 v10, v8
+; CHECK-F-NEXT:    vfwcvt.f.xu.v v8, v10
+; CHECK-F-NEXT:    vnsrl.wi v10, v8, 23
 ; CHECK-F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
-; CHECK-F-NEXT:    vnsrl.wi v8, v8, 0
+; CHECK-F-NEXT:    vnsrl.wi v8, v10, 0
 ; CHECK-F-NEXT:    li a0, 134
 ; CHECK-F-NEXT:    vrsu...
[truncated]

@@ -752,18 +752,24 @@ def VR : VReg<!listconcat(VM1VTs, VMaskVTs),

def VRNoV0 : VReg<!listconcat(VM1VTs, VMaskVTs), (sub VR, V0), 1>;

let AllocationPriority = 2 in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this in side of VReg using the lmul?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current change, yes. I'd done it this way because I'd originally planned to have the NoV0 cases have different values. I'll switch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this revealed that I hadn't handled the segment tuple register classes. I decided to treat those as having the same priority as their lmul component; that is, I ignored NF. There might be a better heuristic here.

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this! I tried this before but I can't remember why I dropped it (maybe it was simply because setting AllocationPriority couldn't fix #113489).
I do see some improvements and regressions but I don't think they are significant, this PR locates in improving compile-time since it avoids the later eviction? @lukel97 Can you also evaluate the performance please?

; CHECK-RV32-NEXT: sub sp, sp, a1
; CHECK-RV32-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x18, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 24 * vlenb
; CHECK-RV32-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x20, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 32 * vlenb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression.

; RV32-NEXT: mul a2, a2, a3
; RV32-NEXT: sub sp, sp, a2
; RV32-NEXT: .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0xe4, 0x00, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 100 * vlenb
; RV32-NEXT: .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0xe0, 0x00, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 96 * vlenb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvement.

; RV32-NEXT: mul a2, a2, a3
; RV32-NEXT: add a2, sp, a2
; RV32-NEXT: addi a2, a2, 16
; RV32-NEXT: vs2r.v v14, (a2) # Unknown-size Folded Spill
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use less memory but cause more spills?

@lukel97
Copy link
Contributor

lukel97 commented Mar 14, 2025

Thanks for looking at this! I tried this before but I can't remember why I dropped it (maybe it was simply because setting AllocationPriority couldn't fix #113489). I do see some improvements and regressions but I don't think they are significant, this PR locates in improving compile-time since it avoids the later eviction? @lukel97 Can you also evaluate the performance please?

I've kicked off a run on a banana pi now, it should be done over the weekend

@lukel97
Copy link
Contributor

lukel97 commented Mar 17, 2025

Results on rva22u64_v -O3 -flto: https://lnt.lukelau.me/db_default/v4/nts/311

@preames
Copy link
Collaborator Author

preames commented Mar 17, 2025

Results on rva22u64_v -O3 -flto: https://lnt.lukelau.me/db_default/v4/nts/311

Looks like we saw a small improvement in x264 and not much else. That's actually a bit better than I'd expected; definitely nothing problematic at least.

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this being landed.
LGTM but please wait for one more approval since this changes a lot.

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Mar 18, 2025
The cost of a vector spill/reload may vary highly depending on the size of the vector register being spilled, i.e. LMUL, so the usual regalloc.NumSpills/regalloc.NumReloads statistics may not be an accurate reflection of the total cost.

This adds two new statistics for RISCVInstrInfo that collects the total LMUL for vector register spills and reloads. It can be used to get a better idea of regalloc changes in e.g. llvm#131176 llvm#113675
Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to increase the overall LMUL spilled on 538.imagick_r:

Program                                       riscv-instr-info.TotalLMULSpilled               riscv-instr-info.TotalLMULReloaded              
                                              lhs                               rhs     diff  lhs                                rhs     diff 
FP2017rate/538.imagick_r/538.imagick_r        4239.00                           5082.00 19.9% 6697.00                            7321.00  9.3%
FP2017speed/638.imagick_s/638.imagick_s       4239.00                           5082.00 19.9% 6697.00                            7321.00  9.3%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s    132.00                            134.00  1.5%  274.00                             248.00 -9.5%
INT2017rat...31.deepsjeng_r/531.deepsjeng_r    132.00                            134.00  1.5%  274.00                             248.00 -9.5%
INT2017rate/520.omnetpp_r/520.omnetpp_r          4.00                              4.00  0.0%    5.00                               5.00  0.0%
INT2017speed/625.x264_s/625.x264_s              83.00                             83.00  0.0%   93.00                              93.00  0.0%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s      6.00                              6.00  0.0%    6.00                               6.00  0.0%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s      4.00                              4.00  0.0%    5.00                               5.00  0.0%
INT2017speed/602.gcc_s/602.gcc_s                85.00                             85.00  0.0%   91.00                              91.00  0.0%
INT2017spe...00.perlbench_s/600.perlbench_s      4.00                              4.00  0.0%    8.00                               8.00  0.0%
INT2017rate/525.x264_r/525.x264_r               83.00                             83.00  0.0%   93.00                              93.00  0.0%
INT2017rat...23.xalancbmk_r/523.xalancbmk_r      6.00                              6.00  0.0%    6.00                               6.00  0.0%
FP2017rate/508.namd_r/508.namd_r                 1.00                              1.00  0.0%    6.00                               6.00  0.0%
FP2017rate/510.parest_r/510.parest_r          1084.00                           1084.00  0.0% 1368.00                            1368.00  0.0%
INT2017rat...00.perlbench_r/500.perlbench_r      4.00                              4.00  0.0%    8.00                               8.00  0.0%
FP2017speed/644.nab_s/644.nab_s                 25.00                             25.00  0.0%   25.00                              25.00  0.0%
FP2017speed/619.lbm_s/619.lbm_s                 38.00                             38.00  0.0%   38.00                              38.00  0.0%
FP2017rate/544.nab_r/544.nab_r                  25.00                             25.00  0.0%   25.00                              25.00  0.0%
FP2017rate/519.lbm_r/519.lbm_r                  38.00                             38.00  0.0%   38.00                              38.00  0.0%
FP2017rate/511.povray_r/511.povray_r           121.00                            121.00  0.0%  138.00                             138.00  0.0%
INT2017rate/502.gcc_r/502.gcc_r                 85.00                             85.00  0.0%   91.00                              91.00  0.0%
FP2017rate/526.blender_r/526.blender_r        1159.00                           1155.00 -0.3% 1292.00                            1298.00  0.5%

With that said though there's no actual impact on the runtime performance of that benchmark so I presume these spills are on the cold path. I looked at the codegen changes and the code there is pretty bad to begin with anyway.

I'm just flagging this FYI, I'm happy for this to land in any case.

@preames preames merged commit 2175c6c into llvm:main Mar 18, 2025
6 of 9 checks passed
@preames preames deleted the pr-riscv-allocation-priority branch March 18, 2025 15:25
@preames
Copy link
Collaborator Author

preames commented Mar 18, 2025

This seems to increase the overall LMUL spilled on 538.imagick_r:
(snip)
With that said though there's no actual impact on the runtime performance of that benchmark so I presume these spills are on the cold path. I looked at the codegen changes and the code there is pretty bad to begin with anyway.

I'm just flagging this FYI, I'm happy for this to land in any case.

I wrote this down to follow up on. At a minimum, it may be an interesting register allocation case I can find something from.

lukel97 added a commit that referenced this pull request Mar 19, 2025
The cost of a vector spill/reload may vary highly depending on the size
of the vector register being spilled, i.e. LMUL, so the usual
regalloc.NumSpills/regalloc.NumReloads statistics may not be an accurate
reflection of the total cost.

This adds two new statistics for RISCVInstrInfo that collects the total
number of vector registers spilled/reloaded within groups. It can be
used to get a better idea of regalloc changes in e.g. #131176 #113675
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants