[RISC-V] Use fine-grained fence variants for memory barriers#126566
[RISC-V] Use fine-grained fence variants for memory barriers#126566JamieMagee wants to merge 2 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR improves RISC-V64 JIT codegen for memory barriers by honoring BarrierKind and emitting a more specific fence variant instead of always using the heaviest fence rw,rw.
Changes:
- Extends RISC-V64
insBarrierimmediates to represent multiplefencepredecessor/successor masks. - Updates
CodeGen::instGen_MemoryBarrier(RISC-V64) to select thefenceimmediate based onBarrierKind. - Cleans up outdated comments that claimed only full barriers were supported.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/instr.h | Adds additional RISC-V64 insBarrier values intended to represent different fence masks. |
| src/coreclr/jit/codegenriscv64.cpp | Uses BarrierKind to pick an insBarrier immediate when emitting INS_fence. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 2
The JIT always emitted fence rw,rw (full barrier) regardless of the requested BarrierKind. RISC-V's fence instruction supports finer predecessor/successor bits, so use them: - BARRIER_FULL -> fence rw,rw (0x33) - unchanged - BARRIER_LOAD_ONLY -> fence r,rw (0x23) - acquire semantics - BARRIER_STORE_ONLY -> fence rw,w (0x31) - release semantics Same idea as ARM64, which uses DMB ISHLD for load-only and DMB ISH for full. Volatile reads and post-CpBlk barriers were paying for a full fence when a lighter one is sufficient under RVWMO.
9414867 to
398c025
Compare
There was a problem hiding this comment.
Pull request overview
This PR refines RISC-V64 JIT memory barrier emission so instGen_MemoryBarrier(BarrierKind) uses the appropriate fence predecessor/successor mask instead of always emitting the heaviest fence rw,rw, aligning barrier strength with .NET volatile semantics.
Changes:
- Add RISC-V-specific
insBarrierencodings for full, load-only (acquire), and store-only (release) fences. - Update
CodeGen::instGen_MemoryBarrieron RISC-V64 to select the correct fence variant based onBarrierKind. - Remove an outdated TODO that stated only full barriers were supported on RISC-V64.
Show a summary per file
| File | Description |
|---|---|
src/coreclr/jit/instr.h |
Defines RISC-V64 fence immediate values for full/load-only/store-only barrier variants. |
src/coreclr/jit/codegenriscv64.cpp |
Maps BarrierKind to the corresponding RISC-V64 fence variant during codegen. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 1
| @@ -2200,7 +2200,6 @@ void CodeGen::genCodeForCpObj(GenTreeBlk* cpObjNode) | |||
| if (cpObjNode->IsVolatile()) | |||
| { | |||
| // issue a INS_BARRIER_RMB after a volatile CpObj operation | |||
There was a problem hiding this comment.
The comment says this emits an INS_BARRIER_RMB after a volatile CpObj, but on RISC-V there is no INS_BARRIER_RMB variant and the code is emitting BARRIER_FULL here. This is misleading (and now that load/store-only barriers exist it’s easy to misread). Please update the comment to match what is actually emitted (e.g., “full memory barrier after ...” or “load barrier after ...” if you decide to use BARRIER_LOAD_ONLY here).
| // issue a INS_BARRIER_RMB after a volatile CpObj operation | |
| // issue a full memory barrier after a volatile CpObj operation |
instGen_MemoryBarrieron RISC-V ignored thebarrierKindparameter and always emittedfence rw,rw. The RISC-V fence instruction has per-bit control over predecessor and successor ordering (read, write, or both), so there's no reason to always use the heaviest option.This maps
BarrierKindto the matching fence variant:BARRIER_FULLfence rw,rwfence rw,rw(unchanged)BARRIER_LOAD_ONLYfence rw,rwfence r,rw(acquire)BARRIER_STORE_ONLYfence rw,rwfence rw,w(release)ARM64 does the same thing with
DMB ISHLDfor load-only andDMB ISHfor full. The load/store variants use acquire/release semantics rather than just load-load or store-store ordering, since .NET's volatile semantics require it.The two call sites that benefit today are
GT_MEMORYBARRIERwithGTF_MEMORYBARRIER_LOAD/GTF_MEMORYBARRIER_STOREflags, and the post-volatile-CpBlk load barrier.Validated the encoding on QEMU (
fence rw,rw=0x0330000f,fence r,rw=0x0230000f,fence rw,w=0x0310000f).