Conversation
Refactor the lowering-time store coalescing machinery so adjacent constant GT_STORE_LCL_FLD writes can reuse the existing combine logic that was previously limited to STOREIND/STORE_BLK. Gate the new local-field path from LowerStoreLocCommon with JitEnableStoreLclFldCoalescing. Extend the struct-promotion regression coverage for the newly handled cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR refactors the JIT lowering-time store coalescing logic to extend existing adjacent-constant-store combining (previously focused on indirections) to also handle adjacent GT_STORE_LCL_FLD writes, guarded by a new JIT config switch, and adds regression coverage for struct-promotion-related cases.
Changes:
- Extend store coalescing to support
GT_STORE_LCL_FLD(including some overlapping-store scenarios) behindJitEnableStoreLclFldCoalescing. - Refactor coalescing machinery to share logic across store kinds and centralize atomicity checks.
- Add JIT regression tests covering non-address-exposed, address-exposed, and overlapping local-field store patterns.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/tests/JIT/Directed/StructPromote/SpAddr.cs | Adds new regression scenarios for local-field store coalescing, including overlap and address-exposed cases. |
| src/coreclr/jit/lower.h | Extends coalescing data structure and declares new shared coalescing/atomicity helpers. |
| src/coreclr/jit/lower.cpp | Implements shared coalescing path for STORE_LCL_FLD and refactors existing store-indir coalescing to reuse it. |
| src/coreclr/jit/jitconfigvalues.h | Adds JitEnableStoreLclFldCoalescing release config gate (default enabled). |
| GenTree* prevStore, | ||
| const LoadStoreCoalescingData& currData, | ||
| const LoadStoreCoalescingData& prevData) | ||
| { |
There was a problem hiding this comment.
LowerCheckCoalescedStoreAtomicity takes a prevStore parameter but never uses it. This can trigger unused-parameter warnings (depending on toolchain) and makes the API misleading. Either drop the parameter or explicitly mark it unused (e.g., (void)prevStore; / UNUSED(prevStore)) or use it in an assert/JITDUMP to justify its presence.
| { | |
| { | |
| (void)prevStore; |
| auto isNodeInvariant = [](Compiler* compiler, GenTree* node) { | ||
| if (node->OperIsConst()) | ||
| { | ||
| return true; | ||
| } | ||
|
|
||
| return node->OperIs(GT_LCL_VAR) && !compiler->lvaVarAddrExposed(node->AsLclVar()->GetLclNum()); | ||
| }; | ||
|
|
||
| auto* lclStore = store->AsLclFld(); | ||
| if (!isNodeInvariant(m_compiler, lclStore->Data())) | ||
| { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Am I missing something or is this lambda only used once?
There was a problem hiding this comment.
I assume it was copied from GetLoadStoreCoalescingData, it probably needs to be a separate static function then
| #ifdef TARGET_ARM64 | ||
| if (currData.accessSize == TARGET_POINTER_SIZE) | ||
| { | ||
| // Per Arm ARM, a 128-bit SIMD write that is 64-bit aligned is treated as a pair of |
There was a problem hiding this comment.
A citation URL or something here would be cool
| // items at once. Although, it may be profitable to do "stp q0, q0, [addr]". | ||
| if (!varTypeIsIntegral(ind) && !varTypeIsSIMD(ind)) | ||
| if (!varTypeIsIntegral(node) && !varTypeIsSIMD(node)) | ||
| { |
There was a problem hiding this comment.
A lot of comments were removed in this method and it seems like some of them might have been worth keeping, just want to double-check that they weren't blanket removed accidentally, like this CQ one
| // it on ARM64 where it's pretty efficient to do "stp xzr, xzr, [addr]" to clear two | ||
| // items at once. Although, it may be profitable to do "stp q0, q0, [addr]". | ||
| if (!varTypeIsIntegral(ind) && !varTypeIsSIMD(ind)) | ||
| if (!varTypeIsIntegral(node) && !varTypeIsSIMD(node)) |
There was a problem hiding this comment.
I wonder why we don't support coalescing for adjacent floats?
There was a problem hiding this comment.
I don't remember why it wasn't done, but it was like that before, so could be a task for a follow-up
There was a problem hiding this comment.
I would expect both float and SIMD to be supportable if integers are supported.
There was a problem hiding this comment.
It's not up to this PR anyway. I think we already lower *x = 3.14 into TYP_INT store today before we get here.
There was a problem hiding this comment.
Sure, was just noting in agreement that this seems like an artificial limitation we can reduce in the future.
| case 4: | ||
| newType = TYP_INT; | ||
| break; | ||
| #ifdef TARGET_64BIT |
| { | ||
| // LA, RISC-V and ARM32 more likely to receive a terrible performance hit from | ||
| // unaligned accesses making this optimization questionable. | ||
| #if defined(TARGET_XARCH) || defined(TARGET_ARM64) |
There was a problem hiding this comment.
Do we want to enable this for wasm while we're here, or add a todo-wasm comment to remind ourselves to come back later and turn it on?
| { | ||
| // RetBuf is a private stack memory, so we don't need to worry about atomicity. | ||
| allowsNonAtomic = true; | ||
| BlockRange().Remove(prevData.rangeStart, prevData.rangeEnd); |
There was a problem hiding this comment.
Maybe add a JITDUMP message when doing this so it's clear what happened?
There was a problem hiding this comment.
Yeah I think it's worth a comment or JitDump that in this case we remove a store because the next stores overwrites it.
| newType = TYP_LONG; | ||
| break; | ||
|
|
||
| #if defined(FEATURE_HW_INTRINSICS) |
There was a problem hiding this comment.
Aren't there targets that have FEATURE_HW_INTRINSICS but not FEATURE_SIMD? Should this be gated on SIMD instead?
There was a problem hiding this comment.
in theory. I think it's indeed should be FEATURE_SIMD here
There was a problem hiding this comment.
FEATURE_HW_INTRINSICS is built on top of FEATURE_SIMD (unless one of the community driven platforms has introduced a deviation).
It's something that ideally we'd merge together into a single feature, particularly given how co-dependent they are now for practical usage.
| #endif | ||
| #endif | ||
| #endif |
There was a problem hiding this comment.
These would benefit from comments specifying what if they're attached to
| newType = TYP_INT; | ||
| break; | ||
|
|
||
| #ifdef TARGET_64BIT |
| if (prevData.offset > currData.offset) | ||
| { | ||
| std::swap(lowerCns, upperCns); | ||
| } | ||
|
|
There was a problem hiding this comment.
Why was this swap moved from outside of here to inside of 64bit+hwintrins+simd? Does it fix a bug?
|
|
||
| } while (true); | ||
| #endif // TARGET_XARCH || TARGET_ARM64 | ||
| #endif |
| } | ||
|
|
||
| data->targetType = ind->TypeGet(); | ||
| data->accessSize = ind->Size(); |
There was a problem hiding this comment.
if we add size, we probably no longer need targetType field
| if (prevData.offset == currData.offset) | ||
| if ((currData.offset == prevData.offset) && (currData.targetType == prevData.targetType)) | ||
| { | ||
| if (m_compiler->gtTreeHasSideEffects(prevData.value, GTF_SIDE_EFFECT | GTF_GLOB_EFFECT | GTF_ASG) || |
There was a problem hiding this comment.
I'm confused by GTF_SIDE_EFFECT | GTF_GLOB_EFFECT | GTF_ASG - doesn't GTF_GLOB_EFFECT imply them all alone?
| // | ||
| // but we don't want to load managed references into SIMD registers (we can only do so | ||
| // when we can issue a nongc region for a block) | ||
| return; |
There was a problem hiding this comment.
is there a reason this comment was removed? I think it explains why we bail out here
| int8_t* upperCns = currData.value->AsVecCon()->gtSimdVal.i8; | ||
|
|
||
| // if the previous store was at a higher address, swap the constants | ||
| if (prevData.offset > currData.offset) |
There was a problem hiding this comment.
the comment explained why we guarded #if defined(TARGET_AMD64) - i think we should keep it.
|
I think we need to look at CI results before the next round of reviews |
Refactor the lowering-time store coalescing machinery so adjacent constant GT_STORE_LCL_FLD writes can reuse the existing combine logic that was previously limited to STOREIND/STORE_BLK. Gate the new local-field path from LowerStoreLocCommon with JitEnableStoreLclFldCoalescing. Extend the struct-promotion regression coverage for the newly handled cases.