Skip to content

JIT: STORE_LCL_FLD coalescing#126562

Open
AndyAyersMS wants to merge 1 commit intodotnet:mainfrom
AndyAyersMS:RefactorAtomicStoreCheck
Open

JIT: STORE_LCL_FLD coalescing#126562
AndyAyersMS wants to merge 1 commit intodotnet:mainfrom
AndyAyersMS:RefactorAtomicStoreCheck

Conversation

@AndyAyersMS
Copy link
Copy Markdown
Member

Refactor the lowering-time store coalescing machinery so adjacent constant GT_STORE_LCL_FLD writes can reuse the existing combine logic that was previously limited to STOREIND/STORE_BLK. Gate the new local-field path from LowerStoreLocCommon with JitEnableStoreLclFldCoalescing. Extend the struct-promotion regression coverage for the newly handled cases.

Refactor the lowering-time store coalescing machinery so adjacent constant GT_STORE_LCL_FLD
writes can reuse the existing combine logic that was previously limited to STOREIND/STORE_BLK.
Gate the new local-field path from LowerStoreLocCommon with JitEnableStoreLclFldCoalescing.
Extend the struct-promotion regression coverage for the newly handled cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 5, 2026 15:46
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 5, 2026
@AndyAyersMS
Copy link
Copy Markdown
Member Author

@EgorBo PTAL
fyi @dotnet/jit-contrib

Supersedes #126178

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the JIT lowering-time store coalescing logic to extend existing adjacent-constant-store combining (previously focused on indirections) to also handle adjacent GT_STORE_LCL_FLD writes, guarded by a new JIT config switch, and adds regression coverage for struct-promotion-related cases.

Changes:

  • Extend store coalescing to support GT_STORE_LCL_FLD (including some overlapping-store scenarios) behind JitEnableStoreLclFldCoalescing.
  • Refactor coalescing machinery to share logic across store kinds and centralize atomicity checks.
  • Add JIT regression tests covering non-address-exposed, address-exposed, and overlapping local-field store patterns.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/tests/JIT/Directed/StructPromote/SpAddr.cs Adds new regression scenarios for local-field store coalescing, including overlap and address-exposed cases.
src/coreclr/jit/lower.h Extends coalescing data structure and declares new shared coalescing/atomicity helpers.
src/coreclr/jit/lower.cpp Implements shared coalescing path for STORE_LCL_FLD and refactors existing store-indir coalescing to reuse it.
src/coreclr/jit/jitconfigvalues.h Adds JitEnableStoreLclFldCoalescing release config gate (default enabled).

GenTree* prevStore,
const LoadStoreCoalescingData& currData,
const LoadStoreCoalescingData& prevData)
{
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LowerCheckCoalescedStoreAtomicity takes a prevStore parameter but never uses it. This can trigger unused-parameter warnings (depending on toolchain) and makes the API misleading. Either drop the parameter or explicitly mark it unused (e.g., (void)prevStore; / UNUSED(prevStore)) or use it in an assert/JITDUMP to justify its presence.

Suggested change
{
{
(void)prevStore;

Copilot uses AI. Check for mistakes.
Comment on lines +10528 to +10541
auto isNodeInvariant = [](Compiler* compiler, GenTree* node) {
if (node->OperIsConst())
{
return true;
}

return node->OperIs(GT_LCL_VAR) && !compiler->lvaVarAddrExposed(node->AsLclVar()->GetLclNum());
};

auto* lclStore = store->AsLclFld();
if (!isNodeInvariant(m_compiler, lclStore->Data()))
{
return false;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something or is this lambda only used once?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it was copied from GetLoadStoreCoalescingData, it probably needs to be a separate static function then

#ifdef TARGET_ARM64
if (currData.accessSize == TARGET_POINTER_SIZE)
{
// Per Arm ARM, a 128-bit SIMD write that is 64-bit aligned is treated as a pair of
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A citation URL or something here would be cool

// items at once. Although, it may be profitable to do "stp q0, q0, [addr]".
if (!varTypeIsIntegral(ind) && !varTypeIsSIMD(ind))
if (!varTypeIsIntegral(node) && !varTypeIsSIMD(node))
{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of comments were removed in this method and it seems like some of them might have been worth keeping, just want to double-check that they weren't blanket removed accidentally, like this CQ one

// it on ARM64 where it's pretty efficient to do "stp xzr, xzr, [addr]" to clear two
// items at once. Although, it may be profitable to do "stp q0, q0, [addr]".
if (!varTypeIsIntegral(ind) && !varTypeIsSIMD(ind))
if (!varTypeIsIntegral(node) && !varTypeIsSIMD(node))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we don't support coalescing for adjacent floats?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why it wasn't done, but it was like that before, so could be a task for a follow-up

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect both float and SIMD to be supportable if integers are supported.

Copy link
Copy Markdown
Member

@EgorBo EgorBo Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not up to this PR anyway. I think we already lower *x = 3.14 into TYP_INT store today before we get here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, was just noting in agreement that this seems like an artificial limitation we can reduce in the future.

case 4:
newType = TYP_INT;
break;
#ifdef TARGET_64BIT
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or WASM (eventually)

{
// LA, RISC-V and ARM32 more likely to receive a terrible performance hit from
// unaligned accesses making this optimization questionable.
#if defined(TARGET_XARCH) || defined(TARGET_ARM64)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to enable this for wasm while we're here, or add a todo-wasm comment to remind ourselves to come back later and turn it on?

{
// RetBuf is a private stack memory, so we don't need to worry about atomicity.
allowsNonAtomic = true;
BlockRange().Remove(prevData.rangeStart, prevData.rangeEnd);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a JITDUMP message when doing this so it's clear what happened?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's worth a comment or JitDump that in this case we remove a store because the next stores overwrites it.

newType = TYP_LONG;
break;

#if defined(FEATURE_HW_INTRINSICS)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't there targets that have FEATURE_HW_INTRINSICS but not FEATURE_SIMD? Should this be gated on SIMD instead?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory. I think it's indeed should be FEATURE_SIMD here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FEATURE_HW_INTRINSICS is built on top of FEATURE_SIMD (unless one of the community driven platforms has introduced a deviation).

It's something that ideally we'd merge together into a single feature, particularly given how co-dependent they are now for practical usage.

Comment on lines +10886 to +10888
#endif
#endif
#endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would benefit from comments specifying what if they're attached to

newType = TYP_INT;
break;

#ifdef TARGET_64BIT
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or WASM

Comment on lines +10981 to +10985
if (prevData.offset > currData.offset)
{
std::swap(lowerCns, upperCns);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this swap moved from outside of here to inside of 64bit+hwintrins+simd? Does it fix a bug?


} while (true);
#endif // TARGET_XARCH || TARGET_ARM64
#endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endif could use a comment

}

data->targetType = ind->TypeGet();
data->accessSize = ind->Size();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we add size, we probably no longer need targetType field

if (prevData.offset == currData.offset)
if ((currData.offset == prevData.offset) && (currData.targetType == prevData.targetType))
{
if (m_compiler->gtTreeHasSideEffects(prevData.value, GTF_SIDE_EFFECT | GTF_GLOB_EFFECT | GTF_ASG) ||
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by GTF_SIDE_EFFECT | GTF_GLOB_EFFECT | GTF_ASG - doesn't GTF_GLOB_EFFECT imply them all alone?

//
// but we don't want to load managed references into SIMD registers (we can only do so
// when we can issue a nongc region for a block)
return;
Copy link
Copy Markdown
Member

@EgorBo EgorBo Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this comment was removed? I think it explains why we bail out here

int8_t* upperCns = currData.value->AsVecCon()->gtSimdVal.i8;

// if the previous store was at a higher address, swap the constants
if (prevData.offset > currData.offset)
Copy link
Copy Markdown
Member

@EgorBo EgorBo Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment explained why we guarded #if defined(TARGET_AMD64) - i think we should keep it.

@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented Apr 7, 2026

I think we need to look at CI results before the next round of reviews

@AndyAyersMS AndyAyersMS closed this Apr 8, 2026
@AndyAyersMS AndyAyersMS reopened this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants