Skip to content

Switch to ymm after zmm in genZeroInitFrameUsingBlockInit #115981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 27, 2025

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented May 25, 2025

Closes #114274

 vxorps   xmm4, xmm4, xmm4
 vmovdqu32 zmmword ptr [rsp+0x20], zmm4
-vmovdqa  xmmword ptr [rsp+0x60], xmm4
-vmovdqa  xmmword ptr [rsp+0x70], xmm4
+vmovdqu  ymmword ptr [rsp+0x60], ymm4
 vmovdqu  xmmword ptr [rsp+0x80], xmm4

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 25, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member Author

EgorBo commented May 26, 2025

PTAL @dotnet/jit-contrib small change, a couple of diffs (only reproduces on avx512)

@EgorBo EgorBo marked this pull request as ready for review May 26, 2025 11:04
@Copilot Copilot AI review requested due to automatic review settings May 26, 2025 11:04
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the zero-initialization of stack frames to use YMM after ZMM, consolidates loops into a single while-based approach, and replaces aligned vmovdqa with unaligned vmovdqu for YMM registers.

  • Replaces two for-loops with a while-loop driven by lenRemaining
  • Computes regSize dynamically via roundDownSIMDSize and chooses aligned vs. unaligned moves
  • Introduces ALIGN_UP(blkSize, 16) to drive the loop and switches mov instructions
Comments suppressed due to low confidence (1)

src/coreclr/jit/codegenxarch.cpp:11261

  • Add tests for block sizes not divisible by SIMD widths (e.g., sizes between 1–15, 17–31 bytes) to verify that remainders are handled correctly by this loop.
while (lenRemaining > 0)


assert(i == blkSize);
// frameReg is definitely not known to be 32B/64B aligned -> switch to unaligned movs
instruction ins = regSize > XMM_REGSIZE_BYTES ? simdUnalignedMovIns() : simdMov;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does simdUnalignedMovIns() get hoisted out of the loop, or will it be a lookup each time?

Is there a reason to not just always use the unaligned instruction since they're the same perf for accesses that are actually aligned?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume it acts as a validation of the assumption that the frame pointer is 16 bytes aligned, but not sure, I just copied it from the previous logic

@EgorBo EgorBo enabled auto-merge (squash) May 27, 2025 17:17
@EgorBo
Copy link
Member Author

EgorBo commented May 27, 2025

/ba-g "windows-x86 Debug Libraries_CheckedCoreCLR is stuck"

@EgorBo EgorBo merged commit 97d1fc2 into dotnet:main May 27, 2025
106 of 108 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suboptimal stack zeroing on AVX512
3 participants