-
Notifications
You must be signed in to change notification settings - Fork 5k
Switch to ymm after zmm in genZeroInitFrameUsingBlockInit #115981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
PTAL @dotnet/jit-contrib small change, a couple of diffs (only reproduces on avx512) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the zero-initialization of stack frames to use YMM after ZMM, consolidates loops into a single while-based approach, and replaces aligned vmovdqa
with unaligned vmovdqu
for YMM registers.
- Replaces two for-loops with a while-loop driven by
lenRemaining
- Computes
regSize
dynamically viaroundDownSIMDSize
and chooses aligned vs. unaligned moves - Introduces
ALIGN_UP(blkSize, 16)
to drive the loop and switches mov instructions
Comments suppressed due to low confidence (1)
src/coreclr/jit/codegenxarch.cpp:11261
- Add tests for block sizes not divisible by SIMD widths (e.g., sizes between 1–15, 17–31 bytes) to verify that remainders are handled correctly by this loop.
while (lenRemaining > 0)
|
||
assert(i == blkSize); | ||
// frameReg is definitely not known to be 32B/64B aligned -> switch to unaligned movs | ||
instruction ins = regSize > XMM_REGSIZE_BYTES ? simdUnalignedMovIns() : simdMov; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does simdUnalignedMovIns()
get hoisted out of the loop, or will it be a lookup each time?
Is there a reason to not just always use the unaligned instruction since they're the same perf for accesses that are actually aligned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume it acts as a validation of the assumption that the frame pointer is 16 bytes aligned, but not sure, I just copied it from the previous logic
/ba-g "windows-x86 Debug Libraries_CheckedCoreCLR is stuck" |
Closes #114274