Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 31 additions & 15 deletions src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11253,29 +11253,45 @@ void CodeGen::genZeroInitFrameUsingBlockInit(int untrLclHi, int untrLclLo, regNu
// movups xmmword ptr [ebp/esp-OFFS], xmm4
// ...
// movups xmmword ptr [ebp/esp-OFFS], xmm4
// mov qword ptr [ebp/esp-OFFS], rax
//

// NOTE: it implicitly zeroes YMM4 and ZMM4 as well.
emit->emitIns_SIMD_R_R_R(INS_xorps, EA_16BYTE, zeroSIMDReg, zeroSIMDReg, zeroSIMDReg, INS_OPTS_NONE);

int i = 0;
if (maxSimdSize > XMM_REGSIZE_BYTES)
assert((blkSize % XMM_REGSIZE_BYTES) == 0);

int regSize = (int)compiler->roundDownSIMDSize(blkSize);
int lenRemaining = blkSize;
while (lenRemaining > 0)
{
for (; i <= blkSize - maxSimdSize; i += maxSimdSize)
// Overlap with the previously zeroed memory if we can clear the remainder
// with just single store. Example: say we have 112 bytes to clear:
//
// Option 1 (no overlapping):
// movups zmmword ptr [+0]
// movups ymmword ptr [+64]
// movups xmmword ptr [+96]
//
// Option 2 (overlapping):
// movups zmmword ptr [+0]
// movups zmmword ptr [+48]
//
if ((regSize > lenRemaining) && !isPow2(lenRemaining))
{
// We previously aligned data to 16 bytes which might not be aligned to maxSimdSize
emit->emitIns_AR_R(simdUnalignedMovIns(), EA_ATTR(maxSimdSize), zeroSIMDReg, frameReg,
alignedLclLo + i);
lenRemaining = regSize;
}
// Remainder will be handled by the xmm loop below
}

for (; i < blkSize; i += XMM_REGSIZE_BYTES)
{
emit->emitIns_AR_R(simdMov, EA_ATTR(XMM_REGSIZE_BYTES), zeroSIMDReg, frameReg, alignedLclLo + i);
}
// Use the largest SIMD register size that fits in the remaining length
regSize = (int)compiler->roundDownSIMDSize(lenRemaining);
assert(regSize >= XMM_REGSIZE_BYTES);

// frameReg is definitely not known to be 32B/64B aligned -> switch to unaligned movs
instruction ins = regSize > XMM_REGSIZE_BYTES ? simdUnalignedMovIns() : simdMov;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does simdUnalignedMovIns() get hoisted out of the loop, or will it be a lookup each time?

Is there a reason to not just always use the unaligned instruction since they're the same perf for accesses that are actually aligned?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume it acts as a validation of the assumption that the frame pointer is 16 bytes aligned, but not sure, I just copied it from the previous logic

const int offset = blkSize - lenRemaining;
emit->emitIns_AR_R(ins, EA_ATTR(regSize), zeroSIMDReg, frameReg, alignedLclLo + offset);

assert(i == blkSize);
lenRemaining -= regSize;
}
assert(lenRemaining == 0);
}
else
{
Expand Down
Loading