Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GT_STORE_BLK - do not call memset for blocks containg gc pointers on heap #95530

Merged
merged 27 commits into from
Jan 4, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 35 additions & 9 deletions docs/design/coreclr/botr/clr-abi.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ For non-rude thread abort, the VM walks the stack, running any catch handler tha

For example:

```
```cs
try { // try 1
try { // try 2
System.Threading.Thread.CurrentThread.Abort();
Expand All @@ -223,7 +223,7 @@ L:

In this case, if the address returned in catch 2 corresponding to label L is outside try 1, then the ThreadAbortException re-raised by the VM will not be caught by catch 1, as is expected. The JIT needs to insert a block such that this is the effective code generation:

```
```cs
try { // try 1
try { // try 2
System.Threading.Thread.CurrentThread.Abort();
Expand All @@ -240,7 +240,7 @@ L:

Similarly, the automatic re-raise address for a ThreadAbortException can't be within a finally handler, or the VM will abort the re-raise and swallow the exception. This can happen due to call-to-finally thunks marked as "cloned finally", as described above. For example (this is pseudo-assembly-code, not C#):

```
```cs
try { // try 1
try { // try 2
System.Threading.Thread.CurrentThread.Abort();
Expand All @@ -256,7 +256,7 @@ L:

This would generate something like:

```
```asm
// beginning of 'try 1'
// beginning of 'try 2'
System.Threading.Thread.CurrentThread.Abort();
Expand All @@ -281,7 +281,7 @@ Finally1:

Note that the JIT must already insert a "step" block so the finally will be called. However, this isn't sufficient to support ThreadAbortException processing, because "L1" is marked as "cloned finally". In this case, the JIT must insert another step block that is within "try 1" but outside the cloned finally block, that will allow for correct re-raise semantics. For example:

```
```asm
// beginning of 'try 1'
// beginning of 'try 2'
System.Threading.Thread.CurrentThread.Abort();
Expand Down Expand Up @@ -399,7 +399,7 @@ To implement this requirement, for any function with EH, we create a frame-local

Note that the since a slot on x86 is 4 bytes, the minimum size is 16 bytes. The idea is to have 1 slot for each handler that could be possibly be invoked at the same time. For example, for:

```
```cs
try {
...
} catch {
Expand All @@ -419,7 +419,7 @@ When calling a finally, we set the appropriate level to 0xFC (aka "finally call"

Thus, calling a finally from JIT generated code looks like:

```
```asm
mov dword ptr [L_02+0x4 ebp-10H], 0 // This must happen before the 0xFC is written
mov dword ptr [L_02+0x8 ebp-0CH], 252 // 0xFC
push G_M52300_IG07
Expand All @@ -430,7 +430,7 @@ In this case, `G_M52300_IG07` is not the address after the 'jmp', so a simple 'c

The code this finally returns to looks like this:

```
```asm
mov dword ptr [L_02+0x8 ebp-0CH], 0
jmp SHORT G_M52300_IG05
```
Expand Down Expand Up @@ -479,7 +479,7 @@ Because a main function body will **always** be on the stack when one of its fun

There is one "corner case" in the VM implementation of WantsReportOnlyLeaf model that has implications for the code the JIT is allowed to generate. Consider this function with nested exception handling:

```
```cs
public void runtest() {
try {
try {
Expand Down Expand Up @@ -804,3 +804,29 @@ In addition to the usual registers it also preserves all float registers and `rc
`CORINFO_HELP_DISPATCH_INDIRECT_CALL` takes the call address in `rax` and it reserves the right to use and trash `r10` and `r11`.
The JIT uses the dispatch helper on x64 whenever possible as it is expected that the code size benefits outweighs the less accurate branch prediction.
However, note that the use of `r11` in the dispatcher makes it incompatible with VSD calls where the JIT must fall back to the validator and a manual call.

# Notes on Memset/Memcpy

Generally, `memset` and `memcpy` do not provide any guarantees of atomicity. This implies that they should only be used when the memory being modified by `memset`/`memcpy` is not observable by any other thread (including GC), or when there are no atomicity requirements according to our [Memory Model](../../specs/Memory-model.md). It's especially important when we modify heap containing managed pointers - those must be updated atomically, e.g. using pointer-sized `mov` instruction (managed pointers are always aligned) - see [Atomic Memory Access](../../specs/Memory-model.md#Atomic-memory-accesses). It's worth noting that by "update" it's implied "set to zero", otherwise, we need a write barrier.

Examples:

```cs
struct MyStruct
{
long a;
string b;
}

void Test1(ref MyStruct m)
{
// We're not allowed to use memset here
m = default;
}

MyStruct Test2()
{
// We can use memset here
return default;
}
```
1 change: 1 addition & 0 deletions src/coreclr/jit/codegen.h
Original file line number Diff line number Diff line change
Expand Up @@ -1234,6 +1234,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
#ifndef TARGET_X86
void genCodeForInitBlkHelper(GenTreeBlk* initBlkNode);
#endif
void genCodeForInitBlkLoop(GenTreeBlk* initBlkNode);
void genCodeForInitBlkRepStos(GenTreeBlk* initBlkNode);
void genCodeForInitBlkUnroll(GenTreeBlk* initBlkNode);
void genJumpTable(GenTree* tree);
Expand Down
71 changes: 71 additions & 0 deletions src/coreclr/jit/codegenarmarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3243,6 +3243,72 @@ void CodeGen::genCodeForInitBlkHelper(GenTreeBlk* initBlkNode)
genEmitHelperCall(CORINFO_HELP_MEMSET, 0, EA_UNKNOWN);
}

//------------------------------------------------------------------------
// genCodeForInitBlkLoop - Generate code for an InitBlk using an inlined for-loop.
// It's needed for cases when size is too big to unroll and we're not allowed
// to use memset call due to atomicity requirements.
//
// Arguments:
// initBlkNode - the GT_STORE_BLK node
//
void CodeGen::genCodeForInitBlkLoop(GenTreeBlk* initBlkNode)
{
GenTree* const dstNode = initBlkNode->Addr();
GenTree* const zeroNode = initBlkNode->Data();

genConsumeReg(dstNode);
genConsumeReg(zeroNode);

const regNumber dstReg = dstNode->GetRegNum();
const regNumber zeroReg = zeroNode->GetRegNum();

// TODO-ARM64: mark initBlkNode->Data() as contained and use WZR/XZR
EgorBo marked this conversation as resolved.
Show resolved Hide resolved
EgorBo marked this conversation as resolved.
Show resolved Hide resolved

if (initBlkNode->IsVolatile())
{
// issue a full memory barrier before a volatile initBlock Operation
instGen_MemoryBarrier();
}

// mov zeroReg, wzr
EgorBo marked this conversation as resolved.
Show resolved Hide resolved
// str zeroReg, [dstReg]
// mov offsetReg, <block size>
//.LOOP:
// str zeroReg, [dstReg, offsetReg]
// subs offsetReg, offsetReg, #8
// bne .LOOP

const unsigned size = initBlkNode->GetLayout()->GetSize();
assert((size >= TARGET_POINTER_SIZE) && ((size % TARGET_POINTER_SIZE) == 0));

// The loop is reversed - it makes it smaller.
// Although, we zero the first pointer before the loop (the loop doesn't zero it)
// it works as a nullcheck, otherwise the first iteration would try to access
// "null + potentially large offset" and hit AV.
GetEmitter()->emitIns_R_R(INS_str, EA_PTRSIZE, zeroReg, dstReg);
if (size > TARGET_POINTER_SIZE)
{
// Extend liveness of dstReg in case if it gets killed by the store.
gcInfo.gcMarkRegPtrVal(dstReg, dstNode->TypeGet());

const regNumber offsetReg = initBlkNode->GetSingleTempReg();
instGen_Set_Reg_To_Imm(EA_PTRSIZE, offsetReg, size - TARGET_POINTER_SIZE);

BasicBlock* loop = genCreateTempLabel();
genDefineTempLabel(loop);

GetEmitter()->emitIns_R_R_R(INS_str, EA_PTRSIZE, zeroReg, dstReg, offsetReg);
#ifdef TARGET_ARM64
GetEmitter()->emitIns_R_R_I(INS_subs, EA_PTRSIZE, offsetReg, offsetReg, TARGET_POINTER_SIZE);
#else
GetEmitter()->emitIns_R_R_I(INS_sub, EA_PTRSIZE, offsetReg, offsetReg, TARGET_POINTER_SIZE, INS_FLAGS_SET);
#endif
inst_JMP(EJ_ne, loop);

gcInfo.gcMarkRegSetNpt(genRegMask(dstReg));
}
}

//------------------------------------------------------------------------
// genCall: Produce code for a GT_CALL node
//
Expand Down Expand Up @@ -4513,6 +4579,11 @@ void CodeGen::genCodeForStoreBlk(GenTreeBlk* blkOp)
genCodeForCpObj(blkOp->AsBlk());
break;

case GenTreeBlk::BlkOpKindLoop:
assert(!isCopyBlk);
genCodeForInitBlkLoop(blkOp);
break;

case GenTreeBlk::BlkOpKindHelper:
assert(!blkOp->gtBlkOpGcUnsafe);
if (isCopyBlk)
Expand Down
65 changes: 65 additions & 0 deletions src/coreclr/jit/codegenloongarch64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6386,6 +6386,66 @@ void CodeGen::genCodeForInitBlkHelper(GenTreeBlk* initBlkNode)
genEmitHelperCall(CORINFO_HELP_MEMSET, 0, EA_UNKNOWN);
}

//------------------------------------------------------------------------
// genCodeForInitBlkLoop - Generate code for an InitBlk using an inlined for-loop.
// It's needed for cases when size is too big to unroll and we're not allowed
// to use memset call due to atomicity requirements.
//
// Arguments:
// initBlkNode - the GT_STORE_BLK node
//
void CodeGen::genCodeForInitBlkLoop(GenTreeBlk* initBlkNode)
{
GenTree* const dstNode = initBlkNode->Addr();
GenTree* const zeroNode = initBlkNode->Data();

genConsumeReg(dstNode);
genConsumeReg(zeroNode);

const regNumber dstReg = dstNode->GetRegNum();
const regNumber zeroReg = zeroNode->GetRegNum();

if (initBlkNode->IsVolatile())
{
// issue a full memory barrier before a volatile initBlock Operation
instGen_MemoryBarrier();
}

const unsigned size = initBlkNode->GetLayout()->GetSize();
assert((size >= TARGET_POINTER_SIZE) && ((size % TARGET_POINTER_SIZE) == 0));

// The loop is reversed - it makes it smaller.
// Although, we zero the first pointer before the loop (the loop doesn't zero it)
// it works as a nullcheck, otherwise the first iteration would try to access
// "null + potentially large offset" and hit AV.
GetEmitter()->emitIns_R_R_I(INS_st_d, EA_PTRSIZE, zeroReg, dstReg, 0);
if (size > TARGET_POINTER_SIZE)
{
// Extend liveness of dstReg in case if it gets killed by the store.
gcInfo.gcMarkRegPtrVal(dstReg, dstNode->TypeGet());

const regNumber offsetReg = initBlkNode->ExtractTempReg();
const regNumber tempReg = initBlkNode->ExtractTempReg();
instGen_Set_Reg_To_Imm(EA_PTRSIZE, offsetReg, size - TARGET_POINTER_SIZE);

BasicBlock* loop = genCreateTempLabel();
genDefineTempLabel(loop);
GetEmitter()->emitDisableGC(); // TODO: add gcinfo to tempReg and remove nogc

// tempReg = dstReg + offset (a new interior pointer, but in a nongc region)
GetEmitter()->emitIns_R_R_R(INS_add_d, EA_PTRSIZE, tempReg, dstReg, offsetReg);
// *tempReg = 0
GetEmitter()->emitIns_R_R_R(INS_st_d, EA_PTRSIZE, zeroReg, tempReg, 0);
EgorBo marked this conversation as resolved.
Show resolved Hide resolved
// offsetReg = offsetReg - 8
GetEmitter()->emitIns_R_R_I(INS_addi_d, EA_PTRSIZE, offsetReg, offsetReg, -8);
// if (offsetReg != 0) goto loop;
GetEmitter()->emitIns_J_cond_la(INS_beq, loop, offsetReg, zeroReg);
GetEmitter()->emitEnableGC();

gcInfo.gcMarkRegSetNpt(genRegMask(dstReg));
}
}

// Generate code for a load from some address + offset
// base: tree node which can be either a local address or arbitrary node
// offset: distance from the base from which to load
Expand Down Expand Up @@ -7297,6 +7357,11 @@ void CodeGen::genCodeForStoreBlk(GenTreeBlk* blkOp)
genCodeForCpObj(blkOp->AsBlk());
break;

case GenTreeBlk::BlkOpKindLoop:
assert(!isCopyBlk);
genCodeForInitBlkLoop(blkOp);
break;

case GenTreeBlk::BlkOpKindHelper:
if (isCopyBlk)
{
Expand Down
65 changes: 65 additions & 0 deletions src/coreclr/jit/codegenriscv64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6077,6 +6077,66 @@ void CodeGen::genCodeForCpBlkUnroll(GenTreeBlk* cpBlkNode)
}
}

//------------------------------------------------------------------------
// genCodeForInitBlkLoop - Generate code for an InitBlk using an inlined for-loop.
// It's needed for cases when size is too big to unroll and we're not allowed
// to use memset call due to atomicity requirements.
//
// Arguments:
// initBlkNode - the GT_STORE_BLK node
//
void CodeGen::genCodeForInitBlkLoop(GenTreeBlk* initBlkNode)
{
GenTree* const dstNode = initBlkNode->Addr();
GenTree* const zeroNode = initBlkNode->Data();

genConsumeReg(dstNode);
genConsumeReg(zeroNode);

const regNumber dstReg = dstNode->GetRegNum();
const regNumber zeroReg = zeroNode->GetRegNum();

if (initBlkNode->IsVolatile())
{
// issue a full memory barrier before a volatile initBlock Operation
instGen_MemoryBarrier();
}

const unsigned size = initBlkNode->GetLayout()->GetSize();
assert((size >= TARGET_POINTER_SIZE) && ((size % TARGET_POINTER_SIZE) == 0));

// The loop is reversed - it makes it smaller.
// Although, we zero the first pointer before the loop (the loop doesn't zero it)
// it works as a nullcheck, otherwise the first iteration would try to access
// "null + potentially large offset" and hit AV.
GetEmitter()->emitIns_R_R_I(INS_sd, EA_PTRSIZE, zeroReg, dstReg, 0);
if (size > TARGET_POINTER_SIZE)
{
// Extend liveness of dstReg in case if it gets killed by the store.
gcInfo.gcMarkRegPtrVal(dstReg, dstNode->TypeGet());

const regNumber offsetReg = initBlkNode->ExtractTempReg();
const regNumber tempReg = initBlkNode->ExtractTempReg();
EgorBo marked this conversation as resolved.
Show resolved Hide resolved
instGen_Set_Reg_To_Imm(EA_PTRSIZE, offsetReg, size - TARGET_POINTER_SIZE);

BasicBlock* loop = genCreateTempLabel();
genDefineTempLabel(loop);
GetEmitter()->emitDisableGC(); // TODO: add gcinfo to tempReg and remove nogc

// tempReg = dstReg + offset (a new interior pointer, but in a nongc region)
GetEmitter()->emitIns_R_R_R(INS_add, EA_PTRSIZE, tempReg, dstReg, offsetReg);
// *tempReg = 0
GetEmitter()->emitIns_R_R_I(INS_sd, EA_PTRSIZE, zeroReg, tempReg, 0);
// offsetReg = offsetReg - 8
GetEmitter()->emitIns_R_R_I(INS_addi, EA_PTRSIZE, offsetReg, offsetReg, -8);
// if (offsetReg != 0) goto loop;
GetEmitter()->emitIns_J(INS_bnez, loop, offsetReg);
GetEmitter()->emitEnableGC();

gcInfo.gcMarkRegSetNpt(genRegMask(dstReg));
}
}

//------------------------------------------------------------------------
// genCodeForInitBlkHelper - Generate code for an InitBlk node by the means of the VM memcpy helper call
//
Expand Down Expand Up @@ -6966,6 +7026,11 @@ void CodeGen::genCodeForStoreBlk(GenTreeBlk* blkOp)
genCodeForCpObj(blkOp->AsBlk());
break;

case GenTreeBlk::BlkOpKindLoop:
assert(!isCopyBlk);
genCodeForInitBlkLoop(blkOp);
break;

case GenTreeBlk::BlkOpKindHelper:
if (isCopyBlk)
{
Expand Down
Loading
Loading