-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Collected delegate diagnostic #15809
Collected delegate diagnostic #15809
Conversation
@jkotas PTAL |
src/vm/amd64/cgenamd64.cpp
Outdated
m_movR10[1] = 0xBF; | ||
#endif | ||
|
||
FlushInstructionCache(GetCurrentProcess(), &m_movR10[0], &m_jmpRAX[3]-&m_movR10[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use ClrFlushInstructionCache
? It is no-op on Intel platforms that guarantee code cache coherency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you for suggestion. Should we use ClrFlushInstructionCache
instead of direct calls of the FlushInstructionCache
in other places (for example in UMEntryThunkCode::Encode
in arm/stubs.cpp
, arm64/stubs.cpp
, i386/cgenx86.cpp
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
MODE_COOPERATIVE; | ||
PRECONDITION(CheckPointer(pEntryThunk)); | ||
} | ||
CONTRACTL_END; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be useful to add a comment here that this diagnostic is best effort, it won't report the problem in 100% of cases, and it may sometime crash while trying to report the problem.
src/utilcode/loaderheap.cpp
Outdated
|
||
MergeBlock(pNewBlock, pHeap); | ||
if (pHeap->IsFIFO()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering how this would can interact with other places that use the executable heap, and whether we can make this generally more reliable.
Would it be better to move the FIFO into one level up to where UMEntryThunks are allocate? I am thinking about something like:
UMEntryThunk::CreateUMEntryThunk
{
if (number of cached thunks < 100)
Allocate a new thunk using GetGlobalLoaderAllocator()->GetExecutableHeap()
else
Use thunk from the LIFO cache
}
UMEntryThunk::Terminate
{
Add a thunk to LIFO cache
}
It looks good to me overall, modulo comments. Thank you for implementing it! |
e025ec5
to
c6e61e3
Compare
Thank you for review! I've updated PR. |
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test |
src/vm/dllimportcallback.cpp
Outdated
|
||
if (p == NULL) | ||
{ | ||
// On the phone, use loader heap to save memory commit of regular executable heap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Delete this comment. It is not relevant anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
c6e61e3
to
44eb523
Compare
src/vm/dllimportcallback.cpp
Outdated
++m_count; | ||
} | ||
|
||
m_list.InsertTail(new SListElem<UMEntryThunk*>(pThunk)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocating here is problematic. This method needs to have:
CONTRACTL
{
NOTHROW;
}
CONTRACTL_END;
contract because of UMEntryThunk::FreeUMEntryThunk
that calls it is nothrow.
We should just use the UMEntryThunks memory itself to maintain the list.
It may be better to not use SList for this, and just implement a custom linked list just for the UMEntryThunk here.
src/vm/dllimportcallback.cpp
Outdated
|
||
if (pElem != NULL) | ||
{ | ||
UMEntryThunkFreeList::FreeThunk(pElem->GetValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can put the thunk to the list all the time. No need to ever return it back to the LoaderHeap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do this, we can also delete LHF_ZEROINIT
flag on the LoaderHeap added by previous iteration of this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we don't need to return allocated chunks to the LoaderHeap? I think they can be reused since GlobalLoaderAllocator's executable heap is also used in other places.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside UMEntryThunks, the executable heap is used in a very few rarely used places. There is a close to zero chance that the returned memory would be reused for anything but UMEntryThunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for explanation, I've removed returning memory to the LoaderHeap in UMEntryThunkFreeList
.
Is LHF_ZEROINIT
flag not useful? I think it can reduce time of allocation if we don't need initialized memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoaderHeap allocates memory directly from the OS using mmap. This memory is zero initialized, so the zero initialization is free for the normal loader heap use.
!LHF_ZEROINIT
can only save something for cases where the memory is returned back to LoaderHeap. This was only done by UMEntryThunks on mainline paths. After this change, it will pretty much never happen. The memory is generally returned to LoaderHeap on exceptional paths only, like a complex operations like type loading fails in the middle and we need to backout the memory allocated so far - so that we do not have a memory leak if this operation is repeated again and again. We do not optimize performance on exceptional paths. We prefer simplicity for exceptional paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I've removed this option.
@@ -33,6 +33,90 @@ struct UM2MThunk_Args | |||
int argLen; | |||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can the MDA_SUPPORTED
code in dllimportcallback.*
because of it is superceeded by this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can delete MDA_SUPPORTED
code in dllimportcallback
, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is what I meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/vm/dllimportcallback.cpp
Outdated
|
||
~UMEntryThunkFreeList() | ||
{ | ||
SListElem<UMEntryThunk*> *pElem = m_list.GetHead(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This destructor won't be necessary once we stop allocating our own heap. (You do not need to worry about freeing memory allocated on LoaderHeap.)
44eb523
to
757f7a8
Compare
cc @parjong |
|
||
#define DEFAULT_THUNK_FREE_LIST_THRESHOLD 64 | ||
|
||
static UMEntryThunkFreeList s_thunkFreeList(DEFAULT_THUNK_FREE_LIST_THRESHOLD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This contains Crst
. Crst
s in static variables has to be CrstStatic
to avoid issues like: https://github.com/dotnet/coreclr/issues/13779#issuecomment-328007409
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
union | ||
{ | ||
// Pointer to the shared structure containing everything else | ||
PTR_UMThunkMarshInfo m_pUMThunkMarshInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks for making it explicit where the link lives.
src/vm/i386/cgenx86.cpp
Outdated
@@ -1607,14 +1607,19 @@ void UMEntryThunkCode::Encode(BYTE* pTargetCode, void* pvSecretParam) | |||
m_jmp = X86_INSTR_JMP_REL32; | |||
m_execstub = (BYTE*) ((pTargetCode) - (4+((BYTE*)&m_execstub))); | |||
|
|||
FlushInstructionCache(GetCurrentProcess(),GetEntryPoint(),sizeof(UMEntryThunkCode)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please keep the full FlushInstructionCache
in Encode (on x86 and x64 at least)?
We used to have issues with Time Travel Debugging that required explicit FlushInstructionCache to fix/workaround. I am not sure whether these issues still exist.
Using ClrFlushInstructionCache
in Poison should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/inc/CrstTypes.def
Outdated
@@ -745,6 +745,10 @@ Crst WrapperTemplate | |||
AcquiredBefore IbcProfile | |||
End | |||
|
|||
Crst UMEntryThunkFreeList | |||
AcquiredBefore LoaderHeap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering this with LoaderHeap should not be needed - you are not calling into LoaderHeap when the lock is taken.
Since your lock is a pretty simple leaf lock, you can just use CrstLeafLock for it and not add to this file at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for suggestion!
Improve UMEntryThunkCode::Poison to produce diagnostic message when collected delegate was called.
757f7a8
to
8238c90
Compare
Use free list to delay reusing deleted thunks. It improves collected delegate calls diagnostic.
This option was used for UMEntryThunkCode::Poison. Now we use own free list to store freed thunks and don't return allocated memory to the LoaderHeap. So reused thunks are always uninitialized.
8238c90
to
564db77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Improve collected delegate calls diagnostic (https://github.com/dotnet/coreclr/issues/15465):
UMEntryThunkCode::Poison
to produce diagnostic messageExample: