Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collected delegate diagnostic #15809

Merged
merged 4 commits into from Jan 12, 2018
Merged

Conversation

@kbaladurin
Copy link
Member

@kbaladurin kbaladurin commented Jan 10, 2018

Improve collected delegate calls diagnostic (https://github.com/dotnet/coreclr/issues/15465):

  • Improve UMEntryThunkCode::Poison to produce diagnostic message
  • Store freed thunks into FIFO free list, in this case it takes longer before the thunks get reused

Example:

$ ./corerun delegategc.dll
Press ESC to stop
register_callback:0x7f293afa316c ----
force GC!!!
upcall ----
FailFast: A callback was made on a garbage collected delegate of type 'delegategc!DelegateGC.Program+Callback::Invoke'.

   at DelegateGC.Program.upcall()
   at DelegateGC.Program.upcall()
   at DelegateGC.Program.MyView_KeyEvent(System.ConsoleKeyInfo)
   at DelegateGC.Program.Main(System.String[])
Aborted (core dumped)
@kbaladurin
Copy link
Member Author

@kbaladurin kbaladurin commented Jan 10, 2018

@jkotas PTAL

m_movR10[1] = 0xBF;
#endif

FlushInstructionCache(GetCurrentProcess(), &m_movR10[0], &m_jmpRAX[3]-&m_movR10[0]);
Copy link
Member

@jkotas jkotas Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use ClrFlushInstructionCache? It is no-op on Intel platforms that guarantee code cache coherency.

Copy link
Member Author

@kbaladurin kbaladurin Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you for suggestion. Should we use ClrFlushInstructionCache instead of direct calls of the FlushInstructionCache in other places (for example in UMEntryThunkCode::Encode in arm/stubs.cpp, arm64/stubs.cpp, i386/cgenx86.cpp)?

Copy link
Member

@jkotas jkotas Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.

MODE_COOPERATIVE;
PRECONDITION(CheckPointer(pEntryThunk));
}
CONTRACTL_END;
Copy link
Member

@jkotas jkotas Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be useful to add a comment here that this diagnostic is best effort, it won't report the problem in 100% of cases, and it may sometime crash while trying to report the problem.


MergeBlock(pNewBlock, pHeap);
if (pHeap->IsFIFO())
Copy link
Member

@jkotas jkotas Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering how this would can interact with other places that use the executable heap, and whether we can make this generally more reliable.

Would it be better to move the FIFO into one level up to where UMEntryThunks are allocate? I am thinking about something like:

UMEntryThunk::CreateUMEntryThunk
{
    if (number of cached thunks < 100)
        Allocate a new thunk using GetGlobalLoaderAllocator()->GetExecutableHeap()
    else
        Use thunk from the LIFO cache
}

UMEntryThunk::Terminate
{
    Add a thunk to LIFO cache
}

@jkotas
Copy link
Member

@jkotas jkotas commented Jan 10, 2018

It looks good to me overall, modulo comments. Thank you for implementing it!

@kbaladurin kbaladurin force-pushed the collected-delegate-diagnostic branch 2 times, most recently from e025ec5 to c6e61e3 Jan 11, 2018
@kbaladurin
Copy link
Member Author

@kbaladurin kbaladurin commented Jan 11, 2018

Thank you for review! I've updated PR.

@kbaladurin
Copy link
Member Author

@kbaladurin kbaladurin commented Jan 11, 2018

@kbaladurin
Copy link
Member Author

@kbaladurin kbaladurin commented Jan 11, 2018

@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test


if (p == NULL)
{
// On the phone, use loader heap to save memory commit of regular executable heap
Copy link
Member

@jkotas jkotas Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Delete this comment. It is not relevant anymore.

Copy link
Member Author

@kbaladurin kbaladurin Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@kbaladurin kbaladurin force-pushed the collected-delegate-diagnostic branch from c6e61e3 to 44eb523 Jan 11, 2018
++m_count;
}

m_list.InsertTail(new SListElem<UMEntryThunk*>(pThunk));
Copy link
Member

@jkotas jkotas Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating here is problematic. This method needs to have:

   CONTRACTL
   {
       NOTHROW;
   }
   CONTRACTL_END;

contract because of UMEntryThunk::FreeUMEntryThunk that calls it is nothrow.

We should just use the UMEntryThunks memory itself to maintain the list.

It may be better to not use SList for this, and just implement a custom linked list just for the UMEntryThunk here.


if (pElem != NULL)
{
UMEntryThunkFreeList::FreeThunk(pElem->GetValue());
Copy link
Member

@jkotas jkotas Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can put the thunk to the list all the time. No need to ever return it back to the LoaderHeap.

Copy link
Member

@jkotas jkotas Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do this, we can also delete LHF_ZEROINIT flag on the LoaderHeap added by previous iteration of this change.

Copy link
Member Author

@kbaladurin kbaladurin Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't need to return allocated chunks to the LoaderHeap? I think they can be reused since GlobalLoaderAllocator's executable heap is also used in other places.
Thank you!

Copy link
Member

@jkotas jkotas Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside UMEntryThunks, the executable heap is used in a very few rarely used places. There is a close to zero chance that the returned memory would be reused for anything but UMEntryThunks.

Copy link
Member Author

@kbaladurin kbaladurin Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explanation, I've removed returning memory to the LoaderHeap in UMEntryThunkFreeList.
Is LHF_ZEROINIT flag not useful? I think it can reduce time of allocation if we don't need initialized memory.

Copy link
Member

@jkotas jkotas Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoaderHeap allocates memory directly from the OS using mmap. This memory is zero initialized, so the zero initialization is free for the normal loader heap use.

!LHF_ZEROINIT can only save something for cases where the memory is returned back to LoaderHeap. This was only done by UMEntryThunks on mainline paths. After this change, it will pretty much never happen. The memory is generally returned to LoaderHeap on exceptional paths only, like a complex operations like type loading fails in the middle and we need to backout the memory allocated so far - so that we do not have a memory leak if this operation is repeated again and again. We do not optimize performance on exceptional paths. We prefer simplicity for exceptional paths.

Copy link
Member Author

@kbaladurin kbaladurin Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I've removed this option.

@@ -33,6 +33,90 @@ struct UM2MThunk_Args
int argLen;
};
Copy link
Member

@jkotas jkotas Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can the MDA_SUPPORTED code in dllimportcallback.* because of it is superceeded by this change.

Copy link
Member Author

@kbaladurin kbaladurin Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can delete MDA_SUPPORTED code in dllimportcallback, yes?

Copy link
Member

@jkotas jkotas Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is what I meant.

Copy link
Member Author

@kbaladurin kbaladurin Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


~UMEntryThunkFreeList()
{
SListElem<UMEntryThunk*> *pElem = m_list.GetHead();
Copy link
Member

@jkotas jkotas Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This destructor won't be necessary once we stop allocating our own heap. (You do not need to worry about freeing memory allocated on LoaderHeap.)

@kbaladurin kbaladurin force-pushed the collected-delegate-diagnostic branch from 44eb523 to 757f7a8 Jan 12, 2018
@kbaladurin
Copy link
Member Author

@kbaladurin kbaladurin commented Jan 12, 2018


#define DEFAULT_THUNK_FREE_LIST_THRESHOLD 64

static UMEntryThunkFreeList s_thunkFreeList(DEFAULT_THUNK_FREE_LIST_THRESHOLD);
Copy link
Member

@jkotas jkotas Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contains Crst. Crsts in static variables has to be CrstStatic to avoid issues like: https://github.com/dotnet/coreclr/issues/13779#issuecomment-328007409

Copy link
Member Author

@kbaladurin kbaladurin Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

union
{
// Pointer to the shared structure containing everything else
PTR_UMThunkMarshInfo m_pUMThunkMarshInfo;
Copy link
Member

@jkotas jkotas Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Thanks for making it explicit where the link lives.

@@ -1607,14 +1607,19 @@ void UMEntryThunkCode::Encode(BYTE* pTargetCode, void* pvSecretParam)
m_jmp = X86_INSTR_JMP_REL32;
m_execstub = (BYTE*) ((pTargetCode) - (4+((BYTE*)&m_execstub)));

FlushInstructionCache(GetCurrentProcess(),GetEntryPoint(),sizeof(UMEntryThunkCode));
Copy link
Member

@jkotas jkotas Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please keep the full FlushInstructionCache in Encode (on x86 and x64 at least)?

We used to have issues with Time Travel Debugging that required explicit FlushInstructionCache to fix/workaround. I am not sure whether these issues still exist.

Using ClrFlushInstructionCache in Poison should be fine.

Copy link
Member Author

@kbaladurin kbaladurin Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -745,6 +745,10 @@ Crst WrapperTemplate
AcquiredBefore IbcProfile
End

Crst UMEntryThunkFreeList
AcquiredBefore LoaderHeap
Copy link
Member

@jkotas jkotas Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering this with LoaderHeap should not be needed - you are not calling into LoaderHeap when the lock is taken.

Since your lock is a pretty simple leaf lock, you can just use CrstLeafLock for it and not add to this file at all.

Copy link
Member Author

@kbaladurin kbaladurin Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for suggestion!

Improve UMEntryThunkCode::Poison to produce diagnostic message
when collected delegate was called.
@kbaladurin kbaladurin force-pushed the collected-delegate-diagnostic branch from 757f7a8 to 8238c90 Jan 12, 2018
Konstantin Baladurin added 3 commits Jan 12, 2018
Use free list to delay reusing deleted thunks. It improves
collected delegate calls diagnostic.
This option was used for UMEntryThunkCode::Poison. Now we use own free list
to store freed thunks and don't return allocated memory to the LoaderHeap.
So reused thunks are always uninitialized.
@kbaladurin kbaladurin force-pushed the collected-delegate-diagnostic branch from 8238c90 to 564db77 Jan 12, 2018
jkotas
jkotas approved these changes Jan 12, 2018
Copy link
Member

@jkotas jkotas left a comment

Thank you!

@jkotas jkotas merged commit 02f172c into dotnet:master Jan 12, 2018
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants