-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
While I was hunting the strange memory unmapping issue during the last few weeks, one of the instrumentations that I have locally added was a failure on n-th allocation of memory in PAL. I have tried to run it in a loop today, iterating the n from 1 to some high number to see if I can hit some problem. I was running it with my single threaded testing app that allocates GC memory like mad until it OOMs and also catches the OOM exception.
And I have found a case when the app ends up hanging in gc_heap::wait_for_gc_done forever. I have looked into it and I think I understand what’s going on. The root of the issue is that the SetupUnstartedThread uses throwing operator new to allocate a new thread. When that allocation fails and it throws, the stack looks like shown below. The exception unwinds all the frames until it is caught in the JIT_NewArr1 (frame dotnet/coreclr#18 below). And the gc_heap::gc_started stays set. Now we try to allocate memory for the Throwable for the exception, so we go to GC heap and since there is not enough space, we end up calling gc_heap::try_allocate_more_space, which calls gc_heap::wait_for_gc_done. And that’s the end of it, since the gc_started is still set and we wait forever.