Add heuristic to trigger GC to finalize dead threads and clean up han… #10413

kouvel · 2017-03-23T09:16:33Z

…dles and memory

A thread that is started allocates some handles in Thread::AllocHandles(). After it terminates, the managed thread object needs to be collected by a GC for the handles to be released. Some applications that are mostly idle but have a timer ticking periodically will cause a thread pool thread to be created on each tick, since the thread pool preserves threads for a maximum of 20 seconds currently. Over time the number of handles accumulates to a high value. Thread creation adds some memory pressure, but that does not force a GC until a sufficiently large handle count, and for some mostly idle apps, that can be very long. The same issue arises with directly starting threads as well.

Fixes #6602:

Track a dead thread count separately from the current dead thread count. This is the count that will contribute to triggering a GC, once it reaches a threshold. The count is tracked separately so that it may be reset to zero when a max-generation GC occurs, preventing dead threads that survive a GC due to references from contributing to triggering a GC again in this fashion.
If the count exceeds a threshold, enumerate dead threads to see which GC generation the corresponding managed thread objects are in. If the duration of time since the last GC of the desired generation also exceeds a threshold, trigger a preferably non-blocking GC on the finalizer thread.
Removed a couple of handles and some code paths specific to user-requested thread suspension, which is not supported on CoreCLR

kouvel · 2017-03-23T09:24:38Z

I investigated whether it would be possible to delete handles and some unmanaged memory associated with a thread upon the thread exiting. This does not seem to be feasible due to races that would exist without the use of a lock to read thread state, which would be necessary on almost every Thread API call. The current mechanism eliminates the possibility of the native Thread object and its handles from being deleted concurrently with a Thread API call by associating their lifetime with the managed Thread object's finalizer.

So, going back to the original solution of adding a heuristic to trigger GC based on the number of dead threads accumulated since the last finalizing GC and time elapsed since the last finalizing GC.

@jkotas @Maoni0

jkotas · 2017-03-23T13:44:23Z

src/gc/gcinterface.h

    virtual HRESULT GarbageCollect(int generation = -1, BOOL low_memory_p = FALSE, int mode = collection_blocking) = 0;

+    // Gets the smallest GC generation that runs finalizers.
+    virtual unsigned GetMinFinalizingGeneration() = 0;


I do not understand what this means or why you need it. The finalizers run even after gen0 GC:

using System; using System.Runtime.CompilerServices; class Program { ~Program() { Console.WriteLine("Here"); } [MethodImpl(MethodImplOptions.NoInlining)] static void foo() { GC.KeepAlive(new Program()); } static void Main(string[] args) { GC.Collect(); foo(); GC.Collect(0); for (;;); } }

Hmm somehow I was under the other impression, looks like gen 0 is fine. I'll fix to specify 1 or max based on the other discussion. Doing only gen 0 GC probably would not cover enough cases.

jkotas · 2017-03-23T13:53:37Z

src/vm/threads.cpp

+    m_TriggerFinalizingGC = false;
+
+    IGCHeap *gcHeap = GCHeapUtilities::GetGCHeap();
+    gcHeap->GarbageCollect(gcHeap->GetMinFinalizingGeneration(), FALSE, collection_non_blocking);


If the waking threads have done just enough work to get them promoted to Gen2, this won't cleanup anything. Should this rather be full GC?

I had discussed this with @Maoni0 before, the concern was that the mostly idle apps that are running into this issue may have a large GC heap and this would cause those apps to do a full GC every period. Also, in all of the cases where I saw this issue manifest, the scenario was such that neither gen 1 nor gen 2 GCs were happening frequently enough. So doing a gen 1 GC should cover most of the cases that are running into this issue. On the other hand, as you said this would not cover thread objects that are promoted to gen 2. Thoughts on the tradeoff? Maybe it could trigger a gen 2 GC between a number of gen 1 GCs. Or if there is a way to see which generation an object is in, it could walk the dead threads' exposed objects and trigger a GC for the highest generation containing a dead thread (but that would include reachable dead threads too).

Or if there is a way to see which generation an object is in,

WhichGeneration

that would include reachable dead threads too

If you wanted to handle these better, you can have a flag on the thread that you will set once you have considered it once.

I would think that this needs to do Gen2 GCs at least sometimes if we do not ever want to hear about it again.

I thought you wanted this for the idling scenario where nothing is happening aside from the timer creating threads? If there's work going on, you'd have gen1 GCs (or sometimes gen2 GCs) happening anyway so you don't need to worry about this. So if you are triggering GCs, I'd think a gen1 GC would suffice 'cause all the previously created threads would be dead.

The work that's happening is not sufficient to automatically trigger a gen 2 GC (and usually not enough to trigger a gen 1 GC either), but in cases where a gen 1 GC is triggered automatically it's possible that threads that are still referenced get promoted to gen 2, but a gen 2 GC is not triggered frequently enough to clean up those threads, which can continue to increase the handle count as before.

Ok, I think this is different from the scenario I understood when we talked. The scenario I understood was the threads that you created before were all dead (except the current one) - nothing holding onto them - if something is holding onto them it means they are useful in some way and if that's the case, some work is in progress and GCs will probably be triggered anyway. But it sounds like this is not the case because you worry about a little bit of work is going on that's enough to trigger gen1's. I am hesitant to have it trigger gen2 GCs because then we might be back to square one where it's triggering too many gen2 GCs so we have to be careful about triggering gen2's. You could call WhichGeneration and see if the percentage of gen2 objects is high enough, then trigger a gen2 but I would experiment with this carefully.

As far as the observed scenarios are concerned, a gen 1 GC would suffice. Slightly different scenarios may trigger gen 1 GCs as a result of medium-level work, while referencing created threads for a short enough amount of time that they are still garbage but are alive long enough to get promoted to gen 2. I think it's beneficial to not leave any room for this being a continuing problem in the future. I plan on using WhichGeneration to trigger precisely the generation of GC that would benefit for this purpose, so this shouldn't cause excessive GCs in any scenario.

okie that sounds good

jkotas · 2017-03-23T13:54:17Z

src/vm/threads.h


 private:
    // For suspends:
+#ifndef FEATURE_CORECLR // CoreCLR does not support user-requested thread suspension


You can delete this. No need to add FEATURE_CORECLR ifdefs. All FEATURE_CORECLR ifdefs in VM were deleted some time ago.

swgillespie · 2017-03-23T16:54:56Z

src/gc/gcinterface.h

    SVAL_DECL(uint32_t, gcHeapType);
 #endif

+    static const uint32_t minFinalizingGeneration;


Ideally there won't be any static fields on IGCHeap - the two existing ones (gcHeapType and maxGeneration) will be getting removed soon, they were there because of DAC trickiness. Can we avoid putting this on the interface itself? (Provided it's necessary, per Jan's comment)

I'll remove this and will just specify a number relative to 0 or max

you can just use (max_generation - 1).

kouvel · 2017-03-24T11:47:04Z

Made all changes from above

kouvel · 2017-03-24T12:02:59Z

@dotnet-bot test this please

…dles and memory A thread that is started allocates some handles in Thread::AllocHandles(). After it terminates, the managed thread object needs to be collected by a GC for the handles to be released. Some applications that are mostly idle but have a timer ticking periodically will cause a thread pool thread to be created on each tick, since the thread pool preserves threads for a maximum of 20 seconds currently. Over time the number of handles accumulates to a high value. Thread creation adds some memory pressure, but that does not force a GC until a sufficiently large handle count, and for some mostly idle apps, that can be very long. The same issue arises with directly starting threads as well. Fixes #6602: - Track a dead thread count separately from the current dead thread count. This is the count that will contribute to triggering a GC, once it reaches a threshold. The count is tracked separately so that it may be reset to zero when a max-generation GC occurs, preventing dead threads that survive a GC due to references from contributing to triggering a GC again in this fashion. - If the count exceeds a threshold, enumerate dead threads to see which GC generation the corresponding managed thread objects are in. If the duration of time since the last GC of the desired generation also exceeds a threshold, trigger a preferably non-blocking GC on the finalizer thread. - Removed a couple of handles and some code paths specific to user-requested thread suspension, which is not supported on CoreCLR

kouvel · 2017-03-24T12:07:33Z

Updated commit description

jkotas · 2017-03-24T17:42:06Z

src/vm/threads.cpp

+    SIZE_T newDeadThreadGenerationCounts[3] = {0};
+    int countedMaxGeneration = _countof(newDeadThreadGenerationCounts) - 1;
+    {
+        GCX_COOP();


It looks dangerous to be switching to cooperating mode here after the thread is marked as dead. Can we just do the simple checks of the time and dead thread count thresholds here, kick the finalizer thread if they are hit, and do the more fine grained checks on the finalizer thread?

Done. Also fixed to clear the count even if a GC is not triggered after scanning threads due to not enough time having elapsed since the last GC at the same generation, so that it would not scan threads every time a thread becomes dead after the count threshold is reached and before the time threshold is reached.

jkotas · 2017-03-24T20:42:15Z

It may be nice to add a test that sets DeadThreadGCTriggerPeriodMilliseconds to very low number and then creates a ton of threads - to verify that there is nothing completely broken.

Maoni0 · 2017-03-24T21:56:54Z

src/vm/threads.cpp

+    IGCHeap *gcHeap = GCHeapUtilities::GetGCHeap();
+    _ASSERTE(gcHeap != nullptr);
+    SIZE_T generationCountThreshold = static_cast<SIZE_T>(s_DeadThreadCountThresholdForGCTrigger) / 2;
+    SIZE_T newDeadThreadGenerationCounts[3] = {0};


3 [](start = 41, length = 1)

this should really be max_generation + 1

Maoni0 · 2017-03-24T22:04:01Z

src/vm/threads.cpp

+                continue;
+            }
+
+            int exposedObjectGeneration = min(countedMaxGeneration, static_cast<int>(gcHeap->WhichGeneration(exposedObject)));


min [](start = 42, length = 3)

not sure why you needed to do this min - WhichGeneration is not going to return anything that's > max_generation

Maoni0 · 2017-03-24T22:22:50Z

src/vm/threads.cpp

+        }
+
+        // Make sure that enough time has elapsed since the last GC of the desired generation. Otherwise, the memory pressure
+        // would be sufficient to trigger GCs automatically.


not sure what to make of this comment - if not enough time has elapsed, we wouldn't want to trigger a GC anyway just because we don't want to trigger GCs too often.

Maoni0 · 2017-03-24T22:24:26Z

src/vm/threads.cpp

+            return;
+        }
+
+        // For threads whose exposed objects are in the generation of GC that will be triggered or in a lower GC generation,


threads whose exposed objects are in the generation of GC that will be triggered or in a lower GC generation, [](start = 14, length = 110)

When are dead threads removed from the thread list?

They're removed in the Thread destructor, which is typically called from the finalizer thread (CallFinalizerOnThreadObject, ..., Thread::DoExtraWorkForFinalizer => Thread::CleanupDetachedThreads => Thread::DecExternalCount)

kouvel · 2017-03-25T00:24:17Z

Made changes from above

dnfclas added the cla-already-signed label Mar 23, 2017

jkotas reviewed Mar 23, 2017

View reviewed changes

swgillespie reviewed Mar 23, 2017

View reviewed changes

Koundinya Veluri added 3 commits March 24, 2017 05:04

Address some feedback

8fca6ac

Trigger an appropriate generation of GC

81690dd

jkotas reviewed Mar 24, 2017

View reviewed changes

Move determination of GC generation to trigger to finalizer thread

dfcb32e

jkotas approved these changes Mar 24, 2017

View reviewed changes

Maoni0 reviewed Mar 24, 2017

View reviewed changes

Address feedback and add sanity test

a2d5058

kouvel closed this Mar 25, 2017

kouvel reopened this Mar 25, 2017

dnfclas added the cla-already-signed label Mar 25, 2017

kouvel merged commit 2401b6e into dotnet:master Mar 27, 2017

kouvel deleted the DeadThreads branch March 27, 2017 18:10

karelz modified the milestone: 2.0.0 Aug 28, 2017

Add heuristic to trigger GC to finalize dead threads and clean up han… #10413

Add heuristic to trigger GC to finalize dead threads and clean up han… #10413

Uh oh!

Conversation

kouvel commented Mar 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kouvel commented Mar 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Mar 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swgillespie Mar 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouvel commented Mar 24, 2017

Uh oh!

kouvel commented Mar 24, 2017

Uh oh!

kouvel commented Mar 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas commented Mar 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouvel commented Mar 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kouvel commented Mar 23, 2017 •

edited

Loading

kouvel commented Mar 23, 2017 •

edited

Loading

jkotas Mar 23, 2017 •

edited

Loading

swgillespie Mar 23, 2017 •

edited

Loading