-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garbage Collection Thread is blocked waiting for another thread for 10 seconds or more. #44698
Comments
what |
It's DynaTrace. I have not tried disabling it. As it involves process restart. In prod there s whole process of RFC sadly :) By looking at stack trace, can we say explicitly it's DynaTrace at fault. |
Interestingly this issue only happened when FileShare was having latency issues. One API store& read data from it. |
Some Dynatrace .NET Agent insights:
Here are some things you can try on the Dynatrace side:
If this does not help and it still looks like Dynatrace thread is the culprit please feel free to open a support ticket! |
if there is a consistent repro yeah next step would be to ensure it reproes with it disabled or updated config as suggested above. |
first of all Thanks for taking a look. We are in process of doing diagnostics on DynaTrace end. Given I am newbie to WinDbg , I got following message. I do not fully understand what it means. |
.NET CPU Method hotspots in the used variation needs to suspend threads to do a stack walk. It might be the case that the thread is not resumed any more for whatever reason. This is then handled by some watchdog thread. |
I was to grab details of Critical Section from dumps. It says "Uninitialized or deleted." Both memory dumps have same message when I do !cs debuginfoaddress Also what this line means in stack trace?
Critical Section Info
|
please show the stack for the GC thread so we know what state GC is in. |
Finalizer Thread
State of Finalizer Thread
CLR Stack of Finalizer Thread
Full Detail of threads - Stack trace of all GC Threads - towards the end. Sorry its kind of very long
|
yeah, as I mentioned on twitter, GC hasn't started yet, it's just waiting for SuspendEE to return -
because that one thread is not getting suspended. interop folks might want to take a look - CC @AaronRobinsonMSFT. Aaron, could you please take a look at the first callstack on this issue? my guess is clr!JIT_InitPInvokeFrame is perhaps trying to suspend the thread because it observes suspension in progress and isn't successfully suspending thread? |
Thanks again for taking time look at it. In case this is helpful. I did !ThreadState on all GC Threads. Much slim output. Oh it indicates Stack Overflow, Do not know what to make of that.
|
@Maoni0 Yes, I believe that is the case. The managed function
That would be my first avenue for investigation. |
@AaronRobinsonMSFT thanks. Is there non-intrusive way to collect trace using perfview in prod for classic GC starvation issue for this specific issue |
@marafiq Not that I am aware of. Let me break this down a bit more. There are two P/Invoke mechanisms that are considered an implementation detail from the CLR's perspective. The first is a The key distinction between the I supposed it would be possible for the sampler to create a good/bad list of methods that indicate when sampling is to be avoided or at least temporarily suspended. This could be done by inspecting the metadata of all methods in |
Thanks makes lot of sense now. |
@marafiq I am going to close this issue since it is "by-design" from the CLR perspective. Please let us know if we can help in anyway to workaround this issue. |
Description
Performance issues observed in a web application when one of API end point starting responding slow due to slow file share dependency. But it triggered unexpected behavior which we have not seen before, i.e. 503, time out (Redis) exceptions & intermittent hangs. Primarily due to GC Thread is blocked by another thread.
Configuration
Analysis
To do further analysis, I took to two memory dumps (10 seconds apart) on one of the host for the affected process. Below command was used.
DebugDiag Analysis Report
Stack Trace of Thread 41 - Which happens to be doing the work for APM Agent.
Complete CLR STACK of Thread 41 - via WinDbg
I can not make much sense of it, does this tell us that it blocked GC thread but why/how?
thirdparty-APM is dll of third party APM platform.
WinDbg Analysis
I am newbie to WinDbg. But I was able to verify some basics diagnostics to rule out. But there are lot interesting details which I do not fully understand.
Note: Below are details from one of the memory dump, other memory dump also report very similar stats.
Thread Pool Details
GC Handles
DotMemory Finalizable Objects - Which seems to match GC Handles - Is it because GC thread is blocked?
!Threads
I tried to see what Thread 41 Stack Trace can tell me, but I can not make much sense of it?
!dumpheap -live -thinlock
Note: All thin lock objects of same size which 24. Which is free object according to Maoni Stephens "You can not lock a free object."
I am using WinDbg Preview - Debugger engine version: 10.0.20153.1000 Debugger client version: 1.0.2007.06001
0:041> !SyncBlk
Note: This also produces interesting, I believe these are coming from Bundling, but I am failing to conclude that this is the root cause? Help
I will appreciate any help in that regard to find the root cause. Kindly let me know if you need more information or what further analysis can be done on these memory dumps. Since latency, 503, Redis Timeout errors were observed only when File Share was responding slow, so I have not done further traces on production.
The text was updated successfully, but these errors were encountered: