Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible finalizer thread stuck on System.RuntimeMethodHandle.Destroy #31782

Closed
vtortola opened this issue Sep 13, 2023 · 26 comments
Closed

Possible finalizer thread stuck on System.RuntimeMethodHandle.Destroy #31782

vtortola opened this issue Sep 13, 2023 · 26 comments
Assignees
Labels
area-perf area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported Servicing-approved type-enhancement
Milestone

Comments

@vtortola
Copy link

vtortola commented Sep 13, 2023

Description

When a single application instance reaches ~1700 Rps and ~85% Cpu usage during a short period of time due a traffic spike, around a 25% of our servers experiment what it seems to be a blockage in the finalizer thread and the application starts hoarding memory and increasing latency. Eventually we have to kill the server when memory is at 10X of what it uses normally and latency is not acceptable.

Exploring with dotnet-dump, we see that after the requests spike the finalizer queue starts to accumulate objects. Taking multiple dumps shows that the "Ready for finalization" keep growing in each heap.

When exploring the threads, there is always a thread 0x001B that is stuck in this frame:

> clrstack -a                                                                                                                                                                                                                     
OS Thread Id: 0x1b (21)
        Child SP               IP Call Site
00007F0002EC9930 00007f00081947b2 [InlinedCallFrame: 00007f0002ec9930] 
00007F0002EC9930 00007eff88696957 [InlinedCallFrame: 00007f0002ec9930] 
00007F0002EC9920 00007EFF88696957 System.RuntimeMethodHandle.Destroy(System.RuntimeMethodHandleInternal)
    PARAMETERS:
        method = <no data>

00007F0002EC99C0 00007EFF9623E087 System.Reflection.Emit.DynamicResolver+DestroyScout.Finalize() [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicILGenerator.cs @ 669]
    PARAMETERS:
        this = <no data>

00007F0002EC9D30 00007f0007957cc6 [DebuggerU2MCatchHandlerFrame: 00007f0002ec9d30] 

If I take a dump 30 minutes later, it still shows this frame with this same data in this thread.

Those servers keep having a higher % of time in GC
image

Configuration

Server is an Azure Standard_F16s_v2 (16 cores, 32 GiB RAM).

Docker image mcr.microsoft.com/dotnet/aspnet:7.0.11.

# dotnet --info                                                                                                                                                                               

Host:
  Version:      7.0.11
  Architecture: x64
  Commit:       ecb34f85ec

.NET SDKs installed:
  No SDKs were found.

.NET runtimes installed:
  Microsoft.AspNetCore.App 7.0.11 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 7.0.11 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

# uname -a
Linux live-api-75fcf998d-hcwvz 5.15.0-1041-azure dotnet/runtime#48-Ubuntu SMP Tue Jun 20 20:34:08 UTC 2023 x86_64 GNU/Linux

Regression?

We are not completely sure, but we have the feeling it started happening when we moved to .NET7 from .NET6.
When we were in .NET6 we did not do see this kind of situation. After we moved to .NET7 we started seeing some machines using an unexpected big amount of memory that starts on a traffic spike situation.

Data

> syncblk                                                                                                                                                                                                                         
Index         SyncBlock MonitorHeld Recursion Owning Thread Info          SyncBlock Owner
-----------------------------
Total           2661
Free            793

> finalizequeue -stat                                                                                                                                                                                                             
SyncBlocks to be cleaned up: 0
Free-Threaded Interfaces to be released: 0
MTA Interfaces to be released: 0
STA Interfaces to be released: 0
----------------------------------

Heap 0
generation 0 has 29 objects (7ef79c121920->7ef79c121a08)
generation 1 has 2,464 objects (7ef79c11cc20->7ef79c121920)
generation 2 has 0 objects (7ef79c11cc20->7ef79c11cc20)
Ready for finalization 560 objects (7ef79c123438->7ef79c1245b8)
------------------------------
Heap 1
generation 0 has 5 objects (7ef5f820ada8->7ef5f820add0)
generation 1 has 2,013 objects (7ef5f8206ec0->7ef5f820ada8)
generation 2 has 0 objects (7ef5f8206ec0->7ef5f8206ec0)
Ready for finalization 309 objects (7ef5f820b480->7ef5f820be28)
------------------------------
Heap 2
generation 0 has 7 objects (7ef5bc101488->7ef5bc1014c0)
generation 1 has 1,409 objects (7ef5bc0fe880->7ef5bc101488)
generation 2 has 0 objects (7ef5bc0fe880->7ef5bc0fe880)
Ready for finalization 321 objects (7ef5bc101ea0->7ef5bc1028a8)
------------------------------
Heap 3
generation 0 has 14 objects (7ef6142535b8->7ef614253628)
generation 1 has 1,295 objects (7ef614250d40->7ef6142535b8)
generation 2 has 0 objects (7ef614250d40->7ef614250d40)
Ready for finalization 321 objects (7ef6142540a0->7ef614254aa8)
------------------------------
Heap 4
generation 0 has 2 objects (7ef7a0bbfa88->7ef7a0bbfa98)
generation 1 has 729 objects (7ef7a0bbe3c0->7ef7a0bbfa88)
generation 2 has 0 objects (7ef7a0bbe3c0->7ef7a0bbe3c0)
Ready for finalization 322 objects (7ef7a0bc05c0->7ef7a0bc0fd0)
------------------------------
Heap 5
generation 0 has 10 objects (7eff84129200->7eff84129250)
generation 1 has 992 objects (7eff84127300->7eff84129200)
generation 2 has 0 objects (7eff84127300->7eff84127300)
Ready for finalization 388 objects (7eff84129bd0->7eff8412a7f0)
------------------------------
Heap 6
generation 0 has 5 objects (7ef694166578->7ef6941665a0)
generation 1 has 1,511 objects (7ef694163640->7ef694166578)
generation 2 has 0 objects (7ef694163640->7ef694163640)
Ready for finalization 339 objects (7ef694166fa8->7ef694167a40)
------------------------------
Heap 7
generation 0 has 2 objects (7ef65c27acc0->7ef65c27acd0)
generation 1 has 1,692 objects (7ef65c2777e0->7ef65c27acc0)
generation 2 has 0 objects (7ef65c2777e0->7ef65c2777e0)
Ready for finalization 316 objects (7ef65c27b690->7ef65c27c070)
------------------------------
Heap 8
generation 0 has 5 objects (7ef79c180c20->7ef79c180c48)
generation 1 has 594 objects (7ef79c17f990->7ef79c180c20)
generation 2 has 0 objects (7ef79c17f990->7ef79c17f990)
Ready for finalization 376 objects (7ef79c1818f0->7ef79c1824b0)
------------------------------
Heap 9
generation 0 has 7 objects (7ef770377290->7ef7703772c8)
generation 1 has 832 objects (7ef770375890->7ef770377290)
generation 2 has 0 objects (7ef770375890->7ef770375890)
Ready for finalization 388 objects (7ef770377bd0->7ef7703787f0)
------------------------------
Heap 10
generation 0 has 10 objects (7ef79c245ce0->7ef79c245d30)
generation 1 has 804 objects (7ef79c2443c0->7ef79c245ce0)
generation 2 has 0 objects (7ef79c2443c0->7ef79c2443c0)
Ready for finalization 390 objects (7ef79c246408->7ef79c247038)
------------------------------
Heap 11
generation 0 has 8 objects (7ef6881c80c8->7ef6881c8108)
generation 1 has 1,297 objects (7ef6881c5840->7ef6881c80c8)
generation 2 has 0 objects (7ef6881c5840->7ef6881c5840)
Ready for finalization 327 objects (7ef6881c8b78->7ef6881c95b0)
------------------------------
Heap 12
generation 0 has 10 objects (7ef79c2af180->7ef79c2af1d0)
generation 1 has 2,222 objects (7ef79c2aac10->7ef79c2af180)
generation 2 has 0 objects (7ef79c2aac10->7ef79c2aac10)
Ready for finalization 485 objects (7ef79c2afc38->7ef79c2b0b60)
------------------------------
Heap 13
generation 0 has 13 objects (7ef74815af60->7ef74815afc8)
generation 1 has 350 objects (7ef74815a470->7ef74815af60)
generation 2 has 0 objects (7ef74815a470->7ef74815a470)
Ready for finalization 329 objects (7ef74815b948->7ef74815c390)
------------------------------
Heap 14
generation 0 has 8 objects (7ef79c79e070->7ef79c79e0b0)
generation 1 has 322 objects (7ef79c79d660->7ef79c79e070)
generation 2 has 0 objects (7ef79c79d660->7ef79c79d660)
Ready for finalization 25,489 objects (7ef79c79e9b8->7ef79c7d0640)
------------------------------
Heap 15
generation 0 has 7 objects (7ef6802dadd0->7ef6802dae08)
generation 1 has 508 objects (7ef6802d9df0->7ef6802dadd0)
generation 2 has 0 objects (7ef6802d9df0->7ef6802d9df0)
Ready for finalization 404,766 objects (7ef6802db588->7ef6805f1e78)
------------------------------
Statistics for all finalizable objects (including all objects ready for finalization):
Statistics:
          MT   Count  TotalSize Class Name

[...]

7eff8de90098     538     17,216 Microsoft.Win32.SafeHandles.SafeEvpMdCtxHandle
7eff917fa6f8      70     18,480 Microsoft.Data.SqlClient.SNI.SNISslStream
7eff90117f28     355     19,880 System.Runtime.CompilerServices.ConditionalWeakTable<System.Text.Json.JsonSerializerOptions, System.Object>+Container
7eff9433c1f0     361     20,216 System.IO.Compression.Deflater
7eff8c2c7b88     316     20,224 System.Net.Sockets.SafeSocketHandle
7eff8d8ce2c8     663     21,216 Microsoft.Win32.SafeHandles.SafeX509Handle
7eff8d830920   1,255     30,120 System.Threading.TimerHolder
7eff9433efe8     394     31,520 System.IO.Compression.ZLibNative+ZLibStreamHandle
7eff8c2c5228     316     37,920 System.Net.Sockets.Socket
7eff927f5370     180     44,640 Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver
7eff8d859958     223     62,440 System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs
7eff9146acb8     389     80,912 Microsoft.Data.SqlClient.SqlConnection
7eff9149f348     980    290,080 Microsoft.Data.SqlClient.SqlCommand
7eff89253d70   4,192    301,824 System.Threading.Thread
7eff90b81468 247,708  5,944,992 System.Reflection.Emit.DynamicResolver+DestroyScout
7eff8ad24450 199,413 14,357,736 System..Reflection.Emit.DynamicResolver
Total 460,012 objects, 21,441,768 bytes
> dumpheap -stat

[...]

7eff8acfc688   207,010   9,936,480 System.Reflection.Emit.DynamicMethod+RTDynamicMethod
7eff8ad200a8   207,010   9,936,480 System.Reflection.Emit.ScopeTree
7eff8acfcf68   208,325   9,999,600 System.Reflection.Emit.SignatureHelper
7eff8919ec38   605,088  14,522,112 System.RuntimeTypeHandle
7eff8ad24450   207,001  14,904,072 System.Reflection.Emit.DynamicResolver
7eff89817a20   212,441  20,394,336 System.RuntimeMethodInfoStub
7eff8acfbda0   207,010  23,185,120 System.Reflection.Emit.DynamicMethod
7eff8912a508   358,123  25,959,536 System.Object[]
7eff8acfb6e8   207,010  31,465,520 System.Reflection.Emit.DynamicILGenerator
7eff89251038 1,241,746 120,095,736 System.String
7eff89773338 1,070,259 136,469,834 System.Byte[]
7eff892b7aa8    90,512 260,066,802 System.Char[]
55fb5eb4e900    55,769 364,019,016 Free
Total 10,141,843 objects, 1,321,079,065 bytes
[System.Runtime]
    % Time in GC since last GC (%)                                         5    
    Allocation Rate (B / 1 sec)                                       6.2155e+08   
    CPU Usage (%)                                                         57.81 
    Exception Count (Count / 1 sec)                                       94    
    GC Committed Bytes (MB)                                            1,675.366
    GC Fragmentation (%)                                                  44.356
    GC Heap Size (MB)                                                    637.815
    Gen 0 GC Count (Count / 1 sec)                                         2    
    Gen 0 Size (B)                                                 4,236,872    
    Gen 1 GC Count (Count / 1 sec)                                         1    
    Gen 1 Size (B)                                                11,126,616    
    Gen 2 GC Count (Count / 1 sec)                                         0    
    Gen 2 Size (B)                                                    5.2245e+08   
    IL Bytes Jitted (B)                                               3.1923e+08   
    LOH Size (B)                                                      5.7157e+08   
    Monitor Lock Contention Count (Count / 1 sec)                      1,212    
    Number of Active Timers                                               46    
    Number of Assemblies Loaded                                          252    
    Number of Methods Jitted                                       9,647,443    
    POH (Pinned Object Heap) Size (B)                              4,465,760    
    ThreadPool Completed Work Item Count (Count / 1 sec)              23,982    
@jkotas
Copy link
Member

jkotas commented Sep 13, 2023

If I take a dump 30 minutes later, it still shows this frame with this same data in this thread.

This fragment does not show the handle of the method being finalized. It is not a proof that the finalizer is stuck finalizing the same dynamic method. What is the full stacktrace from k command? Can you find the handle of the method being finalized from it?

The most likely explanation of the symptoms is that each request produces one or more DynamicMethods and the finalizer thread is not able to clean them up fast enough. Try to find out what is producing these dynamic methods. The dynamic methods are typically meant to be cached, and it is not unusual to a have a bug where it is not happening.

The problem can be magnified by enable tracing. Tracing makes the cleanup of DynamicMethods more expensive and so it makes it more likely that the finalizer thread won't be able to keep up with high DynamicMethod churn.

@ghost
Copy link

ghost commented Sep 13, 2023

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When our application reaches ~1700 Rps and ~85% Cpu usage per server during a short period of time due a traffic spike, around a 25% of our servers experiment what it seems to be a blockage in the finalizer thread and the application starts hoarding memory and increasing latency. Eventually we have to kill the server when memory is at 10X of what it uses normally and latency is not acceptable.

Exploring with dotnet-dump, we see that after the requests spike the finalizer queue starts to accumulate objects. Taking multiple dumps shows that the "Ready for finalization" keep growing in each heap.

When exploring the threads, there is always a thread 0x001B that is stuck in this frame:

> clrstack -a                                                                                                                                                                                                                     
OS Thread Id: 0x1b (21)
        Child SP               IP Call Site
00007F0002EC9930 00007f00081947b2 [InlinedCallFrame: 00007f0002ec9930] 
00007F0002EC9930 00007eff88696957 [InlinedCallFrame: 00007f0002ec9930] 
00007F0002EC9920 00007EFF88696957 System.RuntimeMethodHandle.Destroy(System.RuntimeMethodHandleInternal)
    PARAMETERS:
        method = <no data>

00007F0002EC99C0 00007EFF9623E087 System.Reflection.Emit.DynamicResolver+DestroyScout.Finalize() [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicILGenerator.cs @ 669]
    PARAMETERS:
        this = <no data>

00007F0002EC9D30 00007f0007957cc6 [DebuggerU2MCatchHandlerFrame: 00007f0002ec9d30] 

If I take a dump 30 minutes later, it still shows this frame with this same data in this thread.

Those servers keep having a higher % of time in GC
image

Configuration

Server is an Azure Standard_F16s_v2 (16 cores, 32 GiB RAM).

Docker image mcr.microsoft.com/dotnet/aspnet:7.0.11.

# dotnet --info                                                                                                                                                                               

Host:
  Version:      7.0.11
  Architecture: x64
  Commit:       ecb34f85ec

.NET SDKs installed:
  No SDKs were found.

.NET runtimes installed:
  Microsoft.AspNetCore.App 7.0.11 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 7.0.11 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

# uname -a
Linux live-api-75fcf998d-hcwvz 5.15.0-1041-azure dotnet/runtime#48-Ubuntu SMP Tue Jun 20 20:34:08 UTC 2023 x86_64 GNU/Linux

Regression?

We are not completely sure, but we have the feeling it started happening when we moved to .NET7 from .NET6.
When we were in .NET6 we did not do see this kind of situation. After we moved to .NET7 we started seeing some machines using an unexpected big amount of memory that starts on a traffic spike situation.

Data

> syncblk                                                                                                                                                                                                                         
Index         SyncBlock MonitorHeld Recursion Owning Thread Info          SyncBlock Owner
-----------------------------
Total           2661
Free            793

> finalizequeue -stat                                                                                                                                                                                                             
SyncBlocks to be cleaned up: 0
Free-Threaded Interfaces to be released: 0
MTA Interfaces to be released: 0
STA Interfaces to be released: 0
----------------------------------

Heap 0
generation 0 has 29 objects (7ef79c121920->7ef79c121a08)
generation 1 has 2,464 objects (7ef79c11cc20->7ef79c121920)
generation 2 has 0 objects (7ef79c11cc20->7ef79c11cc20)
Ready for finalization 560 objects (7ef79c123438->7ef79c1245b8)
------------------------------
Heap 1
generation 0 has 5 objects (7ef5f820ada8->7ef5f820add0)
generation 1 has 2,013 objects (7ef5f8206ec0->7ef5f820ada8)
generation 2 has 0 objects (7ef5f8206ec0->7ef5f8206ec0)
Ready for finalization 309 objects (7ef5f820b480->7ef5f820be28)
------------------------------
Heap 2
generation 0 has 7 objects (7ef5bc101488->7ef5bc1014c0)
generation 1 has 1,409 objects (7ef5bc0fe880->7ef5bc101488)
generation 2 has 0 objects (7ef5bc0fe880->7ef5bc0fe880)
Ready for finalization 321 objects (7ef5bc101ea0->7ef5bc1028a8)
------------------------------
Heap 3
generation 0 has 14 objects (7ef6142535b8->7ef614253628)
generation 1 has 1,295 objects (7ef614250d40->7ef6142535b8)
generation 2 has 0 objects (7ef614250d40->7ef614250d40)
Ready for finalization 321 objects (7ef6142540a0->7ef614254aa8)
------------------------------
Heap 4
generation 0 has 2 objects (7ef7a0bbfa88->7ef7a0bbfa98)
generation 1 has 729 objects (7ef7a0bbe3c0->7ef7a0bbfa88)
generation 2 has 0 objects (7ef7a0bbe3c0->7ef7a0bbe3c0)
Ready for finalization 322 objects (7ef7a0bc05c0->7ef7a0bc0fd0)
------------------------------
Heap 5
generation 0 has 10 objects (7eff84129200->7eff84129250)
generation 1 has 992 objects (7eff84127300->7eff84129200)
generation 2 has 0 objects (7eff84127300->7eff84127300)
Ready for finalization 388 objects (7eff84129bd0->7eff8412a7f0)
------------------------------
Heap 6
generation 0 has 5 objects (7ef694166578->7ef6941665a0)
generation 1 has 1,511 objects (7ef694163640->7ef694166578)
generation 2 has 0 objects (7ef694163640->7ef694163640)
Ready for finalization 339 objects (7ef694166fa8->7ef694167a40)
------------------------------
Heap 7
generation 0 has 2 objects (7ef65c27acc0->7ef65c27acd0)
generation 1 has 1,692 objects (7ef65c2777e0->7ef65c27acc0)
generation 2 has 0 objects (7ef65c2777e0->7ef65c2777e0)
Ready for finalization 316 objects (7ef65c27b690->7ef65c27c070)
------------------------------
Heap 8
generation 0 has 5 objects (7ef79c180c20->7ef79c180c48)
generation 1 has 594 objects (7ef79c17f990->7ef79c180c20)
generation 2 has 0 objects (7ef79c17f990->7ef79c17f990)
Ready for finalization 376 objects (7ef79c1818f0->7ef79c1824b0)
------------------------------
Heap 9
generation 0 has 7 objects (7ef770377290->7ef7703772c8)
generation 1 has 832 objects (7ef770375890->7ef770377290)
generation 2 has 0 objects (7ef770375890->7ef770375890)
Ready for finalization 388 objects (7ef770377bd0->7ef7703787f0)
------------------------------
Heap 10
generation 0 has 10 objects (7ef79c245ce0->7ef79c245d30)
generation 1 has 804 objects (7ef79c2443c0->7ef79c245ce0)
generation 2 has 0 objects (7ef79c2443c0->7ef79c2443c0)
Ready for finalization 390 objects (7ef79c246408->7ef79c247038)
------------------------------
Heap 11
generation 0 has 8 objects (7ef6881c80c8->7ef6881c8108)
generation 1 has 1,297 objects (7ef6881c5840->7ef6881c80c8)
generation 2 has 0 objects (7ef6881c5840->7ef6881c5840)
Ready for finalization 327 objects (7ef6881c8b78->7ef6881c95b0)
------------------------------
Heap 12
generation 0 has 10 objects (7ef79c2af180->7ef79c2af1d0)
generation 1 has 2,222 objects (7ef79c2aac10->7ef79c2af180)
generation 2 has 0 objects (7ef79c2aac10->7ef79c2aac10)
Ready for finalization 485 objects (7ef79c2afc38->7ef79c2b0b60)
------------------------------
Heap 13
generation 0 has 13 objects (7ef74815af60->7ef74815afc8)
generation 1 has 350 objects (7ef74815a470->7ef74815af60)
generation 2 has 0 objects (7ef74815a470->7ef74815a470)
Ready for finalization 329 objects (7ef74815b948->7ef74815c390)
------------------------------
Heap 14
generation 0 has 8 objects (7ef79c79e070->7ef79c79e0b0)
generation 1 has 322 objects (7ef79c79d660->7ef79c79e070)
generation 2 has 0 objects (7ef79c79d660->7ef79c79d660)
Ready for finalization 25,489 objects (7ef79c79e9b8->7ef79c7d0640)
------------------------------
Heap 15
generation 0 has 7 objects (7ef6802dadd0->7ef6802dae08)
generation 1 has 508 objects (7ef6802d9df0->7ef6802dadd0)
generation 2 has 0 objects (7ef6802d9df0->7ef6802d9df0)
Ready for finalization 404,766 objects (7ef6802db588->7ef6805f1e78)
------------------------------
Statistics for all finalizable objects (including all objects ready for finalization):
Statistics:
          MT   Count  TotalSize Class Name

[...]

7eff8de90098     538     17,216 Microsoft.Win32.SafeHandles.SafeEvpMdCtxHandle
7eff917fa6f8      70     18,480 Microsoft.Data.SqlClient.SNI.SNISslStream
7eff90117f28     355     19,880 System.Runtime.CompilerServices.ConditionalWeakTable<System.Text.Json.JsonSerializerOptions, System.Object>+Container
7eff9433c1f0     361     20,216 System.IO.Compression.Deflater
7eff8c2c7b88     316     20,224 System.Net.Sockets.SafeSocketHandle
7eff8d8ce2c8     663     21,216 Microsoft.Win32.SafeHandles.SafeX509Handle
7eff8d830920   1,255     30,120 System.Threading.TimerHolder
7eff9433efe8     394     31,520 System.IO.Compression.ZLibNative+ZLibStreamHandle
7eff8c2c5228     316     37,920 System.Net.Sockets.Socket
7eff927f5370     180     44,640 Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver
7eff8d859958     223     62,440 System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs
7eff9146acb8     389     80,912 Microsoft.Data.SqlClient.SqlConnection
7eff9149f348     980    290,080 Microsoft.Data.SqlClient.SqlCommand
7eff89253d70   4,192    301,824 System.Threading.Thread
7eff90b81468 247,708  5,944,992 System.Reflection.Emit.DynamicResolver+DestroyScout
7eff8ad24450 199,413 14,357,736 System..Reflection.Emit.DynamicResolver
Total 460,012 objects, 21,441,768 bytes
> dumpheap -stat

[...]

7eff8acfc688   207,010   9,936,480 System.Reflection.Emit.DynamicMethod+RTDynamicMethod
7eff8ad200a8   207,010   9,936,480 System.Reflection.Emit.ScopeTree
7eff8acfcf68   208,325   9,999,600 System.Reflection.Emit.SignatureHelper
7eff8919ec38   605,088  14,522,112 System.RuntimeTypeHandle
7eff8ad24450   207,001  14,904,072 System.Reflection.Emit.DynamicResolver
7eff89817a20   212,441  20,394,336 System.RuntimeMethodInfoStub
7eff8acfbda0   207,010  23,185,120 System.Reflection.Emit.DynamicMethod
7eff8912a508   358,123  25,959,536 System.Object[]
7eff8acfb6e8   207,010  31,465,520 System.Reflection.Emit.DynamicILGenerator
7eff89251038 1,241,746 120,095,736 System.String
7eff89773338 1,070,259 136,469,834 System.Byte[]
7eff892b7aa8    90,512 260,066,802 System.Char[]
55fb5eb4e900    55,769 364,019,016 Free
Total 10,141,843 objects, 1,321,079,065 bytes
[System.Runtime]
    % Time in GC since last GC (%)                                         5    
    Allocation Rate (B / 1 sec)                                       6.2155e+08   
    CPU Usage (%)                                                         57.81 
    Exception Count (Count / 1 sec)                                       94    
    GC Committed Bytes (MB)                                            1,675.366
    GC Fragmentation (%)                                                  44.356
    GC Heap Size (MB)                                                    637.815
    Gen 0 GC Count (Count / 1 sec)                                         2    
    Gen 0 Size (B)                                                 4,236,872    
    Gen 1 GC Count (Count / 1 sec)                                         1    
    Gen 1 Size (B)                                                11,126,616    
    Gen 2 GC Count (Count / 1 sec)                                         0    
    Gen 2 Size (B)                                                    5.2245e+08   
    IL Bytes Jitted (B)                                               3.1923e+08   
    LOH Size (B)                                                      5.7157e+08   
    Monitor Lock Contention Count (Count / 1 sec)                      1,212    
    Number of Active Timers                                               46    
    Number of Assemblies Loaded                                          252    
    Number of Methods Jitted                                       9,647,443    
    POH (Pinned Object Heap) Size (B)                              4,465,760    
    ThreadPool Completed Work Item Count (Count / 1 sec)              23,982    
Author: vtortola
Assignees: -
Labels:

tenet-performance, area-GC-coreclr, untriaged, needs-area-label

Milestone: -

@vtortola
Copy link
Author

vtortola commented Sep 13, 2023

What is the full stacktrace from k command? Can you find the handle of the method being finalized from it?

I am using dotnet-dump to analyze the memory dump, can that be found out with it?

The most likely explanation of the symptoms is that each request produces one or more DynamicMethods and the finalizer thread is not able to clean them up fast enough.

Note this happens due a traffic spike that lasts just a few minutes. After that most of servers memory levels go back to normal when traffic volume gets back to normal, it is just a few of them that memory keeps growing and latency is getting worse despite of being receiving the same traffic than the rest of servers.

Try to find out what is producing these dynamic methods. The dynamic methods are typically meant to be cached, and it is not unusual to a have a bug where it is not happening.

Tracking those references using gcroot it seems to be mostly Entity Framework and System.Text.Json. We do not do much reflection by ourselves.

The problem can be magnified by enable tracing. Tracing makes the cleanup of DynamicMethods more expensive and so it makes it more likely that the finalizer thread won't be able to keep up with high DynamicMethod churn.

We do not have tracing enabled.

@jkotas
Copy link
Member

jkotas commented Sep 13, 2023

I am using dotnet-dump to analyze the memory dump, can that be found out with it?

dotnet-dump is not able to printnative stacktraces. The dump has to be opened in native debugger for that. https://learn.microsoft.com/en-us/dotnet/core/diagnostics/debug-linux-dumps or https://learn.microsoft.com/en-us/troubleshoot/developer/webapps/aspnetcore/practice-troubleshoot-linux/lab-1-2-analyze-core-dumps-lldb-debugger have instructions for how to do that.

it seems to be mostly Entity Framework and System.Text.Json. We do not do much reflection by ourselves.

The way you are calling into Entity Framework and System.Text.Json may be creating too many dynamic methods that leads to the problem. For example, are you reusing JsonSerializationOptions instances - https://learn.microsoft.com/dotnet/standard/serialization/system-text-json/configure-options#reuse-jsonserializeroptions-instances ?

Ideally, you should not see any dynamic methods created and destroyed per request in steady state.

We do not have tracing enabled.

The monitoring software may be enabling tracing for you. How are you producing the graph with GC utilization you have shared above? It is mostly likely done via tracing.

@ghost
Copy link

ghost commented Sep 14, 2023

This issue has been marked needs-author-action and may be missing some important information.

@vtortola
Copy link
Author

vtortola commented Sep 15, 2023

Thank you. I will try getting than information from the dump.

Yes, we cache the JsonSerializationOptions.

We get that GC metric ftom Event Counters.

@vtortola
Copy link
Author

vtortola commented Sep 17, 2023

I have LLDB installed now. I can see the backtrace of the thread in question:

(lldb) setthread 22

(lldb) clrstack
OS Thread Id: 0x1b (22)
        Child SP               IP Call Site
00007F71C8784930 00007f71cda917b2 [InlinedCallFrame: 00007f71c8784930] 
00007F71C8784930 00007f714dfa6957 [InlinedCallFrame: 00007f71c8784930] 
00007F71C8784920 00007F714DFA6957 System.RuntimeMethodHandle.Destroy(System.RuntimeMethodHandleInternal)
00007F71C87849C0 00007F715748F237 System.Reflection.Emit.DynamicResolver+DestroyScout.Finalize() [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicILGenerator.cs @ 669]
00007F71C8784D30 00007f71cd254af6 [DebuggerU2MCatchHandlerFrame: 00007f71c8784d30] 

(lldb) thread backtrace
* thread dotnet/runtime#22, stop reason = signal 0
  * frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
    frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
    frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
    frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
    frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
    frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
    frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
    frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
    frame dotnet/runtime#8: 0x00007f71cd38cf3f libcoreclr.so`EEJitManager::Unload(this=0x00007f2ec80d9340, pAllocator=0x00007f71cd473eb8) at codeman.cpp:3600:21
    frame dotnet/runtime#9: 0x00007f71ccf9e22b libcoreclr.so`LCGMethodResolver::Destroy(this=0x00007f7168112010) at dynamicmethod.cpp:1022:31
    frame dotnet/runtime#10: 0x00007f71ccf9dff7 libcoreclr.so`DynamicMethodDesc::Destroy(this=0x00007f71680ff738) at dynamicmethod.cpp:893:29
    frame dotnet/runtime#11: 0x00007f71cd154206 libcoreclr.so`RuntimeMethodHandle::GetMethodBody(pMethodUNSAFE=<unavailable>, pDeclaringTypeUNSAFE=<unavailable>) at runtimehandles.cpp:2370:9
    frame dotnet/runtime#12: 0x00007f714dfa6964
    frame dotnet/runtime#13: 0x00007f715748f237
    frame dotnet/runtime#14: 0x00007f71cd254af6 libcoreclr.so`VarargPInvokeGenILStub at unixasmmacrosamd64.inc:871
    frame dotnet/runtime#15: 0x00007f71cd00163c libcoreclr.so`MethodTable::CallFinalizer(Object*) at methodtable.cpp:4050:5
    frame dotnet/runtime#16: 0x00007f71cd0015dc libcoreclr.so`MethodTable::CallFinalizer(obj=0x00007f715748f200) at methodtable.cpp:4168
    frame dotnet/runtime#17: 0x00007f71cd0c9f32 libcoreclr.so`ScanTailCallArgBufferRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) [inlined] GCRefMapDecoder::GetBit(this=<unavailable>) at gcrefmap.h:170:30
    frame dotnet/runtime#18: 0x00007f71cd0c9f1e libcoreclr.so`ScanTailCallArgBufferRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) [inlined] GCRefMapDecoder::GetInt() at gcrefmap.h:190
    frame dotnet/runtime#19: 0x00007f71cd0c9ecb libcoreclr.so`ScanTailCallArgBufferRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) at gcrefmap.h:231
    frame dotnet/runtime#20: 0x00007f71cd0c9e74 libcoreclr.so`ScanTailCallArgBufferRoots(pThread=<unavailable>, fn=(libcoreclr.so`g_szFailFastBuffer + 440), sc=0x00007f71cd475a10)(Object**, ScanContext*, unsigned int), ScanContext*) at gcenv.ee.cpp:210
    frame dotnet/runtime#21: 0x00007f71cd0ca155 libcoreclr.so`GCToEEInterface::SyncBlockCacheDemote(max_gen=-850937008) at gcenv.ee.cpp:371:42
    frame dotnet/runtime#22: 0x00007f71cd04c39a libcoreclr.so at threads.cpp:0
    frame dotnet/runtime#23: 0x00007f71cd04ca3d libcoreclr.so`NativeExceptionHolder<ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(PAL_SEHException&)>::InvokeFilter(PAL_SEHException&) [inlined] ManagedThreadBase_DispatchOuter(this=0x0000000000000001, ex=0x00007f71cd47b960)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(PAL_SEHException&)::operator()(PAL_SEHException&) const at threads.cpp:7502:9
    frame dotnet/runtime#24: 0x00007f71cd04c9f2 libcoreclr.so`NativeExceptionHolder<ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(PAL_SEHException&)>::InvokeFilter(this=<unavailable>, ex=0x00007f71cd47b960) at pal.h:4902
    frame dotnet/runtime#25: 0x00007f71cd0ca3f8 libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() at profilepriv.h:193:37
    frame dotnet/runtime#26: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] void ProfControlBlock::IterateProfilers<void (*)(ProfilerInfo*, int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*), int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*>(this=0x00007f71c8784e00, callbackType=ActiveOrInitializing)(ProfilerInfo*, int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*), int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*, int*) at profilepriv.h:207
    frame dotnet/runtime#27: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] int ProfControlBlock::DoProfilerCallback<int (*)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*>(this=0x00007f71c8784e00, callbackType=ActiveOrInitializing)(ProfilerInfo*), int (*)(EEToProfInterfaceImpl*, int*), int*) at profilepriv.h:295
    frame dotnet/runtime#28: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] int AnyProfilerPassesCondition<int (*)(ProfilerInfo*)>(int (*)(ProfilerInfo*)) at profilepriv.inl:101
    frame dotnet/runtime#29: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] ProfControlBlock::IsCallback5Supported(this=0x00007f71c8784e00) at profilepriv.inl:263
    frame dotnet/runtime#30: 0x00007f71cd0ca3eb libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() [inlined] CORProfilerTrackConditionalWeakTableElements() at profilepriv.inl:1980
    frame dotnet/runtime#31: 0x00007f71cd0ca3cc libcoreclr.so`ProfilerShouldTrackConditionalWeakTableElements() at gcenv.ee.cpp:548
    frame dotnet/runtime#32: 0x00007f71cd3e1cee libcoreclr.so`::ExitThread(dwExitCode=<unavailable>) at thread.cpp:826:5
    frame dotnet/runtime#33: 0x00007f71cda8aea7 libpthread.so.0`start_thread + 215
    frame dotnet/runtime#34: 0x00007f71cd679a2f libc.so.6`__clone + 63

Unfortunately I only have this dump. I will force the situation again in the next few days and take two of them.

From the CLR thread stacks, I see 108 threads which top stackframes are :

System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
System.Reflection.Emit.DynamicMethod.CreateDelegate(System.Type, System.Object) [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicMethod.cs @ 370]
System.Linq.Expressions.Compiler.LambdaCompiler.Compile(System.Linq.Expressions.LambdaExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/Compiler/LambdaCompiler.cs @ 190]
00007F3A4EFFA620 00007F7D2FC45A31 System.Linq.Expressions.Expression`1[[System.__Canon, System.Private.CoreLib]].Compile() [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/LambdaExpression.cs @ 221]
Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.GetValue(System.Linq.Expressions.Expression, System.String ByRef)

When comparing with a healthy application dump, only one thread was using CreateDelegate.
If that CompileMethod is being slow for some reason may explain why the whole application becomes slow.

Checking some of those thread's backtrace I see they are also blocked on pthread_cond_wait:

(lldb) thread backtrace
* thread dotnet/runtime#76, stop reason = signal 0
  * frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
    frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
    frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
    frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
    frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
    frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
    frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
    frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
    frame dotnet/runtime#8: 0x00007f71cd38dcc3 libcoreclr.so`EEJitManager::LazyGetFunctionEntry(this=<unavailable>, pCodeInfo=<unavailable>) at codeman.cpp:4279:82
(lldb) thread backtrace
* thread dotnet/runtime#77, stop reason = signal 0
  * frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
    frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
    frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
    frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
    frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
    frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
    frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
    frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
    frame dotnet/runtime#8: 0x00007f71cd38dcc3 libcoreclr.so`EEJitManager::LazyGetFunctionEntry(this=<unavailable>, pCodeInfo=<unavailable>) at codeman.cpp:4279:82
(lldb) thread backtrace
* thread dotnet/runtime#78, stop reason = signal 0
  * frame #0: 0x00007f71cda917b2 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 482
    frame dotnet/runtime#1: 0x00007f71cd3d0481 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] void VolatileStore<PalCsInitState>(pt=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:271:23
    frame dotnet/runtime#2: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::Store(this=0x00005617b1779048) at volatile.h:393
    frame dotnet/runtime#3: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] Volatile<PalCsInitState>::operator=(this=0x00005617b1779048, val=PalCsFullyInitialized) at volatile.h:434
    frame dotnet/runtime#4: 0x00007f71cd3d047b libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) at cs.cpp:1117
    frame dotnet/runtime#5: 0x00007f71cd3d0436 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) [inlined] CorUnix::PALCS_WaitOnCS(pPalCriticalSection=0x00005617b1779028, lInc=<unavailable>) at cs.cpp:1159
    frame dotnet/runtime#6: 0x00007f71cd3d0380 libcoreclr.so`CorUnix::InternalEnterCriticalSection(pThread=<unavailable>, pCriticalSection=0x00005617b1779028) at cs.cpp:802
    frame dotnet/runtime#7: 0x00007f71ccf961c4 libcoreclr.so`CrstBase::Enter(this=0x00005617b1779028) at crst.cpp:322:5
    frame dotnet/runtime#8: 0x00007f71cd38c70c libcoreclr.so`EEJitManager::allocEHInfoRaw(_hpCodeHdr*, unsigned int, unsigned long*) [inlined] MethodDesc::GetClassification(this=0x00007f715d8e7f90) const at method.hpp:1726:17
    frame dotnet/runtime#9: 0x00007f71cd38c70b libcoreclr.so`EEJitManager::allocEHInfoRaw(_hpCodeHdr*, unsigned int, unsigned long*) [inlined] MethodDesc::IsLCGMethod(this=0x00007f715d8e7f90) at method.inl:99
    frame dotnet/runtime#10: 0x00007f71cd38c70b libcoreclr.so`EEJitManager::allocEHInfoRaw(this=<unavailable>, pCodeHeader=<unavailable>, blockSize=2977402920, pAllocationSize=0x00007f2f967f6518) at codeman.cpp:3265
[...]

@jkotas
Copy link
Member

jkotas commented Sep 18, 2023

I see 108 threads which top stackframes are :

System.Linq.Expressions.Expression`1[[System.__Canon, System.Private.CoreLib]].Compile() [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/LambdaExpression.cs @ 221]
Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.GetValue(System.Linq.Expressions.Expression, System.String ByRef)

This is the problem. The EFCore is generating a dynamic method per requiests that is overwhelming the system.

It should be fixed in current EF version by #29815 . cc @roji

@jkotas jkotas transferred this issue from dotnet/runtime Sep 18, 2023
@vtortola
Copy link
Author

vtortola commented Sep 18, 2023

Thank you @jkotas

For the record we use Entity Framework 7.0.10 on .NET7. It was also happening with Entity Framework 6.0.6 on .NET7.

Our application usually reads a single record from the database using FindAsync, and loads related entities explicitly depending of the usage.

We load related entities both with Where and without:

  • await Entry(entity).Collection(u => u.Related).Query().LoadAsync(cancel);
  • await Entry(entity).Collection(u => u.Related).Query().Where(x => x.Member == "member").LoadAsync(cancel);

Although I can see stack traces using CreateDelegate also when calling FindAsync:

00007F3B1EFF9408 00007f7da80147b2 [InlinedCallFrame: 00007f3b1eff9408] System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
00007F3B1EFF9408 00007f7d2fc3b8c3 [InlinedCallFrame: 00007f3b1eff9408] System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
00007F3B1EFF9400 00007F7D2FC3B8C3 System.Reflection.Emit.DynamicMethod.CreateDelegate(System.Type, System.Object) [/_/src/coreclr/System.Private.CoreLib/src/System/Reflection/Emit/DynamicMethod.cs @ 370]
00007F3B1EFF94A0 00007F7D2FC45B8B System.Linq.Expressions.Compiler.LambdaCompiler.Compile(System.Linq.Expressions.LambdaExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/Compiler/LambdaCompiler.cs @ 190]
00007F3B1EFF94D0 00007F7D2FC45A31 System.Linq.Expressions.Expression`1[[System.__Canon, System.Private.CoreLib]].Compile() [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/LambdaExpression.cs @ 221]
00007F3B1EFF94F0 00007F7D3688F156 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.GetValue(System.Linq.Expressions.Expression, System.String ByRef)
00007F3B1EFF9560 00007F7D36899ED5 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Evaluate(System.Linq.Expressions.Expression, Boolean) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 288]
00007F3B1EFF95A0 00007F7D35678B83 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 105]
00007F3B1EFF95D0 00007F7D2FC4AFE6 System.Linq.Expressions.ExpressionVisitor.VisitBinary(System.Linq.Expressions.BinaryExpression)
00007F3B1EFF9610 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF9640 00007F7D35679673 System.Linq.Expressions.ExpressionVisitor.VisitLambda[[System.__Canon, System.Private.CoreLib]](System.Linq.Expressions.Expression`1<System.__Canon>) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/ExpressionVisitor.cs @ 346]
00007F3B1EFF9670 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF96A0 00007F7D3566CC99 System.Linq.Expressions.ExpressionVisitor.VisitUnary(System.Linq.Expressions.UnaryExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/ExpressionVisitor.cs @ 540]
00007F3B1EFF96C0 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF96F0 00007F7D319DFF4E System.Dynamic.Utils.ExpressionVisitorUtils.VisitArguments(System.Linq.Expressions.ExpressionVisitor, System.Linq.Expressions.IArgumentProvider) [/_/src/libraries/System.Linq.Expressions/src/System/Dynamic/Utils/ExpressionVisitorUtils.cs @ 66]
00007F3B1EFF9740 00007F7D319E1303 System.Linq.Expressions.ExpressionVisitor.VisitMethodCall(System.Linq.Expressions.MethodCallExpression) [/_/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/ExpressionVisitor.cs @ 406]
00007F3B1EFF9780 00007F7D35678B9D Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.Visit(System.Linq.Expressions.Expression) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 108]
00007F3B1EFF97B0 00007F7D368AB1C6 Microsoft.EntityFrameworkCore.Query.Internal.ParameterExtractingExpressionVisitor.ExtractParameters(System.Linq.Expressions.Expression, Boolean) [/_/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs @ 75]
00007F3B1EFF9800 00007F7D368AAE98 Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.ExecuteAsync[[System.__Canon, System.Private.CoreLib]](System.Linq.Expressions.Expression, System.Threading.CancellationToken) [/_/src/EFCore/Query/Internal/QueryCompiler.cs @ 109]
00007F3B1EFF9860 00007F7D369381CA Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ExecuteAsync[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Reflection.MethodInfo, System.Linq.IQueryable`1<System.__Canon>, System.Linq.Expressions.Expression, System.Threading.CancellationToken) [/_/src/EFCore/Extensions/EntityFrameworkQueryableExtensions.cs @ 3081]
00007F3B1EFF98E0 00007F7D37A49E21 Microsoft.EntityFrameworkCore.Internal.EntityFinder`1[[System.__Canon, System.Private.CoreLib]].FindAsync(System.Object[], System.Threading.CancellationToken) [/_/src/EFCore/Internal/EntityFinder.cs @ 88]
00007F3B1EFF9960 00007F7D37A43D30 Microsoft.EntityFrameworkCore.Internal.InternalDbSet`1[[System.__Canon, System.Private.CoreLib]].FindAsync(System.Object[]) [/_/src/EFCore/Internal/InternalDbSet.cs @ 170]
[...]

The application were these stackframes were taken was running for days without problem until due a sudden traffic spike that made the CPU hit 85%, some servers included this one started to have memory issues and sluggishness.

@roji
Copy link
Member

roji commented Sep 18, 2023

@jkotas thanks for digging into this... We should backport that specific fix to 7.0 and 6.0.

@vtortola
Copy link
Author

@roji in which version is this fixed?

@roji
Copy link
Member

roji commented Sep 18, 2023

@vtortola 8.0.0-rc.1 no longer has this problem - can you please give that a try and confirm that it resolves the problem for you? Note that 8.0 (non-preview/rc) will be out in November.

@vtortola
Copy link
Author

@roji I am sorry I cannot deploy our application as .NET8 in our production system, that is where the problem happens. I am afraid I will need to wait till November. If you backport it to EF7 in .NET7 let me know and I can give it a try.

Thanks.

@roji
Copy link
Member

roji commented Sep 18, 2023

@vtortola thanks. I'll submit a PR for a patching 6 and 7, keep your eyes on this issue to be updated on progress.

@vtortola
Copy link
Author

@jkotas although maybe it is true that Entity Framework is calling too much to System.Reflection.Emit.DynamicMethod.CreateDelegate and @roji 's fix will help, it still does not explain why the application does not recover from the traffic spike even when traffic goes back to normal levels, and gets worse as time goes by. It is like whatever is doing System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod (called from request processing) in common with System.RuntimeMethodHandle.Destroy (called by the finalizer thread) gets somehow deteriorated and becomes more and more sluggish.

@jkotas
Copy link
Member

jkotas commented Sep 19, 2023

Yes, I agree that we do not a full explanation of the behavior. System.RuntimeMethodHandle.Destroy can be slower when there is a lot of active DynamicMethods.

@roji
Copy link
Member

roji commented Sep 20, 2023

FYI #31784 has been approved for patching for 6.0 and 7.0.

@vtortola
Copy link
Author

Any idea in which 7.0 version will it be released?

@ajcvickers
Copy link
Member

Looks like it should be in 7.0.12.

@ajcvickers ajcvickers added this to the 6.0.23 milestone Sep 20, 2023
@ajcvickers ajcvickers added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Sep 20, 2023
@stevendarby
Copy link
Contributor

Looks like it should be in 7.0.12.

Is this still the case? Could only find the 6.0 PR

@ajcvickers
Copy link
Member

@stevendarby It will get merged from 6 into 7.

@vtortola
Copy link
Author

vtortola commented Oct 10, 2023

@ajcvickers @roji I see 7.0.12 is out, can you please confirm this fix in it? thanks!

@ajcvickers
Copy link
Member

@vtortola All non-security fixes got pulled from 7.0.12, so it will be in 7.0.13 instead.

@vtortola
Copy link
Author

Alright, we will wait for 7.0.13, many thanks!

@ajcvickers ajcvickers modified the milestones: 6.0.23, 6.0.24 Oct 17, 2023
@vtortola
Copy link
Author

vtortola commented Nov 1, 2023

@ajcvickers @roji hi again! I see 7.0.13 is out, can you please confirm this fix in it? thanks!

@vtortola
Copy link
Author

vtortola commented Nov 2, 2023

I can see the commit 28c0abe in v7.0.13...main 🎊 I will let you know how it goes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-perf area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported Servicing-approved type-enhancement
Projects
None yet
Development

No branches or pull requests

5 participants