-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dotnet core 3.1 __pv_queued_spin_lock_slowpath #34678
Comments
this is my flamegraph |
[root@VM_16_46_centos ~]# dotnet --info Runtime Environment: Host (useful for support): .NET Core SDKs installed: .NET Core runtimes installed: To install additional .NET Core runtimes or SDKs: |
@egmkang the Can you switch the view of the profiler samples to view inclusive times? That might help to highlight the actual functions in the runtime that are related to the problematic calls to the mutex functions? |
@janvorli i'm sorry, english is not my mother language. can you tell me how to switch the view of the profiler samples to view inclusice times? is it |
@egmkang I am sorry for a delayed response. I am actually not that familiar with the perf tool details, but I assume you've printed the second image in this issue using the "perf report" command, correct? From the doc it seems that from perf tool version 3.16 on, unless you've passed |
@janvorli i have some tests, and found out, only Intel CPU can reproduce this problem!!! CPU: Intel and AMD
Intel CPU in both mode, spin lock cost more time; and AMD CPU, only WKS mode cost time allocating object. |
processor : 31 |
processor : 31 |
this is all flamegraph svg files |
and on Intel CPU's Cloud VM, only support few players online; AMD's, WKS can support 4000+ players, only the frame rate is unstable, SVR works fine as well as windows. |
For the WKS GC, more locking is expected compared to SVR. But I'll defer further help here to the dotnet perf folks. @dotnet/dotnet-perf, could someone please help @egmkang to get better. @Maoni0 - this may be of an interest to you too - there seems to be a substantial difference in the time spent on locks for GC (between AMD and Intel CPUs with the same app) and also substantial delta between the WKS / SVR differences on AMD and Intel. |
Interesting that this only happens on Intel CPUs. Unfortunately, it looks like the stack is broken, and so we can't tell from the flame graph what's actually calling
Either of these options is reasonable to try, but I suspect you will have faster results with option 2. You'll just want to make sure that you're using Here are some instructions on how to install |
the 32 Core Intel VM has been recycled. another 8 core Intel 8255C vm looks normal. i tested several GameServer versions(Span of a month), they are normal too....................... |
i revert codes to date 2020-04-08, which version had a great lot of New/NewArray operations. and all test in one cloud vm, Intel CPU. and i found:
|
|
use |
OS: CentOS 7.6 64bit
dotnet core: 3.1.201
hardware: Intel Xeon Cascade Lake 8255C(2.5 GHz) 32 Core 64G Cloud VM
my app is a MMOG GameServer, there are about 5000 input message per second and about 100000 output message per second.
i use windows server 2012 R2 to profile and optimize. and Windows runs very well.
but on CentOS 7.6, it's bad. about half of the process time, it's runs
__pv_queued_spin_lock_slowpath
. and i don't know who calls__pv_queued_spin_lock_slowpath
and why. it seems like not my logic code calling__pv_queued_spin_lock_slowpath
.what the difference between Windows and Linux strategy? how can i solve this problem?
thanks a lot
The text was updated successfully, but these errors were encountered: