Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant performance drop with Server GC #5506

Closed
syalachigere opened this issue Apr 6, 2016 · 24 comments
Closed

Significant performance drop with Server GC #5506

syalachigere opened this issue Apr 6, 2016 · 24 comments

Comments

@syalachigere
Copy link

There is a significant performance drop of up to 80% for workloads built with .Net CLI. We used TechEmpower plaintext as well as our internal workload to verify. Initial analysis indicates Server GC to be an issue as performance is as expected when choosing Workstation GC.

We also observed that for Workstation GC, n number of GC threads were created where n = total logical processors. For Server GC, there was only one gc thread created. As per design documents of CoreCLR, it should have been equal to total logical processors.

We have checked performance drop on two configurations. Configuration-1 has 36 logical processors and Configuration-2 has 24 logical processors. This performance drop is significant when running on a system with many logical threads ( >20).

Used complus_gcserver environment variable to enable\disable Server GC and Workstation GC.

Runtime Configuration

Runtime: Microsoft.NETCore.Runtime.CoreCLR: 1.0.2-rc2-23826
.NET Command Line Tools (1.0.0-beta-001808)
Product Information
Version:     1.0.0-beta-001808
Commit Sha:  5c8a3e22fe

Runtime Environment:
OS Name:     Windows
OS Version:  6.3.9600
OS Platform: Windows
Runtime Id:  win81-x64

Configuration 1

App server
Intel® Xeon® Processor E5 2699 v3 (HSW-EP)
@2.3 GHz
18 cores per Socket
Total Logical Cores: 36
Hyper Threading: OFF
NUMA: Enabled
Windows Server 2012 R2

Load server
Intel® Xeon® Processor E5 2699 v3(HSW-EP)
@2.3 GHz
18 cores per Socket
Total Logical Cores: 36
Hyper Threading: OFF
NUMA: Enabled
Ubuntu Server 14.04;  wrk version 4.0.0

Connected by 10GB switch

Configuration 2

App server
Intel® Xeon® Processor E5 2695 v2 (IVT-EP)
@2.35 GHz
12 cores per Socket
Total Logical Cores: 24
Hyper Threading: OFF
NUMA: Enabled
Windows Server 2012 R2

Load server
Intel® Xeon® Processor E5 2699 v3(HSW-EP)
@2.3 GHz
18 cores per Socket
Total Logical Cores: 36
Hyper Threading: OFF
NUMA: Enabled
Ubuntu Server 14.04;  wrk version 4.0.0

Connected by 10GB switch
@swgillespie
Copy link
Contributor

cc @Maoni0 @sergiy-k

Thanks for the report!

We also observed that for Workstation GC, n number of GC threads were created where n = total logical processors. For Server GC, there was only one gc thread created. As per design documents of CoreCLR, it should have been equal to total logical processes.

This is curious but unfortunately I can't repro this on my machine. When using Workstation GC on a program with five threads and a machine with 8 procs, I see 12 threads (1 BGC thread + 5 user threads + 1 finalizer thread + 3 ntdll threads + 2 debugger-related threads), and with server GC, I see 26 threads (8 BGC threads + 8 Server GC threads + 5 user threads + 1 finalizer thread + 3 ntdll threads + debugger-related threads).

Do you mind attaching the thread stack traces of when you observe Workstation GC spawning many GC threads and Server GC not doing so? As you say, server GC should be creating as many GC threads as there are procs.

@syalachigere
Copy link
Author

We used WPA to get details about threads count. CPU Usage sampled "Utilization by process and Thread” gave us list of threads. Then we matched thread Ids to GCJoin-events threads ids.

Will get thread stack trace as soon as possible.

@syalachigere
Copy link
Author

@swgillespie Can you try this on a server with more than 20 logical processors? we have seen this issue only on systems with more than 20 logical processors.

@swgillespie
Copy link
Contributor

@syalachigere Just tried this on a machine with 48 logical processors, and I see 9 threads under workstation GC and 56 threads with server GC. This is by setting server GC and workstation GC using corerun as a host.

I'm going to try and run the TechEmpower benchmark + dotnet CLI on the 48-proc machine to see if this repros - it's possible some wires are getting crossed while trying to set the GC flavor in the CLI.

@syalachigere
Copy link
Author

@swgillespie I did not run using corerun as a host. We build and published with .Net CLI tool. Then ran the published executable to measure the performance.
Default mode is Server GC and we saw low performance. We then observed expected performance with Workstation GC (set complus_gcserver=false)

@sergiy-k
Copy link
Contributor

sergiy-k commented Apr 6, 2016

@syalachigere, do you have NUMA enabled on your machine?

@syalachigere
Copy link
Author

Yes. NUMA was enabled.

@swgillespie
Copy link
Contributor

I figure there are several possible issues here:

  1. that setting the "System.GC.Server" key in the project.json does not result in Server GC being enabled in the target process,
  2. that workstation GC results in server GC-like threads being spawned per-proc and server GC behaves like workstation GC,
  3. that server GC regresses in performance significantly over workstation GC.

I can reproduce (1) by running the techempower benchmark, which has the "System.GC.Server" key set to "true" but is not starting with server GC:
capture

To reproduce (2), I looked in the debugger and saw typical Workstation GC behavior when in Workstation GC and the same for Server GC - I didn't see any abnormal thread-spawning in Workstation GC or Server GC using a single thread. I'll look at this more in tandem with (1) since they may be related.

To reproduce (3), I did a 1 minute test of the server with both server GC and workstation GC set and got the following data:

Workstation GC

Running 1m test @ http://<ip>:5000/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.31ms   14.54ms 382.24ms   95.85%
    Req/Sec    18.75k     3.05k   42.01k    73.64%
  35676890 requests in 1.00m, 4.39GB read
Requests/sec: 593646.84
Transfer/sec:     74.73MB

Server GC

Running 1m test @ http://<ip>:5000/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.72ms   23.16ms 464.25ms   94.47%
    Req/Sec    23.13k     3.82k   50.13k    78.62%
  43755410 requests in 1.00m, 5.38GB read
Requests/sec: 728055.18
Transfer/sec:     91.65MB

Your load server is much more powerful than mine, so it's possible that my server isn't hammering asp.net as hard as your load server. I'm also not sure whether or not this particular benchmark spawns a bunch of threads, I'll have to double-check. I have another workload that I can run that can spawn arbitrarily many threads that I can use to try and repro this.

@sergiy-k
Copy link
Contributor

sergiy-k commented Apr 7, 2016

@syalachigere, may please ask you to run your test with NUMA disabled and see if it will have any effect on the results. I just want to make sure this is not the same as https://github.com/dotnet/coreclr/issues/3380.

You also seem to be using an old build of dotnet/cli command line tools and CoreCLR runtime (more than a month old). I could be wrong but I don't think that dotnet v1.0.0-beta-001808 (or later) enables Server GC by default (you need to do "set COREHOST_SERVER_GC=1" to make the dotnet host to pass the Server GC option to the runtime). In any case, it would be great to get thread stack traces (like @swgillespie suggested) to be 100% sure. Please also note that the COREHOST_SERVER_GC environment variable is no longer supported on recent builds of dotnet tools. Or maybe you are still using DNX (which indeed enables Server GC by default on Windows)?

@swgillespie, what version of dotnet/cli do you have installed? Propagation of the runtime options (like System.GC.Server) from project.json into runtimeconfig.json has been implemented only 2 days ago (dotnet/cli#2227).

@swgillespie
Copy link
Contributor

@sergiy-k This is the latest published nightly from the CLI and built-from-the-tip CoreCLR. I'll double-check to see if it made it into the build I was using.

@syalachigere
Copy link
Author

@swgillespie @sergiy-k

"System.GC.Server" setting did not work for us either. We are using complus_gcserver. Did the following steps to reproduce the issue with the latest .Net CLI.

dotnet.exe HelloWeb.dll
Could not load host policy library [HelloWeb.dll]
  • So, started the web app from the project folder
C:\Projects\cli-samples-master\HelloWeb>\Tools\dotnet\dotnet.exe run  -c Release
Project HelloWeb (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
Hosting environment: Production
Content root path: C:\Projects\cli-samples-master\HelloWeb
Now listening on: http://*:8080
Application started. Press Ctrl+C to shut down.

With NUMA enabled

  • Run wrk, without setting any GC mode
wrk -c256 -t32 -d5 http://10.20.30.10:8080
Running 5s test @ http://10.20.30.10:8080
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.07ms   12.90ms 171.13ms   98.10%
    Req/Sec     6.38k     1.96k   13.16k    81.87%
  955078 requests in 5.10s, 112.03MB read
  Socket errors: connect 0, read 0, write 400, timeout 0
Requests/sec: 187289.84
Transfer/sec:     21.97MB

  • Run wrk with Workstation GC modecomplus_gcserver=false
wrk -c256 -t32 -d5 http://10.20.30.10:8080
Running 5s test @ http://10.20.30.10:8080
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.92ms    5.80ms 167.98ms   95.93%
    Req/Sec     7.74k     1.24k   10.88k    59.01%
  1256585 requests in 5.10s, 147.40MB read
Requests/sec: 246405.40
Transfer/sec:     28.90MB

  • Run wrk with Server GC mode complus_gcserver=true.
wosuser@wos-ubuntu1:~/wrk$ wrk -c256 -t32 -d5 http://10.20.30.10:8080
Running 5s test @ http://10.20.30.10:8080
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.18ms   13.74ms 179.41ms   98.14%
    Req/Sec     6.58k     2.07k   12.86k    78.84%
  985561 requests in 5.10s, 115.61MB read
  Socket errors: connect 0, read 0, write 800, timeout 0
Requests/sec: 193235.21
Transfer/sec:     22.67MB

There is a performance drop in Server GC mode.

With NUMA disabled

  • Run wrk, without setting any GC mode
wrk -c256 -t32 -d5 http://10.20.30.10:8080
Running 5s test @ http://10.20.30.10:8080
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.40ms   13.58ms 183.04ms   98.12%
    Req/Sec     5.75k     1.64k   13.03k    83.01%
  861575 requests in 5.10s, 101.06MB read
  Socket errors: connect 0, read 1, write 399, timeout 0
Requests/sec: 168945.42
Transfer/sec:     19.82MB
  • Run wrk with Workstation GC modecomplus_gcserver=false
 wrk -c256 -t32 -d5 http://10.20.30.10:8080
Running 5s test @ http://10.20.30.10:8080
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.07ms   13.73ms 173.07ms   97.20%
    Req/Sec     8.78k     2.23k   16.55k    78.87%
  1323767 requests in 5.10s, 155.28MB read
  Socket errors: connect 0, read 1, write 399, timeout 0
Requests/sec: 259575.59
Transfer/sec:     30.45MB
  • Run wrk with Server GC mode complus_gcserver=true.
 wrk -c256 -t32 -d5 http://10.20.30.10:8080
Running 5s test @ http://10.20.30.10:8080
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.11ms   13.13ms 186.71ms   98.26%
    Req/Sec     6.75k     1.94k   13.32k    82.67%
  1009559 requests in 5.10s, 118.42MB read
  Socket errors: connect 0, read 1, write 399, timeout 0
Requests/sec: 197970.83
Transfer/sec:     23.22MB

Again, there is a performance drop in Server GC mode.

Working on getting thread stack traces.

@swgillespie
Copy link
Contributor

@sergiy-k @syalachigere I just tried again with the most recent .NET CLI build that I could find (1.0.0-beta-002284). My steps:

  1. dotnet.exe restore on the HelloWeb folder of aspnet/cli-samples.
  2. Added a line in Program.cs to print out whether or not server GC is enabled,
  3. dotnet.exe run --configuration Release - the program outputs that Server GC is not on.
  4. Add this to the project.json:
  "runtimeOptions": {
    "configProperties": {
      "System.GC.Server": true
    }
  }
  1. dotnet.exe run --configuration Release - the program outputs that Server GC is on.

Running with dotnet.exe bin\Release\netcoreapp1.0\publish\HelloWeb.dll also worked for me with this build of the CLI.

@syalachigere - Can you try your scenario with CLI version 1.0.0-beta-002284? I got it from here. As Sergiy said, this is a very new feature that landed very recently in the CLI.

@syalachigere
Copy link
Author

I got the latest build 1.0.0-beta-002275 last night. I am trying again with 1.0.0-beta-002284.

I also want to mention here that we have seen the performance drop on our internal non-asp.net transaction-workload (a console app)

@sergiy-k
Copy link
Contributor

sergiy-k commented Apr 7, 2016

Hi @syalachigere, you might want to delete your bin and obj folders after upgrading the version of dotnet/cli. That can be the reason for the "Could not load host policy library" error that you mentioned in your earlier message.

@syalachigere
Copy link
Author

We did the following.

  1. Downloaded .Net CLI 1.0.0-beta-002286

  2. dotnet.exe run -c release on the HelloWeb folder of aspnet/cli-samples.

  • Saw performance drop (default settings).
  • set complus_gcserver=true gave us low performance
  • set complus_gcserver=false gave us better performance
  1. We then added a line in Program.cs to print out whether or not server GC is enabled, and added
    runtimeOptions to the project.json:
  "runtimeOptions": {
      "configProperties": {
      "System.GC.Server": true
     }
  }
  1. After adding runtimeOptions, we saw
    expected performance with "System.GC.Server": true and
    lower performance with "System.GC.Server": false

  2. How is set complus_gcserver=true as environment variable different from "System.GC.Server": true in project.json?

@sergiy-k
Copy link
Contributor

sergiy-k commented Apr 7, 2016

@syalachigere, I just noticed that you are setting complus_gcserver to true/false. This will have no effect; the correct (supported) values are 1 and 0.

@sergiy-k
Copy link
Contributor

sergiy-k commented Apr 7, 2016

I also wonder if there is some instability in the benchmark's results because you are running it for 5sec only. Do you get the same numbers if you run the benchmarks let's say for 30sec?

@swgillespie
Copy link
Contributor

@syalachigere (nevermind, Aditya has a better idea than I do)

@adityamandaleeka
Copy link
Member

@syalachigere

  1. How is set complus_gcserver=true as environment variable different from "System.GC.Server": true in project.json?

They will do the same thing. JSON configuration is new for .NET CLI, but the COMPlus_ version still works and will override the JSON version if both are set.

@swgillespie
Copy link
Contributor

@syalachigere - @sergiy-k just discovered the reason why you observed a regression in performance with complus_gcserver=true over complus_gcserver=false. complus_gcserver is expected to be a DWORD and, when it is being retrieved from the environment, is parsed as a hexadecimal number here: https://github.com/dotnet/coreclr/blob/master/src/utilcode/regutil.cpp#L239. The call to uniwcst is a macro that expands to _wcstoui64. _wcstoui64 stops the scan at the first non-digit character, so for true it returns 0, but for false it parses 0xFA and stops. Since 0xFA is nonzero, server GC is enabled, and since true is parsed as 0x0, server GC is disabled.

So, complus_gcserver=true has the effect of disabling server GC, while complus_gcserver=false has the effect of enabling server GC, strangely enough.

@syalachigere
Copy link
Author

We saw the performance drop when we moved to CLI from DNX. GC settings in [appname].runtimeconfig.json file didn’t work and setting complus_gcserver=true\false led us to an incorrect conclusion.

We verified the performance by setting complus_gcserver=1\0 and it works.

I am closing this issue. Thank you @swgillespie @sergiy-k

@swgillespie
Copy link
Contributor

@syalachigere Can you share the contents of your json file (or at least the runtime part)? If GC settings are not getting propagated from there to the runtime, that's an issue we should fix.

@syalachigere
Copy link
Author

@swgillespie From cli documentation.

HelloWeb.runtimeconfig.json
{
    "runtimeOptions": {
        "configProperties": {
            "System.GC.Server": true
        }
    }
}

@swgillespie
Copy link
Contributor

@syalachigere I was able to reproduce this on a CLI nightly, so I filed a bug to track it: https://github.com/dotnet/cli/issues/2485 . As a workaround, it works for me if you put the runtimeOptions object in your project.json file instead of in the HelloWeb.runtimeconfig.json file.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants