Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy scenario perf regression #9404

Open
sebastienros opened this Issue Apr 15, 2019 · 26 comments

Comments

Projects
None yet
5 participants
@sebastienros
Copy link
Member

sebastienros commented Apr 15, 2019

Actually in CoreFx, but opening in aspnet until I get more information

image

Updated

3.0.0-preview-27408-1 -> 3.0.0-preview-27409-1

Microsoft.NetCore.App / Core FX
dotnet/corefx@c2bbb6a...e1c0e1d

Microsoft.NetCore.App / Core CLR
dotnet/coreclr@a51af08...4e5df11

Command lines:

dotnet run -- --server http://10.10.10.2:5001 --client http://10.10.10.4:5002 `
-j ..\Benchmarks\benchmarks.httpclient.json -n HttpClientFactory --sdk Latest --self-contained --aspnetcoreversion 3.0.0-preview3-19116-01 --runtimeversion 3.0.0-preview-27408-1

dotnet run -- --server http://10.10.10.2:5001 --client http://10.10.10.4:5002 `
-j ..\Benchmarks\benchmarks.httpclient.json -n HttpClientFactory --sdk Latest --self-contained --aspnetcoreversion 3.0.0-preview3-19116-01 --runtimeversion 3.0.0-preview-27409-1
@benaadams

This comment has been minimized.

Copy link
Contributor

benaadams commented Apr 16, 2019

Looks like two drops?

image

@sebastienros

This comment has been minimized.

Copy link
Member Author

sebastienros commented Apr 17, 2019

The first drop corresponds to when I changed the benchmarks to actually flow the headers.

@sebastienros

This comment has been minimized.

Copy link
Member Author

sebastienros commented Apr 17, 2019

@stephentoub this benchmark is mostly about HttpClient under load, no SSL. Do you see anything in the changes I linked to that could explain the regression?

@stephentoub

This comment has been minimized.

Copy link

stephentoub commented Apr 17, 2019

Do you see anything in the changes I linked to that could explain the regression?

Nothing jumps out at me.

@sebastienros

This comment has been minimized.

Copy link
Member Author

sebastienros commented Apr 17, 2019

I will investigate then.

@sebastienros

This comment has been minimized.

Copy link
Member Author

sebastienros commented Apr 18, 2019

After investigating, it looks like a regression between

3.0.0-preview-27408-1 -> 3.0.0-preview-27408-3

Microsoft.NetCore.App / Core FX
dotnet/corefx@c2bbb6a...ede577c

Microsoft.NetCore.App / Core CLR
dotnet/coreclr@a51af08...4e5df11

Then I fetched both runtimes and tried to find which changed assemblies might cause this, and I can recover the previous performance by using System.Globalization.Native.so from the previous runtime.

Looking at the diff, there is this file that has changed:

src/corefx/System.Globalization.Native/pal_collation.c

Something I need to mention also is that after this runtime version is taken into account only this benchmarks is changed. Everything seems stable.

Here is the code that the benchmark is running:

https://github.com/aspnet/Benchmarks/blob/711f2b81cbd5dc1c2022cae0606e76a7eb79edc1/src/Proxy/Program.cs#L134-L144

@sebastienros

This comment has been minimized.

Copy link
Member Author

sebastienros commented Apr 18, 2019

/cc @filipnavara @janvorli as it seems to come from this commit: https://github.com/dotnet/coreclr/pull/22390/files

I'd say that it's not because of this PR dotnet/coreclr@34d50b0 though they are pretty close but the diff starts on Feb 04. To be sure I could run a benchmark if someone could give me the System.Gloablization.Native.so that only reverts this commit.

@filipnavara

This comment has been minimized.

Copy link

filipnavara commented Apr 18, 2019

Are you absolutely positive about those commit lists? I'd expect dotnet/coreclr#22378 to be the primary suspect if System.Globalization.Native is involved. It's just one commit prior to the lists you linked. The change in https://github.com/dotnet/coreclr/pull/22390/files should be harmless unless the C runtime glibc library is significantly broken.

I'll try to reproduce the results on the weekend, but if you find anything out please share it here.

@filipnavara

This comment has been minimized.

Copy link

filipnavara commented Apr 18, 2019

I'm going through possible regressions from dotnet/coreclr#22378.

This line is missing customRules->size == 0 condition. Some callocs could be replaced with mallocs and avoid zeroing memory. There's potential for some performance regression in some of the *SortHandle functions, but not sure if it's big enough to account for this. I will submit patch to address the obvious things.

@janvorli

This comment has been minimized.

Copy link

janvorli commented Apr 18, 2019

I have applied a delta to the current master that undoes the change made in dotnet/coreclr#22390 and shared the System.Globalization.Native.so with @sebastienros. He has verified that it fixes the perf regression.

@sebastienros

This comment has been minimized.

Copy link
Member Author

sebastienros commented Apr 18, 2019

@filipnavara if you are not sure whether a change is beneficial or not, just build many versions and I can run the numbers for each build. Put them on //scratch2.

@filipnavara

This comment has been minimized.

Copy link

filipnavara commented Apr 18, 2019

@janvorli Wow. That's very surprising. I'll look at it again. The change itself doesn't fix any bug and it should have generally improved performance which makes it surprising that quite the opposite happened. If I don't come up with a good explanation I'll submit a PR to revert it.

@sebastienros

This comment has been minimized.

Copy link
Member Author

sebastienros commented Apr 18, 2019

I am quite confident with the way I ran it, but there is always space for a human error (me). To be very confident feel free to give me versions with obvious regressions, even based on the reverted commit, to be sure I get the expected results. That would also make me fell more confident.

Something I want to insist on also is that I just patch "this" file, no other one, and I am not sure if one or the other versions could actually trigger some behavior coming from a separate module. I can also share the full standalone application if you want to check which bits are measured. But you will see that only this file will have changed in the end.

@filipnavara

This comment has been minimized.

Copy link

filipnavara commented Apr 18, 2019

I don't doubt your findings. It must be some weird thing with the way the tsearch function is used. I'll try to produce a more localized test to see if I can reproduce the behaviour. I'd actually be happy if it turns out to be caused by this small change since the one preceding it was huge and much harder to check for any regressions.

@filipnavara

This comment has been minimized.

Copy link

filipnavara commented Apr 18, 2019

I'm getting a suspicion that the problem is actually the changed structure layout in that commit that causes the pthread mutex to be aligned differently. However that's kinda hard to measure.

@janvorli Would it be possible for you to make a build that swaps these two lines: https://github.com/dotnet/coreclr/blob/42b5509b1b8f3be12876579aa23a3cb3ed2ca7a4/src/corefx/System.Globalization.Native/pal_collation.c#L45-L46 ? My only Linux build machines are in the cloud and it would take me a while to set it up properly.

@janvorli

This comment has been minimized.

Copy link

janvorli commented Apr 18, 2019

Sure, I will do that.

@janvorli

This comment has been minimized.

Copy link

janvorli commented Apr 18, 2019

@filipnavara that change had no effect.

@filipnavara

This comment has been minimized.

Copy link

filipnavara commented Apr 18, 2019

Thanks for testing it. I will submit a PR to revert the change and continue investigating on my Linux machine to see if it's anything I can reproduce.

@filipnavara

This comment has been minimized.

Copy link

filipnavara commented Apr 18, 2019

I reduced it to the following test case in plain C: https://gist.github.com/filipnavara/31791b216078706e0b465f7a103ff01f

On macOS using tsearch is a notch faster than tfind, but it's close to a measurement error for a sparse distribution. On Ubuntu (Linux mono-linux-build 4.15.0-1019-azure #19-Ubuntu SMP Thu Jul 26 19:54:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux) running in Azure tsearch is consistently around 15%-18% slower.

@sebastienros

This comment has been minimized.

Copy link
Member Author

sebastienros commented Apr 18, 2019

How do you run measure the differences? I ran the code but it simply output 12.

@filipnavara

This comment has been minimized.

Copy link

filipnavara commented Apr 18, 2019

@sebastienros by running it under time command.

gcc tsearch.c -o tsearch
time ./tsearch
gcc tfind.c -o tfind
time ./tfind
@sebastienros

This comment has been minimized.

Copy link
Member Author

sebastienros commented Apr 18, 2019

on my physical machines I get this:

Cores tsearch tfind
6/12 HT 0.900s 0.750s
14/28 HT 1.113s 0.927s
@benaadams

This comment has been minimized.

Copy link
Contributor

benaadams commented Apr 19, 2019

A side question; if the regression is in Globalization; how is HttpClient using globalization? Wouldn't have thought it would be involved; though I may be wrong?

i.e isn't it all mostly ascii formats and fairly invariant?

@stephentoub

This comment has been minimized.

Copy link

stephentoub commented Apr 19, 2019

It's not just HttpClient involved here... isn't this an ASP.NET site that in turn makes HttpClient requests (or HttpClient requests to an ASP.NET site)?

@benaadams

This comment has been minimized.

Copy link
Contributor

benaadams commented Apr 19, 2019

Yeah; but it doesn't do anything particularly interesting, except this line...

new Uri(UriHelper.BuildAbsolute(_scheme, _host, _pathBase, context.Request.Path, context.Request.QueryString.Add(_appendQuery)))

Checking UriHelper its doing a bunch of string.IndexOf(string) which is CurrentCulture, so it might be that...

@benaadams

This comment has been minimized.

Copy link
Contributor

benaadams commented Apr 19, 2019

Made a PR for that class #9537

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.