Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upHuge performance difference in string.EndsWith between Linux and Windows for non Invariant, non en-US. #5612
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
stephentoub
Jun 8, 2016
Member
cc: @ellismg
Interesting. I don't have that exact set up handy, but on Ubuntu 14.04 with CLI 1.0.0-preview2-002853 and relatively recent rc3 builds, I see:
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002493
00:00:00.0464325
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002399
00:00:00.0572944
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002480
00:00:00.0502375
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0004543
00:00:00.0643471
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002738
00:00:00.0627142
so much faster than what you're seeing, and actually faster than what I'm seeing on the Windows host, e.g.
00:00:00.0076654
00:00:00.1033137
But... my LANG is en-US. If I do:
export LANG=en-GB
and then run it again, I get numbers much closer to yours:
00:00:00.0002939
00:00:04.8001375
Not a real solution, but as an experiment, @jskeet, if you change your LANG to en-US, do you see a perf improvement?
|
cc: @ellismg Interesting. I don't have that exact set up handy, but on Ubuntu 14.04 with CLI 1.0.0-preview2-002853 and relatively recent rc3 builds, I see:
so much faster than what you're seeing, and actually faster than what I'm seeing on the Windows host, e.g.
But... my LANG is en-US. If I do:
and then run it again, I get numbers much closer to yours:
Not a real solution, but as an experiment, @jskeet, if you change your LANG to en-US, do you see a perf improvement? |
stephentoub
added this to the 1.0.0-rtm milestone
Jun 8, 2016
stephentoub
added
the
tenet-performance
label
Jun 8, 2016
stephentoub
assigned
ellismg
Jun 8, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ellismg
Jun 8, 2016
Contributor
If the problem is the LANG setting, then what's happening is that our fast paths for ASCII StartsWith/EndsWith are specific today to en-US and Invariant (see the code here.)
Unlike Windows, having to do a full linguistic StartsWith or EndsWith is slow because we have to construct some ICU searching objects which we can't cache across runs (since the object is specific to a target string you are searching for, and we don't maintain a cache of searcher objects).
For 1.1 we should look at expanding the fast path to work for ASCII strings if the collation rules for the current locale don't tailor anything in the ASCII range, which would allow this fast path to be hit for locales like en-GB.
We could also consider trying to re-implement IndexOf in terms of some lower level ICU primitives that we might be able to cache across calls.
|
If the problem is the LANG setting, then what's happening is that our fast paths for ASCII StartsWith/EndsWith are specific today to en-US and Invariant (see the code here.) Unlike Windows, having to do a full linguistic StartsWith or EndsWith is slow because we have to construct some ICU searching objects which we can't cache across runs (since the object is specific to a target string you are searching for, and we don't maintain a cache of searcher objects). For 1.1 we should look at expanding the fast path to work for ASCII strings if the collation rules for the current locale don't tailor anything in the ASCII range, which would allow this fast path to be hit for locales like en-GB. We could also consider trying to re-implement IndexOf in terms of some lower level ICU primitives that we might be able to cache across calls. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jskeet
Jun 8, 2016
@stephentoub: Bingo! Yes, with
$ LANG=en-US dotnet runI get:
00:00:00.0003087
00:00:00.0630077
Applying the same workaround to running my Noda Time tests halves the total test time, too... (That's where all this started.)
jskeet
commented
Jun 8, 2016
|
@stephentoub: Bingo! Yes, with $ LANG=en-US dotnet runI get:
Applying the same workaround to running my Noda Time tests halves the total test time, too... (That's where all this started.) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ellismg
Jun 8, 2016
Contributor
@jskeet Another option (and I'm not sure if this is possible via the interfaces NUnit exposes, a quick glance over the source doesn't give me much hope) is to use Ordinal or OrdinalIgnoreCase, which will ignore all the ICU gunk.
|
@jskeet Another option (and I'm not sure if this is possible via the interfaces NUnit exposes, a quick glance over the source doesn't give me much hope) is to use Ordinal or OrdinalIgnoreCase, which will ignore all the ICU gunk. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jskeet
Jun 8, 2016
@ellismg: The checks here are deep in the bowels of NUnit - but could be fixed very easily with a patch. It's unfortunate that it's necessary, but it feels like a good practical solution to a very real issue. Will file a feature request now...
jskeet
commented
Jun 8, 2016
|
@ellismg: The checks here are deep in the bowels of NUnit - but could be fixed very easily with a patch. It's unfortunate that it's necessary, but it feels like a good practical solution to a very real issue. Will file a feature request now... |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ellismg
Jun 8, 2016
Contributor
@jskeet Thanks! I very much expect we will do the extension of the ASCII fast paths for 1.1, but we may be in a world where linguistic StartsWith and EndsWith are slower for non ASCII strings or strings where collation for ASCII characters differs (e.g tr-TR) because of how this stuff is implemented in terms of ICU, so if we can upstream general goodness changes to force ordinal comparisions when you don't need linguistic behavior that would be great.
|
@jskeet Thanks! I very much expect we will do the extension of the ASCII fast paths for 1.1, but we may be in a world where linguistic StartsWith and EndsWith are slower for non ASCII strings or strings where collation for ASCII characters differs (e.g tr-TR) because of how this stuff is implemented in terms of ICU, so if we can upstream general goodness changes to force ordinal comparisions when you don't need linguistic behavior that would be great. |
ellismg
modified the milestones:
1.1.0,
1.0.0-rtm
Jun 8, 2016
ellismg
changed the title from
Huge performance difference in string.EndsWith between Linux and Windows
to
Huge performance difference in string.EndsWith between Linux and Windows for non Invariant, non en-US.
Jun 8, 2016
jskeet
referenced this issue
Jun 8, 2016
Closed
EndsWith calls in Constraint constructor can cause major perf issues #1555
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jskeet
Jun 8, 2016
Yup. Would be good if every call to Assert.AreEqual didn't need to do so much work in general, but that's a matter for another day :) Fingers crossed we can get this into NUnit quickly - it seems about as trivial a fix as one could wish for given the crazy perf benefits. (Admittedly Noda Time may be slightly unusual in its number of assertions...)
jskeet
commented
Jun 8, 2016
|
Yup. Would be good if every call to |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
wpostma
Jun 9, 2016
Given how disgusting and unexpected such ICU regressions are likely to be, could there be a syntax added to .net core that lets you specify ie StartsWith(arg, forceAscii:true) ?
wpostma
commented
Jun 9, 2016
|
Given how disgusting and unexpected such ICU regressions are likely to be, could there be a syntax added to .net core that lets you specify ie StartsWith(arg, forceAscii:true) ? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jskeet
Jun 9, 2016
@wpostma: What would forceAscii do? There's already string.StartsWith(string, StringComparison) so you can specify an ordinal comparison, which is precisely what I'm proposing in NUnit.
jskeet
commented
Jun 9, 2016
|
@wpostma: What would |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
wpostma
commented
Jun 9, 2016
|
Um, forget I suggested that. :-) |
jtattermusch
referenced this issue
Oct 23, 2016
Merged
C# performance: string.EndsWith is really slow on CoreCLR #8472
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
@ellismg Is this actionable for 1.1.0? |
ellismg
modified the milestones:
1.2.0,
1.1.0
Oct 26, 2016
gkhanna79
added
the
area-corelib
label
Jan 12, 2017
danmosemsft
added
area-System.Runtime
and removed
area-corelib
labels
Jan 12, 2017
AlexGhiondea
modified the milestones:
Future,
2.0.0
Mar 20, 2017
AlexGhiondea
added
the
up-for-grabs
label
Mar 20, 2017
AlexGhiondea
unassigned
ellismg
Mar 20, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mazong1123
Aug 16, 2017
Collaborator
Anyone working on this issue? If not I have time to take a look.
|
Anyone working on this issue? If not I have time to take a look. |
janvorli
assigned
mazong1123
Aug 16, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
janvorli
Aug 16, 2017
Member
@mazong1123 that would be great! Thank you. No-one is assigned, so that means no-one is working on this. I've just assigned you.
|
@mazong1123 that would be great! Thank you. No-one is assigned, so that means no-one is working on this. I've just assigned you. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mazong1123
Aug 20, 2017
Collaborator
Today I investigate a little bit of the related source code. Just put my learning here:
There're two path for the EndsWith comparison:
-
Slow path:
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Globalization/CompareInfo.Unix.cs#L283
It would call https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/Interop/Unix/System.Globalization.Native/Interop.Collation.cs#L38
Ultimately goes to native functionGlobalizationNative_EndsWith
It has some search operations which slow down the performance. -
Fast path: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Globalization/CompareInfo.Unix.cs#L280
or
The fast path would not go toGlobalizationNative_EndsWith. It either goes to
Ultimately goes to
or
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Globalization/CompareInfo.cs#L476
It avoids the search operations in the slow path so its performance is better.
|
Today I investigate a little bit of the related source code. Just put my learning here: There're two path for the
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
@mazong1123 have you had a chance to look further? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mazong1123
Nov 23, 2017
Collaborator
@danmosemsft sorry I'm afraid I don't have so much time for working on this issue right now. I'm busy with my day job and for .NET Core I'm only focusing on #14646
However, I can support anyone who wants to work on this issue in my spare time.
|
@danmosemsft sorry I'm afraid I don't have so much time for working on this issue right now. I'm busy with my day job and for .NET Core I'm only focusing on #14646 However, I can support anyone who wants to work on this issue in my spare time. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
abk-x
Mar 13, 2018
I have noticed this issue when running netcore code along side mono:
var x = System.IO.File.ReadAllText("/tmp/test/random25charstring.txt");
for (int i=0; i< 900000; i++) {
if (x.StartsWith("notlong/")) { Console.WriteLine("Yes");}
The above code takes 6 seconds on my PC with 2.1.300-preview1-008174, and 370 milliseconds on mono. Interestingly, when I change my LANG to en-US, it takes 7 seconds (my default is en_AU.UTF-8).
The fix as mentioned above is to add Ordinal as the string comparison option:
if (x.StartsWith("notlong/",StringComparison.Ordinal))
This brings it back down to an acceptable 31 milliseconds on my PC.
abk-x
commented
Mar 13, 2018
|
I have noticed this issue when running netcore code along side mono:
The above code takes 6 seconds on my PC with 2.1.300-preview1-008174, and 370 milliseconds on mono. Interestingly, when I change my LANG to en-US, it takes 7 seconds (my default is en_AU.UTF-8). The fix as mentioned above is to add Ordinal as the string comparison option:
This brings it back down to an acceptable 31 milliseconds on my PC. |
jskeet commentedJun 8, 2016
•
edited
Edited 1 time
-
jskeet
edited Jun 8, 2016 (most recent)
Very simple code:
The project.json is mostly the default from
dotnet new:{ "buildOptions": { "emitEntryPoint": true, "optimize": true }, "frameworks": { "netcoreapp1.0": { "dependencies": { "Microsoft.NETCore.App": { "type": "platform", "version": "1.0.0-*" } }, "imports": "dnxcore50" } } }On both Linux (Ubuntu 15.10 and Ubuntu 16.04) the output is something like:
On Windows, on the same hardware, it's:
Version info: .NET Command Line Tools (1.0.0-preview1-002702) on Windows and Ubuntu 15.10; 1.0.0-preview2-002886 on Ubuntu 16.04.
Note that under NUnit, every equality assertion involves three
EndsWithcalls, making NUnit assertions basically horrifically expensive on Linux...