New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge performance difference in string.EndsWith between Linux and Windows for non Invariant, non en-US. #5612

Open
jskeet opened this Issue Jun 8, 2016 · 17 comments

Comments

Projects
None yet
@jskeet

jskeet commented Jun 8, 2016

Very simple code:

using System;
using System.Diagnostics;

namespace ConsoleApplication
{
    public class Program
    {
        public static void Main(string[] args)
        {
            Time(1);
            Time(1000000);
        }

        private static void Time(int iterations)
        {
            var stopwatch = Stopwatch.StartNew();
            for (int i = 0; i < iterations; i++)
            {
                "abcfeg".EndsWith("123");
            }
            stopwatch.Stop();
            Console.WriteLine(stopwatch.Elapsed);
        }
    }
}

The project.json is mostly the default from dotnet new:

{
  "buildOptions": {
    "emitEntryPoint": true,
    "optimize": true
  },
  "frameworks": {
    "netcoreapp1.0": {
      "dependencies": {
        "Microsoft.NETCore.App": {
          "type": "platform",
          "version": "1.0.0-*"
        }
      },
      "imports": "dnxcore50"
    }
  }
}

On both Linux (Ubuntu 15.10 and Ubuntu 16.04) the output is something like:

00:00:00.0003456
00:00:05.2664877

On Windows, on the same hardware, it's:

00:00:00.0000506
00:00:00.1633352

Version info: .NET Command Line Tools (1.0.0-preview1-002702) on Windows and Ubuntu 15.10; 1.0.0-preview2-002886 on Ubuntu 16.04.

Note that under NUnit, every equality assertion involves three EndsWith calls, making NUnit assertions basically horrifically expensive on Linux...

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Jun 8, 2016

Member

cc: @ellismg

Interesting. I don't have that exact set up handy, but on Ubuntu 14.04 with CLI 1.0.0-preview2-002853 and relatively recent rc3 builds, I see:

stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002493
00:00:00.0464325
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002399
00:00:00.0572944
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002480
00:00:00.0502375
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0004543
00:00:00.0643471
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002738
00:00:00.0627142

so much faster than what you're seeing, and actually faster than what I'm seeing on the Windows host, e.g.

00:00:00.0076654
00:00:00.1033137

But... my LANG is en-US. If I do:

export LANG=en-GB

and then run it again, I get numbers much closer to yours:

00:00:00.0002939
00:00:04.8001375

Not a real solution, but as an experiment, @jskeet, if you change your LANG to en-US, do you see a perf improvement?

Member

stephentoub commented Jun 8, 2016

cc: @ellismg

Interesting. I don't have that exact set up handy, but on Ubuntu 14.04 with CLI 1.0.0-preview2-002853 and relatively recent rc3 builds, I see:

stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002493
00:00:00.0464325
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002399
00:00:00.0572944
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002480
00:00:00.0502375
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0004543
00:00:00.0643471
stoub@stoublinux2:~/tmp3$ dotnet run
Project tmp3 (.NETCoreApp,Version=v1.0) was previously compiled. Skipping compilation.
00:00:00.0002738
00:00:00.0627142

so much faster than what you're seeing, and actually faster than what I'm seeing on the Windows host, e.g.

00:00:00.0076654
00:00:00.1033137

But... my LANG is en-US. If I do:

export LANG=en-GB

and then run it again, I get numbers much closer to yours:

00:00:00.0002939
00:00:04.8001375

Not a real solution, but as an experiment, @jskeet, if you change your LANG to en-US, do you see a perf improvement?

@ellismg

This comment has been minimized.

Show comment
Hide comment
@ellismg

ellismg Jun 8, 2016

Contributor

If the problem is the LANG setting, then what's happening is that our fast paths for ASCII StartsWith/EndsWith are specific today to en-US and Invariant (see the code here.)

Unlike Windows, having to do a full linguistic StartsWith or EndsWith is slow because we have to construct some ICU searching objects which we can't cache across runs (since the object is specific to a target string you are searching for, and we don't maintain a cache of searcher objects).

For 1.1 we should look at expanding the fast path to work for ASCII strings if the collation rules for the current locale don't tailor anything in the ASCII range, which would allow this fast path to be hit for locales like en-GB.

We could also consider trying to re-implement IndexOf in terms of some lower level ICU primitives that we might be able to cache across calls.

Contributor

ellismg commented Jun 8, 2016

If the problem is the LANG setting, then what's happening is that our fast paths for ASCII StartsWith/EndsWith are specific today to en-US and Invariant (see the code here.)

Unlike Windows, having to do a full linguistic StartsWith or EndsWith is slow because we have to construct some ICU searching objects which we can't cache across runs (since the object is specific to a target string you are searching for, and we don't maintain a cache of searcher objects).

For 1.1 we should look at expanding the fast path to work for ASCII strings if the collation rules for the current locale don't tailor anything in the ASCII range, which would allow this fast path to be hit for locales like en-GB.

We could also consider trying to re-implement IndexOf in terms of some lower level ICU primitives that we might be able to cache across calls.

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Jun 8, 2016

@stephentoub: Bingo! Yes, with

$ LANG=en-US dotnet run

I get:

00:00:00.0003087
00:00:00.0630077

Applying the same workaround to running my Noda Time tests halves the total test time, too... (That's where all this started.)

jskeet commented Jun 8, 2016

@stephentoub: Bingo! Yes, with

$ LANG=en-US dotnet run

I get:

00:00:00.0003087
00:00:00.0630077

Applying the same workaround to running my Noda Time tests halves the total test time, too... (That's where all this started.)

@ellismg

This comment has been minimized.

Show comment
Hide comment
@ellismg

ellismg Jun 8, 2016

Contributor

@jskeet Another option (and I'm not sure if this is possible via the interfaces NUnit exposes, a quick glance over the source doesn't give me much hope) is to use Ordinal or OrdinalIgnoreCase, which will ignore all the ICU gunk.

Contributor

ellismg commented Jun 8, 2016

@jskeet Another option (and I'm not sure if this is possible via the interfaces NUnit exposes, a quick glance over the source doesn't give me much hope) is to use Ordinal or OrdinalIgnoreCase, which will ignore all the ICU gunk.

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Jun 8, 2016

@ellismg: The checks here are deep in the bowels of NUnit - but could be fixed very easily with a patch. It's unfortunate that it's necessary, but it feels like a good practical solution to a very real issue. Will file a feature request now...

jskeet commented Jun 8, 2016

@ellismg: The checks here are deep in the bowels of NUnit - but could be fixed very easily with a patch. It's unfortunate that it's necessary, but it feels like a good practical solution to a very real issue. Will file a feature request now...

@ellismg

This comment has been minimized.

Show comment
Hide comment
@ellismg

ellismg Jun 8, 2016

Contributor

@jskeet Thanks! I very much expect we will do the extension of the ASCII fast paths for 1.1, but we may be in a world where linguistic StartsWith and EndsWith are slower for non ASCII strings or strings where collation for ASCII characters differs (e.g tr-TR) because of how this stuff is implemented in terms of ICU, so if we can upstream general goodness changes to force ordinal comparisions when you don't need linguistic behavior that would be great.

Contributor

ellismg commented Jun 8, 2016

@jskeet Thanks! I very much expect we will do the extension of the ASCII fast paths for 1.1, but we may be in a world where linguistic StartsWith and EndsWith are slower for non ASCII strings or strings where collation for ASCII characters differs (e.g tr-TR) because of how this stuff is implemented in terms of ICU, so if we can upstream general goodness changes to force ordinal comparisions when you don't need linguistic behavior that would be great.

@ellismg ellismg modified the milestones: 1.1.0, 1.0.0-rtm Jun 8, 2016

@ellismg ellismg changed the title from Huge performance difference in string.EndsWith between Linux and Windows to Huge performance difference in string.EndsWith between Linux and Windows for non Invariant, non en-US. Jun 8, 2016

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Jun 8, 2016

Yup. Would be good if every call to Assert.AreEqual didn't need to do so much work in general, but that's a matter for another day :) Fingers crossed we can get this into NUnit quickly - it seems about as trivial a fix as one could wish for given the crazy perf benefits. (Admittedly Noda Time may be slightly unusual in its number of assertions...)

jskeet commented Jun 8, 2016

Yup. Would be good if every call to Assert.AreEqual didn't need to do so much work in general, but that's a matter for another day :) Fingers crossed we can get this into NUnit quickly - it seems about as trivial a fix as one could wish for given the crazy perf benefits. (Admittedly Noda Time may be slightly unusual in its number of assertions...)

@wpostma

This comment has been minimized.

Show comment
Hide comment
@wpostma

wpostma Jun 9, 2016

Given how disgusting and unexpected such ICU regressions are likely to be, could there be a syntax added to .net core that lets you specify ie StartsWith(arg, forceAscii:true) ?

wpostma commented Jun 9, 2016

Given how disgusting and unexpected such ICU regressions are likely to be, could there be a syntax added to .net core that lets you specify ie StartsWith(arg, forceAscii:true) ?

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Jun 9, 2016

@wpostma: What would forceAscii do? There's already string.StartsWith(string, StringComparison) so you can specify an ordinal comparison, which is precisely what I'm proposing in NUnit.

jskeet commented Jun 9, 2016

@wpostma: What would forceAscii do? There's already string.StartsWith(string, StringComparison) so you can specify an ordinal comparison, which is precisely what I'm proposing in NUnit.

@wpostma

This comment has been minimized.

Show comment
Hide comment
@wpostma

wpostma Jun 9, 2016

Um, forget I suggested that. :-)

wpostma commented Jun 9, 2016

Um, forget I suggested that. :-)

@gkhanna79

This comment has been minimized.

Show comment
Hide comment
@gkhanna79

gkhanna79 Oct 26, 2016

Member

@ellismg Is this actionable for 1.1.0?

Member

gkhanna79 commented Oct 26, 2016

@ellismg Is this actionable for 1.1.0?

@ellismg ellismg modified the milestones: 1.2.0, 1.1.0 Oct 26, 2016

@AlexGhiondea AlexGhiondea modified the milestones: Future, 2.0.0 Mar 20, 2017

@mazong1123

This comment has been minimized.

Show comment
Hide comment
@mazong1123

mazong1123 Aug 16, 2017

Collaborator

Anyone working on this issue? If not I have time to take a look.

Collaborator

mazong1123 commented Aug 16, 2017

Anyone working on this issue? If not I have time to take a look.

@janvorli

This comment has been minimized.

Show comment
Hide comment
@janvorli

janvorli Aug 16, 2017

Member

@mazong1123 that would be great! Thank you. No-one is assigned, so that means no-one is working on this. I've just assigned you.

Member

janvorli commented Aug 16, 2017

@mazong1123 that would be great! Thank you. No-one is assigned, so that means no-one is working on this. I've just assigned you.

@mazong1123

This comment has been minimized.

Show comment
Hide comment
@mazong1123

mazong1123 Aug 20, 2017

Collaborator

Today I investigate a little bit of the related source code. Just put my learning here:

There're two path for the EndsWith comparison:

Collaborator

mazong1123 commented Aug 20, 2017

Today I investigate a little bit of the related source code. Just put my learning here:

There're two path for the EndsWith comparison:

@danmosemsft

This comment has been minimized.

Show comment
Hide comment
@danmosemsft

danmosemsft Nov 23, 2017

Member

@mazong1123 have you had a chance to look further?

Member

danmosemsft commented Nov 23, 2017

@mazong1123 have you had a chance to look further?

@mazong1123

This comment has been minimized.

Show comment
Hide comment
@mazong1123

mazong1123 Nov 23, 2017

Collaborator

@danmosemsft sorry I'm afraid I don't have so much time for working on this issue right now. I'm busy with my day job and for .NET Core I'm only focusing on #14646

However, I can support anyone who wants to work on this issue in my spare time.

Collaborator

mazong1123 commented Nov 23, 2017

@danmosemsft sorry I'm afraid I don't have so much time for working on this issue right now. I'm busy with my day job and for .NET Core I'm only focusing on #14646

However, I can support anyone who wants to work on this issue in my spare time.

@abk-x

This comment has been minimized.

Show comment
Hide comment
@abk-x

abk-x Mar 13, 2018

I have noticed this issue when running netcore code along side mono:

var x = System.IO.File.ReadAllText("/tmp/test/random25charstring.txt");
for (int i=0; i< 900000; i++) {
    if (x.StartsWith("notlong/")) { Console.WriteLine("Yes");}

The above code takes 6 seconds on my PC with 2.1.300-preview1-008174, and 370 milliseconds on mono. Interestingly, when I change my LANG to en-US, it takes 7 seconds (my default is en_AU.UTF-8).

The fix as mentioned above is to add Ordinal as the string comparison option:

if (x.StartsWith("notlong/",StringComparison.Ordinal))

This brings it back down to an acceptable 31 milliseconds on my PC.

abk-x commented Mar 13, 2018

I have noticed this issue when running netcore code along side mono:

var x = System.IO.File.ReadAllText("/tmp/test/random25charstring.txt");
for (int i=0; i< 900000; i++) {
    if (x.StartsWith("notlong/")) { Console.WriteLine("Yes");}

The above code takes 6 seconds on my PC with 2.1.300-preview1-008174, and 370 milliseconds on mono. Interestingly, when I change my LANG to en-US, it takes 7 seconds (my default is en_AU.UTF-8).

The fix as mentioned above is to add Ordinal as the string comparison option:

if (x.StartsWith("notlong/",StringComparison.Ordinal))

This brings it back down to an acceptable 31 milliseconds on my PC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment