New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of string.IndexOfAny for 2 & 3 char searches #13219

Merged
merged 4 commits into from Aug 8, 2017

Conversation

Projects
None yet
9 participants
@bbowyersmyth
Contributor

bbowyersmyth commented Aug 4, 2017

Improves the performance of string.IndexOfAny, special casing common 2 and 3 character searches.

2 char search
Beats a minimal hard coded loop after 3rd string character.
Small string (22chars) - 67% faster than existing
Large string (366 chars) - 56% faster than existing

3 char search
Beats a minimal hard coded loop after 8th string character.
Small string (22chars) - 49% faster than existing
Large string (366 chars) - 28% faster than existing

InitializeProbabilisticMap
Very common to search for ASCII symbols. The high map always writes the same value to the same location. Only doing that once gives a 13% improvement (isolated, 10 char array).

Issues
dotnet/corefx#22771
aspnet/Mvc#5362
NuGet/NuGet.Client#1590 (comment)
#7017

cc @davkean @benaadams

@bbowyersmyth

This comment has been minimized.

Contributor

bbowyersmyth commented Aug 4, 2017

Intention was to also improve ProbablyContains but there is an existing 80% difference between x86 and x64 (isolated test) that made it difficult. That also limited what could be done to InitializeProbabilisticMap.
Test code here if anyone wants to tackle that but taking 2 and 3 char searches out of this path helps.

@davkean

This comment has been minimized.

Member

davkean commented Aug 4, 2017

while (count > 0)
{
if (*pCh == value1)

This comment has been minimized.

@jkotas

jkotas Aug 4, 2017

Member

This depends on JIT CSEing *pCh for best performance. The JIT will probably do it, but it is best to write hand-optimized code like this one to be as close as possible to the native code.

Why not:

 char c = *pCh; 
 if (c == value1 || c == value2)
     goto ReturnIndex;

like in the 3 argument overload?

This comment has been minimized.

@bbowyersmyth

bbowyersmyth Aug 4, 2017

Contributor

I got the same time between the two variations so I just went with the simpler one. 3 char was better with a variable. Will change.

if (anyOf == null)
{
throw new ArgumentNullException(nameof(anyOf));
}

This comment has been minimized.

@stephentoub

stephentoub Aug 4, 2017

Member

It may not be common enough to warrant it (and thus may hurt the more common cases of 2/3), but I wonder if it's worthwhile adding here:

if (anyOf.Length == 1)
{
    return IndexOf(anyOf[0], startIndex, count);
}

This comment has been minimized.

@bbowyersmyth

bbowyersmyth Aug 4, 2017

Contributor

@stephentoub The only place I could think of that would be using a 1 char array was the Unix path handling code. That looked to be taken care of with the Path internal changes so I didn't bother with it.
But true to performance testing form the 2 char call is actually 10% better if it has another if before it. Not so if I change that to a switch though which is interesting. Will run a few more tests on the other paths.

This comment has been minimized.

@bbowyersmyth

bbowyersmyth Aug 4, 2017

Contributor

Nope sorry it was the 3 char test that was 10% better. 2 char is 4% worse. Thoughts?

This comment has been minimized.

@jkotas

jkotas Aug 5, 2017

Member

Nope sorry it was the 3 char test that was 10% better. 2 char is 4% worse. Thoughts?

These kind of odd variations are typically caused by things like code alignment or branch prediction that will vary from build to build. One has to run it under Intel profiler to tell what is going on. I typically just look at the JITed code whether it looks fine.

This comment has been minimized.

@jkotas

jkotas Aug 7, 2017

Member

I wonder if it's worthwhile adding here

I think it is worthwhile to add it, and also Length == 0. Otherwise, the API has unexpected perf cliff.

This comment has been minimized.

@benaadams

benaadams Aug 7, 2017

Collaborator

😢 but don't optimize for them early? e.g.

if (anyOf.Length == 2)
{
    // Very common optimization for directory separators (/, \), quotes (", '), brackets, etc
    return IndexOfAny(anyOf[0], anyOf[1], startIndex, count);
}
else if (anyOf.Length == 3)
{
    return IndexOfAny(anyOf[0], anyOf[1], anyOf[2], startIndex, count);
}
else if (anyOf.Length > 3)
{
    return IndexOfCharArray(anyOf, startIndex, count);
}
else if (anyOf.Length == 1)
{
    return IndexOf(anyOf[0], startIndex, count);
}
else // anyOf.Length == 0
{
    return -1;
}

This comment has been minimized.

@bbowyersmyth

bbowyersmyth Aug 8, 2017

Contributor

That seems like a good compromise. Done.

{
char c = *pCh;
if (c == value1)

This comment has been minimized.

@jkotas

jkotas Aug 5, 2017

Member

Nit: if (c == value1 || c == value2) ?

This comment has been minimized.

@benaadams

benaadams Aug 5, 2017

Collaborator

Also below

This comment has been minimized.

@bbowyersmyth

bbowyersmyth Aug 7, 2017

Contributor

Done

bbowyersmyth added some commits Aug 7, 2017

@jkotas

This comment has been minimized.

Member

jkotas commented Aug 8, 2017

@dotnet-bot test Tizen armel Cross Release Build

@jkotas

This comment has been minimized.

Member

jkotas commented Aug 8, 2017

@dotnet-bot test Windows_NT x64 corefx_baseline

@jkotas

jkotas approved these changes Aug 8, 2017

Thanks!

@jkotas jkotas merged commit 9a405f2 into dotnet:master Aug 8, 2017

17 checks passed

CROSS Check Build finished.
Details
CentOS7.1 x64 Debug Build and Test Build finished.
Details
FreeBSD x64 Checked Build Build finished.
Details
OSX10.12 x64 Checked Build and Test Build finished.
Details
Tizen armel Cross Debug Build Build finished.
Details
Tizen armel Cross Release Build Build finished.
Details
Ubuntu arm Cross Release Build Build finished.
Details
Ubuntu x64 Checked Build and Test Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
Ubuntu16.04 arm Cross Debug Build Build finished.
Details
Windows_NT x64 Checked Build and Test (Jit - CoreFx) Build finished.
Details
Windows_NT x64 CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x64 Debug Build and Test Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x64 Release Priority 1 Build and Test Build finished.
Details
Windows_NT x86 Checked Build and Test Build finished.
Details
Windows_NT x86 CoreCLR Perf Tests Correctness Build finished.
Details

@bbowyersmyth bbowyersmyth deleted the bbowyersmyth:String_IndexOfAny branch Aug 26, 2017

@karelz karelz modified the milestone: 2.1.0 Aug 28, 2017

@dnickless

I'm sory for being late. ;)

if (c == value1 || c == value2)
goto ReturnIndex1;
pCh += 2;

This comment has been minimized.

@dnickless

dnickless Sep 9, 2017

Contributor

Is there any reason why we only unroll 2x here? The other methods in this file, IndexOf and LastIndexOf both use 4x. My tests runs also show that switching to 4x cuts the time to 60-70% (actually, 8x is even faster on my machine but let's not go over the top...). Also, I would see value in keeping the two new explicit implementations for 2/3 elements char arrays as close as possible to the existing one for IndexOf() just with 2/3 if checks instead of just 1?!

This comment has been minimized.

@bbowyersmyth

bbowyersmyth Sep 10, 2017

Contributor

Most of the usages of IndexOfAny looked like they would be on short strings, path segments and the like. IndexOf use is much more varied and needs to have great performance on all string lengths.
It's a bit of a trade off as the more you unroll, the more time you spend outside that loop in slower code going a char at a time which hurts small string performance.
It's quite possible that with further analysis this function would be better off with 4x unroll but I locked in the win when it achieved its first two objectives. Which was to avoid the relative expensive probability map creation/comparison and to alleviate the need for the hard coded loops that have been popping up.

This comment has been minimized.

@dnickless

dnickless Sep 11, 2017

Contributor

You might want to look at that test code once you've got too much time: https://gist.github.com/dnickless/eb4807d698b7032b625a8d04794e90d8

if (c == value1 || c == value2 || c == value3)
goto ReturnIndex;
pCh++;

This comment has been minimized.

@dnickless

dnickless Sep 9, 2017

Contributor

Would 4x unrolling here not make sense, too? I didn't test this case but I would expect it to have a similar effect like with all other methods in this file.

This comment has been minimized.

@bbowyersmyth

bbowyersmyth Sep 10, 2017

Contributor

3 char is a lot less common. This was really about giving a lot better performance with minimal code.

@danmosemsft

This comment has been minimized.

Member

danmosemsft commented Nov 21, 2017

Marked port consider per associated request via MSBuild profiles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment