-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Improve performance of string.IndexOfAny for 2 & 3 char searches #13219
Conversation
Intention was to also improve |
tag @stephentoub @jkotas |
|
||
while (count > 0) | ||
{ | ||
if (*pCh == value1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This depends on JIT CSEing *pCh
for best performance. The JIT will probably do it, but it is best to write hand-optimized code like this one to be as close as possible to the native code.
Why not:
char c = *pCh;
if (c == value1 || c == value2)
goto ReturnIndex;
like in the 3 argument overload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the same time between the two variations so I just went with the simpler one. 3 char was better with a variable. Will change.
if (anyOf == null) | ||
{ | ||
throw new ArgumentNullException(nameof(anyOf)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not be common enough to warrant it (and thus may hurt the more common cases of 2/3), but I wonder if it's worthwhile adding here:
if (anyOf.Length == 1)
{
return IndexOf(anyOf[0], startIndex, count);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub The only place I could think of that would be using a 1 char array was the Unix path handling code. That looked to be taken care of with the Path internal changes so I didn't bother with it.
But true to performance testing form the 2 char call is actually 10% better if it has another if
before it. Not so if I change that to a switch
though which is interesting. Will run a few more tests on the other paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope sorry it was the 3 char test that was 10% better. 2 char is 4% worse. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope sorry it was the 3 char test that was 10% better. 2 char is 4% worse. Thoughts?
These kind of odd variations are typically caused by things like code alignment or branch prediction that will vary from build to build. One has to run it under Intel profiler to tell what is going on. I typically just look at the JITed code whether it looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worthwhile adding here
I think it is worthwhile to add it, and also Length == 0. Otherwise, the API has unexpected perf cliff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢 but don't optimize for them early? e.g.
if (anyOf.Length == 2)
{
// Very common optimization for directory separators (/, \), quotes (", '), brackets, etc
return IndexOfAny(anyOf[0], anyOf[1], startIndex, count);
}
else if (anyOf.Length == 3)
{
return IndexOfAny(anyOf[0], anyOf[1], anyOf[2], startIndex, count);
}
else if (anyOf.Length > 3)
{
return IndexOfCharArray(anyOf, startIndex, count);
}
else if (anyOf.Length == 1)
{
return IndexOf(anyOf[0], startIndex, count);
}
else // anyOf.Length == 0
{
return -1;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a good compromise. Done.
{ | ||
char c = *pCh; | ||
|
||
if (c == value1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: if (c == value1 || c == value2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@dotnet-bot test Tizen armel Cross Release Build |
@dotnet-bot test Windows_NT x64 corefx_baseline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sory for being late. ;)
if (c == value1 || c == value2) | ||
goto ReturnIndex1; | ||
|
||
pCh += 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why we only unroll 2x here? The other methods in this file, IndexOf
and LastIndexOf
both use 4x. My tests runs also show that switching to 4x cuts the time to 60-70% (actually, 8x is even faster on my machine but let's not go over the top...). Also, I would see value in keeping the two new explicit implementations for 2/3 elements char arrays as close as possible to the existing one for IndexOf()
just with 2/3 if
checks instead of just 1?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the usages of IndexOfAny
looked like they would be on short strings, path segments and the like. IndexOf
use is much more varied and needs to have great performance on all string lengths.
It's a bit of a trade off as the more you unroll, the more time you spend outside that loop in slower code going a char at a time which hurts small string performance.
It's quite possible that with further analysis this function would be better off with 4x unroll but I locked in the win when it achieved its first two objectives. Which was to avoid the relative expensive probability map creation/comparison and to alleviate the need for the hard coded loops that have been popping up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to look at that test code once you've got too much time: https://gist.github.com/dnickless/eb4807d698b7032b625a8d04794e90d8
if (c == value1 || c == value2 || c == value3) | ||
goto ReturnIndex; | ||
|
||
pCh++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would 4x unrolling here not make sense, too? I didn't test this case but I would expect it to have a similar effect like with all other methods in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 char is a lot less common. This was really about giving a lot better performance with minimal code.
Marked port consider per associated request via MSBuild profiles. |
Improves the performance of
string.IndexOfAny
, special casing common 2 and 3 character searches.2 char search
Beats a minimal hard coded loop after 3rd string character.
Small string (22chars) - 67% faster than existing
Large string (366 chars) - 56% faster than existing
3 char search
Beats a minimal hard coded loop after 8th string character.
Small string (22chars) - 49% faster than existing
Large string (366 chars) - 28% faster than existing
InitializeProbabilisticMap
Very common to search for ASCII symbols. The high map always writes the same value to the same location. Only doing that once gives a 13% improvement (isolated, 10 char array).
Issues
https://github.com/dotnet/corefx/issues/22771
aspnet/Mvc#5362
NuGet/NuGet.Client#1590 (comment)
https://github.com/dotnet/coreclr/issues/7017
cc @davkean @benaadams