-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Vectorize String.IndexOf(char) and String.LastIndexOf(char) #16392
Conversation
It would be good to get additional perf numbers on hardware without VEX support and potentially on hardware where an unaligned read/write is not as fast as an aligned read/write. |
I would be useful to get some data about the distribution of inputs for IndexOf (e.g. |
|
||
while (count >= 4) | ||
int nLength = count; | ||
bool useVectorization = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good reason why this is using this bool variable? https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/SpanHelpers.byte.cs#L96 does not use it. The bool variable does not look like an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning was so it wouldn't have to do the index calculation and checks when we aren't using the vectorized approach (ex. when the string/count is too short).
https://github.com/dotnet/coreclr/pull/16392/files#diff-f7e6389a6519754f04d20215ff42efcdR126
The SpanHelpers approach uses a ref byte searchSpace
and keeps track of the index. Here I kept the same char*
pointer approach. So to get the index, I either need to calculate it, or also keep track of the index as we go.
Doing this saved ~1ns in the IndexOfShort_Miss
case on my machine.
I can remove it, if you don't think it is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two long-lived variables: pStartCh and useVectorization. They will burn two registers or two stack spill slots. I think it maybe be more efficient if you replace them by just one pEndCh
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need both pCh
and index
?
We will soon have the span extension methods, like IndexOf (and LastIndexOf) in coreclr (https://github.com/dotnet/corefx/issues/25182). Does it make sense to keep the vectorization implementation in one place only and have the String APIs call the Span ones or is there a significant cost here to merit duplicating the implementation? Granted, the span indexof only special-cases T=byte. cc @atsushikan |
@@ -79,28 +82,77 @@ public unsafe int IndexOf(char value, int startIndex, int count) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: reduce the comparisons in the bounds checks:
if (startIndex < 0 || startIndex > Length)
=> if ((uint)startIndex > (uint)Length)
if (count < 0 || count > Length - startIndex)
=> if ((uint)count > (uint)(Length - startIndex))
@tannergooding - I don't think I have access to the hardware you are asking for. I updated the OP with 2 more machines (the worst machines in my house). If you had machines I could remote into, please let me know. |
You can just set |
Check @ViktorHofer is not on the Q6600 box before you use it... we need to buy another $20 box.. |
cc @ViktorHofer |
So I've spent some time data gathering.
OrchardCore (had to kill it after 200k results)
dotnet new console
This leads me to believe this change is valuable, since:
I tried getting E2E perf numbers of msbuild with this change, but the results are so inconsistent (probably due to I/O), that it is impossible to get hard numbers. |
The one main place that might cause issues is the |
@jkotas - thoughts on whether this approach should be pursued or not? |
This is a performance improvement. Performance improvements should be pursued strictly based on data. If the data say that this is a improvement on average (which it sounds like that they do), then it is worth pursuing. |
Clean up IndexOf vectorization.
This PR should be ready for real review now. I've removed |
Any feedback on this? I'd like to get this in today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits (ignore if there is no other changes, since CI is green already). Otherwise, LGTM.
{ | ||
unchecked | ||
{ | ||
const int elementsPerByte = sizeof(ushort) / sizeof(byte); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: PascalCasing for constants
{ | ||
count = (int)((pEndCh - pCh) & ~(Vector<ushort>.Count - 1)); | ||
// Get comparison Vector | ||
Vector<ushort> vComparison = new Vector<ushort>(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use var
private static int LocateLastFoundChar(ulong match) | ||
{ | ||
// Find the most significant char that has its highest bit set | ||
int index = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does index start at 3 here? Can you add a comment please?
using System.Runtime.InteropServices; | ||
using Internal.Runtime.CompilerServices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add space between System.*
and Internal.*
using directives
@dotnet-bot test Windows_NT arm Cross Checked corefx_baseline Build and Test |
@eerhardt, the CI legs were interrupted. Can you run the string tests locally (for instance https://github.com/dotnet/corefx/blob/master/src/System.Runtime/tests/System/StringTests.cs#L1314)?
As long as they pass, I think we should merge this change in. The CI did manage to run some:
|
I've run the System.Runtime tests locally and they all passed. I wanted to beef those tests up a little bit in light of this new change -
But that needs to go into corefx in a separate PR. |
LGTM. @tarekgh, any comments before we merge this in? |
I'll take a look |
public unsafe int IndexOf(char value, int startIndex, int count) | ||
{ | ||
if (startIndex < 0 || startIndex > Length) | ||
if ((uint)startIndex > (uint)Length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f ((uint)startIndex > (uint)Length) [](start = 13, length = 35)
wouldn't this can generate a problem or a different exception if passing startIndex as negative value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope.
If start was negative, casting it to uint would make it larger than Int32.MaxValue and would automatically be larger than segment.Count. So the check is already there (just optimized).
We have such checks everywhere.
See #16658 (comment)
And #16392 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if someone calling this API inside unchecked block would result in undesired results I guess:
for example
Console.WriteLine(unchecked((int)-4294967294));
is printing
2
In reply to: 171733137 [](ancestors = 171733137)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not true if using unchecked blocks.
for example
Console.WriteLine(unchecked((int)-4294967294));
is printing
2
In reply to: 171733480 [](ancestors = 171733480)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind, the value would be just 2. I think this will work I guess
In reply to: 171734390 [](ancestors = 171734390,171733480)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind, the value would be just 2. I think this will work I guess.
could you add some comment mentioning the limits and how this is safe?
In reply to: 171733980 [](ancestors = 171733980,171733137)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add some comment mentioning the limits and how this is safe?
Are you suggesting adding a comment everywhere we do this? We have several Span/Memory APIs that all do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least adding the comment on the newly introduced code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are hundreds of places in coreclr/corefx that use this idiom. IMHO, I do not think that the comment is needed.
In general LGTM. The only worry now is the code complexity became high but I hope we'll not touch this code again for any reason. |
This is cool but it confuses me a little bit: Did you make Is this an intended change, did you announce it? Am I missing something?
Famous last words 😆 |
Can you point to the docs that you are referring to? |
@eerhardt You are right, my bad! There's a big difference when the needle is a I checked some docs to verify my intuition before posting here and you have to give them a very careful read. For instance on Best practices for using string on .NET:
But then if you continue reading, the "default interpretation" gets split:
So all good, sorry! |
Vectorize String.IndexOf(char) using the same algorithm as SpanHelpers.IndexOf(byte).
I also plan on doing the same for String.LastIndexOf, which is why I marked this as WIP. I wanted to get early feedback on the approach to ensure continuing this work is worth the effort.
Perf results
I ran the following tests Test Code
Machine 1 (windows desktop) (click to expand)
Machine 1 (windows desktop):
Without changes
With changes
Machine 2 (MacBook Air ~2012) (click to expand)
### Machine 2 (MacBook Air ~2012):Without changes
With changes
Machine 3 (Lenvo Carbon X1) (click to expand)
Machine 3 (Lenvo Carbon X1):
Without changes
With changes
Machine 4 (Tanner's Q6600): (click to expand)
Machine 4 (Tanner's Q6600):
Without changes
With changes
As you can see, for short strings, there is a little degradation, especially when the match is towards the beginning of the string. But for longer strings, where the match is towards the end or doesn't match at all, the gains are substantial.
/cc @benaadams