-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Avoid defensive copy in String.Concat(string[]) #4559
Avoid defensive copy in String.Concat(string[]) #4559
Conversation
Today the String.Concat(params string[]) implementation makes a defensive copy of the input string[] in order to avoid issues where another thread mutates the array concurrently with the concatenation. Such a situation is possible but exceedingly rare, so we can redo the implementation to make it cheaper when there isn't concurrent mutation and more expensive when there is, rather than always having it be more expensive.
This commit changes string.Concat to optimistically assume there won't be such concurrent mutation. It avoids allocating the defensive string[] array copy, and instead just proceeds to allocate a string of the right length and copy the input strings into it. If along the way it discovers that something has changed such that the string lengths no longer add up to exactly what was precomputed, then it falls back to the original implementation.
Example microbenchmark:
```C#
using System;
using System.Diagnostics;
public class Program
{
public static void Main()
{
string result;
string[] inputs = new[] { "abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx", "yz" };
var sw = new Stopwatch();
while (true)
{
int gen0 = GC.CollectionCount(0);
sw.Restart();
for (int i = 0; i < 20000000; i++) result = string.Concat(inputs);
sw.Stop();
Console.WriteLine($"{GC.CollectionCount(0) - gen0}: {sw.Elapsed.TotalSeconds}");
}
}
}
```
Before the change:
```
> corerun test.exe
1525: 2.0733829
1526: 2.0599474
1526: 2.0717786
1526: 2.0318797
^C
```
After the change:
```
763: 1.4700695
762: 1.446919
763: 1.4581139
763: 1.4568889
^C
```
src/mscorlib/src/System/String.cs
Outdated
| // fall back should be extremely rare. | ||
| return copiedLength == totalLength ? | ||
| result : | ||
| ConcatWithDefensiveCopy(values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not need to duplicate the implementation for defensive copy - it should be fine sufficient to just call the method recursively as Concat((string[])values.Clone()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean Concat:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will change.
|
Do we have tests in corefx to exercise the rare path? |
Not currently checked in, as the special case didn't previously exist. But I have a little mini-stress outerloop test locally that typically ends up hitting both of the special paths (too long in the middle, incorrect length at the end), and I'll put up a PR for it tomorrow once this change is checked in and consumable in the packages. I don't want to put it up yet as it also includes a few more tests that don't currently pass, e.g. checking for the 0-length optimization that exists in the other string.Concat overloads, doesn't currently exist in the string.Concat(string[]) overload, but now with this change does exist in that one. EDIT: I put up the PR at dotnet/corefx#8052, but it can't be merged until corefx consumes this change. |
|
Updated. |
|
|
||
| return ConcatArray(internalValues, totalLength); | ||
|
|
||
| // If it's too long, fail, or if it's empty, return an empty string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConcatArray is called from a single place now - can fold it into its caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
LGTM otherwise |
Use a recursive call to Concat rather than using a separate ConcatWithDefensiveCopy method.
424a7c6 to
0b088d2
Compare
|
Nice. Do we have a similar situation with other |
Not as-is, in that they use StringBuilderCache and Append directly to the received StringBuilder. The StringBuilderCache approach is a good one when it's expected that the returned string will fit within the limits of the StringBuilder cache, and it's a good approach for string.Format where there's a lot more complexity involved. But... it might be interesting to look at string.Join specifically to see whether changing to use an approach like the one now employed by Concat would help with performance. At the moment I'm not seeing significant perf downsides to doing so (other than the super rare case of concurrent manipulation), but there may be something I'm missing, and I expect the implementation would end up being slightly more complex, though not terribly. For the cases where the string ends up being longer than what fits in a StringBuilder from the cache, I'd expect this approach would be more efficient. |
|
LGTM as well. Thanks, Steve. |
|
Test Windows_NT x64 Debug Build and Test please |
|
Not sure why the Windows_NT x86 Debug Legacy Backend Build and Test leg is still showing as yellow; clicking on the Details link shows that it completed successfully. |
Avoid defensive copy in String.Concat(string[])
* Avoid defensive copy in String.Concat(string[])
Today the String.Concat(params string[]) implementation makes a defensive copy of the input string[] in order to avoid issues where another thread mutates the array concurrently with the concatenation. Such a situation is possible but exceedingly rare, so we can redo the implementation to make it cheaper when there isn't concurrent mutation and more expensive when there is, rather than always having it be more expensive.
This commit changes string.Concat to optimistically assume there won't be such concurrent mutation. It avoids allocating the defensive string[] array copy, and instead just proceeds to allocate a string of the right length and copy the input strings into it. If along the way it discovers that something has changed such that the string lengths no longer add up to exactly what was precomputed, then it falls back to the original implementation.
Example microbenchmark:
```C#
using System;
using System.Diagnostics;
public class Program
{
public static void Main()
{
string result;
string[] inputs = new[] { "abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx", "yz" };
var sw = new Stopwatch();
while (true)
{
int gen0 = GC.CollectionCount(0);
sw.Restart();
for (int i = 0; i < 20000000; i++) result = string.Concat(inputs);
sw.Stop();
Console.WriteLine($"{GC.CollectionCount(0) - gen0}: {sw.Elapsed.TotalSeconds}");
}
}
}
```
Before the change:
```
> corerun test.exe
1525: 2.0733829
1526: 2.0599474
1526: 2.0717786
1526: 2.0318797
^C
```
After the change:
```
> corerun test.exe
763: 1.4700695
762: 1.446919
763: 1.4581139
763: 1.4568889
^C
```
Commit migrated from dotnet/coreclr@de32aed
Today the String.Concat(params string[]) implementation makes a defensive copy of the input string[] in order to avoid issues where another thread mutates the array concurrently with the concatenation. Such a situation is possible but exceedingly rare, so we can redo the implementation to make it cheaper when there isn't concurrent mutation and more expensive when there is, rather than always having it be more expensive.
This commit changes string.Concat to optimistically assume there won't be such concurrent mutation. It avoids allocating the defensive string[] array copy, and instead just proceeds to allocate a string of the right length and copy the input strings into it. If along the way it discovers that something has changed such that the string lengths no longer add up to exactly what was precomputed, then it falls back to the original implementation.
Example microbenchmark:
Before the change:
After the change:
cc: @jkotas, @ellismg