-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Adds String.Join overload to accept char separator #2945
Conversation
This needs to go through API review |
@jkotas, could you please shed some light on why these two tests are failing for Windows_NT x64 Release:
Are these failures related to these changes? I have tried altering the code but unable to reproduce them on FreeBSD x64 locally. |
These failures do not look related. cscbench is known to fail spuriously https://github.com/dotnet/coreclr/issues/2728 . @RussKeldorph could you please disable it from running in CI to avoid false alarms until the failure is understood? And b151440_static_object_static_object too since it seems to be another flaky test. |
@jkotas The spurious CscBench failures were only on one specific ubuntu test machine, and the failure was in resolving system.runtime. I've already disabled this test off-windows. This failure is on windows and is a timeout. So it looks different than what we were seeing before, |
Looking at the history for CscBench, timeouts (meaning > 10mins) for this test on windows are a recent problem -- however all of the timeouts have occurred while testing #2945, and other PR tests interleaved have finished in the normal ~7sec time. So I think the failure is likely related to the change somehow. @jasonwilliams200OK do you notice this test running significantly longer with your changes? |
@AndyAyersMS, thanks for looking into it. |
If this change is really causing a slowdown of this magnitude the root cause should be easy to spot with some profiling. It might be related to increased use of shared generics, but we should measure and see. Can you reproduce the slowdown locally? |
@AndyAyersMS, I think it was use of generic which was causing those performance tests to fail. // current String.Join with string separator:
char x = 'x';
var arr = new[] { "foo", "bar" };
string result = string.Empty;
for(int i=0;i<1000000;++i)
result = String.Join(x.ToString(), arr); compared to: // new addition of String.Join with char separator:
char x = 'x';
var arr = new[] { "foo", "bar" };
string result = string.Empty;
for(int i=0;i<1000000;++i)
result += String.Join(x, arr); I observed 17-18% performance gain for 1 million iterations and the Windows_NT x64 CI now seems to be happy. :) |
@jasonwilliams200OK I see that in one example you are doing |
@AlexGhiondea, it was indeed += in both samples as I did |
@jkotas, if the API review at https://github.com/dotnet/corefx/issues/5552 is not going to happen anytime soon, would it be ok to merge this without model.xml changes? I have resolved the merge conflicts serveral times to keep this PR green. PR #895 is in a similar situation.. |
@jasonwilliams200OK The framework team is busy with bringing back existing APIs for netstandard2.0 right now. @weshaggard @danmosemsft @karelz When do you expect we will be able to start looking at APIs additions like this one? |
|
||
[System.Security.SecuritySafeCritical] // auto-generated | ||
public void AppendChar( char charToAppend ) { | ||
if( charToAppend == '\0' ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question, why did you add this? \0
is a perfectly valid character in .NET strings, they are length-prefixed as well as null-terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Fixed by a011339.
@jkotas I'm not aware of any hold on API review. @terrajobst ? Assuming API is reviewed, I don't know why we would not take a community submission. |
Because of the netstandard 2.0 work, we do not have a path currently to add new APIs to reference assemblies and to add tests for the new APIs. |
Ah right. @weshaggard ? |
@jkotas, I have resolved the merge conflicts with #6800 merge and previously 4 or 5 more pull requests like that. 😵 My question was in relation with #6304 and friends, where some new signatures were added to CoreCLR, but not yet exposed by CoreFX BCL because System.Runtime is not ready for the change. Can this also enter that same lane, where CoreCLR has the signatures ready, but BCL is awaiting next version of desktop etc.? |
@jasonwilliams200OK Thanks for your work here as I mentioned in #895 we will let these go into the implementation once they are api approved and properly code reviewed. However it doesn't look like we have actually approved this API yet, as https://github.com/dotnet/corefx/issues/5552 isn't marked as api-approved which means we haven't gotten around to reviewing them yet (http://aka.ms/apireview). As @jkotas said we are pretty busy with other work so we haven't had much cycles to do new API addition reviews, so I would prefer we not merge this until the API is reviewed and approved. |
@weshaggard, don't get me wrong; I'm all for the API review and won't mind if it doesn't get accepted. In this particular case, I saw a sibling PR String.Split getting char overloads and saw it as an excellent opportunity to issue a PR for String.Join to complete the set. But for reason, it is decided that String.Join shouldn't have these overloads (or don't really need them), it's totally fine. I have still learned quite much out of this exercise. (again I am not |
@jasonwilliams200OK I completely agree with your sentiment that our API review process is far from well right now. The fact that you refer to 10-15 weeks as quickly makes me cringe inside -- we've clearly failed to set good precedence on expediency 😢 Let me provide a bit more context on what was already said and hinted at above. We're currently still in middle of figuring out the architecture and engineering plan in order to realize the convergence I talked about here. I think we're on good trajectory to finalize the plan soon. This process is about creating a version of .NET Standard that represents a much larger API surface than what .NET Core has today and that allows us to share this between .NET Framework, Xamarin, Unity, Mono, and, of course, .NET Core. This requires adding a bunch of APIs to .NET Core. This convergence requires non-trivial work in our implementation and engineering system for .NET Core. The reason we're not trigger happy right now with adding net-new APIs is twofold:
However, I agree with you that this isn't a good trade-off. I'll talk a bit more with other folks on our side to see if there is way to unblock folks like you that add APIs that are simple to review, are non-controversial, and generally useful. In the end, we didn't create .NET Core to stand still... |
@terrajobst thanks for the insights. I am very happy to learn that APIs pending reviews are not forgotten and even more so about convergence with Unity (hoping to see someday Unity's il2cpp getting replaced by .NET Native)! 😃 |
This adds five additional overload signatures for string.Join to accept char separator. Since the implementation of: ```c# Join(String separator, String[] value, int startIndex, int count) ``` relies on `UnSafeCharBuffer.AppendString(string)` (which is not exposed to user land), I have added a char substitute for same. Also removed an additional check for `separator==null`, which wasn't present in `Join(String separator, IEnumerable<String> values)` either. `StringBuilder.Append` and `UnSafeCharBuffer.AppendString` both handle this case explicitly.
@weshaggard @terrajobst is there anything else we would need before taking this change? |
From the API review process no it has been approved. https://github.com/dotnet/corefx/issues/5552. |
As @weshaggard said, the API has been reviewed and we know have a way to add APIs to .NET Core only. |
@stephentoub can you take a look? |
return StringBuilderCache.GetStringAndRelease(result); | ||
} | ||
|
||
// Joins an object array of strings together as one string with a char separator between each original string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: comment is incorrect; this overload takes an array of objects, not strings
public static String Join(Char separator, params Object[] values) { | ||
if (values == null) | ||
throw new ArgumentNullException("values"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Contract.EndContractBlock wasn't added here but it was added in the overload below. We don't use the Contract stuff anymore, but we should be consistent.
// Joins a string IEnumerable together as one string with a char separator between each original string. | ||
// | ||
[ComVisible(false)] | ||
public static String Join(Char separator, IEnumerable<String> values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this overload? The use case is covered entirely by the Join(char, IEnumerable<T>)
overload, and the only benefit I see to this one is avoiding per-element a branch and a ToString call (which will just return this
). Is the performance difference between, say, string.Join('a', Enumerable.Repeat("b", 10000000))
and string.Join<string>('a', Enumerable.Repeat("b", 10000000))
different enough that it's worth having the extra overload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@terrajobst @weshaggard as you guys reviewed these APIs do you have any second thoughts about this?
int jointLength = 0; | ||
//Figure out the total length of the strings in value | ||
int endIndex = startIndex + count - 1; | ||
for (int stringToJoinIndex = startIndex; stringToJoinIndex <= endIndex; stringToJoinIndex++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not:
int endIndex = startIndex + count;
for (int stringToJoinIndex = startIndex; stringToJoinIndex < endIndex; stringToJoinIndex++) {
?
this person has deleted his/her account on github that is why he is shown as ghost , i don't think he/she can get this pr any further, from github atleast. 😄 |
@AlexGhiondea should be able either to push the changes through on behalf of the deleted user, or close it and mark it again up for grabs if it needs additional substantial work. |
Does this still need API approval? If so, rather than having this PR sit around even longer, we should file an up for grabs issue tracking it. Tag it properly to get it in the api approval pipeline, link to this PR from the new issue and close this PR. Any objections? |
No, from Wes and Immo earlier in the thread: |
Ah, missed that. Thanks! @AlexGhiondea , can you finish this one up for @ghost at this point? |
@Petermarcu I will take care of it. |
Closing this in favor of #7621 |
This adds five additional overload signatures for string.Join to accept
char separator.
I observed 17-18% performance gain for 1 million iterations when joining a simple string array with a char
char x = ';'; while(countdown()) result+=String.Join(x, arr);
vs.char x = ';'; while(countdown()) String.Join(x.ToString(), arr)
.Since the implementation of:
relies on
UnSafeCharBuffer.AppendString(string)
(which is not exposedto user land), I have added a char substitute for same.
Also removed an additional check for
separator==null
, which wasn'tpresent in
Join(String separator, IEnumerable<String> values)
either.StringBuilder.Append
andUnSafeCharBuffer.AppendString
both handlethis case explicitly.
Fixes dotnet/corefx#5552.