Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

ghost
Copy link

@ghost ghost commented Jan 31, 2016

This adds five additional overload signatures for string.Join to accept
char separator.
I observed 17-18% performance gain for 1 million iterations when joining a simple string array with a char char x = ';'; while(countdown()) result+=String.Join(x, arr); vs. char x = ';'; while(countdown()) String.Join(x.ToString(), arr).
Since the implementation of:

Join(String separator, String[] value, int startIndex, int count)

relies on UnSafeCharBuffer.AppendString(string) (which is not exposed
to user land), I have added a char substitute for same.

Also removed an additional check for separator==null, which wasn't
present in Join(String separator, IEnumerable<String> values) either.
StringBuilder.Append and UnSafeCharBuffer.AppendString both handle
this case explicitly.

Fixes dotnet/corefx#5552.

@jkotas
Copy link
Member

jkotas commented Feb 1, 2016

This needs to go through API review

@ghost
Copy link
Author

ghost commented Feb 1, 2016

@jkotas, could you please shed some light on why these two tests are failing for Windows_NT x64 Release:

Are these failures related to these changes? I have tried altering the code but unable to reproduce them on FreeBSD x64 locally.

@jkotas
Copy link
Member

jkotas commented Feb 1, 2016

These failures do not look related.

cscbench is known to fail spuriously https://github.com/dotnet/coreclr/issues/2728 . @RussKeldorph could you please disable it from running in CI to avoid false alarms until the failure is understood? And b151440_static_object_static_object too since it seems to be another flaky test.

@AndyAyersMS
Copy link
Member

@jkotas The spurious CscBench failures were only on one specific ubuntu test machine, and the failure was in resolving system.runtime. I've already disabled this test off-windows.

This failure is on windows and is a timeout. So it looks different than what we were seeing before,

@AndyAyersMS
Copy link
Member

Looking at the history for CscBench, timeouts (meaning > 10mins) for this test on windows are a recent problem -- however all of the timeouts have occurred while testing #2945, and other PR tests interleaved have finished in the normal ~7sec time.

So I think the failure is likely related to the change somehow. @jasonwilliams200OK do you notice this test running significantly longer with your changes?

@ghost
Copy link
Author

ghost commented Feb 1, 2016

@AndyAyersMS, thanks for looking into it.
In the proposed API, I suggested having private generic signatures for String.Join and separate overloads for caller Join(String separator ..) and Join(Char separator ..), will keep the implementation clean and follow the DRY principle. But in 7a88d0d, i just tried another approach by repeating all five methods for new Char separator overloads and it seems to be passing those tests. Since I have tested it many times and CI was consistently failing those tests, I am now under the impression that the code gen with static generic method in effort to reuse the implementation is bit inefficient than having the implementations separate and redundant. Does this make sense?

@AndyAyersMS
Copy link
Member

If this change is really causing a slowdown of this magnitude the root cause should be easy to spot with some profiling. It might be related to increased use of shared generics, but we should measure and see.

Can you reproduce the slowdown locally?

@ghost
Copy link
Author

ghost commented Feb 1, 2016

@AndyAyersMS, I think it was use of generic which was causing those performance tests to fail.
As for the reset, given the code:

// current String.Join with string separator:
char x = 'x';
var arr = new[] { "foo", "bar" };
string result = string.Empty;

for(int i=0;i<1000000;++i)
  result = String.Join(x.ToString(), arr);

compared to:

// new addition of String.Join with char separator:
char x = 'x';
var arr = new[] { "foo", "bar" };
string result = string.Empty;

for(int i=0;i<1000000;++i)
  result += String.Join(x, arr);

I observed 17-18% performance gain for 1 million iterations and the Windows_NT x64 CI now seems to be happy. :)

@AlexGhiondea
Copy link

@jasonwilliams200OK I see that in one example you are doing
result = String.Join(x.ToString(), arr)
while in the other you are doing
result += String.Join(x, arr)
Was that just an artifact of putting the sample on GitHub or did the test actually used different code?

@ghost
Copy link
Author

ghost commented Feb 3, 2016

@AlexGhiondea, it was indeed += in both samples as I did Console.WriteLine(result.Length); at the end of both cases. :)

@ghost
Copy link
Author

ghost commented Aug 20, 2016

@jkotas, if the API review at https://github.com/dotnet/corefx/issues/5552 is not going to happen anytime soon, would it be ok to merge this without model.xml changes? I have resolved the merge conflicts serveral times to keep this PR green. PR #895 is in a similar situation..

@jkotas
Copy link
Member

jkotas commented Aug 20, 2016

@jasonwilliams200OK The framework team is busy with bringing back existing APIs for netstandard2.0 right now.

@weshaggard @danmosemsft @karelz When do you expect we will be able to start looking at APIs additions like this one?


[System.Security.SecuritySafeCritical] // auto-generated
public void AppendChar( char charToAppend ) {
if( charToAppend == '\0' ) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, why did you add this? \0 is a perfectly valid character in .NET strings, they are length-prefixed as well as null-terminated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed by a011339.

@danmoseley
Copy link
Member

@jkotas I'm not aware of any hold on API review. @terrajobst ? Assuming API is reviewed, I don't know why we would not take a community submission.

@jkotas
Copy link
Member

jkotas commented Aug 21, 2016

why we would not take a community submission

Because of the netstandard 2.0 work, we do not have a path currently to add new APIs to reference assemblies and to add tests for the new APIs.

@danmoseley
Copy link
Member

Ah right. @weshaggard ?

@ghost
Copy link
Author

ghost commented Aug 22, 2016

@jkotas, I have resolved the merge conflicts with #6800 merge and previously 4 or 5 more pull requests like that. 😵

My question was in relation with #6304 and friends, where some new signatures were added to CoreCLR, but not yet exposed by CoreFX BCL because System.Runtime is not ready for the change. Can this also enter that same lane, where CoreCLR has the signatures ready, but BCL is awaiting next version of desktop etc.?

@jkotas
Copy link
Member

jkotas commented Aug 22, 2016

I am sorry we do not have a path to add new public APIs to core assemblies right now. I hope it will get resolved soon.

I have mentioned in #895 that I would like to see #6304 to go through end-to-end (ie tests added and reference assemblies updated) before merging more of them.

@weshaggard
Copy link
Member

@jasonwilliams200OK Thanks for your work here as I mentioned in #895 we will let these go into the implementation once they are api approved and properly code reviewed. However it doesn't look like we have actually approved this API yet, as https://github.com/dotnet/corefx/issues/5552 isn't marked as api-approved which means we haven't gotten around to reviewing them yet (http://aka.ms/apireview). As @jkotas said we are pretty busy with other work so we haven't had much cycles to do new API addition reviews, so I would prefer we not merge this until the API is reviewed and approved.

@ghost
Copy link
Author

ghost commented Aug 24, 2016

@weshaggard, don't get me wrong; I'm all for the API review and won't mind if it doesn't get accepted. In this particular case, I saw a sibling PR String.Split getting char overloads and saw it as an excellent opportunity to issue a PR for String.Join to complete the set. But for reason, it is decided that String.Join shouldn't have these overloads (or don't really need them), it's totally fine. I have still learned quite much out of this exercise.

(again I am not :trollface:'ing here)
Just that on the flip side, if the APIs are reviewed bit quickly (perhaps within 10-15 weeks period?), that would make the API proposal as a process more reliable IMHO. Some proposals are sitting there for long time with no reply from the authorities. Mine is not alone is what I am saying, for example: https://github.com/dotnet/corefx/issues?q=is%3Aopen+label%3Aapi-ready-for-review+sort%3Acreated-asc (and then there are some un-tagged proposals as well). It would be good if this fact also being taken into consideration.

@terrajobst
Copy link

terrajobst commented Aug 25, 2016

@jasonwilliams200OK

I completely agree with your sentiment that our API review process is far from well right now. The fact that you refer to 10-15 weeks as quickly makes me cringe inside -- we've clearly failed to set good precedence on expediency 😢

Let me provide a bit more context on what was already said and hinted at above. We're currently still in middle of figuring out the architecture and engineering plan in order to realize the convergence I talked about here. I think we're on good trajectory to finalize the plan soon.

This process is about creating a version of .NET Standard that represents a much larger API surface than what .NET Core has today and that allows us to share this between .NET Framework, Xamarin, Unity, Mono, and, of course, .NET Core. This requires adding a bunch of APIs to .NET Core. This convergence requires non-trivial work in our implementation and engineering system for .NET Core.

The reason we're not trigger happy right now with adding net-new APIs is twofold:

  1. They would be .NET Core only. While there is nothing wrong with APIs appearing on .NET Core first (in fact, that's the model we want to simplify innovation) our engineering system for .NET Core is currently building both, .NET Standard and .NET Core. We need to separate them first in order to enable this scenario.
  2. The people that need to review these APIs are the same people that work on the architecture and engineering system that allows the convergence to happen. Since this all non-trivial work, it's fair to say we're under water enough that we simply don't have the cycles to work as much on new APIs as is necessary to have a smooth review experience. It's not that we don't want to do it; it's simply that the other work is so much more impactful and thus important at this given moment.

However, I agree with you that this isn't a good trade-off. I'll talk a bit more with other folks on our side to see if there is way to unblock folks like you that add APIs that are simple to review, are non-controversial, and generally useful. In the end, we didn't create .NET Core to stand still...

@ghost
Copy link
Author

ghost commented Aug 25, 2016

@terrajobst thanks for the insights. I am very happy to learn that APIs pending reviews are not forgotten and even more so about convergence with Unity (hoping to see someday Unity's il2cpp getting replaced by .NET Native)! 😃

This adds five additional overload signatures for string.Join to accept
char separator.
Since the implementation of:
```c#
Join(String separator, String[] value, int startIndex, int count)
```
relies on `UnSafeCharBuffer.AppendString(string)` (which is not exposed
to user land), I have added a char substitute for same.

Also removed an additional check for `separator==null`, which wasn't
present in `Join(String separator, IEnumerable<String> values)` either.
`StringBuilder.Append` and `UnSafeCharBuffer.AppendString` both handle
this case explicitly.
@AlexGhiondea
Copy link

@weshaggard @terrajobst is there anything else we would need before taking this change?

@weshaggard
Copy link
Member

From the API review process no it has been approved. https://github.com/dotnet/corefx/issues/5552.

@terrajobst
Copy link

As @weshaggard said, the API has been reviewed and we know have a way to add APIs to .NET Core only.

@AlexGhiondea
Copy link

@stephentoub can you take a look?

return StringBuilderCache.GetStringAndRelease(result);
}

// Joins an object array of strings together as one string with a char separator between each original string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: comment is incorrect; this overload takes an array of objects, not strings

public static String Join(Char separator, params Object[] values) {
if (values == null)
throw new ArgumentNullException("values");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Contract.EndContractBlock wasn't added here but it was added in the overload below. We don't use the Contract stuff anymore, but we should be consistent.

// Joins a string IEnumerable together as one string with a char separator between each original string.
//
[ComVisible(false)]
public static String Join(Char separator, IEnumerable<String> values) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this overload? The use case is covered entirely by the Join(char, IEnumerable<T>) overload, and the only benefit I see to this one is avoiding per-element a branch and a ToString call (which will just return this). Is the performance difference between, say, string.Join('a', Enumerable.Repeat("b", 10000000)) and string.Join<string>('a', Enumerable.Repeat("b", 10000000)) different enough that it's worth having the extra overload?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terrajobst @weshaggard as you guys reviewed these APIs do you have any second thoughts about this?

int jointLength = 0;
//Figure out the total length of the strings in value
int endIndex = startIndex + count - 1;
for (int stringToJoinIndex = startIndex; stringToJoinIndex <= endIndex; stringToJoinIndex++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not:

int endIndex = startIndex + count;
for (int stringToJoinIndex = startIndex; stringToJoinIndex < endIndex; stringToJoinIndex++) {

?

@shahid-pk
Copy link

shahid-pk commented Oct 13, 2016

this person has deleted his/her account on github that is why he is shown as ghost , i don't think he/she can get this pr any further, from github atleast. 😄

@karelz
Copy link
Member

karelz commented Oct 13, 2016

@AlexGhiondea should be able either to push the changes through on behalf of the deleted user, or close it and mark it again up for grabs if it needs additional substantial work.
@shahid-pk if you want to help here on behalf of @ghost, we will be more than happy :)

@Petermarcu
Copy link
Member

Does this still need API approval? If so, rather than having this PR sit around even longer, we should file an up for grabs issue tracking it. Tag it properly to get it in the api approval pipeline, link to this PR from the new issue and close this PR.

Any objections?

@stephentoub
Copy link
Member

Does this still need API approval?

No, from Wes and Immo earlier in the thread:
"From the API review process no it has been approved. dotnet/corefx#5552."
"As @weshaggard said, the API has been reviewed and we know have a way to add APIs to .NET Core only."

@Petermarcu
Copy link
Member

Ah, missed that. Thanks! @AlexGhiondea , can you finish this one up for @ghost at this point?

@AlexGhiondea
Copy link

@Petermarcu I will take care of it.

@AlexGhiondea AlexGhiondea mentioned this pull request Oct 14, 2016
@AlexGhiondea
Copy link

Closing this in favor of #7621

@karelz karelz added enhancement Product code improvement that does NOT require public API changes/additions and removed api addition labels Mar 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement Product code improvement that does NOT require public API changes/additions

Projects

None yet

Development

Successfully merging this pull request may close these issues.