-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for splitting on ReadOnlySpan<char> #295
Conversation
Is the input span to split required to be a single item of n length or n items of 1 length? string split has both single item and array of items overloads which are commonly used. So for example if i call this new method with |
@Wraith2 You are correct, it will just treat it as one single value to split on. The reason for that is because when the API was reviewed, the team decided to not include a way to split on multiple chars. So I decided not to include it here. See 'SplitAny()' in #934 (comment) |
@bbartels, you have this marked WIP. Are you still working on it? It hasn't seen progress in over a month. Thanks. |
There is not much I can currently do before someone reviews and/or addresses the points in the Todo above. Maybe I should have made that a little clearer (and possibly omitted WIP as it may not be the appropriate label for this PR). |
I see, ok, thanks. Yes, the way the PR reads right now, it appears you're saying there's nothing for anyone to do until you do more work on it (at least that's how I read it). |
My mistake, could have sworn I made it clear enough, but upon rereading I am unsure myself what conclusion to make! |
Adding area owners. Apologies for the delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
We need to expose the API additions through the System.Memory ref - https://github.com/dotnet/runtime/blob/master/src/libraries/System.Memory/ref/System.Memory.cs.
Tests should be added here: https://github.com/dotnet/runtime/tree/master/src/libraries/System.Memory/tests/ReadOnlySpan.
src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.Split.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.Split.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@layomia Finished merging the implementations, as well as added tests to the specified directory. There are a couple points I need feedback on below.
private readonly ReadOnlySpan<T> _separators; | ||
private readonly T _separator; | ||
private readonly bool _isSequence; | ||
private readonly int _separatorLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be replaced with a cleaner:
private int SeparatorLength => _isSequence ? _separators.Length : 1;
However, according to my benchmarks it will incur a 3% perf penalty on the tested scenario.
What would be preferred in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current way is preferred to avoid the extra calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share your benchmarks source/results for this PR?
src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.Split.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.Split.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.Split.cs
Outdated
Show resolved
Hide resolved
For some reason I cannot comment on 'outdated' review comments, not sure if this is by design. |
Trailing whitespace 😢 |
@danmosemsft Oh my, no clue how they managed to sneak in. Should be fixed now! |
Overall, LGTM.
@bbartels, can you highlight the comments? All my feedback seems to be addressed. |
@layomia I think one of them was #295 (comment), but honestly completely forgot what the other one was. Probably should have noted it down 😅, don't think it was too important thought! EDIT: So it turns out I never pressed Submit Review and my comments weren't published... |
Ah. That's resolved - #295 (comment). |
Okay, then there are no further objections from me :) Thank you for the guidance throughout the review, was a good learning experience! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, @bbartels!
It would be great if you could add your benchmarks to the dotnet/performance repo here. This doc goes over how to write/run benchmarks for new API. Here's an example PR for new API in System.Text.Json - dotnet/performance#1130.
I wonder how we could find places we could potentially use this ourselves in this repo. Perhaps someone out there's interested in looking. |
Finally!!!! |
Nice! If you have some more time, an additional way to widen the impact of this new API is this task - #295 (comment). I imagine this would involve searching this repo for usages of
I meant this - https://github.com/dotnet/performance/blob/2c15891e126a397fed9b3c34120228ae21b1ea1f/docs/benchmarking-workflow-dotnet-runtime.md#benchmarking-new-api 😅 |
Out of curiosity, why is the parameterless This also speaks to @danmosemsft's earlier comment. We need to be careful to account for any behavioral differences between this and It's still a worthwhile exercise though! :) |
@GrabYourPitchforks I faintly remember watching the API review on YouTube and there being arguments against introducing anything that has multiple seperators (which would be needed to match string°Split semantics). Though I also remember that there was discussion about sending this back to API Review once the implementation is done. Maybe that's not such a bad idea, to decide on Split() semantics, as it would require a constructor taking a ReadOnlySpan argument for multiple separators,which was previously postponed for UTF-8 reasons. |
What I understand from watching the API review discussion is that, for simplicity, the Another area where the current implementation differs from |
The parameterless Split should split on anything for which char.IsWhitespace returns true. There's nothing fundamentally wrong with that behavior for string, and any such subtle differences like that from string are going to lead to bugs. |
The discussion about semantics of a |
Why? If you want to split on just ' ', you can do so just as with string by passing that to split. If you want to split on all whitespace, you can do that just as with string by not passing any parameters. |
Oh, my bad, I forgot the |
Moving discussion regarding ROS.Split() here: #37746 |
This reverts commit 78ed8e8.
This implementation complies with the specifications discussed in: #934
I tried to add the tests I wrote (adapted from original StringSplitTests), but I was not sure which directory pertains to CoreLib tests.
My best guess is
src/libraries/System.Runtime/tests/System
, but would like someone to confirm.This implementation uses multiple types to achieve splitting by a single- and multiple chars. This results in almost identical code in the two types, but could be mitigated by merging the two implementations. I am not sure which implementation is preferred in this scenario.
I need help with the following: