New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String-like extension methods to ReadOnlySpan<char> Epic #22434
Comments
I'm particularly interested in |
Some of these extensions may also be useful for types other than char: public static bool Equals<T>(this ReadOnlySpan<T> source, ReadOnlySpan<T> value)
where T : struct, IEquatable<T>; // Non-allocating Enumerable.SequenceEquals
public static bool StartsWith(this ReadOnlySpan<T> source, ReadOnlySpan<T> value)
where T : struct, IEquatable<T>;
// and overloads with IEqualityComparer<T> instead of the type constraint. |
We have IndexOf already. Does that your scenario? https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L93
We have StartsWith already. Also, is the semantics of Equals the same as SequenceEqual? If so, that exists as well. |
What about adding these as well? public static bool EndsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool IsWhiteSpace(this ReadOnlySpan<char> value);
public static ReadOnlySpan<char> Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue); // this will need to allocate if the length of the chars being removed is not equal to the length of the chars being added
public static ReadOnlySpan<char> ToUpper(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToLower(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span); |
Updated proposal here: https://github.com/dotnet/corefx/issues/21395#issuecomment-357410425 Proposed API Additionspublic static class MemoryExtensions
{
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
public static bool Equals(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static int Compare(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
// this will need to allocate if any chars are removed from the middle
public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count);
public static bool StartsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool EndsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool IsWhiteSpace(this ReadOnlySpan<char> value);
// this will need to allocate if the length of the chars being removed is not equal to the length of the chars being added
public static ReadOnlySpan<char> Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue);
public static ReadOnlySpan<char> ToUpper(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToLower(this ReadOnlySpan<char> span);
} |
Are these APIs going to be present in both the OOB and inbox version of System.Memory? What is the OOB implementation going to look like? The ToUpper and ToLower APIs are also allocating. Should we rather change the signatures to be non-allocating? |
I was thinking, similar to APIs like StartsWith, SequenceEqual, etc. (https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/MemoryExtensions.cs#L281), that there will only be an OOB version. What would be the value of having inbox versions as well?
Yes. I will work on the API signatures (adding destination span that is passed in as well), to try to get them to be non-allocating. |
The problem with ToLower, ToUpper and other similar globalization APIs is that it is impossible to implement them efficiently in the OOB version. The implementation would always have to allocate the strings from the Span first. It make them kind of pointless to have in the OOB System.Memory. Have you though about having these APIs in System.Runtime or similar contract instead? |
Wouldn't work for string (as immuatable); but for regular public static void ToUpper(this Span<char> span);
public static void ToLower(this Span<char> span); |
@benaadams |
any thoughts on supporting running Regex on ReadOnlySpan? |
I think APIs that don't specify otherwise (including the ToUpper/Lower) should be culture invariant. If We want to provide culture aware APIs, they should take an explicit CultureInfo argument. |
@rafael-aero There's a separate issue for that: https://github.com/dotnet/corefx/issues/24145. |
FWIW, @Tragetaschen
This is not correct anymore, there is a ẞ in Unicode for years now, U+1E9E. It's part of german Orthography since July 2017. https://en.wikipedia.org/wiki/Capital_ẞ |
It would be great if we could also |
@SomeAnon42 Special-casing |
@Tragetaschen @benaadams CoreCLR already operates under the assumption that the case change produces the same length string. https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Globalization/TextInfo.Windows.cs#L26-L76 |
Updated proposal here: https://github.com/dotnet/corefx/issues/21395#issuecomment-357410425 Added:
Updated Proposed API Additionspublic static class MemoryExtensions
{
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static bool Equals(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool Contains(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static int Compare(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
// this will need to allocate if any chars are removed from the middle
public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count);
public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count);
// non-allocating alternative:
public static bool Remove(this ReadOnlySpan<char> source, int startIndex, Span<char> result);
public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> result);
public static bool StartsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool EndsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool IsWhiteSpace(this ReadOnlySpan<char> value);
// this will need to allocate if the length of the chars being removed is not equal to the length of the chars being added
public static ReadOnlySpan<char> Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue);
// non-allocating alternative:
public static bool Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue, Span<char> result);
public static ReadOnlySpan<char> ToUpper(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToUpperInvariant(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToLower(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToLowerInvariant(this ReadOnlySpan<char> span);
// these will allocate
public static ReadOnlySpan<char> PadLeft(this ReadOnlySpan<char> span, int totalWidth);
public static ReadOnlySpan<char> PadLeft(this ReadOnlySpan<char> span, int paddingChar);
public static ReadOnlySpan<char> PadRight(this ReadOnlySpan<char> span, int totalWidth);
public static ReadOnlySpan<char> PadRight(this ReadOnlySpan<char> span, int paddingChar);
// non-allocating alternative
public static bool PadLeft(this ReadOnlySpan<char> span, int totalWidth, Span<char> result);
public static bool PadLeft(this ReadOnlySpan<char> span, int paddingChar, Span<char> result);
public static bool PadRight(this ReadOnlySpan<char> span, int totalWidth, Span<char> result);
public static bool PadRight(this ReadOnlySpan<char> span, int paddingChar, Span<char> result);
} |
Are the signatures of Replace methods right? They seem to be missing an argument. Also, why is the allocating variant of Replace returning ReadOnlySpan? Shouldn't it rather return writeable span? |
We don't think this is ready yet. @ahsonkhan, please redesign the APIs to avoid allocations by returning new spans. Instead, they should follow our |
Update:
Most of the APIs that are designed to avoid allocations do not need to return bytesWritten since that will be known to the user before the call. For example, we know that Remove will write exactly Proposed APIpublic static class MemoryExtensions
{
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span) { throw null; }
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar) { throw null; }
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars) { throw null; }
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span) { throw null; }
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar) { throw null; }
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars) { throw null; }
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span) { throw null; }
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar) { throw null; }
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars) { throw null; }
public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
public static int Compare(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
public static bool IsWhiteSpace(this ReadOnlySpan<char> span) { throw null; }
// APIs designed to avoid allocations:
// do we need bytesWritten? It should be source.Length - count on success
public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination) { throw null; }
public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten) { throw null; }
// do we need bytesWritten? It should be source.Length on success
public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }
public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }
public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }
public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }
// do we need bytesWritten? It should be totalWidth on success or source.Length if totalWidth < source.Length
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination) { throw null; }
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, char paddingChar, Span<char> destination) { throw null; }
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination) { throw null; }
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, char paddingChar, Span<char> destination) { throw null; }
} Sample UsageApproach 1 to use PadLeft: ReadOnlySpan<char> span; // input
int totalWidth = 50; // known value, assume >= span.Length
Span<char> destination = new char[totalWidth]; //if set to less than totalWidth, PadLeft will return false
if (span.PadLeft(totalWidth, destination))
{
// destination has some pad characters (if there is extra space) followed by the entire input
// input is at the end of the destination buffer
}
else
{
// resize destination to be >= totalWidth
var result = span.PadLeft(totalWidth, destination);
Debug.Assert(result);
}
// only if destination.Length > totalWidth, it will be necessary to slice to get the precise segment.
destination = destination.Slice(0, totalWidth); Approach 2 to use PadLeft: ReadOnlySpan<char> span; // input
int totalWidth = x; // unknown value, x >= 0, could be less than span.Length
Span<char> destination = new char[y] // y could be less than x
if (source.Length > totalWidth) return; // user can choose to do nothing, or call PadLeft which will just do a copy
if (destination.Length < totalWidth) destination = new char[totalWidth];
span.PadLeft(totalWidth, destination); // no need to slice, the destination is of the right size. Sample usage for Replace: string testStr = "abcdefghibclmbcp";
ReadOnlySpan<char> source = testStr.AsReadOnlySpan();
Span<char> destination = new char[source.Length];
// oldValue.Length >= newValue.Length
source.Replace("a".AsReadOnlySpan(), "z".AsReadOnlySpan(), destination, out int bytesWritten)
// not necessary here
destination = destination.Slice(0, bytesWritten);
// oldValue.Length < newValue.Length
while (!source.Replace("z".AsReadOnlySpan(), "ab".AsReadOnlySpan(), destination, out int bytesWritten))
{
// enlarge destination
// maximum size required: string.Length + (string.Length/oldValue.Length * (newValue.Length - oldValue.Length))
// maximum size required: 16 + (16/1 * (2 - 1)) = 16 + 16 = 32
// growing the destination to 32 will gaurantee success in this case
}
destination = destination.Slice(0, bytesWritten); Open Questions:
|
// Approved, but needs to go somewhere else due to globalization
public static class MemoryExtensions
{
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static int Compare(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool IsWhiteSpace(this ReadOnlySpan<char> span);
}
// Needs more work
public static class MemoryExtensions
{
// APIs designed to avoid allocations:
public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);
public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten);
// do we need bytesWritten? It should be source.Length on success
public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination);
public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination);
public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
}
// We probably don't want these
public static class MemoryExtensions
{
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, char paddingChar, Span<char> destination);
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, char paddingChar, Span<char> destination);
} |
Time for a NuGet span left-pad? Will become the most popular library and the world will come to depend on it - mwhahahaha! |
What about the Ordinal versions? public static ReadOnlySpan<T> Trim(this ReadOnlySpan<T> span, T trimElement);
public static ReadOnlySpan<T> Trim(this ReadOnlySpan<T> span, ReadOnlySpan<T> trimElements);
// etc |
I understand that we need to consider StringComparison overload for Replace, but what additional work/design decisions are there for the Remove API? It doesn't have any concerns regarding comparison/globalization/etc. |
// All these do ordinal comparisons, and hence do not rely on StringComparison, and can live in System.Memory.dll
public static class MemoryExtensions
{
// If we decide to add overloads to Trim in the future that are non-ordinal and take StringComparison
// (similar to https://github.com/dotnet/corefx/issues/1244), they will be .NET Core specific.
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static bool IsWhiteSpace(this ReadOnlySpan<char> span);
public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);
// Does exactly what string does today, i.e. ordinal, case-sensitive, culture-insensitive comparison
public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten);
// To me, these are complementary to the Trim APIs and hence we should add them.
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
}
// Live in CoreLib and only available on .NET Core, exposed from System.Memory.dll
// Atm, this class in corelib is called 'Span' and contains the .NET Core specific implementation of the extension methods
public static class MemoryExtensions
{
public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static int Compare(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, StringComparison comparison, Span<char> destination, out int bytesWritten);
public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture = CultureInfo.CurrentCulture);
public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture = CultureInfo.CurrentCulture);
public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
} |
Can I throw in another suggestion? I'd really like to see some ability to split a // equivalent to the overloads of 'String.Split()'
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, StringSplitOptions options); The reason for choosing |
@Joe4evr, out of curiosity, do you have a scenario atm where these APIs would be useful? If so, can you please show the code sample? I would replace the char[] overloads with ReadOnlySpan<char>. However, I am not sure about adding the split APIs, in general, given they have to allocate. Is there a way to avoid allocating? Also, given these are string-like APIs for span, it is strange to have an overload that takes a string[]. Maybe all these would fit better on ReadOnlyMemory instead, especially given the return type. |
My scenario is taking a relatively big string of user input and then parsing that to populate a more complex object. So rather than take the whole string at once, I'd like to parse it in pieces at a time. It'd be pretty nice if this can be facilitated by the Admittedly, I only started on this particular case earlier today, mostly to experiment and find out how much I could get out of the Span APIs at this time. Maybe it was a bit naive of me to expect a collection like I did, but I'd at least like to see some API to deal with this scenario a little easier, because I'll probably not be the only one looking to split a span up into smaller chunks like this. |
Splitting support would be good, but I don't think it would look like the proposed methods; as @ahsonkhan points out, that would result in a lot of allocation (including needing to copy the whole input string to the heap, since you can't store the span into a returned interface implementation). I would instead expect a design more like an iterator implemented as a ref struct, e.g. public ref struct CharSpanSplitter
{
public CharSpanSplitter(ReadOnlySpan<char> value, char separator, StringSplitOptions options);
public bool TryMoveNext(out ReadOnlySpan<char> result);
} |
@Joe4evr, would it be fine for me to split the conversation related to the split API into a new issue, especially since it would require additional design? Edit: Created https://github.com/dotnet/corefx/issues/26528 |
|
@stephentoub Fair enough. Glad I at least got the ball rolling on the matter. |
Added the proper overload without optional arguments (updated the first post).
How so? StringComparison is just an enum with culture specific options and intersects with some of CompareOptions.
Edit: I think I get what you mean. We may need to expose the compare APIs on CompareInfo that take Span, publicly. This is used by Equals/Compare (and can probably remain internal): We would need to add additional span-based overloads for IsPrefix/IsSuffix/IndexOf. |
It would be odd for the overloads to remain internal. We have made the Span overloads public in all other similar cases - Stream, BinaryReader/Writer, ... . |
@stephentoub @Joe4evr Why stop with a "string splitter"? I'd love to see something like Google Guava's |
public static class MemoryExtensions
{
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static bool IsWhiteSpace(this ReadOnlySpan<char> span);
public static void Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);
public static void PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static void PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
public static void PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static void PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
// Those need access to globalization APIs. We'll also expose them from
// the .NET Framework OOB (slow span). They will try to extract the string
// from the underlying span (because slow span stores it) -- or -- allocate
// a new string. This avoids bifurcating the API surface.
public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static int CompareTo(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static void ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static void ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static void ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static void ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
} |
one last note regarding public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison); I want to have this API out the matched length of the searched characters as it is not always equal to the length of the value span. Example: imagine the source span have the characters so I suggest the API will be public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison, out int matchedLength); The benefit having the outmatched length is:
|
Given that the proposed API can result in correctness issues, I would agree that we need the Change: public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value,
StringComparison comparison); To: public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value,
StringComparison comparison, out int matchedLength); |
Then maybe we want a Ordinal overload without this out parameter?
Could we add the example to docs (remarks) of the method? |
I think matchedLength would be useful even in the ordinal case too. if you write some code to do a repeated search inside the span would just work for any case (either ordinal or non-ordinal). if we have the ordinal overload, the user has to use Span.Length to perform the subsequent operation. here is some code sample for IndexOf // string has A character followed by the COMBINING GRAVE ACCENT ̀ which is equivalent to À (0x00C0)
string source = "A\u0300 Some Other text A\u0300";
string value = "\u00C0"; // À
ReadOnlySpan<char> sourceSpan = source.AsReadOnlySpan();
ReadOnlySpan<char> valueSpan = value.AsReadOnlySpan();
while (true)
{
int index = sourceSpan.IndexOf(valueSpan, StringComparison.InvariantCulture, out int matachedLength);
if (index < 0 || index + matachedLength >= sourceSpan.Length)
{
break; // done searching
}
sourceSpan = sourceSpan.Slice(index + matachedLength); // Slice for next search
} |
We already have that, essentially, with IndexOf<T> public static int IndexOf<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value) where T : IEquatable<T> |
Ah, makes sense. |
|
The APIs where |
Based on the discussion here, we want to make an adjustment to the following APIs to return an int (number of characters written to the destination). Also, instead of throwing when the destination is too small, we would return -1 isntead. Changing return type from void to int. public static int ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static int ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static int ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static int ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination); |
Thanks! Nice addition that nicely rounds up what can be done with Spans. |
As we look to support more efficient parsing/formatting/usage of
ReadOnlySpan<char>
as slices of System.Strings, we're planning to add a variety of APIs across corefx that operate withReadOnlySpan<char>
(https://github.com/dotnet/corefx/issues/21281). But for such APIs to be truly useful, and forReadOnlySpan<char>
to be generally helpful as a string-like type, we need a set of extension methods onReadOnlySpan<char>
that mirror the corresponding operations on string, e.g. Equals with various kinds of string comparisons, Trim, Replace, etc. We need to define, implement, test, and ship a core set of these (more can of course be added in the future), e.g.Edit by @ahsonkhan - Updated APIs:
Original Proposal
The text was updated successfully, but these errors were encountered: