Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String-like extension methods to ReadOnlySpan<char> Epic #22434

Closed
stephentoub opened this issue Jun 23, 2017 · 48 comments
Closed

String-like extension methods to ReadOnlySpan<char> Epic #22434

stephentoub opened this issue Jun 23, 2017 · 48 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Milestone

Comments

@stephentoub
Copy link
Member

As we look to support more efficient parsing/formatting/usage of ReadOnlySpan<char> as slices of System.Strings, we're planning to add a variety of APIs across corefx that operate with ReadOnlySpan<char> (https://github.com/dotnet/corefx/issues/21281). But for such APIs to be truly useful, and for ReadOnlySpan<char> to be generally helpful as a string-like type, we need a set of extension methods on ReadOnlySpan<char> that mirror the corresponding operations on string, e.g. Equals with various kinds of string comparisons, Trim, Replace, etc. We need to define, implement, test, and ship a core set of these (more can of course be added in the future), e.g.

Edit by @ahsonkhan - Updated APIs:

// All these do ordinal comparisons, and hence do not rely on StringComparison, and can live in System.Memory.dll
public static class MemoryExtensions
{
    // If we decide to add overloads to Trim in the future that are non-ordinal and take StringComparison 
    // (similar to https://github.com/dotnet/corefx/issues/1244), they will be .NET Core specific.
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);

    public static bool IsWhiteSpace(this ReadOnlySpan<char> span);

    public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);

    // Does exactly what string does today, i.e. ordinal, case-sensitive, culture-insensitive comparison
    public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten);
    
    // To me, these are complementary to the Trim APIs and hence we should add them.
    public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
    public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
    public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
    public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
}

// Live in CoreLib and only available on .NET Core, exposed from System.Memory.dll
// Atm, this class in corelib is called 'Span' and contains the .NET Core specific implementation of the extension methods
public static class MemoryExtensions
{
    public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static int Compare(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, StringComparison comparison, Span<char> destination, out int bytesWritten);

    public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination);
    public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
    public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
    public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination);
    public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
    public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
}
Original Proposal
public static class SpanExtensions
{
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
    public static bool Equals(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static int Compare(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool StartsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count); // this will need to allocate if any chars are removed from the middle
    ...
}
@jnm2
Copy link
Contributor

jnm2 commented Sep 15, 2017

I'm particularly interested in IndexOf(char) and LastIndexOf(char) so far.

@safakgur
Copy link

safakgur commented Oct 27, 2017

Some of these extensions may also be useful for types other than char:

public static bool Equals<T>(this ReadOnlySpan<T> source, ReadOnlySpan<T> value)
    where T : struct, IEquatable<T>; // Non-allocating Enumerable.SequenceEquals

public static bool StartsWith(this ReadOnlySpan<T> source, ReadOnlySpan<T> value)
    where T : struct, IEquatable<T>;

// and overloads with IEqualityComparer<T> instead of the type constraint.

@ahsonkhan
Copy link
Member

I'm particularly interested in IndexOf(char) and LastIndexOf(char) so far.

We have IndexOf already. Does that your scenario? https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L93

Some of these extensions may also be useful for types other than char:
Equals
StartsWith

We have StartsWith already. Also, is the semantics of Equals the same as SequenceEqual? If so, that exists as well.
https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L137

@ahsonkhan
Copy link
Member

What about adding these as well?

public static bool EndsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool IsWhiteSpace(this ReadOnlySpan<char> value);
public static ReadOnlySpan<char> Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue); // this will need to allocate if the length of the chars being removed is not equal to the length of the chars being added
public static ReadOnlySpan<char> ToUpper(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToLower(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);

@ahsonkhan
Copy link
Member

ahsonkhan commented Dec 12, 2017

Updated proposal here: https://github.com/dotnet/corefx/issues/21395#issuecomment-357410425

Proposed API Additions

public static class MemoryExtensions
{
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
    public static bool Equals(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static int Compare(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
   
    // this will need to allocate if any chars are removed from the middle
    public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count); 
    
    public static bool StartsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool EndsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool IsWhiteSpace(this ReadOnlySpan<char> value);
    
    // this will need to allocate if the length of the chars being removed is not equal to the length of the chars being added
    public static ReadOnlySpan<char> Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue); 
    
    public static ReadOnlySpan<char> ToUpper(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> ToLower(this ReadOnlySpan<char> span);
}

@jkotas
Copy link
Member

jkotas commented Dec 12, 2017

Are these APIs going to be present in both the OOB and inbox version of System.Memory? What is the OOB implementation going to look like?

The ToUpper and ToLower APIs are also allocating. Should we rather change the signatures to be non-allocating?

@ahsonkhan
Copy link
Member

ahsonkhan commented Dec 12, 2017

Are these APIs going to be present in both the OOB and inbox version of System.Memory? What is the OOB implementation going to look like?

I was thinking, similar to APIs like StartsWith, SequenceEqual, etc. (https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/MemoryExtensions.cs#L281), that there will only be an OOB version. What would be the value of having inbox versions as well?

The ToUpper and ToLower APIs are also allocating. Should we rather change the signatures to be non-allocating?

Yes. I will work on the API signatures (adding destination span that is passed in as well), to try to get them to be non-allocating.

@jkotas
Copy link
Member

jkotas commented Dec 12, 2017

The problem with ToLower, ToUpper and other similar globalization APIs is that it is impossible to implement them efficiently in the OOB version. The implementation would always have to allocate the strings from the Span first. It make them kind of pointless to have in the OOB System.Memory. Have you though about having these APIs in System.Runtime or similar contract instead?

@benaadams
Copy link
Member

Wouldn't work for string (as immuatable); but for regular Span<char>?

public static void ToUpper(this Span<char> span);
public static void ToLower(this Span<char> span);

@Tragetaschen
Copy link
Contributor

@benaadams
string.ToUpper depends on the current culture and there's no one-to-one mapping between characters. For example the one-character, lowercase ß in German becomes the two-character SS (groß => GROSS)

@theolivenbaum
Copy link

any thoughts on supporting running Regex on ReadOnlySpan?

@KrzysztofCwalina
Copy link
Member

I think APIs that don't specify otherwise (including the ToUpper/Lower) should be culture invariant. If We want to provide culture aware APIs, they should take an explicit CultureInfo argument.

@svick
Copy link
Contributor

svick commented Dec 20, 2017

@rafael-aero There's a separate issue for that: https://github.com/dotnet/corefx/issues/24145.

@Tornhoof
Copy link
Contributor

Tornhoof commented Dec 21, 2017

FWIW, @Tragetaschen

For example the one-character, lowercase ß in German becomes the two-character SS (groß => GROSS)

This is not correct anymore, there is a ẞ in Unicode for years now, U+1E9E. It's part of german Orthography since July 2017. https://en.wikipedia.org/wiki/Capital_ẞ
I Imagine there are still other examples though.

@SomeAnonDev
Copy link

It would be great if we could also switch on Spans of char as if they were strings.

@jnm2
Copy link
Contributor

jnm2 commented Dec 27, 2017

@SomeAnon42 Special-casing ReadOnlySpan<char> the way string is special-cased would be a language-level change. (https://github.com/dotnet/csharplang)

@bbowyersmyth
Copy link
Contributor

@Tragetaschen @benaadams CoreCLR already operates under the assumption that the case change produces the same length string. https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Globalization/TextInfo.Windows.cs#L26-L76
This may be a bug. If not, it would be consistent for span to do the same thing.

@danmoseley danmoseley changed the title Need core set of ReadOnlySpan<char> string-like extension methods String-like extension methods to ReadOnlySpan<char> Epic Jan 3, 2018
@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 12, 2018

Updated proposal here: https://github.com/dotnet/corefx/issues/21395#issuecomment-357410425

Added:

  • Contains
  • Non-allocating alternatives
  • Additional overloads

Updated Proposed API Additions

public static class MemoryExtensions
{
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    
    public static bool Equals(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool Contains(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static int Compare(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
   
    // this will need to allocate if any chars are removed from the middle
    public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count);
    public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count);
    // non-allocating alternative:
    public static bool Remove(this ReadOnlySpan<char> source, int startIndex, Span<char> result);
    public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> result); 
    
    public static bool StartsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool EndsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool IsWhiteSpace(this ReadOnlySpan<char> value);
    
    // this will need to allocate if the length of the chars being removed is not equal to the length of the chars being added
    public static ReadOnlySpan<char> Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue);
    // non-allocating alternative:
    public static bool Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue, Span<char> result);
    
    public static ReadOnlySpan<char> ToUpper(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> ToUpperInvariant(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> ToLower(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> ToLowerInvariant(this ReadOnlySpan<char> span);

    // these will allocate
    public static ReadOnlySpan<char> PadLeft(this ReadOnlySpan<char> span, int totalWidth);
    public static ReadOnlySpan<char> PadLeft(this ReadOnlySpan<char> span, int paddingChar);
    public static ReadOnlySpan<char> PadRight(this ReadOnlySpan<char> span, int totalWidth);
    public static ReadOnlySpan<char> PadRight(this ReadOnlySpan<char> span, int paddingChar);
    // non-allocating alternative
    public static bool PadLeft(this ReadOnlySpan<char> span, int totalWidth, Span<char> result);
    public static bool PadLeft(this ReadOnlySpan<char> span, int paddingChar, Span<char> result);
    public static bool PadRight(this ReadOnlySpan<char> span, int totalWidth, Span<char> result);
    public static bool PadRight(this ReadOnlySpan<char> span, int paddingChar, Span<char> result);
}

@jkotas
Copy link
Member

jkotas commented Jan 12, 2018

Are the signatures of Replace methods right? They seem to be missing an argument. Also, why is the allocating variant of Replace returning ReadOnlySpan? Shouldn't it rather return writeable span?

@terrajobst
Copy link
Member

terrajobst commented Jan 12, 2018

Video

We don't think this is ready yet. @ahsonkhan, please redesign the APIs to avoid allocations by returning new spans. Instead, they should follow our TryXxx pattern. But we have to keep in mind that many times folks will start with ReadOnlySpan<char> from a string where in-place isn't possible. It would help to have some sample code that shows how these APIs are used.

@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 13, 2018

Update:

  • Removed the allocating versions of the API
  • Fixed Replace
  • Fixed argument names and return values of the non-allocating versions of the APIs

Most of the APIs that are designed to avoid allocations do not need to return bytesWritten since that will be known to the user before the call. For example, we know that Remove will write exactly source.Length - count chars to the destination span. The exception to this is Replace, which is why it needs bytesWritten.

Proposed API

public static class MemoryExtensions
{
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span) { throw null; }
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar) { throw null; }
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars) { throw null; }
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span) { throw null; }
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar) { throw null; }
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars) { throw null; }
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span) { throw null; }
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar) { throw null; }
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars) { throw null; }
    
    public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
    public static int Compare(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
    public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }

    public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
    public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
    public static bool IsWhiteSpace(this ReadOnlySpan<char> span) { throw null; }


    // APIs designed to avoid allocations:

    // do we need bytesWritten? It should be source.Length - count on success
    public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination) { throw null; }
    
    public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten) { throw null; }
    
    // do we need bytesWritten? It should be source.Length on success
    public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }
    public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }
    public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }
    public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }

    // do we need bytesWritten? It should be totalWidth on success or source.Length if totalWidth < source.Length
    public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination) { throw null; } 
    public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, char paddingChar, Span<char> destination) { throw null; }
    public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination) { throw null; }
    public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, char paddingChar, Span<char> destination) { throw null; }
}

Sample Usage

Approach 1 to use PadLeft:

ReadOnlySpan<char> span; // input
int totalWidth = 50; // known value, assume >= span.Length
Span<char> destination = new char[totalWidth]; //if set to less than totalWidth, PadLeft will return false
if (span.PadLeft(totalWidth, destination))
{
    // destination has some pad characters (if there is extra space) followed by the entire input
    // input is at the end of the destination buffer
}
else
{
    // resize destination to be >= totalWidth
    var result = span.PadLeft(totalWidth, destination);
    Debug.Assert(result);
}
// only if destination.Length > totalWidth, it will be necessary to slice to get the precise segment.
destination = destination.Slice(0, totalWidth);

Approach 2 to use PadLeft:

ReadOnlySpan<char> span; // input
int totalWidth = x; // unknown value, x >= 0, could be less than span.Length
Span<char> destination = new char[y] // y could be less than x

if (source.Length > totalWidth) return; // user can choose to do nothing, or call PadLeft which will just do a copy

if (destination.Length < totalWidth) destination = new char[totalWidth];
span.PadLeft(totalWidth, destination); // no need to slice, the destination is of the right size.

Sample usage for Replace:

string testStr = "abcdefghibclmbcp";
ReadOnlySpan<char> source = testStr.AsReadOnlySpan();
Span<char> destination = new char[source.Length];

// oldValue.Length >= newValue.Length
source.Replace("a".AsReadOnlySpan(), "z".AsReadOnlySpan(), destination, out int bytesWritten)
// not necessary here
destination = destination.Slice(0, bytesWritten);


// oldValue.Length < newValue.Length
while (!source.Replace("z".AsReadOnlySpan(), "ab".AsReadOnlySpan(), destination, out int bytesWritten))
{
    // enlarge destination
    // maximum size required: string.Length + (string.Length/oldValue.Length * (newValue.Length - oldValue.Length))
    // maximum size required: 16 + (16/1 * (2 - 1)) = 16 + 16 = 32
    // growing the destination to 32 will gaurantee success in this case
}
destination = destination.Slice(0, bytesWritten);

Open Questions:

  1. Should these methods be generic? If not, should we provide byte specific overloads?
  2. When should we provide the bytesWritten overload? The APIs are currently designed to only provide it if it cannot be calculated, or already known, by the caller (in constant time) before the method is called.

@terrajobst
Copy link
Member

terrajobst commented Jan 16, 2018

Video

// Approved, but needs to go somewhere else due to globalization

public static class MemoryExtensions
{
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    
    public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static int Compare(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);

    public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool IsWhiteSpace(this ReadOnlySpan<char> span);
}

// Needs more work

public static class MemoryExtensions
{
    // APIs designed to avoid allocations:

    public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);
    
    public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten);
    
    // do we need bytesWritten? It should be source.Length on success
    public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination);
    public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
    public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination);
    public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
}

// We probably don't want these    

public static class MemoryExtensions
{
    public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination); 
    public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, char paddingChar, Span<char> destination);
    public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
    public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, char paddingChar, Span<char> destination);
}

@benaadams
Copy link
Member

We probably don't want these
PadLeft

Time for a NuGet span left-pad? Will become the most popular library and the world will come to depend on it - mwhahahaha!

@benaadams
Copy link
Member

needs to go somewhere else due to globalization

What about the Ordinal versions?

public static ReadOnlySpan<T> Trim(this ReadOnlySpan<T> span, T trimElement);
public static ReadOnlySpan<T> Trim(this ReadOnlySpan<T> span, ReadOnlySpan<T> trimElements);
// etc

@ahsonkhan
Copy link
Member

// Needs more work

I understand that we need to consider StringComparison overload for Replace, but what additional work/design decisions are there for the Remove API? It doesn't have any concerns regarding comparison/globalization/etc.

@ahsonkhan ahsonkhan self-assigned this Jan 17, 2018
@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 20, 2018

// All these do ordinal comparisons, and hence do not rely on StringComparison, and can live in System.Memory.dll
public static class MemoryExtensions
{
    // If we decide to add overloads to Trim in the future that are non-ordinal and take StringComparison 
    // (similar to https://github.com/dotnet/corefx/issues/1244), they will be .NET Core specific.
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);

    public static bool IsWhiteSpace(this ReadOnlySpan<char> span);

    public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);

    // Does exactly what string does today, i.e. ordinal, case-sensitive, culture-insensitive comparison
    public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten);
    
    // To me, these are complementary to the Trim APIs and hence we should add them.
    public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
    public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
    public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
    public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
}

// Live in CoreLib and only available on .NET Core, exposed from System.Memory.dll
// Atm, this class in corelib is called 'Span' and contains the .NET Core specific implementation of the extension methods
public static class MemoryExtensions
{
    public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static int Compare(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, StringComparison comparison, Span<char> destination, out int bytesWritten);

    public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture = CultureInfo.CurrentCulture);
    public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
    public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture = CultureInfo.CurrentCulture);
    public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
}

@Joe4evr
Copy link
Contributor

Joe4evr commented Jan 22, 2018

Can I throw in another suggestion? I'd really like to see some ability to split a ReadOnlySpan<char>. Obviously, you can't return a collection of Spans directly, but isn't that what Memory<T> is for?

// equivalent to the overloads of 'String.Split()'
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, StringSplitOptions options);

The reason for choosing IReadOnlyList<T> is to make the resulting collection indexable, just like string[]. It would be nice if the implementation is ImmutableArray<T>, but I'm not sure if that's a concern for distribution and such.

@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 22, 2018

@Joe4evr, out of curiosity, do you have a scenario atm where these APIs would be useful? If so, can you please show the code sample?

I would replace the char[] overloads with ReadOnlySpan<char>. However, I am not sure about adding the split APIs, in general, given they have to allocate. Is there a way to avoid allocating? Also, given these are string-like APIs for span, it is strange to have an overload that takes a string[]. Maybe all these would fit better on ReadOnlyMemory instead, especially given the return type.

@Joe4evr
Copy link
Contributor

Joe4evr commented Jan 22, 2018

My scenario is taking a relatively big string of user input and then parsing that to populate a more complex object. So rather than take the whole string at once, I'd like to parse it in pieces at a time. It'd be pretty nice if this can be facilitated by the Span/Memory<T> APIs so that this code won't have to allocate an extra 30-40 tiny strings whenever it runs.

Admittedly, I only started on this particular case earlier today, mostly to experiment and find out how much I could get out of the Span APIs at this time.

Maybe it was a bit naive of me to expect a collection like I did, but I'd at least like to see some API to deal with this scenario a little easier, because I'll probably not be the only one looking to split a span up into smaller chunks like this.

@stephentoub
Copy link
Member Author

Splitting support would be good, but I don't think it would look like the proposed methods; as @ahsonkhan points out, that would result in a lot of allocation (including needing to copy the whole input string to the heap, since you can't store the span into a returned interface implementation).

I would instead expect a design more like an iterator implemented as a ref struct, e.g.

public ref struct CharSpanSplitter
{
    public CharSpanSplitter(ReadOnlySpan<char> value, char separator, StringSplitOptions options);
    public bool TryMoveNext(out ReadOnlySpan<char> result);
}

@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 22, 2018

Can I throw in another suggestion? I'd really like to see some ability to split a ReadOnlySpan<char>.

I would instead expect a design more like an iterator implemented as a ref struct

@Joe4evr, would it be fine for me to split the conversation related to the split API into a new issue, especially since it would require additional design?

Edit: Created https://github.com/dotnet/corefx/issues/26528

@jkotas
Copy link
Member

jkotas commented Jan 23, 2018

bool Equals(this ReadOnlySpan span, ReadOnlySpan value, StringComparison comparison);

StringComparison is just a conveniece wrapper over System.Globalization.CompareInfo. Do you also need new overloads that take Span on System.Globalization.CompareInfo to make this work? Same for all other stuff that deals with globalization.

public static bool ToUpper(this ReadOnlySpan source, Span destination, CultureInfo culture = CultureInfo.CurrentCulture);

CultureInfo.CurrentCulture cannot use a property as the default value. Default value has to be a constant.

@Joe4evr
Copy link
Contributor

Joe4evr commented Jan 23, 2018

@stephentoub Fair enough. Glad I at least got the ball rolling on the matter.

@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 23, 2018

CultureInfo.CurrentCulture cannot use a property as the default value. Default value has to be a constant.

Added the proper overload without optional arguments (updated the first post).

StringComparison is just a conveniece wrapper over System.Globalization.CompareInfo.

How so? StringComparison is just an enum with culture specific options and intersects with some of CompareOptions.

Do you also need new overloads that take Span on System.Globalization.CompareInfo to make this work?

I don't know what you mean by this. Are you asking if we would need additional overloads that take CompareInfo instead of StringComparison as arguments?

Edit: I think I get what you mean. We may need to expose the compare APIs on CompareInfo that take Span, publicly. This is used by Equals/Compare (and can probably remain internal):
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Globalization/CompareInfo.cs#L393

We would need to add additional span-based overloads for IsPrefix/IsSuffix/IndexOf.

@jkotas
Copy link
Member

jkotas commented Jan 23, 2018

This is used by Equals/Compare (and can probably remain internal

It would be odd for the overloads to remain internal. We have made the Span overloads public in all other similar cases - Stream, BinaryReader/Writer, ... .

@khellang
Copy link
Member

khellang commented Jan 23, 2018

I would instead expect a design more like an iterator implemented as a ref struct

@stephentoub @Joe4evr Why stop with a "string splitter"? I'd love to see something like Google Guava's CharMatcher, which could be much more useful (might be a case for a separate proposal, though.)

@terrajobst
Copy link
Member

terrajobst commented Jan 23, 2018

Video

  • We can expose all these APIs from MemoryExtensions, even the ones that need globalization APIs as this just means we can't expose them (yet).
  • We'll not expose APIs that make StringComparison or CultureInfo implicit. We might get feedback, but we think forcing the developer to make a decision is less prone to errors than selecting a default.
  • We've removed Replace as we don't think the API interaction / pattern works out. It need more thought/design.
public static class MemoryExtensions
{
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);

    public static bool IsWhiteSpace(this ReadOnlySpan<char> span);

    public static void Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);    
    public static void PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
    public static void PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
    public static void PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
    public static void PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);

    // Those need access to globalization APIs. We'll also expose them from
    // the .NET Framework OOB (slow span). They will try to extract the string
    // from the underlying span (because slow span stores it) -- or -- allocate
    // a new string. This avoids bifurcating the API surface.

    public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static int CompareTo(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);

    public static void ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
    public static void ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
    public static void ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
    public static void ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
}

@tarekgh
Copy link
Member

tarekgh commented Jan 24, 2018

one last note regarding

public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);

I want to have this API out the matched length of the searched characters as it is not always equal to the length of the value span.

Example: imagine the source span have the characters A ̀ and we are searching for Á with culture aware operation. IndexOf will return the index of the match and will return the matched length is 2 and not just 1.

so I suggest the API will be

public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison, out int matchedLength);

The benefit having the outmatched length is:

  • It will help in the scenario of searching more than one time inside the span. when hit first match, you can know where you can start the next search.
  • As we have removed the Replace API from the proposal, we are giving people the way to implement their Replace APIs

@ahsonkhan
Copy link
Member

Given that the proposed API can result in correctness issues, I would agree that we need the out int matchedLength parameter for the non-ordinal IndexOf.

Change:

public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, 
                                       StringComparison comparison);

To:

public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, 
                                       StringComparison comparison, out int matchedLength);

@KrzysztofCwalina
Copy link
Member

I want to have this API out the matched length of the searched characters as it is not always equal to the length of the value span.

Then maybe we want a Ordinal overload without this out parameter?

Example: imagine the source span have the characters A ̀ and we are searching for Á with culture aware operation. IndexOf will return the index of the match and will return the matched length is 2 and not just 1.

Could we add the example to docs (remarks) of the method?

@tarekgh
Copy link
Member

tarekgh commented Jan 24, 2018

Then maybe we want an Ordinal overload without this out parameter?

I think matchedLength would be useful even in the ordinal case too. if you write some code to do a repeated search inside the span would just work for any case (either ordinal or non-ordinal). if we have the ordinal overload, the user has to use Span.Length to perform the subsequent operation.
I don't have a strong feeling though about that :-)

here is some code sample for IndexOf

            // string has A character followed by the COMBINING GRAVE ACCENT ̀  which is equivalent to À (0x00C0)
            string source = "A\u0300 Some Other text A\u0300"; 
            string value = "\u00C0"; // À

            ReadOnlySpan<char> sourceSpan = source.AsReadOnlySpan();
            ReadOnlySpan<char> valueSpan = value.AsReadOnlySpan();

            while (true)
            {
                int index = sourceSpan.IndexOf(valueSpan, StringComparison.InvariantCulture, out int matachedLength);
                if (index < 0 || index + matachedLength >= sourceSpan.Length)
                {
                    break; // done searching
                }
                sourceSpan = sourceSpan.Slice(index + matachedLength); // Slice for next search
            }

@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 25, 2018

Then maybe we want a Ordinal overload without this out parameter?

We already have that, essentially, with IndexOf<T>
https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L42

public static int IndexOf<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>

@KrzysztofCwalina
Copy link
Member

We already have that, essentially, with IndexOf

Ah, makes sense.

@justinvp
Copy link
Contributor

justinvp commented Feb 6, 2018

  • The name of the StringComparison parameters should be comparisonType, not comparison, to be consistent with the existing naming used everywhere else.
  • Some extension methods use source while others use span as the name of the this parameter. It'd be nice to be consistent with the naming.

@ahsonkhan
Copy link
Member

ahsonkhan commented Feb 6, 2018

Some extension methods use source while others use span as the name of the this parameter. It'd be nice to be consistent with the naming.

The APIs where this is source have another span parameter as destination. The other ones don't. That is the reason for the inconsistency. If we feel strongly about the parameter renaming, I can do it in all methods. We should probably address this together with other inconsistencies like https://github.com/dotnet/corefx/issues/26894 (and dotnet/corefx#26694 (comment)).

@ahsonkhan
Copy link
Member

Based on the discussion here, we want to make an adjustment to the following APIs to return an int (number of characters written to the destination). Also, instead of throwing when the destination is too small, we would return -1 isntead.

Changing return type from void to int.

public static int ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static int ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static int ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static int ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);

@KrzysztofCwalina
Copy link
Member

Thanks! Nice addition that nicely rounds up what can be done with Spans.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@msftbot msftbot bot locked as resolved and limited conversation to collaborators Dec 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Projects
None yet
Development

No branches or pull requests