New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String-like extension methods to ReadOnlySpan<char> Epic #21395

Closed
stephentoub opened this Issue Jun 23, 2017 · 48 comments

Comments

@stephentoub
Member

stephentoub commented Jun 23, 2017

As we look to support more efficient parsing/formatting/usage of ReadOnlySpan<char> as slices of System.Strings, we're planning to add a variety of APIs across corefx that operate with ReadOnlySpan<char> (#21281). But for such APIs to be truly useful, and for ReadOnlySpan<char> to be generally helpful as a string-like type, we need a set of extension methods on ReadOnlySpan<char> that mirror the corresponding operations on string, e.g. Equals with various kinds of string comparisons, Trim, Replace, etc. We need to define, implement, test, and ship a core set of these (more can of course be added in the future), e.g.

Edit by @ahsonkhan - Updated APIs:

// All these do ordinal comparisons, and hence do not rely on StringComparison, and can live in System.Memory.dll
public static class MemoryExtensions
{
    // If we decide to add overloads to Trim in the future that are non-ordinal and take StringComparison 
    // (similar to https://github.com/dotnet/corefx/issues/1244), they will be .NET Core specific.
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);

    public static bool IsWhiteSpace(this ReadOnlySpan<char> span);

    public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);

    // Does exactly what string does today, i.e. ordinal, case-sensitive, culture-insensitive comparison
    public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten);
    
    // To me, these are complementary to the Trim APIs and hence we should add them.
    public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
    public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
    public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
    public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
}

// Live in CoreLib and only available on .NET Core, exposed from System.Memory.dll
// Atm, this class in corelib is called 'Span' and contains the .NET Core specific implementation of the extension methods
public static class MemoryExtensions
{
    public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static int Compare(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, StringComparison comparison, Span<char> destination, out int bytesWritten);

    public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination);
    public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
    public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
    public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination);
    public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
    public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
}
Original Proposal
public static class SpanExtensions
{
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
    public static bool Equals(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static int Compare(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool StartsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count); // this will need to allocate if any chars are removed from the middle
    ...
}
@jnm2

This comment has been minimized.

Collaborator

jnm2 commented Sep 15, 2017

I'm particularly interested in IndexOf(char) and LastIndexOf(char) so far.

@safakgur

This comment has been minimized.

safakgur commented Oct 27, 2017

Some of these extensions may also be useful for types other than char:

public static bool Equals<T>(this ReadOnlySpan<T> source, ReadOnlySpan<T> value)
    where T : struct, IEquatable<T>; // Non-allocating Enumerable.SequenceEquals

public static bool StartsWith(this ReadOnlySpan<T> source, ReadOnlySpan<T> value)
    where T : struct, IEquatable<T>;

// and overloads with IEqualityComparer<T> instead of the type constraint.
@ahsonkhan

This comment has been minimized.

Member

ahsonkhan commented Dec 12, 2017

I'm particularly interested in IndexOf(char) and LastIndexOf(char) so far.

We have IndexOf already. Does that your scenario? https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L93

Some of these extensions may also be useful for types other than char:
Equals
StartsWith

We have StartsWith already. Also, is the semantics of Equals the same as SequenceEqual? If so, that exists as well.
https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L137

@ahsonkhan

This comment has been minimized.

Member

ahsonkhan commented Dec 12, 2017

What about adding these as well?

public static bool EndsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool IsWhiteSpace(this ReadOnlySpan<char> value);
public static ReadOnlySpan<char> Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue); // this will need to allocate if the length of the chars being removed is not equal to the length of the chars being added
public static ReadOnlySpan<char> ToUpper(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToLower(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
@ahsonkhan

This comment has been minimized.

Member

ahsonkhan commented Dec 12, 2017

Updated proposal here: #21395 (comment)

Proposed API Additions

public static class MemoryExtensions
{
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
    public static bool Equals(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static int Compare(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
   
    // this will need to allocate if any chars are removed from the middle
    public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count); 
    
    public static bool StartsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool EndsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool IsWhiteSpace(this ReadOnlySpan<char> value);
    
    // this will need to allocate if the length of the chars being removed is not equal to the length of the chars being added
    public static ReadOnlySpan<char> Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue); 
    
    public static ReadOnlySpan<char> ToUpper(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> ToLower(this ReadOnlySpan<char> span);
}
@jkotas

This comment has been minimized.

Member

jkotas commented Dec 12, 2017

Are these APIs going to be present in both the OOB and inbox version of System.Memory? What is the OOB implementation going to look like?

The ToUpper and ToLower APIs are also allocating. Should we rather change the signatures to be non-allocating?

@ahsonkhan

This comment has been minimized.

Member

ahsonkhan commented Dec 12, 2017

Are these APIs going to be present in both the OOB and inbox version of System.Memory? What is the OOB implementation going to look like?

I was thinking, similar to APIs like StartsWith, SequenceEqual, etc. (https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/MemoryExtensions.cs#L281), that there will only be an OOB version. What would be the value of having inbox versions as well?

The ToUpper and ToLower APIs are also allocating. Should we rather change the signatures to be non-allocating?

Yes. I will work on the API signatures (adding destination span that is passed in as well), to try to get them to be non-allocating.

@jkotas

This comment has been minimized.

Member

jkotas commented Jan 23, 2018

This is used by Equals/Compare (and can probably remain internal

It would be odd for the overloads to remain internal. We have made the Span overloads public in all other similar cases - Stream, BinaryReader/Writer, ... .

@khellang

This comment has been minimized.

Collaborator

khellang commented Jan 23, 2018

I would instead expect a design more like an iterator implemented as a ref struct

@stephentoub @Joe4evr Why stop with a "string splitter"? I'd love to see something like Google Guava's CharMatcher, which could be much more useful (might be a case for a separate proposal, though.)

@terrajobst

This comment has been minimized.

Member

terrajobst commented Jan 23, 2018

  • We can expose all these APIs from MemoryExtensions, even the ones that need globalization APIs as this just means we can't expose them (yet).
  • We'll not expose APIs that make StringComparison or CultureInfo implicit. We might get feedback, but we think forcing the developer to make a decision is less prone to errors than selecting a default.
  • We've removed Replace as we don't think the API interaction / pattern works out. It need more thought/design.
public static class MemoryExtensions
{
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
    public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);

    public static bool IsWhiteSpace(this ReadOnlySpan<char> span);

    public static void Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);    
    public static void PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
    public static void PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
    public static void PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
    public static void PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);

    // Those need access to globalization APIs. We'll also expose them from
    // the .NET Framework OOB (slow span). They will try to extract the string
    // from the underlying span (because slow span stores it) -- or -- allocate
    // a new string. This avoids bifurcating the API surface.

    public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static int CompareTo(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
    public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);

    public static void ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
    public static void ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
    public static void ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
    public static void ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
}
@tarekgh

This comment has been minimized.

Member

tarekgh commented Jan 24, 2018

one last note regarding

public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);

I want to have this API out the matched length of the searched characters as it is not always equal to the length of the value span.

Example: imagine the source span have the characters A ̀ and we are searching for Á with culture aware operation. IndexOf will return the index of the match and will return the matched length is 2 and not just 1.

so I suggest the API will be

public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison, out int matchedLength);

The benefit having the outmatched length is:

  • It will help in the scenario of searching more than one time inside the span. when hit first match, you can know where you can start the next search.
  • As we have removed the Replace API from the proposal, we are giving people the way to implement their Replace APIs
@ahsonkhan

This comment has been minimized.

Member

ahsonkhan commented Jan 24, 2018

Given that the proposed API can result in correctness issues, I would agree that we need the out int matchedLength parameter for the non-ordinal IndexOf.

Change:

public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, 
                                       StringComparison comparison);

To:

public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, 
                                       StringComparison comparison, out int matchedLength);
@KrzysztofCwalina

This comment has been minimized.

Member

KrzysztofCwalina commented Jan 24, 2018

I want to have this API out the matched length of the searched characters as it is not always equal to the length of the value span.

Then maybe we want a Ordinal overload without this out parameter?

Example: imagine the source span have the characters A ̀ and we are searching for Á with culture aware operation. IndexOf will return the index of the match and will return the matched length is 2 and not just 1.

Could we add the example to docs (remarks) of the method?

@tarekgh

This comment has been minimized.

Member

tarekgh commented Jan 24, 2018

Then maybe we want an Ordinal overload without this out parameter?

I think matchedLength would be useful even in the ordinal case too. if you write some code to do a repeated search inside the span would just work for any case (either ordinal or non-ordinal). if we have the ordinal overload, the user has to use Span.Length to perform the subsequent operation.
I don't have a strong feeling though about that :-)

here is some code sample for IndexOf

            // string has A character followed by the COMBINING GRAVE ACCENT ̀  which is equivalent to À (0x00C0)
            string source = "A\u0300 Some Other text A\u0300"; 
            string value = "\u00C0"; // À

            ReadOnlySpan<char> sourceSpan = source.AsReadOnlySpan();
            ReadOnlySpan<char> valueSpan = value.AsReadOnlySpan();

            while (true)
            {
                int index = sourceSpan.IndexOf(valueSpan, StringComparison.InvariantCulture, out int matachedLength);
                if (index < 0 || index + matachedLength >= sourceSpan.Length)
                {
                    break; // done searching
                }
                sourceSpan = sourceSpan.Slice(index + matachedLength); // Slice for next search
            }
@ahsonkhan

This comment has been minimized.

Member

ahsonkhan commented Jan 25, 2018

Then maybe we want a Ordinal overload without this out parameter?

We already have that, essentially, with IndexOf<T>
https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L42

public static int IndexOf<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>
@KrzysztofCwalina

This comment has been minimized.

Member

KrzysztofCwalina commented Jan 25, 2018

We already have that, essentially, with IndexOf

Ah, makes sense.

@justinvp

This comment has been minimized.

Collaborator

justinvp commented Feb 6, 2018

  • The name of the StringComparison parameters should be comparisonType, not comparison, to be consistent with the existing naming used everywhere else.
  • Some extension methods use source while others use span as the name of the this parameter. It'd be nice to be consistent with the naming.
@ahsonkhan

This comment has been minimized.

Member

ahsonkhan commented Feb 6, 2018

Some extension methods use source while others use span as the name of the this parameter. It'd be nice to be consistent with the naming.

The APIs where this is source have another span parameter as destination. The other ones don't. That is the reason for the inconsistency. If we feel strongly about the parameter renaming, I can do it in all methods. We should probably address this together with other inconsistencies like #26894 (and #26694 (comment)).

@ahsonkhan

This comment has been minimized.

Member

ahsonkhan commented Feb 15, 2018

Based on the discussion here, we want to make an adjustment to the following APIs to return an int (number of characters written to the destination). Also, instead of throwing when the destination is too small, we would return -1 isntead.

Changing return type from void to int.

public static int ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static int ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static int ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static int ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
@ahsonkhan

This comment has been minimized.

@ahsonkhan ahsonkhan closed this Feb 28, 2018

@KrzysztofCwalina

This comment has been minimized.

Member

KrzysztofCwalina commented Feb 28, 2018

Thanks! Nice addition that nicely rounds up what can be done with Spans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment