Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API proposal: StringSplitOptions.TrimEntries #31038

Closed
GrabYourPitchforks opened this issue Oct 2, 2019 · 12 comments · Fixed by #35740
Closed

API proposal: StringSplitOptions.TrimEntries #31038

GrabYourPitchforks opened this issue Oct 2, 2019 · 12 comments · Fixed by #35740
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime

Comments

@GrabYourPitchforks
Copy link
Member

How many times have we all written the below code?

string[] entries = someString.Split(',');

for (int i = 0; i < entries.Length; i++)
{
    string entry = entries[i].Trim();
    /* operate on entry */
}

Even just a cursory search of our own Framework reference sources shows that this is a common pattern:

I propose adding an enum value StringSplitOptions.TrimEntries that would make this pattern a bit easier for folks by eliminating the need for developers to call Trim.

[Flags]
public enum StringSplitOptions
{
    None = 0,
    RemoveEmptyEntries = 1,
    TrimEntries = 2 /* new value */
}

The behavior of string.Split and related APIs would be as follows:

If None or RemoveEmpyEntries is provided, the behavior of string.Split under this proposal is unchanged from its behavior today.

If TrimEntries is provided by itself, the string.Split method performs the equivalent of calling string.Trim() on each element of the returned array. That is, no element in the returned array will have leading or trailing whitespace. The array may contain zero-length strings for entries.

Assert.Equal(new[] { "Doe", "John" }, "Doe, John".Split(',', StringSplitOptions.TrimEntries));
Assert.Equal(new[] { "1", "", "2", "3" }, "1, , 2 , 3 ".Split(',', StringSplitOptions.TrimEntries));
Assert.Equal(new[] { "", "" }, ", ".Split(',', StringSplitOptions.TrimEntries));

If TrimEntries and RemoveEmptyEntries are both provided, the entries are trimmed first, and then zero-length entries are removed from the returned array.

Assert.Equal(new[] { "Doe", "John" }, "Doe, John".Split(',', StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries));
Assert.Equal(new[] { "1", "2", "3" }, "1, , 2 , 3 ".Split(',', StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries));
Assert.Equal(new string[0], ", ".Split(',', StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries));

Adding this enum value may be considered a breaking change if we expect that third-party components aside from string.Split are consuming the StringSplitOptions enum, not simply generating the value. In theory any such third-party components should be checking their inputs and rejecting unknown flags.

@scalablecory
Copy link
Contributor

👀 I'm suspicious whenever I see Split with a comma separator.

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Jan 13, 2020

A frequent usage for me is StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyOrWhiteSpaceEntries 😄 missing both.

return element?.InnerText?.Split(',', StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyOrWhiteSpaceEntries);
return element?.InnerText?.Split(',', StringSplitOptions.RemoveEmptyEntries).Where(value => !string.IsNullOrWhiteSpace(value)).Select(value => value.Trim()).ToArray();

Another real case(InnerText)

<Exclude>
  [lib]ns1.*,
  [lib]ns2.*
</Exclude>

or

<Exclude> </Exclude>

👀 I'm suspicious whenever I see Split with a comma separator.

Why? When you work with command line parameters or file(i.e. runsettings file parameters) parameters usually you use this way, not hot loop or csv parsing for sure.

@GrabYourPitchforks
Copy link
Member Author

API review - see if there's any usage of Split followed by TrimStart and TrimEnd as opposed to just Trim. That'll inform us on whether these enum values being added are sufficient for common use cases.

@GrabYourPitchforks
Copy link
Member Author

I dug into this a bit further, and there doesn't seem to be much of a scenario for having a separate TrimStart or TrimEnd enum value. There were two "interesting" use cases of string.Split followed by string.TrimStart in the Framework, but they're both achievable though other means.

In both cases the call sites could've used simply Trim instead of TrimStart. In the case of the second example there's already a bug due to the nested Trim calls because they'll inadvertently trim the white space from the string Content-Type: abc; parameter=" quoted because whitespace is important! ". Quoted semicolons are likewise not honored correctly. The code should probably be using a proper lexer rather than string.Split if feasible.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@maryamariyan maryamariyan removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2020
@GrabYourPitchforks GrabYourPitchforks added the blocking Marks issues that we want to fast track in order to unblock other important work label May 4, 2020
@terrajobst terrajobst self-assigned this May 26, 2020
@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review blocking Marks issues that we want to fast track in order to unblock other important work labels May 26, 2020
@terrajobst
Copy link
Member

namespace System
{
    [Flags]
    public enum StringSplitOptions
    {
        // Existing:
        // None = 0,
        //RemoveEmptyEntries = 1,

        // New:
        TrimEntries = 2
    }
}

@danmoseley
Copy link
Member

@GrabYourPitchforks I was going to note that #295 would need updating for "TrimEntries" but I don't see that API accepting StringSplitOptions. I assume that was intentional (I didn't read all of #934)
cc @layomia

@GrabYourPitchforks
Copy link
Member Author

@danmosemsft For spans, I'm still partial to a design that allows me to write something like the following. It gets rid of the whole iterator mess for some common scenarios.

ROS<char> span = GetName(); // "Doe, John"
(var lastName, var firstName) = span.Split(',', StringSplitOptions.TrimEntries);
// At this point, lastName := "Doe" and firstName := "John"

Per the "accepted" proposal at #934 (comment), there are no split options being passed in. So the changes in this issue shouldn't affect that PR.

@danmoseley
Copy link
Member

@GrabYourPitchforks I'm unclear how that would work, if Split resulted in 50 fragments, would I receive a 50 element tuple?

Do you know why #934 did not include StringSplitOptions? It seems like a natural extension - either we would want #934 plus SSO, or something different entirely.

@GrabYourPitchforks
Copy link
Member Author

@danmosemsft it's equivalent to passing in a "max elements" parameter and then decomposing the result into that number of components.

That is:

(var lastName, var firstName) = span.Split(','); // essentially equivalent to Split(',', maxValues: 2);

ROS<char> span = "Doe, John, III";
(var lastName, var firstName) = span.Split(',', SSO.Trim);

// lastName := "Doe"
// firstName := "John, III"

Which is generally what most callers want when doing this kind of split.

Calling (a, b, c, d, e) = span.Split(',') is essentially equivalent to passing maxValues: 5. And so on. There's a lot more on this over at dotnet/corefxlab#2350 (search for text API Usage Samples).

Honestly I've tried to avoid thinking about #934. The API feels very wrong to me, but I'm having trouble vocalizing why.

@GrabYourPitchforks
Copy link
Member Author

To your question re: MemoryExtensions.Split(..., StringSplitOptions), the discussion starting at https://youtu.be/dfzXrQCINHM?t=2948 gave the reason. The minimum viable product was to have no options at all, adding in the future if we get feedback that people need it.

@stephentoub
Copy link
Member

Calling (a, b, c, d, e) = span.Split(',') is essentially equivalent to passing maxValues: 5.

I'm not understanding what the API signature would be. You're saying you could write one method that would support any decomposition, dictated by the call site? How do you do that without explicitly adding a decompose method for each desired arity?

@layomia
Copy link
Contributor

layomia commented May 27, 2020

I don't see that API accepting StringSplitOptions. I assume that was intentional

The minimum viable product was to have no options at all, adding in the future if we get feedback that people need it.

This being said, I'll carry on with #295 to unblock existing scenarios, and we extend/add more API as needed.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants