Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: variant of string.split(param maxsplit) that returns a (maxsplit+1)*string tuple #21226

Open
cassella opened this issue Dec 10, 2022 · 3 comments

Comments

@cassella
Copy link
Contributor

Summary of Problem

I think it would be convenient to have a string/bytes split-like method that can return a tuple instead of as an iter.

If there aren't actually maxsplit splits available in the string, the remainder of the returned tuple can be "".

I can't think how to use the method name split() for it and still disambiguate from the existing iterators. (Without adding a new named arg to use as a flag.)

A function like this is sketched out in #15606, but that issue is about error handling in casting the resulting tuple.

Associated Future Test(s):

None yet

@bradcray
Copy link
Member

I generally like this idea. I've been using partition to do this for the "split into 2" case, but split feels more natural and it'd be nice to be able to do more than just two.

To make sure I'm correctly anticipating your reluctance to simply overloading: Is it that it would be too weird to have var a = myString.split(3) and var b = myString.split(myVar) generate different types for a and b? If so, I think I agree with that. (at first I wasn't sure whether we even supported iter and proc overloads, but it seems we do).

Just throwing out a strawperson, what about:

proc string.splitToTuple(separator: string = " ",param numElems) { ... }   // or numStrings? (but that doesn't have a good bytes equivalent) or numResults?

An obvious downside to my choice is that it doesn't match maxsplit for the current signature... But somehow I feel that saying "Please give me n things back" is so much simpler than saying "Please cut this k times". Probably the existing precedent is too strong to change here, though (I'm assuming we took maxsplit from ... Python?).

@cassella
Copy link
Contributor Author

I generally like this idea. I've been using partition to do this for the "split into 2" case, but split feels more natural and it'd be nice to be able to do more than just two.

I figured you would, since you wrote it in that other issue. :)

To make sure I'm correctly anticipating your reluctance to simply overloading: Is it that it would be too weird to have var a = myString.split(3) and var b = myString.split(myVar) generate different types for a and b? If so, I think I agree with that.

Yes. I didn't even consider that as an option due to its weirdness
and change in existing code.

Just throwing out a strawperson, what about:

proc string.splitToTuple(separator: string = " ",param numElems) { ... }   // or numStrings? (but that doesn't have a good bytes equivalent) or numResults?

That seems reasonable to me.

An obvious downside to my choice is that it doesn't match maxsplit for the current signature... But somehow I feel that saying "Please give me n things back" is so much simpler than saying "Please cut this k times". Probably the existing precedent is too strong to change here, though (I'm assuming we took maxsplit from ... Python?).

It also seems too confusing to me to have the two versions have
different ways of specifying how many things to split into, though I
agree maxsplit isn't very intuitive. I see a maxsplit in Python's
split().

I'm still trying to figure out how to keep using the split() name,
since split() is what I reach for when I want this behavior. :)

Like

proc string.split(separator: string = " ", maxsplit = -1, param maxTupleSplit = -1) where tupleMaxSplit > 0) {
 // Generate the tuple and ignore maxsplit even if given
}

or

proc string.split(separator: string = " ", param maxsplit, param makeTuple:bool) where makeTuple {
 // Generate the tuple according to maxsplit

Those feels kinda weird though. The latter seems less weird, but
would require specifying two arguments.

Maybe making this more complicated than it needs to be, but there
could be both a tupleMaxSplit version and a numTupleElems variant. (I also think the word "tuple" should appear when this is called, so that in uses like

A = split(maxsplit=3);
B = split(tupleMaxSplit=3);

that it's obvious that they're returning different types of things,
even to someone not familiar with the new functions. (Maybe that
argues for not having be an overload of split().)

I don't remember, is there a way to make these new ones keyword-only
arguments? If there is, my first version above could drop the ignored
maxsplit = -1 as split(3) wouldn't conflict with it.

@bradcray
Copy link
Member

Here's a different take on this which takes in the tuple type as an argument:

proc string.splitToTuple(sep: string, type t) where isTuple(t) {
  var stringArr = this.split(sep);
  var tup: t;
  for param i in 0..<tup.size {
    if i < stringArr.size {
      tup[i] = stringArr[i]: tup[i].type;
    }
  }
  return tup;
}

// destructure into A 3-tuple of strings
var t1 = "Hello, there, Roger.".splitToTuple(",", 3*string);
writeln(t1, ": ", t1.type:string);

// destructure into distinct types
var t2 = "Label, 42.5, 1001".splitToTuple(",", (string, real, int));
writeln(t2, ": ", t2.type:string);

// destructure into a too-big tuple (extra slots left unfilled)
var t3 = "Hello, there, Roger.".splitToTuple(",", 5*string);
writeln(t3, ": ", t3.type:string);

// destructure into a too-big tuple (extra slots left unfilled)
var t4 = "Label, 42.5, 1001".splitToTuple(",", (string, real, int, bool, complex));
writeln(t4, ": ", t4.type:string);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants