-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Make String.split behaviour more like Python #2713
Conversation
stdlib/src/builtin/string.mojo
Outdated
return self.split(str(delimiter), maxsplit) | ||
|
||
fn split( | ||
self, delimiter: Optional[String] = None, maxsplit: Int = Int.MAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having delimiter
taking an Optional
, we should probably use an overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the practical advantage be of doing it that way you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the implementation in the two branches to overloads.
fn split(self, maxsplit: Int = ...)
fn split(self, delimiter: String, maxsplit: Int = ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I wasn't convinced at first but on second thought I think you're right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer fewer overloads in this case. Firstly, it wouldn't be necessary to define the default value for maxsplit
twice. Secondly, a call to some_string.split(bar)
could become ambiguous if bar
is implicitly convertible to both String
and Int
. Given that we cannot make these arguments keyword-only (because the Python API dictates that these are positional-or-keyword passable), a single overload is the safest way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think it's actually the other way around: it doesn't make much sense to have this function taking an optional value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to have it take an Optional since some_string.split(None)
is completely valid in Python.
Also, maxsplit
should default to -1
. (since some_string.split(maxsplit=-1)
is the same as doing some_string.split()
)
stdlib/src/builtin/string.mojo
Outdated
fn split( | ||
self, delimiter: StringRef, maxsplit: Int = Int.MAX | ||
) raises -> List[String]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this overload? I could understand having StringRef.split
first, and maybe delegating to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my testing implicit conversion between StringLiteral
and Optional[String]
doesn't work so any invocations like String.split(",")
must become String.split(String(","))
without this.
- Allow delimiter arg to be optional, default to all whitespace if is None - Implement the slightly distinct behaviour between passing an explicit delimiter and not doing so - Add maxsplit arg Signed-off-by: Brian Grenier <grenierb96@gmail.com>
assert_equal( | ||
String("a b c d e").split(maxsplit=2).__str__(), "['a', 'b', 'c d e']" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are two more edge cases, I don't think they'll pass at the moment:
assert_equal(String("a ").split(" ", 1).__str__(), "['a', '']") | |
assert_equal(String("a ").split(None, 1).__str__(), "['a']") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to #2686 as I noticed our implementation of
String.split
doesn't behave like the Python version