Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Make String.split behaviour more like Python #2713

Closed
wants to merge 1 commit into from

Conversation

bgreni
Copy link

@bgreni bgreni commented May 17, 2024

Related to #2686 as I noticed our implementation of String.split doesn't behave like the Python version

  • Allow delimiter arg to be optional, default to all whitespace if is None
  • Implement the slightly distinct behaviour between passing an explicit delimiter and not doing so
  • Add maxsplit arg

@bgreni bgreni requested a review from a team as a code owner May 17, 2024 15:52
@bgreni bgreni changed the title Make String.split behaviour more like Python [stdlib] Make String.split behaviour more like Python May 17, 2024
return self.split(str(delimiter), maxsplit)

fn split(
self, delimiter: Optional[String] = None, maxsplit: Int = Int.MAX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having delimiter taking an Optional, we should probably use an overload.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the practical advantage be of doing it that way you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the implementation in the two branches to overloads.

fn split(self, maxsplit: Int = ...)
fn split(self, delimiter: String, maxsplit: Int = ...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I wasn't convinced at first but on second thought I think you're right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer fewer overloads in this case. Firstly, it wouldn't be necessary to define the default value for maxsplit twice. Secondly, a call to some_string.split(bar) could become ambiguous if bar is implicitly convertible to both String and Int. Given that we cannot make these arguments keyword-only (because the Python API dictates that these are positional-or-keyword passable), a single overload is the safest way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think it's actually the other way around: it doesn't make much sense to have this function taking an optional value.

Copy link

@siitron siitron May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have it take an Optional since some_string.split(None) is completely valid in Python.

Also, maxsplit should default to -1. (since some_string.split(maxsplit=-1) is the same as doing some_string.split())

Comment on lines 1337 to 1339
fn split(
self, delimiter: StringRef, maxsplit: Int = Int.MAX
) raises -> List[String]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this overload? I could understand having StringRef.split first, and maybe delegating to that?

Copy link
Author

@bgreni bgreni May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my testing implicit conversion between StringLiteral and Optional[String] doesn't work so any invocations like String.split(",") must become String.split(String(",")) without this.

- Allow delimiter arg to be optional, default to all whitespace if is None
- Implement the slightly distinct behaviour between passing
  an explicit delimiter and not doing so
- Add maxsplit arg

Signed-off-by: Brian Grenier <grenierb96@gmail.com>
@martinvuyk
Copy link

please close this PR since I'm working on the same issue in #2711 and I opened the issue itself #2686

assert_equal(
String("a b c d e").split(maxsplit=2).__str__(), "['a', 'b', 'c d e']"
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are two more edge cases, I don't think they'll pass at the moment:

Suggested change
assert_equal(String("a ").split(" ", 1).__str__(), "['a', '']")
assert_equal(String("a ").split(None, 1).__str__(), "['a']")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please stop reviewing this PR since I've solved these issues and others in #2711 and I opened the issue itself #2686

@bgreni bgreni closed this May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants