Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] [Feature Request] Refactor String to not have so many raises and use Optional #2686

Closed
1 task done
martinvuyk opened this issue May 16, 2024 · 6 comments
Closed
1 task done
Labels
enhancement New feature or request good first issue Good for newcomers mojo-repo Tag all issues with this label

Comments

@martinvuyk
Copy link

martinvuyk commented May 16, 2024

Review Mojo's priorities

What is your request?

As title.

What is your motivation for this change?

Could have a fn () raising -> Int and an fn () -> Optional[Int]

fn atol(str: String, base: Int = 10) raises -> Int:
    return _atol(str._strref_dangerous(), base)

fn atol_safe(str: String, base: Int = 10) -> Optional[Int]:
    try:
      return _atol(str._strref_dangerous(), base)(str, base)
    except:
      return None

This could just return a list with one item, it being the whole String. if not delimiter: return List[String](self)

    fn split(self, delimiter: String) raises -> List[String]:
        """Split the string by a delimiter.

        Args:
          delimiter: The string to split on.

        Returns:
          A List of Strings containing the input split by the delimiter.

        Raises:
          Error if an empty delimiter is specified.
        """
        if not delimiter:
            raise Error("empty delimiter not allowed to be passed to split.")

This will most certainly always be checked against anyway with if index == -1: ... so it could also be made an Optional[Int]

fn find(self, substr: String, start: Int = 0) -> Int:
        """Finds the offset of the first occurrence of `substr` starting at
        `start`. If not found, returns -1.

        Args:
          substr: The substring to find.
          start: The offset from which to find.

        Returns:
          The offset of `substr` relative to the beginning of the string.
        """

        return self._strref_dangerous().find(
            substr._strref_dangerous(), start=start
        )

Any other details?

No response

@martinvuyk martinvuyk added enhancement New feature or request mojo-repo Tag all issues with this label labels May 16, 2024
@martinvuyk martinvuyk changed the title [Feature Request] Refactor String to not have so many raises and use Optional [stdlib] [Feature Request] Refactor String to not have so many raises and use Optional May 16, 2024
@JoeLoser JoeLoser added mojo-stdlib Tag for issues related to standard library good first issue Good for newcomers labels May 17, 2024
@siitron
Copy link

siitron commented May 17, 2024

For Python interop split should default to split on any whitespace, and find should return -1 if the substring isn't found.

@martinvuyk
Copy link
Author

For Python interop split should default to split on any whitespace, and find should return -1 if the substring isn't found.

understood. how does this look ?

    fn split(self, owned delimiter: String = " ") -> List[String]:
        """Split the string by a delimiter.

        Args:
          delimiter: The string to split on.

        Returns:
          A List of Strings containing the input split by the delimiter.

        """
        if not delimiter:
            delimiter = " "
        ...

@siitron
Copy link

siitron commented May 17, 2024

Whitespace could be more than just spaces. I think all of these count as whitespace:

  • Spaces: ' '
  • Tabs: '\t'
  • Newlines: '\n'
  • Carriage returns: '\r'
  • Vertical tabs: '\v'
  • Form feeds: '\f'

Also, note that, for example, "a\n\na".split() isn't the same as "a\n\na".split("\n") in Python. The default split will treat continuous whitespace as one separator.

EDIT: There's also File Separator, Group Separator, Record Separator, & Unit Separator.

@martinvuyk
Copy link
Author

Whitespace could be more than just spaces. I think all of these count as whitespace:

wow I had no idea. The current implementation doesn't take that into account, will try to imitate python. But this will absolutely kill performance lol.

@bgreni
Copy link

bgreni commented May 17, 2024

@martinvuyk Sorry I didn't notice you already had a PR making changes to the String.split.

I've tried to address these differences and added a missing maxsplit arg in #2713

@martinvuyk
Copy link
Author

@martinvuyk Sorry I didn't notice you already had a PR making changes to the String.split.

I've tried to address these differences and added a missing maxsplit arg in #2713

please close that PR I'm currently working on that

also there are some characters missing still in String.WHITESPACE to have Python's behaviour which I'm also adressing. And I'm also trying to document this thoroughly and add more tests.

Also what do you think of putting the maxsplit in params? at least that way the compiled code wont have so many branches. Iit'll be less pythonic but I think python devs will manage (?).

    fn split[
        maxsplit: Int = -1
    ](self, owned delimiter: String = "") -> List[String]:
    ...

@ematejska ematejska removed the mojo-stdlib Tag for issues related to standard library label May 20, 2024
modularbot added a commit that referenced this issue May 29, 2024
… behavior to be pythonic (#40714)

[External] [stdlib] Make `String.split()` default to whitespace & fix
behavior to be pythonic

Closes #2686
`String.split()` now defaults to whitespace and has pythonic behavior in
that it removes all adjacent whitespaces by default.

ORIGINAL_AUTHOR=martinvuyk
<110240700+martinvuyk@users.noreply.github.com>
PUBLIC_PR_LINK=#2711

---------

Co-authored-by: martinvuyk <110240700+martinvuyk@users.noreply.github.com>
Closes #2711
MODULAR_ORIG_COMMIT_REV_ID: b6a05e97f8de09a2c77b272b6cd8b96da3c5c782
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers mojo-repo Tag all issues with this label
Projects
None yet
Development

No branches or pull requests

5 participants