Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String maxsplit #10385

Open
dlangBugzillaToGithub opened this issue Aug 31, 2019 · 6 comments
Open

String maxsplit #10385

dlangBugzillaToGithub opened this issue Aug 31, 2019 · 6 comments

Comments

@dlangBugzillaToGithub
Copy link

srpen6 reported this on 2019-08-31T16:17:43Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=20184

CC List

  • dlang (@Vild)
  • jrdemail2000-dlang
  • sascha.orlov

Description

D seems to have no way to limit the number of splits done on a string. This is
possible with Go:

    strings.SplitN("one two three", " ", 2)

also Nim:

   "one two three".split(maxsplit = 1)

also Python:

    'one two three'.split(maxsplit = 1)

also PHP:

    explode(' ', 'one two three', 2);

also Ruby:

    'one two three'.split(nil, 2)
@dlangBugzillaToGithub
Copy link
Author

jrdemail2000-dlang commented on 2019-09-01T20:16:52Z

This can be achieved using 'splitter' and 'take' or another range iteration algorithm that limits the number of candidates selected.

e.g.

assert("a|bc|def".splitter('|').take(4).equal([ "a", "bc", "def" ]));
assert("a|bc|def".splitter('|').take(3).equal([ "a", "bc", "def" ]));
assert("a|bc|def".splitter('|').take(2).equal([ "a", "bc" ]));
assert("a|bc|def".splitter('|').take(1).equal([ "a" ]));

'splitter' (from std.algorithm) is a lazy version of 'split', which is eager. It produces an input range. 'take' (from std.range) takes the first N elements from an input range. 'take' is also lazy. To convert it to a fully realized array similar to the result of 'split' use 'array' (from std.array) or another range "eager" range algorithm. e.g.

auto x = "a|bc|def".splitter('|').take(2).array;
assert(x.length == 2);
assert (x[0] == "a");
assert (x[1] == "bc");

@dlangBugzillaToGithub
Copy link
Author

srpen6 commented on 2019-09-01T20:24:46Z

(In reply to Jon Degenhardt from comment #1)
> This can be achieved using 'splitter' and 'take' or another range iteration
> algorithm that limits the number of candidates selected.
> 
> e.g.
> 
> assert("a|bc|def".splitter('|').take(4).equal([ "a", "bc", "def" ]));
> assert("a|bc|def".splitter('|').take(3).equal([ "a", "bc", "def" ]));
> assert("a|bc|def".splitter('|').take(2).equal([ "a", "bc" ]));

It seems you have a profound misunderstand of what split limiting is. Here is a
result with Python:

    >>> 'one two three'.split(maxsplit = 1)
    ['one', 'two three']

as you can see, it doesnt discard any part of the original input, instead it
stops splitting after the specified amount, and puts the rest of the string as
the final element.

@dlangBugzillaToGithub
Copy link
Author

jrdemail2000-dlang commented on 2019-09-01T21:43:11Z

(In reply to svnpenn from comment #2)
> (In reply to Jon Degenhardt from comment #1)
> Here is a result with Python:
> 
>     >>> 'one two three'.split(maxsplit = 1)
>     ['one', 'two three']
> 
> as you can see, it doesnt discard any part of the original input, instead it
> stops splitting after the specified amount, and puts the rest of the string
> as the final element.

Thanks for clarify what you are looking for. This is a useful refinement of the original description, which is:

> D seems to have no way to limit the number of splits done on a string.

D does have a way to limit the number of splits, but as you point out, this mechanism doesn't preserve the remainder of the string in the fashion available in a number of other libraries.

@dlangBugzillaToGithub
Copy link
Author

sascha.orlov commented on 2019-09-02T06:56:02Z

As a workaround, this is possible: 

´´´
import std;

void main()
{
    "one two three four".fun1(1).writeln; 
    "one two three four".fun2(2).writeln; 
}

auto fun1(string s, size_t num)
{
    size_t summe; 
    auto r = s.splitter(' ').take(num).tee!(a => summe += a.length + 1).array;  
    return r ~ s[summe .. $];
}

auto fun2(string s, size_t num)
{
    auto i = s.splitter(' ').take(num);
    return i.array ~ s[i.map!(el => el.length).sum + num .. $];
}
´´´

If the splitter construct allowed public access to its underlying range, more convenient solutions were possible.

@dlangBugzillaToGithub
Copy link
Author

srpen6 commented on 2019-09-07T19:33:29Z

Here is a better workaround:

    import std.format, std.stdio;
    void main() {
       string s1 = "one two three", s2, s3;
       s1.formattedRead("%s %s", s2, s3);
       writeln(s2);
       writeln(s3);
    }

@dlangBugzillaToGithub
Copy link
Author

dlang (@Vild) commented on 2019-09-19T18:55:04Z

I've had a look at this. I think it's not feasable to add an other parameter "maxsplit" to split. Internally split uses splitter and splitter works with BidirectionalRange. That means, for implementing back, splitter has to go through all elements from the front to find the correct breakpoint. That breaks lazyness, which in my eyes is not desirable.

Therefore I think it would be better to implement separate functions splitN and splitterN. splitterN would then be restricted to ForwardRange.

@LightBender LightBender removed the P4 label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants