Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue 15735 #4030

Merged
merged 1 commit into from
Apr 26, 2016
Merged

Fix issue 15735 #4030

merged 1 commit into from
Apr 26, 2016

Conversation

dcarp
Copy link
Contributor

@dcarp dcarp commented Feb 29, 2016

@dlang-bot
Copy link
Contributor

Fix Bugzilla Description
15735 std.algorithm.iteration.splitter returns empty range

@jmdavis
Copy link
Member

jmdavis commented Mar 1, 2016

Honestly, I'm inclined to argue that the documentation be changed, not the implementation. Changing the implementation risks breaking code, and I don't understand why the behavior that the documentation gives would even be desirable. If you have an empty range, why would splitting it result in anything but an empty range? What would be the point?

@dcarp
Copy link
Contributor Author

dcarp commented Mar 1, 2016

It is quite improbable that the change will break code. The existing code should be able to work with empty elements in result, or it should uses filter as suggested (which will still work with this PR).
In my case it was the opposite: I expected that the result contains at least one element (return the input if no split found) and crashed some months later.
I argue that splitter should return the empty result only if it would skip the empty elements, but this is not the case.

@jmdavis
Copy link
Member

jmdavis commented Mar 1, 2016

But how can you split an empty range? There's nothing to split! I don't see how splitting an empty range into more empty ranges even makes sense conceptually. I can understand running into problems when the documentation says one thing, and the code does another, but I don't understand why any code would ever be looking to have an empty range be split and end up with anything but an empty range. You end up with weird stuff like

foreach(e; splitter("", ';'))
{
    // this code gets executed.
}

I can't think of any case where we have a range-based algorithm that gives you a non-empty range from an empty range. That just seems like it's begging for trouble.

What use case do you have where it actually matters that you get a non-empty range of empty ranges from an empty range with splitter?

@dcarp
Copy link
Contributor Author

dcarp commented Mar 1, 2016

It is the same as with

foreach (e; "foo;bar;;baz".splitter(';'))
{
    // need to handle ""
}

if you don't want "" then use filter as in the documentation

foreach (e; "foo;bar;;baz".splitter(';').filter!(a => !a.empty))
{
    // no need to handle ""
}

Actually testing for empty in splitter is additional work. boost::algorithm::split, Python or Qt are also returning the empty element.

@dcarp
Copy link
Contributor Author

dcarp commented Mar 1, 2016

I can't think of any case where we have a range-based algorithm that gives you a non-empty range from an empty range. That just seems like it's begging for trouble.

splitter returns a range of ranges. It is not the same type.

What use case do you have where it actually matters that you get a non-empty range of empty ranges from an empty range with splitter?

result = foo.splitter(';');
writeln("%s", result);
writeln("%s delimiters found", result.length - 1);

@schuetzm
Copy link
Contributor

schuetzm commented Mar 6, 2016

@jmdavis I think this PR makes sense, because it actually makes the behaviour more consistent:

assert(equal(splitter("abc", ' '), ["abc"]));
assert(equal(splitter("ab",  ' '), ["ab" ]));
assert(equal(splitter("a",   ' '), ["a"  ]));
assert(equal(splitter("",    ' '), [""   ])); // before this PR: []

@wilzbach
Copy link
Member

boost::algorithm::split, Python or Qt are also returning the empty element.

I am not sure whether I can follow, here's what happens in Python:

" ".split() // []
"".split() // []

And that's what I expected - an array/range with zero elements and it works nicely with loops!

If you have an empty range, why would splitting it result in anything but an empty range? What would be the point?

+1 @jmdavis - maybe the docs should be updated then?

@dcarp
Copy link
Contributor Author

dcarp commented Mar 12, 2016

@greenify try the same in python, but specify the separator

" ".split(" ") # ['', '']
"".split(" ") # ['']

What you found corresponds to the 4th splitter overload, the one with no separator specified. It joins the whitespace separators and returns, as in python, the empty range for the empty input. This overload is correct documented and is not affected by this PR.

@dcarp dcarp mentioned this pull request Mar 19, 2016
@schveiguy
Copy link
Member

Agree with @dcarp and @schuetzm

assert(";;".splitter(";").equal(["", "", ""]));
assert(";" .splitter(";").equal(["", ""]));
assert(""  .splitter(";").equal([""])); // fails without PR

This is terribly inconsistent, and requires extra user code to handle this case.

Noting: even with this PR, "".splitter is empty. This is inconsistent, but changing would require some extra machinery. IMO, we should change this too.

Regarding deprecation, I'm trying to imagine that there's code that uses splitter that would fail if this now becomes non-empty. It seems to me, this is simply a special case that will no longer trigger, no?

@dcarp
Copy link
Contributor Author

dcarp commented Mar 19, 2016

@schveiguy foo.splitter joins the white-spaces and in this case an empty token is not expected. Here the implementation is consistent with the documentation. I would not change this.

The consistency check looks like this:

assert(" a b c".splitter.equal(["a", "b", "c"]);
assert(" a b"  .splitter.equal(["a", "b"]);
assert(" a"    .splitter.equal(["a"]);
assert(""      .splitter.equal([]);

Regarding the deprecation of the first cases, you are right. The special case will not trigger, and the empty element will be handled as a empty element in the middle of the result.

@schveiguy
Copy link
Member

in this case an empty token is not expected

right, that makes sense. Thanks.

@dcarp
Copy link
Contributor Author

dcarp commented Mar 28, 2016

Other objections regarding this PR? I think we should bring the documentation in sync with the implementation ASAP.

@wilzbach
Copy link
Member

@greenify try the same in python, but specify the separator

Thanks, but at least in Python the behavior is different to the one you mentioned:

'  '.split(' ') \\ ['', '', '']
' '.split(' ') \\ ['', '']
''.split(' ') \\ ['']

Maybe you want to add this to the unittest ;-)

@dcarp
Copy link
Contributor Author

dcarp commented Mar 28, 2016

@greenify isn't it the test on line 3734-3735?

    compare("", [""]);
    compare(" ", ["", ""]);

@wilzbach
Copy link
Member

@dcarp sorry - my fault.

@DmitryOlshansky
Copy link
Member

Status of this? I think it makes sense. Going to merge?

@schveiguy
Copy link
Member

@jmdavis can you think of a situation where this will break code? I think any code that processes splitter has to handle the case that there is an empty element anyway. The special situation for there being an empty range would have to just be extra code that isn't triggered. But it shouldn't actually break the code.

@jondegenhardt
Copy link
Contributor

FWIW: Just ran into this. Situation: reading text lines, splitting on a delimiter. Adjacent delimiters form an empty string. A line with a single delimiter becomes two strings. A empty line forms a single, empty string. In-line with the arguments about "consistent behavior". Using strings for each line, and '#' as delimiter, the logical interpretation is:

"abc#def" => ["abc", "def"]
"abc#" => ["abc", ""]
"#abc" => ["", "abc"]
"#" => ["", ""]
"abc" => ["abc"]
"" => [""]    

The last case doesn't work at present, but should be fixed by the pull request. The obvious work-around I'll have to write (check for an empty range) will continue to work after the pull request.

@dcarp
Copy link
Contributor Author

dcarp commented Apr 25, 2016

Maybe somebody can merge this PR. It seams that there are no concerns that were not addressed.

@Hackerpilot
Copy link
Member

Auto-merge toggled on

@Hackerpilot Hackerpilot merged commit edbb7a1 into dlang:master Apr 26, 2016
@dcarp dcarp deleted the issue15735 branch April 26, 2016 04:59
@CyberShadow
Copy link
Member

This pull request introduced a regression:

https://issues.dlang.org/show_bug.cgi?id=16587

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants