-
-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 15735 #4030
Fix issue 15735 #4030
Conversation
|
|
Honestly, I'm inclined to argue that the documentation be changed, not the implementation. Changing the implementation risks breaking code, and I don't understand why the behavior that the documentation gives would even be desirable. If you have an empty range, why would splitting it result in anything but an empty range? What would be the point? |
|
It is quite improbable that the change will break code. The existing code should be able to work with empty elements in result, or it should uses |
|
But how can you split an empty range? There's nothing to split! I don't see how splitting an empty range into more empty ranges even makes sense conceptually. I can understand running into problems when the documentation says one thing, and the code does another, but I don't understand why any code would ever be looking to have an empty range be split and end up with anything but an empty range. You end up with weird stuff like I can't think of any case where we have a range-based algorithm that gives you a non-empty range from an empty range. That just seems like it's begging for trouble. What use case do you have where it actually matters that you get a non-empty range of empty ranges from an empty range with |
|
It is the same as with foreach (e; "foo;bar;;baz".splitter(';'))
{
// need to handle ""
}if you don't want foreach (e; "foo;bar;;baz".splitter(';').filter!(a => !a.empty))
{
// no need to handle ""
}Actually testing for |
result = foo.splitter(';');
writeln("%s", result);
writeln("%s delimiters found", result.length - 1); |
|
@jmdavis I think this PR makes sense, because it actually makes the behaviour more consistent: assert(equal(splitter("abc", ' '), ["abc"]));
assert(equal(splitter("ab", ' '), ["ab" ]));
assert(equal(splitter("a", ' '), ["a" ]));
assert(equal(splitter("", ' '), ["" ])); // before this PR: [] |
I am not sure whether I can follow, here's what happens in Python: And that's what I expected - an array/range with zero elements and it works nicely with loops!
+1 @jmdavis - maybe the docs should be updated then? |
|
@greenify try the same in python, but specify the separator " ".split(" ") # ['', '']
"".split(" ") # ['']What you found corresponds to the 4th splitter overload, the one with no separator specified. It joins the whitespace separators and returns, as in python, the empty range for the empty input. This overload is correct documented and is not affected by this PR. |
|
Agree with @dcarp and @schuetzm assert(";;".splitter(";").equal(["", "", ""]));
assert(";" .splitter(";").equal(["", ""]));
assert("" .splitter(";").equal([""])); // fails without PRThis is terribly inconsistent, and requires extra user code to handle this case. Noting: even with this PR, Regarding deprecation, I'm trying to imagine that there's code that uses |
|
@schveiguy The consistency check looks like this: assert(" a b c".splitter.equal(["a", "b", "c"]);
assert(" a b" .splitter.equal(["a", "b"]);
assert(" a" .splitter.equal(["a"]);
assert("" .splitter.equal([]);Regarding the deprecation of the first cases, you are right. The special case will not trigger, and the empty element will be handled as a empty element in the middle of the result. |
right, that makes sense. Thanks. |
|
Other objections regarding this PR? I think we should bring the documentation in sync with the implementation ASAP. |
Thanks, but at least in Python the behavior is different to the one you mentioned: Maybe you want to add this to the unittest ;-) |
|
@greenify isn't it the test on line 3734-3735? compare("", [""]);
compare(" ", ["", ""]); |
|
@dcarp sorry - my fault. |
|
Status of this? I think it makes sense. Going to merge? |
|
@jmdavis can you think of a situation where this will break code? I think any code that processes splitter has to handle the case that there is an empty element anyway. The special situation for there being an empty range would have to just be extra code that isn't triggered. But it shouldn't actually break the code. |
|
FWIW: Just ran into this. Situation: reading text lines, splitting on a delimiter. Adjacent delimiters form an empty string. A line with a single delimiter becomes two strings. A empty line forms a single, empty string. In-line with the arguments about "consistent behavior". Using strings for each line, and '#' as delimiter, the logical interpretation is: The last case doesn't work at present, but should be fixed by the pull request. The obvious work-around I'll have to write (check for an empty range) will continue to work after the pull request. |
|
Maybe somebody can merge this PR. It seams that there are no concerns that were not addressed. |
|
Auto-merge toggled on |
|
This pull request introduced a regression: |
https://issues.dlang.org/show_bug.cgi?id=15735