Skip to content

Conversation

monarchdodra
Copy link
Collaborator

This fixes:

  • Issue 6730 - std.algorithm.splitter conflicts with std.array.splitter
  • Issue 6791 - std.algorithm.splitter random indexes utf strings
  • Issue 8013 - splitter() and split() give different results

(links:)

as well as "unlocks" splitter!pred for forward ranges. Also fixes the splitting behavior of splitter!pred, which was just wrong, which, in turn, also fixes std.array.splitter.


FWI, this is a toned down failed pull I had oppened 9 months ago: #934.

This fix can be review in 3 parts
1: A complet splitter!pred rewrite

I did this because splitter!pred was completely inconsistent with the rest of the splitter functions:

  • leading delimiters where kept, but (edited) leading trailing delimiters where omitted
  • empty tokens where kept, but multiple runs of empty tokens where merged into a single token.

The new implementation has the exact same behavior as the other splitters. Also, the new implementation works on forward ranges. The new implementation is arguably simpler, even though it handles more types.

2: This leads to a std.array.splitter(string) rewrite:

std.array.split has a string specific overload that takes no teminator: This special case takes a string, and splits it on unicode white, and returns no empty tokens. Unfortunatly, std.array.splitter(string) was simply return std.algorithm.splitter!(std.uni.isWhite). This means that split and splitter did not have the same output. Un-acceptable. Now it does.

3: Deprecate std.algorithm.splitter(string)

This string function specific function has no reason to be in algorithm. The only thing it is doing, is creating ambiguity for users who have both algorithm and string included.

(4: Also included: std.array.split!pred, which was missing for some stringe reason)

@ghost
Copy link

ghost commented Aug 22, 2013

Fixes Issue 6730 - std.algorithm.splitter conflicts with std.array.splitter

Just about the best thing I've read all day!

static if (fullSlicing)
return _input[0 .. _end];
else
return _input.take(_end);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be _end - 1 to match the full slicing behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are n elements in [0 .. n], so ... not really. Ideally, I could just use return _input.take(_end); for all code paths, as take knows how to slice... except for strings. Also, I need to change that code to return _input.save.takeExactly(_end);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, forget I said anything, I misread the code.

@monarchdodra
Copy link
Collaborator Author

Fixes Issue 6730 - std.algorithm.splitter conflicts with std.array.splitter

Just about the best thing I've read all day!

Don't get your hopes up too high yet, I just marked one function as deprecated, so its not fully fixed until it is removed.

if (_next.empty)
{
_input = _next;
_end = _end.max;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OT: I know you didn't introduce this, but I don't like the usage of .max on a variable of a non-UDT. size_t.max or typeof(_end).max could work, but just .max really confused me here (since .max is really the property of the type and not the variable).

Anyway you can leave it in, no big deal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure, I'll fix it. No problem.

@ghost
Copy link

ghost commented Oct 24, 2013

This will need a rebase. Otherwise LGTM. There's a semantic change here (a good one I believe), so I'll need some input from others on whether it's ok to move forward. @jmdavis @andralex.

@monarchdodra
Copy link
Collaborator Author

All points addressed.

@jmdavis
Copy link
Member

jmdavis commented Nov 9, 2013

@monarchdodra Can you please add some tests which specifically verify that split's behavior and splitter's behavior match? That should reduce the odds of the two getting out of sync again and highlights the fact that they're supposed to act the same.

Other than that, I think that this is ready to merge. I think that these changes are sorely needed as splitter is highly broken without this.

@monarchdodra
Copy link
Collaborator Author

Can you please add some tests which specifically verify that split's behavior and splitter's behavior match.

Nice, I was able to catch that there is actually no split(range, elem), so I added that.

I also found an existing little discrepancy though, when splitting an empty range:

import std.stdio, std.algorithm;

void main()
{
    int[] input;
    writefln("%s: %s", `splitter(input, 0)`, splitter(input, 0));
    writefln("%s: %s", `splitter(input, [0])`, splitter(input, [0]));
    writefln("%s: %s", `splitter!"a == 0"(input)`, splitter!"a == 0"(input));
}

Produces:

splitter(input, 0): [[]]
splitter(input, [0]): []
splitter!"a == 0"(input): []

This seems wrong to me. I don't think splitting an empty range with an element should create any tokens at all. In particular, while I could accept a difference in beahvior between elem/range splitting, I do not accept a difference with elem/pred. The buggy behavior is in split elem, right?

I'll see if I can easily fix it here, but if not, I'll leave it as is, and deal with it later (I was planning to make some fixup changes to the other splitter in another pull).

@jmdavis
Copy link
Member

jmdavis commented Nov 13, 2013

@monarchdodra Splitting on an empty range should definitely result in an empty range. I actually had to switch to using split in some of my code recently, precisely because splitter currently does this wrong (and I'd much prefer to be using splitter, because I don't want the extra allocation that split incurs). And I see no reason for splitting on an element or a range to be any different. Conceptually, I see no difference between them, and I see no reason why it would ever make sense for splitting nothing to give you something - regardless of what you're trying to split that nothing on.

@monarchdodra
Copy link
Collaborator Author

@jmdavis : Fixed. Good for final review.

//@@@6730@@@ This exists already in std.array, so this declaration, at best, will only create ambiguity.
//unfortunatly, an alias will conflict with the existing splitter in std.algorithm.
//It needs to be removed.
deprecated("std.algorithm.splitter(string) is deprecated in favor of std.algortihm.splitter(string)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick nitpick. You misspelled "algorithm" in the message (you also misspelled "unfortunately" in the comment, but the deprecation message actually matters, since that'll end up in the deprecation warning). Also, this says that you're deprecating std.algorithm.splitter in favor of... itself. So, presumably, the misspelled algorithm is actually supposed to be array.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, I remember fixing that already...

jmdavis added a commit that referenced this pull request Nov 18, 2013
Fix splitter!pred and splitter(string)
@jmdavis jmdavis merged commit d408470 into dlang:master Nov 18, 2013
@monarchdodra monarchdodra deleted the splitterPred branch December 19, 2013 18:46
@andralex
Copy link
Member

This seems to have caused https://d.puremagic.com/issues/show_bug.cgi?id=11701. What's the problem with just leaving a forwarding function in std.algorithm?

@monarchdodra
Copy link
Collaborator Author

it's ambiguous. The idea is to remove the deprecated function asap, and the problem will spontaneously solve.

@monarchdodra
Copy link
Collaborator Author

I'll give a thoroughly detailed explanation and motivation about this tomorrow. Sleep time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants