Skip to content

Implement groupBy. #1453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 9, 2014
Merged

Implement groupBy. #1453

merged 4 commits into from
Oct 9, 2014

Conversation

quickfur
Copy link
Member

@quickfur quickfur commented Aug 5, 2013

This is the preliminary implementation of chunkBy taken from my calendar example program. Several people have expressed interest in including it in Phobos, so here it is.

It probably needs some cleanups, though; refInputRange, for example, should be put in a common place so that other Phobos unittests can use it. And we should probably create analogues for forward/bidi/random-access ranges too. (Unless these wrappers are already present somewhere and I'm just ignorant of it?)

* Returns: A range of ranges in which all elements in a given subrange share
* the same attribute with each other.
*/
auto chunkBy(alias attrFun, Range)(Range r)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in previous pulls we've sort of agreed to not introduce any more functions with string-based lambda parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the second time I've heard that. What's the rationale for this? I haven't been keeping up with phobos pulls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new lambda syntax there was growing support for completely replacing string lambdas with the new syntax. I got ahold of Andrei and asked him whether std.algorithm and std.range should continue to use them in their documentation. He got back to me at some point (presumably after discussing with Walter) and gave me the go ahead to switch the documentation examples over to the new style of lambda (while retaining support for string lambdas for backwards compatibility). This resulted in #707 which didn't get pulled (well, it did but got reverted because of a dmd bug) and then bitrot but the decision to retire string lambdas in phobos has been made by Andrei and Walter.

I think the discussion in #707 is probably the extent of the public discussion about no longer using them. My original conversation with Andrei took place on IRC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. It would be nice if such decisions were posted in a more widely-read place, like the D forums or dmd release notes. Otherwise people ignorant of the change of direction would just continue writing code in the old style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that's what really should have happened. In any case, I don't see any harm in supporting them other than maybe making maintainability a bit harder. std.algorithm/.range both still use them in all of their examples so nothing has really changed since this decision (which seems to a trend with D, silent deprecations seem to have become very common).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so should unaryFun stay, or not?

I vote yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguments for string lambdas are shot down time and again. They weren't a failure - they were the best we could do to fix an important problem, but by now they're definitely obsolete and I'd hate to propagate them any further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are they shot down? String lambdas are way better for short stuff. I'd much prefer to do something like foo!"a == b"(args) than foo!((a, b) => a == b)(args). I wouldn't consider that obsolete at all. More complex lambdas obviously need to use the newer lambda syntax, but short, simple stuff is just plain cleaner using string lambdas. I think that it's a definite step backward to get rid of string lambdas. string lambdas and the newer lambda syntax can coexist just fine, and I see no reason to get rid of string lambdas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a thread in the NewsGroup. Hopefully we can find some kind of consensus.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say new functions should support string lambdas for now. If we decide later to ditch them, we'll do it in one shot.

@quickfur
Copy link
Member Author

ping

I've replaced the string lambdas in the ddoc unittest, so now all the examples will show only the new lambda syntax. Is this good enough for this pull?

I left unaryFun as-is in order not to introduce an inconsistency (other ranges support string lambdas, this one doesn't). IMO, if we want to get rid of string lambdas, we better do it in one go, not scattered across random marginally-related pulls.


@property bool empty()
{
return r.empty || !(curAttr == attr(r.front));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the !(a == b) instead of a != b ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to minimize assumptions about the return type of attrFun. As long as the type can be comparable with ==, it should be good enough. Although, come to think of it, I think in D != is always translated into !opEquals, right? If so, then it should probably be written as != instead. :)

@ghost
Copy link

ghost commented Aug 24, 2013

LGTM. There's some styling issues I'd note but it's all just bikeshedding so I'll let it pass.

alias attr = unaryFun!attrFun;
alias AttrType = typeof(attr(r.front));

static struct Chunk
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, when declared as a "non-template internal struct", no inference gets triggered, meaning all the functions in Chunk will basically be impure, system, and throwing.

If you instead declare it in private global namespace (as ChunkByResult(alias attrFun, Range)), then the struct becomes a template, and inference is triggered.

Please add a @safe pure nothrow unittest, and you will see (it should be anyways).

Also, when a struct can depend on a delegate (attrFun), I think you can't use voldemort, due to problems with said delegate failing to retrieve the correct context pointer under certain situations (or something of that order). This is why the Results of map (amongst others), where migrated to outside the body of their function.

TL.DR: Please declare Chunk as a private ChunkByResult in global space.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I just noticed you had Chunk and ChunkBy, so, yeah, my comment applies to both. You'll just have to look for more colorful names, in particular, ChunkResult may already exist (and is the reason I suggested ChunkByResult to begin with). Maybe ChunkByResult (the range itself) and ChunkByChunk (the subranges)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does http://d.puremagic.com/issues/show_bug.cgi?id=7511 influence this in any way (it was fixed)? I mean whether Chunk still has to be put outside of chunkBy, or if the fix for 7511 has not changed the current behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7511 didn't go deep enough.

It fixed this:

struct S(T)
{
    void foo(){} //Inferred 
}

but this still doesn't work:

auto foo(T)()
{
    struct S
    {
        void bar(){} //NOT infered.
    }
    return S();
}

Which is the case we are in. But even where it fixed, this template depends on a predicate parameter, and is subject to issue 5939: http://d.puremagic.com/issues/show_bug.cgi?id=5939 .

Note that 5939 is "resolved fix", but the conclusion was "voldermort need to be moved out", and it was fixed when it was done in std.algorithm. Also, I'm noticing it is declared static. As such, this code will fail to compile:

int x = 2;
int doit(int a){return a/x;}
auto r = [1, 2, 3, 5, 6];
chunkBy!doit(r);
Error: function main.main.chunkBy!(doit, int[]).chunkBy.Chunk.empty cannot access frame of function D main

@monarchdodra
Copy link
Collaborator

I wanted to comment that I find the implementation for ChunkBy quite particular. Not that it is bad or anything, but conceptually, it is not very far from a splitter, yet the implementation is completely different. What is interesting is that you created a Chunk struct, which itself, will iterate on the subranges.

I find there is 1 big advantage to this: It accepts input ranges. Currently, you can't do things like stdin.asRange().splitter!isWhite(), as splitter needs at least forward.

However, I also see two disadvantages: First, for non reference ranges, you iterate your condition twice: You iterate and check in chunk itself, and you iterate and check in ChunkBy also. 2: If your range is sliceable, then you are returning a Chunk object, when you could just be returning a slice of the range. EG:

int[] a = [1, 2, 3, 4, 4, 5, 5];
auto chunks = chunkBy!"a % 2"(a);
int[] firstChunk = chunks.front();

I'd recommend that you look at splitters' implementations, but they are currently not very good (I think). I re-implemented splitter!pred recently, which should be more of interest to you. In particular it is the version that accepts a predicate. The pull is #1502 .

When all that is said and done, your code works, so we can leave it at that. I'm just saying there are open doors to making it even better.


void popFront()
{
assert(!r.empty);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: The "new" standard way of doing this in phobos (apparently) is

version(assert) if (empty) throw new RangeError();

Yes, it's a bit (a lot) more verbose, but the resulting error (when triggered) feels much more "native".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame assert can't take an optional Throwable type, like enforce can. It would be cute if we could write:

assert!RangeError(!r.empty);

Of course, assert itself isn't a template, but if we end up using version(assert) a lot maybe it's time we make assert and enforce more consistent with each other.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had asked for that before on the boards. It was shot down. Oh well. Also, I think the assert wouldn't know how to build the RangeError (__LINE__/__FILE__ args) anyways.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think the assert wouldn't know how to build the RangeError (LINE/FILE args) anyways.

This is pretty standardized though.

Maybe instead we should introduce an assertEx template that uses assert internally? The mirror to enforceEx.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, the problem with that is that an assert is outright removed from a release build. In contrast, an assertEx!RangeError(!r.empty) would force the compiler to either evaluate (!r.empty) to a boolean before discarding it. Either that, or you'd make the argument passed lazily, but that implies performance penalties.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't believe how awful DMD is at removing dead-code. I've tried implementing it like this:

import std.range;

import core.exception;

void assertEx(E : Throwable = AssertError)(lazy bool exp, lazy string msg = null, string file = __FILE__, size_t line = __LINE__)
{
    version(assert)
    if (!exp)
    {
        static if (is(typeof( new E(msg, file, line) )))
            throw new E(msg, file, line);
        else
            throw new E(file, line);
    }
}

void main()
{
    int[] array;
    assertEx!RangeError(!array.empty);
}

But even with -release, DMD keeps the template around and it calls it.

Even with -inline, it keeps the template around.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, a simpler

version (assert) enforceEx!RangeError(!r.empty);

Could be a good idiomatic compromise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the name assertOrThrow. :)

template assertOrThrow(T : Throwable)(bool condition, lazy string msg)
{
    version(assert)
        void assertOrThrow()
        {
            if (!condition) throw new T(msg);
        }
    else
        void assertOrThrow() {}
}
...
assertOrThrow!RangeError(!array.empty);

Or something like that.

@quickfur
Copy link
Member Author

@monarchdodra I revised the code, and discovered that I don't like version(assert) if .... Using assert directly is far more self-documenting. Anyway, after thinking about it a bit, I realized that these asserts really are in-contracts, not internal asserts to verify internal logic. So I moved them into in-contracts instead. What do you think?

Also, I deliberately implemented it in such a way that input ranges are supported. :) Generally, I like things to work with minimum requirements. Having said that, though, you're quite right that it can be improved. If it's a forward range, it could be .save'd so that the condition is only iterated once. And if it's sliceable, a slice should be returned instead.

@quickfur
Copy link
Member Author

Oops... I think my rebase orphaned the discussion thread on Voldemorts... I'm not sure how to get back into the discussion thread again from github, but anyway, here's what I wanted to say:

@monarchdodra wrote that 7511 didn't go deep enough. Yes, arguably we should file an enhancement about it. But the following workaround does work:

auto makeRange(alias fun,R)(R range) {
    struct Result() { // note: force attribute inference by making Result a template
        R range;
        auto front() { return fun(range.front); }
        // ...
    }
    return Result!()(range);
}
void func() pure @safe nothrow {
    auto r = makeRange(...);
    r.front ... // OK, attributes correctly inferred for Result
}

Now, as @monarchdodra said further, Voldemorts involving alias functions need to be moved out (issue 5939) because otherwise .save wouldn't work (you can't instantiate a Voldemort). I didn't test this, but wouldn't the following work?

auto func(...)(...) {
    struct Voldemort {
        typeof(this) save() {
            auto copy = this; // note: no default initialization for Voldemorts
            copy.internal_state = this.internal_state.dup;
            return copy;
        }
    }
}

Note that .save can't just declare typeof(this) copy; because Voldemorts can't be created outside of their original scope; but they are, in theory, copyable, so we simply make a copy of the existing Voldemort instance then modify it accordingly and return it.

@quickfur
Copy link
Member Author

As for the problem of using alias functions with static struct, you're right, I didn't know this before, but it does preclude functions that close over scope variables. :-( So much for Voldemorts being a useful idiom... they seem to keep running into nasty issues like this. :-(

@monarchdodra
Copy link
Collaborator

Note that .save can't just declare typeof(this) copy; because Voldemorts can't be created outside of their original scope; but they are, in theory, copyable, so we simply make a copy of the existing Voldemort instance then modify it accordingly and return it.

Maybe. But it's playing awefully close with fire. And the bugs are so hard to track down, I would go out of my way to minimize the chances of it ever happening.

@monarchdodra
Copy link
Collaborator

So I moved them into in-contracts instead. What do you think?

Even better! If one day we finally get our "in contracts are caller-conditional", then it's exactly what we want. It's what I'll do from now on anyways.

PS: Just hit this: http://d.puremagic.com/issues/show_bug.cgi?id=10939

@quickfur
Copy link
Member Author

ping

Any other changes that should be applied to this pull before it's good to merge?

@monarchdodra monarchdodra mentioned this pull request Dec 25, 2013
@andralex
Copy link
Member

Just found out today about this, which duplicates at least a fair amount of #1186. The differences are (I'm mostly aggregating others' comments):

  • chunkBy accepts an InputRange (nice), groupBy needs a forward range.
  • chunkBy takes a unary predicate, groupBy takes a binary predicate. I argued in Fix issue 5968 #1186 that binary predicates are to be preferred (in keep with all of phobos and more powerful), though I may be wrong about the second point.
  • For forward ranges that have save do the same thing as sheer copying, a complete chunkBy iteration spans them twice - once in each group and then once for the chunks container to "catch up". groupBy has dedicated code to cope with that issue efficiently.
  • For genuine input ranges (not forward) chunkBy has a dependency between the chunks container and individual chunks: advancing the chunks container renders the current chunk empty, although in fact it wasn't. This is expected to someone who understands the mechanics and inherent limitations of the approach, but raises the question whether groupBy should support input ranges at all.

Perhaps the most productive next step would be to focus on #1186 and figure out whether that should support input ranges, and with what semantics. Thoughts?

@monarchdodra
Copy link
Collaborator

chunkBy accepts an InputRange (nice), groupBy needs a forward range.

I think they are both in the same awkward situation I mentioned in your pull: If it supporst an input range, then the returned Group/Chunk is transient, and only valid until a popGroup happens.

@quickfur
Copy link
Member Author

When I wrote chunkBy, I had a few things in mind. The overall philosophy was that I wanted to make it work with the minimum requirements possible.

The first thing was that the context of chunkBy was the calendar example program that I was writing for my component programming article. In that context, I really only needed to iterate over the range of dates once and segment it into subranges as I go. I never need to go back once a particular date has been processed, so once a subrange is exhausted, I don't need to iterate over it again and I can just move straight on to the next subrange. That's why chunkBy works with an input range the way it does.

The second thing I had in mind was that it should be transience-agnostic: that is to say, it should make no assumptions that a copy of the previous value of .front will remain valid once .popFront is called. If a previous value of .front is needed, then since D doesn't have a generic deep-copy, the only guarantee is if you have a .save'd copy of the range at that previous point, then you can access the .front of that copy. But input ranges can't be .save'd, so that meant that I couldn't rely on any previous values of .front. This is why chunkBy only takes a unary predicate -- a binary predicate would require saving a previous copy of .front.

@andralex
Copy link
Member

@quickfur there's still a liability with transitory ranges. If one uses e.g. "a" as the unary function ("I want to group by name" or whatevs) and the input range is transitory, chunkBy won't work correctly.

I'm not sure how to solve transitory-ness in the general case. I have a few starts, but neither comes at no cost. First we'd need to make sure that transitory-ness is a problem that must be solved.

@quickfur
Copy link
Member Author

Transient ranges are permitted by the current definition of ranges, therefore we must deal with them somehow (File.byLine is a prime example of how this trips up D newbies all the time). Either that, or we should change the definition of ranges to exclude transient ranges, then we don't have to worry about them anymore and we can safely assume that copying the value of .front will always be valid. The current situation where they are permitted by definition but not correctly supported by implementation (many algorithms in std.algorithm don't work correctly with them) is a liability to D.

@andralex
Copy link
Member

Here are two thoughts on solving that:

  1. Have transitory ranges also provide a frontCopy primitive, which ensures a snapshot of front that will survive a subsequent popFront. Ranges that need to copy front to work would use frontCopy if present, front otherwise.
  2. Use the following heuristics:
template isTransitory(R)
{
    enum isTransitory =
        isInputRange!R && !isForwardRange!R  // no better than input
        && (
            is(typeof(&r.front))) // front returns by reference, presumably a cached buffer
            ||
            isTransitory!(ElementType!R) // front returns by value another range, which itself is transitory
        )
}

@quickfur
Copy link
Member Author

  1. Not a bad idea, though it does mean a lot of range-based code will have to be updated to use it. It also complicates the current range API, which Jonathan has expressed dislike about when we last talked about this subject.

  2. OK, the heuristic isn't perfect, but let's suppose it works OK. Then what? What do we do with isTransitory? Reject it at compile-time if the algorithm doesn't support it? Use static if to generate code specifically designed to deal with it?

Either way, we'll still have to update a lot of range code that assign .front to temporary variables, since all of those would be wrong (even if they currently, by chance, work 'cos they've only been tested on non-transitory ranges). The correct way would be to .save the range instead of copying .front.

@quickfur
Copy link
Member Author

Overall, (1) seems like the lesser of two evils. If we also provide a default implementation via UFCS that simply returns front, then only transitory ranges (which should be relatively rare) need to worry about frontCopy, and algorithms that need to copy front will just call frontCopy instead. For "normal" ranges, this just forwards to front, so everything works as before.

Unfortunately, quite a few algorithms do copy front, so they'll have to be updated accordingly.

@monarchdodra
Copy link
Collaborator

and algorithms that need to copy front will just call frontCopy instead

Keeping around a copy in a really generic way is actually very very hard. You have to take into account objects that don't postblit, or ranges that hold const elements. A lot of the implementations go out of their way specifically to not make copies of elements. Instead, they simply save the range at the location they want to reference.

I have some doubts about the generic usefulness (and useability as a whole) of "copyFront" for input ranges.

@andralex
Copy link
Member

@monarchdodra then how?

@quickfur
Copy link
Member Author

He already said it: keep a save'd copy of the range and access .front when you need it. Obviously, though, this doesn't apply to input ranges.

@quickfur
Copy link
Member Author

Rebased and updated.

@quickfur
Copy link
Member Author

ping @Dicebot

@mihails-strasuns
Copy link

Acknowledged, reading

@mihails-strasuns
Copy link

Overall looks like pretty much what I would have expected. Need to think a bit more before commenting on equivalence problem though.

@quickfur quickfur changed the title Implement chunkBy. Implement groupBy. Sep 27, 2014
@bearophile
Copy link

I've just seen this PR. In Phobos we have this pattern to build an associative array in a common case:

void main() {
    import std.stdio, std.range, std.array;

    auto keys = iota(10);
    auto vals = iota(10, 100, 10);
    auto aa = keys.zip(vals).assocArray;

    aa.writeln;
    // Output:
    // [0:10, 4:50, 8:90, 1:20, 5:60, 2:30, 6:70, 3:40, 7:80]
}

But we don't have a functional/range version to build it with an accumulation:

void main() {
    import std.stdio, std.range;

    auto data = 10.iota;

    int[][int] aa;
    foreach (x; data)
        aa[x % 3] ~= x;

    aa.writeln;
    // Output:
    // [0:[0, 3, 6, 9], 1:[1, 4, 7], 2:[2, 5, 8]]
}

So time ago I suggested a hashGroup function:
https://d.puremagic.com/issues/show_bug.cgi?id=9842
A possible usage example:

void main() {
    import std.stdio, std.range;

    auto data = 10.iota;

    auto aa = data.hashGroup!q{ a % 3 };
}

@mihails-strasuns
Copy link

I am inclined to merge it as is right now and address any deficiencies ater getting some practical feedback. Simply lacking the imagination to see what would be the best design from pure theoretical point of view :(

Can anyone else chime in to review for some fresh opinion? @monarchdodra @JakobOvrum I think that is something from your expertise.

{
while (!r.empty && equiv(prev, r.front))
r.popFront();
if (!r.empty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth refactoring this so it only needs to call r.empty once?

@quickfur
Copy link
Member Author

quickfur commented Oct 6, 2014

Done: refactored to avoid extra call to r.empty; marked savePrev and prev as private.

@mihails-strasuns
Copy link

Auto-merge toggled on

mihails-strasuns pushed a commit that referenced this pull request Oct 9, 2014
@mihails-strasuns mihails-strasuns merged commit 753ad3a into dlang:master Oct 9, 2014
@bearophile
Copy link

Perhaps I have not fully understood the usage. This is Haskell:

Prelude> import Data.List
Prelude Data.List> groupBy (\x y -> (mod (x*y) 3) == 0) [1,2,3,4,5,6,7,8,9]
[[1],[2,3],[4],[5,6],[7],[8,9]]

D code:

void main() {
    import std.stdio, std.algorithm, std.array, std.range;
    [1,2,3,4,5,6,7,8,9]
    .groupBy!((x, y) => ((x * y) % 3) == 0)
    .take(20)
    .writeln;
}

Output:

[[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

@mihails-strasuns
Copy link

It happens because internal struct initially runs predicate on r.front as both previous and current element. It allows for certain code elegance but fails if such call evaluates to false.

I am investigating what can be most non-intrusive fix

(previous stupid comment removed :))

@mihails-strasuns
Copy link

I got proof of working fix by adding small additional _empty boolean to chunk range and moving predicate call to popFront from empty(). This, however, has an issue of increasing predicate call count twice as original range needs to skip the whole chunk group. @quickfur do you think it is ok?

@quickfur quickfur deleted the chunkBy branch October 20, 2014 14:36
@mihails-strasuns
Copy link

ping @quickfur

@quickfur
Copy link
Member Author

quickfur commented Nov 2, 2014

A similar issue has been filed: https://issues.dlang.org/show_bug.cgi?id=13595

It looks like what we need is to rework groupBy to do adjacent predicate testing, rather than assuming transitivity and reflexivity, which is what the current code does. It would appear that the most interesting use cases of groupBy are when it's not an equivalence relation.

This would introduce a slight performance hit for when the predicate is an equivalence relation; I was thinking that perhaps an optional compile-time parameter can be specified to fallback to the current implementation when the user knows that his predicate is an equivalence relation, and therefore doesn't need to additional work required for adjacency testing.

@mihails-strasuns
Copy link

Well don't feel like snippets like bearophile has provided are critical (thus initial comment about equivalence ok) but I have realized that it also makes most trivial degraded predicate (one that always returns false) to break groupBy. And that sounds like a very common thing to try.

@quickfur
Copy link
Member Author

quickfur commented Nov 2, 2014

While the current code does work correctly for equivalence relations, I think many use cases actually need non-equivalence relations. Even when I was writing up the ddoc'd unittests, I had trouble coming up with interesting examples that don't require something beyond equivalence relations. Even relatively simple things like segmenting a range by maximum adjacent difference (groupBy!((a,b) => abs(a-b) < n for some n) don't work with the current implementation because they are not equivalence relations.

Furthermore, since groupBy already doesn't deal with reordering elements, it seems too restrictive to demand equivalence relations rather than relations that only hold between adjacent elements.

@quickfur
Copy link
Member Author

quickfur commented Nov 5, 2014

Implementation now ready for review: #2654

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants