-
-
Notifications
You must be signed in to change notification settings - Fork 739
Implement groupBy. #1453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement groupBy. #1453
Conversation
* Returns: A range of ranges in which all elements in a given subrange share | ||
* the same attribute with each other. | ||
*/ | ||
auto chunkBy(alias attrFun, Range)(Range r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in previous pulls we've sort of agreed to not introduce any more functions with string-based lambda parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the second time I've heard that. What's the rationale for this? I haven't been keeping up with phobos pulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new lambda syntax there was growing support for completely replacing string lambdas with the new syntax. I got ahold of Andrei and asked him whether std.algorithm and std.range should continue to use them in their documentation. He got back to me at some point (presumably after discussing with Walter) and gave me the go ahead to switch the documentation examples over to the new style of lambda (while retaining support for string lambdas for backwards compatibility). This resulted in #707 which didn't get pulled (well, it did but got reverted because of a dmd bug) and then bitrot but the decision to retire string lambdas in phobos has been made by Andrei and Walter.
I think the discussion in #707 is probably the extent of the public discussion about no longer using them. My original conversation with Andrei took place on IRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It would be nice if such decisions were posted in a more widely-read place, like the D forums or dmd release notes. Otherwise people ignorant of the change of direction would just continue writing code in the old style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that's what really should have happened. In any case, I don't see any harm in supporting them other than maybe making maintainability a bit harder. std.algorithm/.range both still use them in all of their examples so nothing has really changed since this decision (which seems to a trend with D, silent deprecations seem to have become very common).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so should unaryFun stay, or not?
I vote yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments for string lambdas are shot down time and again. They weren't a failure - they were the best we could do to fix an important problem, but by now they're definitely obsolete and I'd hate to propagate them any further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are they shot down? String lambdas are way better for short stuff. I'd much prefer to do something like foo!"a == b"(args)
than foo!((a, b) => a == b)(args)
. I wouldn't consider that obsolete at all. More complex lambdas obviously need to use the newer lambda syntax, but short, simple stuff is just plain cleaner using string lambdas. I think that it's a definite step backward to get rid of string lambdas. string lambdas and the newer lambda syntax can coexist just fine, and I see no reason to get rid of string lambdas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a thread in the NewsGroup. Hopefully we can find some kind of consensus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say new functions should support string lambdas for now. If we decide later to ditch them, we'll do it in one shot.
ping I've replaced the string lambdas in the ddoc unittest, so now all the examples will show only the new lambda syntax. Is this good enough for this pull? I left unaryFun as-is in order not to introduce an inconsistency (other ranges support string lambdas, this one doesn't). IMO, if we want to get rid of string lambdas, we better do it in one go, not scattered across random marginally-related pulls. |
|
||
@property bool empty() | ||
{ | ||
return r.empty || !(curAttr == attr(r.front)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the !(a == b)
instead of a != b
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to minimize assumptions about the return type of attrFun. As long as the type can be comparable with ==, it should be good enough. Although, come to think of it, I think in D != is always translated into !opEquals, right? If so, then it should probably be written as != instead. :)
LGTM. There's some styling issues I'd note but it's all just bikeshedding so I'll let it pass. |
alias attr = unaryFun!attrFun; | ||
alias AttrType = typeof(attr(r.front)); | ||
|
||
static struct Chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, when declared as a "non-template internal struct", no inference gets triggered, meaning all the functions in Chunk
will basically be impure, system, and throwing.
If you instead declare it in private global namespace (as ChunkByResult(alias attrFun, Range)
), then the struct becomes a template, and inference is triggered.
Please add a @safe pure nothrow unittest
, and you will see (it should be anyways).
Also, when a struct can depend on a delegate (attrFun
), I think you can't use voldemort, due to problems with said delegate failing to retrieve the correct context pointer under certain situations (or something of that order). This is why the Result
s of map
(amongst others), where migrated to outside the body of their function.
TL.DR: Please declare Chunk
as a private ChunkByResult
in global space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: I just noticed you had Chunk
and ChunkBy
, so, yeah, my comment applies to both. You'll just have to look for more colorful names, in particular, ChunkResult
may already exist (and is the reason I suggested ChunkByResult
to begin with). Maybe ChunkByResult
(the range itself) and ChunkByChunk
(the subranges)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does http://d.puremagic.com/issues/show_bug.cgi?id=7511 influence this in any way (it was fixed)? I mean whether Chunk still has to be put outside of chunkBy
, or if the fix for 7511 has not changed the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7511 didn't go deep enough.
It fixed this:
struct S(T)
{
void foo(){} //Inferred
}
but this still doesn't work:
auto foo(T)()
{
struct S
{
void bar(){} //NOT infered.
}
return S();
}
Which is the case we are in. But even where it fixed, this template depends on a predicate parameter, and is subject to issue 5939: http://d.puremagic.com/issues/show_bug.cgi?id=5939 .
Note that 5939 is "resolved fix", but the conclusion was "voldermort need to be moved out", and it was fixed when it was done in std.algorithm. Also, I'm noticing it is declared static. As such, this code will fail to compile:
int x = 2;
int doit(int a){return a/x;}
auto r = [1, 2, 3, 5, 6];
chunkBy!doit(r);
Error: function main.main.chunkBy!(doit, int[]).chunkBy.Chunk.empty cannot access frame of function D main
I wanted to comment that I find the implementation for I find there is 1 big advantage to this: It accepts input ranges. Currently, you can't do things like However, I also see two disadvantages: First, for non reference ranges, you iterate your condition twice: You iterate and check in chunk itself, and you iterate and check in ChunkBy also. 2: If your range is sliceable, then you are returning a int[] a = [1, 2, 3, 4, 4, 5, 5];
auto chunks = chunkBy!"a % 2"(a);
int[] firstChunk = chunks.front(); I'd recommend that you look at When all that is said and done, your code works, so we can leave it at that. I'm just saying there are open doors to making it even better. |
|
||
void popFront() | ||
{ | ||
assert(!r.empty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: The "new" standard way of doing this in phobos (apparently) is
version(assert) if (empty) throw new RangeError();
Yes, it's a bit (a lot) more verbose, but the resulting error (when triggered) feels much more "native".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame assert
can't take an optional Throwable
type, like enforce can. It would be cute if we could write:
assert!RangeError(!r.empty);
Of course, assert
itself isn't a template, but if we end up using version(assert)
a lot maybe it's time we make assert and enforce more consistent with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had asked for that before on the boards. It was shot down. Oh well. Also, I think the assert
wouldn't know how to build the RangeError
(__LINE__
/__FILE__
args) anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think the assert wouldn't know how to build the RangeError (LINE/FILE args) anyways.
This is pretty standardized though.
Maybe instead we should introduce an assertEx
template that uses assert
internally? The mirror to enforceEx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, the problem with that is that an assert is outright removed from a release build. In contrast, an assertEx!RangeError(!r.empty)
would force the compiler to either evaluate (!r.empty
) to a boolean before discarding it. Either that, or you'd make the argument passed lazily, but that implies performance penalties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't believe how awful DMD is at removing dead-code. I've tried implementing it like this:
import std.range;
import core.exception;
void assertEx(E : Throwable = AssertError)(lazy bool exp, lazy string msg = null, string file = __FILE__, size_t line = __LINE__)
{
version(assert)
if (!exp)
{
static if (is(typeof( new E(msg, file, line) )))
throw new E(msg, file, line);
else
throw new E(file, line);
}
}
void main()
{
int[] array;
assertEx!RangeError(!array.empty);
}
But even with -release
, DMD keeps the template around and it calls it.
Even with -inline
, it keeps the template around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, a simpler
version (assert) enforceEx!RangeError(!r.empty);
Could be a good idiomatic compromise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the name assertOrThrow
. :)
template assertOrThrow(T : Throwable)(bool condition, lazy string msg)
{
version(assert)
void assertOrThrow()
{
if (!condition) throw new T(msg);
}
else
void assertOrThrow() {}
}
...
assertOrThrow!RangeError(!array.empty);
Or something like that.
@monarchdodra I revised the code, and discovered that I don't like Also, I deliberately implemented it in such a way that input ranges are supported. :) Generally, I like things to work with minimum requirements. Having said that, though, you're quite right that it can be improved. If it's a forward range, it could be |
Oops... I think my rebase orphaned the discussion thread on Voldemorts... I'm not sure how to get back into the discussion thread again from github, but anyway, here's what I wanted to say: @monarchdodra wrote that 7511 didn't go deep enough. Yes, arguably we should file an enhancement about it. But the following workaround does work: auto makeRange(alias fun,R)(R range) {
struct Result() { // note: force attribute inference by making Result a template
R range;
auto front() { return fun(range.front); }
// ...
}
return Result!()(range);
}
void func() pure @safe nothrow {
auto r = makeRange(...);
r.front ... // OK, attributes correctly inferred for Result
} Now, as @monarchdodra said further, Voldemorts involving alias functions need to be moved out (issue 5939) because otherwise auto func(...)(...) {
struct Voldemort {
typeof(this) save() {
auto copy = this; // note: no default initialization for Voldemorts
copy.internal_state = this.internal_state.dup;
return copy;
}
}
} Note that |
As for the problem of using alias functions with |
Maybe. But it's playing awefully close with fire. And the bugs are so hard to track down, I would go out of my way to minimize the chances of it ever happening. |
Even better! If one day we finally get our "in contracts are caller-conditional", then it's exactly what we want. It's what I'll do from now on anyways. PS: Just hit this: http://d.puremagic.com/issues/show_bug.cgi?id=10939 |
ping Any other changes that should be applied to this pull before it's good to merge? |
Just found out today about this, which duplicates at least a fair amount of #1186. The differences are (I'm mostly aggregating others' comments):
Perhaps the most productive next step would be to focus on #1186 and figure out whether that should support input ranges, and with what semantics. Thoughts? |
I think they are both in the same awkward situation I mentioned in your pull: If it supporst an input range, then the returned Group/Chunk is transient, and only valid until a |
When I wrote The first thing was that the context of The second thing I had in mind was that it should be transience-agnostic: that is to say, it should make no assumptions that a copy of the previous value of |
@quickfur there's still a liability with transitory ranges. If one uses e.g. "a" as the unary function ("I want to group by name" or whatevs) and the input range is transitory, I'm not sure how to solve transitory-ness in the general case. I have a few starts, but neither comes at no cost. First we'd need to make sure that transitory-ness is a problem that must be solved. |
Transient ranges are permitted by the current definition of ranges, therefore we must deal with them somehow ( |
Here are two thoughts on solving that:
template isTransitory(R)
{
enum isTransitory =
isInputRange!R && !isForwardRange!R // no better than input
&& (
is(typeof(&r.front))) // front returns by reference, presumably a cached buffer
||
isTransitory!(ElementType!R) // front returns by value another range, which itself is transitory
)
} |
Either way, we'll still have to update a lot of range code that assign .front to temporary variables, since all of those would be wrong (even if they currently, by chance, work 'cos they've only been tested on non-transitory ranges). The correct way would be to .save the range instead of copying .front. |
Overall, (1) seems like the lesser of two evils. If we also provide a default implementation via UFCS that simply returns Unfortunately, quite a few algorithms do copy |
Keeping around a copy in a really generic way is actually very very hard. You have to take into account objects that don't postblit, or ranges that hold const elements. A lot of the implementations go out of their way specifically to not make copies of elements. Instead, they simply save the range at the location they want to reference. I have some doubts about the generic usefulness (and useability as a whole) of "copyFront" for input ranges. |
@monarchdodra then how? |
He already said it: keep a |
Rebased and updated. |
ping @Dicebot |
Acknowledged, reading |
Overall looks like pretty much what I would have expected. Need to think a bit more before commenting on equivalence problem though. |
I've just seen this PR. In Phobos we have this pattern to build an associative array in a common case:
But we don't have a functional/range version to build it with an accumulation:
So time ago I suggested a hashGroup function:
|
I am inclined to merge it as is right now and address any deficiencies ater getting some practical feedback. Simply lacking the imagination to see what would be the best design from pure theoretical point of view :( Can anyone else chime in to review for some fresh opinion? @monarchdodra @JakobOvrum I think that is something from your expertise. |
{ | ||
while (!r.empty && equiv(prev, r.front)) | ||
r.popFront(); | ||
if (!r.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth refactoring this so it only needs to call r.empty
once?
.savePrev and .prev should be private.
Done: refactored to avoid extra call to |
Auto-merge toggled on |
Perhaps I have not fully understood the usage. This is Haskell:
D code:
Output:
|
It happens because internal struct initially runs predicate on I am investigating what can be most non-intrusive fix (previous stupid comment removed :)) |
I got proof of working fix by adding small additional |
ping @quickfur |
A similar issue has been filed: https://issues.dlang.org/show_bug.cgi?id=13595 It looks like what we need is to rework This would introduce a slight performance hit for when the predicate is an equivalence relation; I was thinking that perhaps an optional compile-time parameter can be specified to fallback to the current implementation when the user knows that his predicate is an equivalence relation, and therefore doesn't need to additional work required for adjacency testing. |
Well don't feel like snippets like bearophile has provided are critical (thus initial comment about equivalence ok) but I have realized that it also makes most trivial degraded predicate (one that always returns |
While the current code does work correctly for equivalence relations, I think many use cases actually need non-equivalence relations. Even when I was writing up the ddoc'd unittests, I had trouble coming up with interesting examples that don't require something beyond equivalence relations. Even relatively simple things like segmenting a range by maximum adjacent difference ( Furthermore, since |
Implementation now ready for review: #2654 |
This is the preliminary implementation of chunkBy taken from my calendar example program. Several people have expressed interest in including it in Phobos, so here it is.
It probably needs some cleanups, though; refInputRange, for example, should be put in a common place so that other Phobos unittests can use it. And we should probably create analogues for forward/bidi/random-access ranges too. (Unless these wrappers are already present somewhere and I'm just ignorant of it?)