Add std.range.enumerate #1866

JakobOvrum · 2014-01-16T15:28:32Z

http://d.puremagic.com/issues/show_bug.cgi?id=5550

This could also be implemented in the language (though potentially ambiguous with automatic foreach unpacking), but here's a library solution for review.

Ping @bearophile

JakobOvrum · 2014-01-16T17:49:56Z

Added the start parameter per bearophile's suggestion in the issue report.

Kozzi11 · 2014-01-17T11:02:29Z

Is it possible to add opApply for faster iteration?

monarchdodra · 2014-01-17T19:11:42Z

Is it possible to add opApply for faster iteration?

I believe that unless you can really justify it, ranges with opApply are slower, because of inlining issue: The compiler creates a delegate of the loop body, and passes that to the foreach, which in turn makes a run-time call to said delegate, throwing all inlining out the window.

...is what I remember from last time I touched opApply.

Kozzi11 · 2014-01-17T20:07:04Z

Wow I always think that opApply should be faster. Thanks.
Dne 17. 1. 2014 20:11 "monarch dodra" notifications@github.com napsal(a):

Is it possible to add opApply for faster iteration?

I believe that unless you can really justify it, ranges with opApply
are slower, because of inlining issue: The compiler creates a delegate
of the loop body, and passes that to the foreach, which in turn makes a
run-time call to said delegate, throwing all inlining out the window.

...is what I remember from last time I touched opApply.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1866#issuecomment-32637128
.

Kozzi11 · 2014-01-17T20:11:12Z

dlang.org/phobos/std_range.html#.opApply
Dne 17. 1. 2014 21:07 "Daniel Kozak" kozzi11@gmail.com napsal(a):

Wow I always think that opApply should be faster. Thanks.
Dne 17. 1. 2014 20:11 "monarch dodra" notifications@github.com
napsal(a):

Is it possible to add opApply for faster iteration?

I believe that unless you can really justify it, ranges with opApply
are slower, because of inlining issue: The compiler creates a delegate
of the loop body, and passes that to the foreach, which in turn makes a
run-time call to said delegate, throwing all inlining out the window.

...is what I remember from last time I touched opApply.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1866#issuecomment-32637128
.

JakobOvrum · 2014-01-18T07:29:43Z

dlang.org/phobos/std_range.html#.opApply

That's for the InputRange interface, i.e. its range primitives are virtual. Most range types, including the one returned from enumerate, are implemented using structs, which have no virtual functions.

Kozzi11 · 2014-01-18T07:52:07Z

OK. Thanks again for explanation. Btw. I try both and it seems there is no
difference between them.
Dne 18. 1. 2014 8:29 "JakobOvrum" notifications@github.com napsal(a):

dlang.org/phobos/std_range.html#.opApply

That's for the InputRange interface, i.e. its range primitives are
virtual. Most range types, including the one returned from enumerate, are
implemented using structs, which have no virtual functions.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1866#issuecomment-32676557
.

quickfur · 2014-01-19T02:45:41Z

std/range.d

+    static struct Result
+    {
+        private Range _range;
+        private size_t index;


Minor nitpick: probably better to explicit write private size_t index=0; to make code clearer, even though .init is 0.

It used to have that - but now it's explicitly initialized unconditionally at the bottom of enumerate. The default is never used.

nitpick, but you have _range and index. I understand why you did it like that, but I think you should use a consistent naming scheme. Arguably, using range is perfectly fine, as the shadowing actually protects you from calling the wrong "range".

Oh right, shadowing here is legal (makes sense). Thanks, I changed it.

quickfur · 2014-01-19T02:48:53Z

Other than those minor nitpicks, LGTM.

Kozzi11 · 2014-01-20T09:26:17Z

std/range.d

+        ElemType front() @property
+        {
+            assert(!_range.empty);
+            return typeof(return)(index, _range.front);


this construction seems to be not so clear. Would not be bettr to just use:

auto front() @property { assert(!_range.empty); return tuple(index, _range.front); }

Tuple!(size_t, ElementType!Range) is not implicitly convertible to ElemType. Tuple.opAssign can handle this kind of conversion, but it doesn't work when ElementType!Range is non-mutable.

edit:

Note how ElemType is used to introduce the named fields.

And what about:

return ElemType(index, _range.front);

DRY

edit:

It's a fairly common Phobos idiom:

$ grep -r "typeof(return)(" std std/algorithm.d: return typeof(return)(r); std/algorithm.d: return typeof(return)(needle); std/algorithm.d: return typeof(return)(range, sentinel, openRight); std/algorithm.d: return typeof(return)(range, openRight); std/algorithm.d: return typeof(return)(r); std/algorithm.d: return typeof(return)(index); std/algorithm.d: return typeof(return)(rs); std/algorithm.d: return typeof(return)(ranges); std/algorithm.d: return typeof(return)(r1, r2); std/algorithm.d: return typeof(return)(r1, r2); std/algorithm.d: return typeof(return)(ror); std/base64.d: return typeof(return)(range); std/base64.d: return typeof(return)(range); std/complex.d: return typeof(return)(ab * std.math.cos(ar), ab * std.math.sin(ar)); std/complex.d: return typeof(return)(cs.im * csh.re, cs.re * csh.im); std/complex.d: return typeof(return)(cs.re * csh.re, - cs.im * csh.im); std/complex.d: c = typeof(return)(0, 0); std/complex.d: c = typeof(return)(w, z_im / (w + w)); std/complex.d: c = typeof(return)(z_im / (w + w), w); std/net/curl.d: return typeof(return)(size_t.max, cast(Char[])null); std/net/curl.d: return typeof(return)(startLen-data.length, res); std/numeric.d: return typeof(return)(r1, r2, penalty); std/range.d: return typeof(return)(state); std/range.d: return typeof(return)(tuple(args)); std/range.d: return typeof(return)(rr); std/range.d: return typeof(return)(rr, n); std/range.d: return typeof(return)(source, indices); std/range.d: return typeof(return)(source, chunkSize); std/range.d: return typeof(return)(index, _range.front); std/range.d: return typeof(return)(index + _range.length - 1, _range.back); std/range.d: return typeof(return)(index + i, _range[i]); std/stdio.d: return typeof(return)(f, format); std/stdio.d: return typeof(return)(this, format); std/typecons.d: return typeof(return)(args);

So then it is ok :-)

Explicitly using ElemType is IMO better looking. If you're really worried about DRY you could use return type inference. I assume most of the uses in phobos do not have a simple alias for the type already available.

I don't like relying on return type inference when the return type is trivial, as I think it helps readability to know the return type without looking at the body of the function. Also, while it may not make a big difference here (though it's not impossible that some time in the future we may decide that we wanted a public Enumerate type, i.e. the old style), I think it's good practice in general, as it improves documentation and DI generation too. Putting the trivial return type explicitly and then using typeof(return) in the body seems like the overall best style to me.

I disagree. If you feel strongly about the return type being present (even though the user will have to look up what ElemType is anyway) I think using the alias in both places is best. typeof(return) does not give any advantage over just using the type.

It has all the advantages of DRY.

JakobOvrum · 2014-01-20T09:33:40Z

Added assert(!_range.empty); to back and popBack.

JakobOvrum · 2014-01-20T12:23:14Z

The tests are now pure @safe nothrow.

monarchdodra · 2014-01-20T12:23:53Z

std/range.d

+auto enumerate(Range)(Range range, size_t start = 0)
+    if (isInputRange!Range)
+{
+    alias ElemType = Tuple!(size_t, "index", ElementType!Range, "value");


Shouldn't this be inside the result struct itself?

There are no correctness issues with it, but stylistically it could be argued either way. It's probably best inside the struct, I'll move it.

edit:

Done.

Oops, forgot to make it private when I moved it in. Fixed (changed to use private: because of the 80 column limit).

monarchdodra · 2014-01-20T12:30:21Z

I discovered and filed this:
https://d.puremagic.com/issues/show_bug.cgi?id=11955

In a word, if you explicit request size_t in loop, it might fail. EG:

foreach (size_t lineNum, line; stdin.byLine().enumerate(1)) //Note "size_t"
    stdout.writefln("line #%s: %s", lineNum, line);

This fails on my win_32 (2.064.2). I haven't tested any other builds.

monarchdodra · 2014-01-20T12:46:29Z

std/range.d

+            {
+                return Result(range[i .. j], index + i);
+            }
+        }


If Range is infinite, this will fail, as the type of range[i .. j] will not match Range's. Because the function is not template, it will straight up create a compile error for something like: foreach(i, v; repeat(1).enumerate()) (so I guess that means they aren't tested...).

If you want to support them, then it goes something like this:

static if (hasSlicing!Range) { static if (hasLength!Range) //Finite RA { Result opSlice(size_t i, size_t j) { return Result(range[i .. j], index + i); } } else //Infinite with slicing { static struct DollarToken {} enum opDollar = DollarToken.init; //Slice to end with matching type. Result opSlice(size_t i, DollarToken) { return Result(range[i .. $], index + i); } //Finite slice. //The type returned needs to be Take!Result to satisfy the hasSlicing primitive. auto opSlice(size_t i, size_t j) { return this[i .. $].take(j - 1); } } }

Do NOT simply have opSlice return auto. It is important for infinite ranges that myEnumerate[1 .. 2] should return a Take!(Result!Range) type, and not a Result!(Range!Take) type (which would be what you'd get with the "dumb" implementation).

Thanks, I'd never used infinite range slicing before so this is useful info. This has been fixed in the latest push. One nitpick: I chose to use takeExactly instead of take - of course, they're equivalent when used on infinite ranges, but take is conceptually more complicated than takeExactly from a readability standpoint.

JakobOvrum · 2014-01-20T20:48:28Z

Now supports infinite ranges properly and has a documented and enforced overflow policy.

The tests would have catched the infinite range slice problem if dummyRanges covered slicing and such... I'm trying to find the sweet spot where the tests are comprehensive but not unwieldy. dummyRanges helps with that, so maybe we should give it some attention to bring it up to date (I guess I just volunteered).

At any rate, please take another look (docs in particular); feedback on the overflow stuff would be much appreciated.

P.S.
Would you (open question) prefer that I make these incremental changes as separate commits instead of amending every time? Which would be easier to review? Those separate commits could of course be squashed or otherwise fixed up before pulling.

JakobOvrum · 2014-01-20T20:57:40Z

One approach that the overflow policy does not incorporate is the choice of using a BigInt index type, which I can imagine has serious uses. I think we can safely defer that to a later date when we have an isIntegerLike or whatever ends up happening.

JakobOvrum · 2014-01-20T21:17:32Z

Improved wording in docs.

dnadlinger · 2014-01-20T22:34:02Z

I wonder whether it would be nicer to actually merge the counter into the tuple for auto-expanding ranges, so that you could do foreach(i, a, b; zip(a, b).enumerate). Might not be worth the additional complications, though…

JakobOvrum · 2014-01-20T23:14:07Z

I wonder whether it would be nicer to actually merge the counter into the tuple for auto-expanding ranges, so that you could do foreach(i, a, b; zip(a, b).enumerate). Might not be worth the additional complications, though…

It's fairly easy to implement, and either way is pretty much equally composable, so I guess it's just a matter of choosing a default. The current way is more regular which I guess can mean less special cases in generic code, and it also means one less thing the user has to know. Maybe we should have some kind of tuple flattening function instead? e.g. a.zip(b).enumerate().flatten() or a.zip(b).enumerate().map!flatten() etc.

monarchdodra · 2014-01-21T13:52:16Z

One approach that the overflow policy does not incorporate is the choice of using a BigInt index type, which I can imagine has serious uses.

I'm not sure I can see any immediate uses for this actually.

size_t on 64 bits is already really freaking huge. Further more, given that BigInt is copy on write, operator++ is actually quite the expansive operation.

If anybody really needs such a huge amount of counted iterations, I'd be willing to wager the algorithm is more complicated than a simple "foreach" over a range.

Worst case scenario, I see enumerate as convenience. That don't mean users can't still use for loops...

monarchdodra · 2014-01-21T13:58:35Z

Would you (open question) prefer that I make these incremental changes as separate commits instead of amending every time? Which would be easier to review? Those separate commits could of course be squashed or otherwise fixed up before pulling.

I find amend tends to be better, but it don't make much of a change. What I've come to learn though is that I much prefer having to review 10 trivial pulls, then a single one that does a lot of things.

JakobOvrum · 2014-01-22T06:42:24Z

I'm not sure I can see any immediate uses for this actually.

size_t on 64 bits is already really freaking huge. Further more, given that BigInt is copy on write, operator++ is actually quite the expansive operation.

If anybody really needs such a huge amount of counted iterations, I'd be willing to wager the algorithm is more complicated than a simple "foreach" over a range.

enumerate is not just for foreach, but regardless, part of the beauty of ranges is that even complicated algorithms can be distilled into simple loops or no loops at all.

Regarding BigInt I was considering cases like long-running programs/threads where the input range is blocking, waiting for some kind of input. For example, it could be a server that reads application-level packets from a socket and does some processsing on them, attaching an ID to each with enumerate. In this particular case though it's common practice to just use wrap-around (so it's nice that enumerate now defines it).

bearophile · 2014-01-23T19:41:12Z

@klickverbot:

foreach(i, a, b; zip(a, b).enumerate)

The auto expansion (unpacking) of tuples in foreach (that I think was introduced by Kenji) is a broken not-feature that should be killed as soon as possible (warned against, deprecated and then removed), and replaced by a better and more general feature of tuple unpacking. I keep saying similar things in some Bugzilla entries.

You can see a problem here:

import std.stdio, std.algorithm, std.range;
void main() {
    foreach (x, y; zip([10], [20]))
        writeln(x, " ", y);
    foreach (x, y; zip([10], [20]).array)
        writeln(x, " ", y);
}

Ouput:

10 20
0 Tuple!(int, int)(10, 20)

dnadlinger · 2014-01-23T19:47:04Z

@bearophile: I agree that we should have general tuple unpacking in the language, and your example indeed shows undesirable behavior.

monarchdodra · 2014-01-28T11:50:37Z

I can see value in having enumerate so that a range may also "attach" an index to its members. However, I'm worried about two things: This pull seems to mainly be aimed at being able to have indexed foreach loops: Why isn't this a language solution? Is it because it requires deprecating tuple un-packing? And if so, wouldn't users that use this just have their code broken if/when we deprecate tuple unpacking in foreach loops?

JakobOvrum · 2014-01-28T12:17:39Z

I can see value in having enumerate so that a range may also "attach" an index to its members. However, I'm worried about two things: This pull seems to mainly be aimed at being able to have indexed foreach loops: Why isn't this a language solution? Is it because it requires deprecating tuple un-packing? And if so, wouldn't users that use this just have their code broken if/when we deprecate tuple unpacking in foreach loops?

Anything that can be achieved with foreach can be achieved functionally with copy and such, and enumerate facilitates this with the named fields, and while I prefer that style in my own code, I couldn't think of a good example for the documentation.

As for library vs language range foreach index; I don't feel strongly about it either way, but I don't think tuple unpacking is necessarily broken. What I do think is that we should solve this problem fairly soon, as the absence of a solution to this keeps surprising people.

One thing to keep in mind is of course that enumerate works on ranges without length as well as infinite ranges with defined behaviour detailed in the documentation. With the current forms of foreach, I don't think it's possible for the index to overflow. Also, the start parameter can help readability in some code, as shown in the example. The equivalent done manually with a + expression inside the loop body wouldn't benefit from the start + length overflow check, either.

monarchdodra · 2014-01-31T12:14:08Z

Don't know if you noticed, but you are failing the release unittests. It's because of http://d.puremagic.com/issues/show_bug.cgi?id=10939

JakobOvrum · 2014-02-01T06:51:56Z

Don't know if you noticed, but you are failing the release unittests. It's because of http://d.puremagic.com/issues/show_bug.cgi?id=10939

I had no idea, thanks. I'm short on time in the next few days but I'll fix it after that; this PR was never in a hurry anyway.

JakobOvrum · 2014-07-26T00:06:18Z

When the range does not provide length, overflow is defined and expected (see docs). It's a useful feature for infinite ranges and such.

JakobOvrum · 2014-07-26T03:29:18Z

@WalterBright, can I somehow use core.checkedint to check if start + length overflows, when start can be an integer of any size and signage, and length is size_t? Not sure what to do with the possibly mixed signage.

bearophile · 2014-07-27T14:10:45Z

@JakobOvrum:

Not sure what to do with the possibly mixed signage.

This is a general question. A core.checkedint function like muls() accepts two ints or two longs. What if you have a uint and int? An example:

void main() {
    import std.stdio, core.checkedint;
    int x = -1;
    uint y = 3_000_000_000;
    writeln(x, " ", y);
    writeln(x * y);
    bool overflow = false;
    immutable r1 = muls(x, y, overflow);
    writeln(r1, " ", overflow);
    overflow = false;
    immutable r2 = mulu(x, y, overflow);
    writeln(r2, " ", overflow);
}

It outputs:

-1 3000000000
1294967296
1294967296 false
1294967296 true

Here the overflow boolean from the muls() is false, but the result is wrong (it's -3_000_000_000 that is not representable by both ints and uints).

DmitryOlshansky · 2014-08-09T18:53:17Z

So is this blocked on overflow check decision ?

DmitryOlshansky · 2014-08-09T19:00:21Z

In this particular case if at least one operand is signed then we need to convert (and test) the second one to!Signed!T(x). Then there are 2 overflows possible: during addition and during conversion of length to signed.

JakobOvrum · 2014-08-12T11:34:03Z

Then there are 2 overflows possible: during addition and during conversion of length to signed.

Thanks, I think I can work with that. I'm travelling at the moment and won't settle for another few weeks, so I probably won't be able to work with D until then.

Fixes issue 5550

JakobOvrum · 2014-09-08T03:38:38Z

Alright, overflow checking now works. Thanks @DmitryOlshansky.

The two points of interest are the in-contract that does the overflow checking, and the last unittest, which tests overflow. Coverage shows all paths are tested (except the assert(false) paths).

If anyone wants to test the last unittest locally, as it's not tested by the auto tester (see the linked bug), remember to change version(none) to version(all) .

Please review.

mihails-strasuns · 2014-09-09T03:08:46Z

Overall looks solid but I don't have any good knowledge of checkedint stuff.

Is versioned out test failing because auto-tester builds unittests in release mode?

JakobOvrum · 2014-09-09T04:23:24Z

Is versioned out test failing because auto-tester builds unittests in release mode?

I think so; asserts are enabled but in-contracts are removed, so the asserts testing the functionality of the in-contract fail.

DmitryOlshansky · 2014-09-20T09:46:55Z

Seems to pointlessly sit there waiting for more reviewers.
Unless some objection comes, goin' to merge in day or two.

mihails-strasuns · 2014-09-20T09:55:37Z

Only controversial issue is not directly related to this PR so there is no reason to not merge it. I'l go ahead.

mihails-strasuns · 2014-09-20T09:55:57Z

Auto-merge toggled on

Add std.range.enumerate

DmitryOlshansky · 2014-09-21T19:15:19Z

Going to mark new things as changelog_2.067 for now, though it's apparent that it's going to get into 2.068 in reality.

mihails-strasuns · 2014-09-22T09:29:30Z

Well it is quite possible that 2.067 will be re-branched from master when Andrew will be able to return to this. No release activity was happening for 2.067 anyway

quickfur reviewed Jan 19, 2014
View reviewed changes

Kozzi11 reviewed Jan 20, 2014
View reviewed changes

monarchdodra reviewed Jan 20, 2014
View reviewed changes

DmitryOlshansky added the needs work label Aug 12, 2014

JakobOvrum force-pushed the enumerate branch 2 times, most recently from e52338a to 83c2a57 Compare September 8, 2014 03:32

Add std.range.enumerate

8ee6d8f

Fixes issue 5550

JakobOvrum force-pushed the enumerate branch from 83c2a57 to 8ee6d8f Compare September 8, 2014 03:34

JakobOvrum added Needs Review and removed Needs Work labels Sep 8, 2014

DmitryOlshansky added changelog_v2.067 Needs Review and removed Needs Review changelog_v2.067 labels Sep 13, 2014

mihails-strasuns pushed a commit that referenced this pull request Sep 20, 2014

Merge pull request #1866 from JakobOvrum/enumerate

2ad0698

Add std.range.enumerate

mihails-strasuns merged commit 2ad0698 into dlang:master Sep 20, 2014

DmitryOlshansky added changelog_v2.067 and removed Needs Review labels Sep 21, 2014

JakobOvrum mentioned this pull request Jan 27, 2015

Issue 6447 & 10762: support user-defined types in iota() #2895

Merged

Add std.range.enumerate #1866

Add std.range.enumerate #1866

Conversation

JakobOvrum commented Jan 16, 2014

JakobOvrum commented Jan 16, 2014

Kozzi11 commented Jan 17, 2014

monarchdodra commented Jan 17, 2014

Kozzi11 commented Jan 17, 2014

Kozzi11 commented Jan 17, 2014

JakobOvrum commented Jan 18, 2014

Kozzi11 commented Jan 18, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

quickfur commented Jan 19, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JakobOvrum commented Jan 20, 2014

JakobOvrum commented Jan 20, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

monarchdodra commented Jan 20, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JakobOvrum commented Jan 20, 2014

JakobOvrum commented Jan 20, 2014

JakobOvrum commented Jan 20, 2014

dnadlinger commented Jan 20, 2014

JakobOvrum commented Jan 20, 2014

monarchdodra commented Jan 21, 2014

monarchdodra commented Jan 21, 2014

JakobOvrum commented Jan 22, 2014

bearophile commented Jan 23, 2014

dnadlinger commented Jan 23, 2014

monarchdodra commented Jan 28, 2014

JakobOvrum commented Jan 28, 2014

monarchdodra commented Jan 31, 2014

JakobOvrum commented Feb 1, 2014

JakobOvrum commented Jul 26, 2014

JakobOvrum commented Jul 26, 2014

bearophile commented Jul 27, 2014

DmitryOlshansky commented Aug 9, 2014

DmitryOlshansky commented Aug 9, 2014

JakobOvrum commented Aug 12, 2014

JakobOvrum commented Sep 8, 2014

mihails-strasuns commented Sep 9, 2014

JakobOvrum commented Sep 9, 2014

DmitryOlshansky commented Sep 20, 2014

mihails-strasuns commented Sep 20, 2014

mihails-strasuns commented Sep 20, 2014

DmitryOlshansky commented Sep 21, 2014

mihails-strasuns commented Sep 22, 2014