-
-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add std.range.enumerate #1866
Add std.range.enumerate #1866
Conversation
Added the |
Is it possible to add opApply for faster iteration? |
I believe that unless you can really justify it, ranges with opApply are slower, because of inlining issue: The compiler creates a delegate of the loop body, and passes that to the foreach, which in turn makes a run-time call to said delegate, throwing all inlining out the window. ...is what I remember from last time I touched |
Wow I always think that opApply should be faster. Thanks.
|
dlang.org/phobos/std_range.html#.opApply
|
That's for the |
OK. Thanks again for explanation. Btw. I try both and it seems there is no
|
static struct Result | ||
{ | ||
private Range _range; | ||
private size_t index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick: probably better to explicit write private size_t index=0;
to make code clearer, even though .init
is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It used to have that - but now it's explicitly initialized unconditionally at the bottom of enumerate
. The default is never used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, but you have _range
and index
. I understand why you did it like that, but I think you should use a consistent naming scheme. Arguably, using range
is perfectly fine, as the shadowing actually protects you from calling the wrong "range".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, shadowing here is legal (makes sense). Thanks, I changed it.
Other than those minor nitpicks, LGTM. |
ElemType front() @property | ||
{ | ||
assert(!_range.empty); | ||
return typeof(return)(index, _range.front); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this construction seems to be not so clear. Would not be bettr to just use:
auto front() @property
{
assert(!_range.empty);
return tuple(index, _range.front);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple!(size_t, ElementType!Range)
is not implicitly convertible to ElemType
. Tuple.opAssign
can handle this kind of conversion, but it doesn't work when ElementType!Range
is non-mutable.
edit:
Note how ElemType
is used to introduce the named fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what about:
return ElemType(index, _range.front);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRY
edit:
It's a fairly common Phobos idiom:
$ grep -r "typeof(return)(" std
std/algorithm.d: return typeof(return)(r);
std/algorithm.d: return typeof(return)(needle);
std/algorithm.d: return typeof(return)(range, sentinel, openRight);
std/algorithm.d: return typeof(return)(range, openRight);
std/algorithm.d: return typeof(return)(r);
std/algorithm.d: return typeof(return)(index);
std/algorithm.d: return typeof(return)(rs);
std/algorithm.d: return typeof(return)(ranges);
std/algorithm.d: return typeof(return)(r1, r2);
std/algorithm.d: return typeof(return)(r1, r2);
std/algorithm.d: return typeof(return)(ror);
std/base64.d: return typeof(return)(range);
std/base64.d: return typeof(return)(range);
std/complex.d: return typeof(return)(ab * std.math.cos(ar), ab * std.math.sin(ar));
std/complex.d: return typeof(return)(cs.im * csh.re, cs.re * csh.im);
std/complex.d: return typeof(return)(cs.re * csh.re, - cs.im * csh.im);
std/complex.d: c = typeof(return)(0, 0);
std/complex.d: c = typeof(return)(w, z_im / (w + w));
std/complex.d: c = typeof(return)(z_im / (w + w), w);
std/net/curl.d: return typeof(return)(size_t.max, cast(Char[])null);
std/net/curl.d: return typeof(return)(startLen-data.length, res);
std/numeric.d: return typeof(return)(r1, r2, penalty);
std/range.d: return typeof(return)(state);
std/range.d: return typeof(return)(tuple(args));
std/range.d: return typeof(return)(rr);
std/range.d: return typeof(return)(rr, n);
std/range.d: return typeof(return)(source, indices);
std/range.d: return typeof(return)(source, chunkSize);
std/range.d: return typeof(return)(index, _range.front);
std/range.d: return typeof(return)(index + _range.length - 1, _range.back);
std/range.d: return typeof(return)(index + i, _range[i]);
std/stdio.d: return typeof(return)(f, format);
std/stdio.d: return typeof(return)(this, format);
std/typecons.d: return typeof(return)(args);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then it is ok :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly using ElemType
is IMO better looking. If you're really worried about DRY you could use return type inference. I assume most of the uses in phobos do not have a simple alias for the type already available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like relying on return type inference when the return type is trivial, as I think it helps readability to know the return type without looking at the body of the function. Also, while it may not make a big difference here (though it's not impossible that some time in the future we may decide that we wanted a public Enumerate
type, i.e. the old style), I think it's good practice in general, as it improves documentation and DI generation too. Putting the trivial return type explicitly and then using typeof(return)
in the body seems like the overall best style to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. If you feel strongly about the return type being present (even though the user will have to look up what ElemType is anyway) I think using the alias in both places is best. typeof(return)
does not give any advantage over just using the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has all the advantages of DRY.
Added |
The tests are now |
auto enumerate(Range)(Range range, size_t start = 0) | ||
if (isInputRange!Range) | ||
{ | ||
alias ElemType = Tuple!(size_t, "index", ElementType!Range, "value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be inside the result struct itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no correctness issues with it, but stylistically it could be argued either way. It's probably best inside the struct, I'll move it.
edit:
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, forgot to make it private when I moved it in. Fixed (changed to use private:
because of the 80 column limit).
I discovered and filed this: In a word, if you explicit request foreach (size_t lineNum, line; stdin.byLine().enumerate(1)) //Note "size_t"
stdout.writefln("line #%s: %s", lineNum, line); This fails on my win_32 (2.064.2). I haven't tested any other builds. |
{ | ||
return Result(range[i .. j], index + i); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Range
is infinite, this will fail, as the type of range[i .. j]
will not match Range
's. Because the function is not template, it will straight up create a compile error for something like: foreach(i, v; repeat(1).enumerate())
(so I guess that means they aren't tested...).
If you want to support them, then it goes something like this:
static if (hasSlicing!Range)
{
static if (hasLength!Range) //Finite RA
{
Result opSlice(size_t i, size_t j)
{
return Result(range[i .. j], index + i);
}
}
else //Infinite with slicing
{
static struct DollarToken {}
enum opDollar = DollarToken.init;
//Slice to end with matching type.
Result opSlice(size_t i, DollarToken)
{
return Result(range[i .. $], index + i);
}
//Finite slice.
//The type returned needs to be Take!Result to satisfy the hasSlicing primitive.
auto opSlice(size_t i, size_t j)
{
return this[i .. $].take(j - 1);
}
}
}
Do NOT simply have opSlice
return auto
. It is important for infinite ranges that myEnumerate[1 .. 2]
should return a Take!(Result!Range)
type, and not a Result!(Range!Take)
type (which would be what you'd get with the "dumb" implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'd never used infinite range slicing before so this is useful info. This has been fixed in the latest push. One nitpick: I chose to use takeExactly
instead of take
- of course, they're equivalent when used on infinite ranges, but take
is conceptually more complicated than takeExactly
from a readability standpoint.
Now supports infinite ranges properly and has a documented and enforced overflow policy. The tests would have catched the infinite range slice problem if At any rate, please take another look (docs in particular); feedback on the overflow stuff would be much appreciated. P.S. |
One approach that the overflow policy does not incorporate is the choice of using a |
Improved wording in docs. |
I wonder whether it would be nicer to actually merge the counter into the tuple for auto-expanding ranges, so that you could do |
It's fairly easy to implement, and either way is pretty much equally composable, so I guess it's just a matter of choosing a default. The current way is more regular which I guess can mean less special cases in generic code, and it also means one less thing the user has to know. Maybe we should have some kind of tuple flattening function instead? e.g. |
I'm not sure I can see any immediate uses for this actually. size_t on 64 bits is already really freaking huge. Further more, given that BigInt is copy on write, operator++ is actually quite the expansive operation. If anybody really needs such a huge amount of counted iterations, I'd be willing to wager the algorithm is more complicated than a simple "foreach" over a range. Worst case scenario, I see |
I find amend tends to be better, but it don't make much of a change. What I've come to learn though is that I much prefer having to review 10 trivial pulls, then a single one that does a lot of things. |
Regarding |
@klickverbot:
The auto expansion (unpacking) of tuples in foreach (that I think was introduced by Kenji) is a broken not-feature that should be killed as soon as possible (warned against, deprecated and then removed), and replaced by a better and more general feature of tuple unpacking. I keep saying similar things in some Bugzilla entries. You can see a problem here:
Ouput:
|
@bearophile: I agree that we should have general tuple unpacking in the language, and your example indeed shows undesirable behavior. |
I can see value in having enumerate so that a range may also "attach" an index to its members. However, I'm worried about two things: This pull seems to mainly be aimed at being able to have indexed foreach loops: Why isn't this a language solution? Is it because it requires deprecating tuple un-packing? And if so, wouldn't users that use this just have their code broken if/when we deprecate tuple unpacking in foreach loops? |
Anything that can be achieved with foreach can be achieved functionally with As for library vs language range foreach index; I don't feel strongly about it either way, but I don't think tuple unpacking is necessarily broken. What I do think is that we should solve this problem fairly soon, as the absence of a solution to this keeps surprising people. One thing to keep in mind is of course that |
Don't know if you noticed, but you are failing the release unittests. It's because of http://d.puremagic.com/issues/show_bug.cgi?id=10939 |
I had no idea, thanks. I'm short on time in the next few days but I'll fix it after that; this PR was never in a hurry anyway. |
When the range does not provide length, overflow is defined and expected (see docs). It's a useful feature for infinite ranges and such. |
@WalterBright, can I somehow use core.checkedint to check if |
This is a general question. A core.checkedint function like muls() accepts two ints or two longs. What if you have a uint and int? An example:
It outputs:
Here the overflow boolean from the muls() is false, but the result is wrong (it's -3_000_000_000 that is not representable by both ints and uints). |
So is this blocked on overflow check decision ? |
In this particular case if at least one operand is signed then we need to convert (and test) the second one |
Thanks, I think I can work with that. I'm travelling at the moment and won't settle for another few weeks, so I probably won't be able to work with D until then. |
e52338a
to
83c2a57
Compare
Fixes issue 5550
83c2a57
to
8ee6d8f
Compare
Alright, overflow checking now works. Thanks @DmitryOlshansky. The two points of interest are the in-contract that does the overflow checking, and the last unittest, which tests overflow. Coverage shows all paths are tested (except the If anyone wants to test the last unittest locally, as it's not tested by the auto tester (see the linked bug), remember to change Please review. |
Overall looks solid but I don't have any good knowledge of Is versioned out test failing because auto-tester builds unittests in release mode? |
I think so; asserts are enabled but in-contracts are removed, so the asserts testing the functionality of the in-contract fail. |
Seems to pointlessly sit there waiting for more reviewers. |
Only controversial issue is not directly related to this PR so there is no reason to not merge it. I'l go ahead. |
Auto-merge toggled on |
Add std.range.enumerate
Going to mark new things as changelog_2.067 for now, though it's apparent that it's going to get into 2.068 in reality. |
Well it is quite possible that 2.067 will be re-branched from master when Andrew will be able to return to this. No release activity was happening for 2.067 anyway |
http://d.puremagic.com/issues/show_bug.cgi?id=5550
This could also be implemented in the language (though potentially ambiguous with automatic foreach unpacking), but here's a library solution for review.
Ping @bearophile