Range adaptor: cache #1364

monarchdodra · 2013-06-22T14:53:08Z

As discussed in this thread:
http://forum.dlang.org/thread/ovbjcmogezbvsxrwfcol@forum.dlang.org

This provides a range adaptor that cache the result of another range.

Meant as a lazy alternative to array. Not much else to say...?

Implemented as obscure type

Wording in documentation might suck.

Please destroy.

brad-anderson · 2013-06-23T00:42:02Z

std/algorithm.d

Typo on cacheBidir

andralex · 2013-06-23T18:14:14Z

I think a complete caching proposal would add two more overloads of cache: cache(range, 42) caches up to 42 items ahead of the current one and offers a method auto ref ElementType!R ahead(size_t) that offers access for up to 42 elements and size_t aheadAvailable() returning how many items ahead are available.

Also, cache(42, 1024) caches 42 elements ahead and 1024 elements behind. Should offer auto ref ElementType!R behind(size_t) and size_t behindAvailable().

We should also offer cacheBehind for the rare case in which lookbehind is needed but not lookahead.

As a rule of thumb for index calculations, cache(range, 0, 0) should only cache front.

andralex · 2013-06-23T18:14:41Z

As a rule of thumb for index calculations, cache(range, 0, 0) should only cache front.

(Not sure whether that's the best rule.)

monarchdodra · 2013-06-24T06:30:30Z

For, say "cache(range, 42)", what would be the semantics (before I get working I mean)? Is this OK?
Eagerly evaluates the first 42 elements of the range, then caches them in a private buffer. As elements are popped from the front, more items are eagerly evaluated ahead?"

"Calls to index[i] that are outside of that cache will be lazily evaluated, but will not raise an error".

Would aheadAvailable() really be very useful? Would it be any different from "max(42, length)" ? Maybe it might be fishy for infinite ranges...? I guess it can't hurt to have it.

Implementation wise, this would be implemented via a circular buffer? (Cycle) ?

Would it be OK to instead have this signature: cache!1024(range) (look ahead size must be static ?)

I'm not sure how the "cache behind would work"? Ranges don't "grow", and indexes can't be negative...? The semantic usage would seem very strange to me.

DmitryOlshansky · 2013-06-24T12:52:40Z

I think a complete caching proposal would add two more overloads of cache: cache(range, 42) caches up to 42 items ahead of the current one and offers a method auto ref ElementType!R ahead(size_t) that offers access for up to 42 elements and size_t aheadAvailable() returning how many items ahead are available.

[...]

Truth be told I don't see how this design will carry its own weight without a good enough use case. Also I'd say that having behind or ahead rather return a RA-range would eliminate an extra primitive.

I see the immediate value in using singular .cache in an UFCS chain as a bit of aspirin to ease the transient .front headache.

Fixed-sized look-ahead/look-behind (commonly with length=1) could be useful but it turns out that having these things as "always-on" is a bad idea performance-wise.

Now the main trouble with enable/disable and std.algorithm style adapter is that having an InputRange range at start and then calling auto withLookahead = lookahead(range)
you may have fun with the new range withLookahead. But now how do you stop this accumulation and keep the perceived position intact? Another obstacle here is that each adapter is a new type unrelated to the original source.

Second point is that e.g. parser requires unrestricted length in lookahead....

3rd point is where and how (as in container/allocation policy) does this adapter store items is important.

Would it be OK to instead have this signature: cache!1024(range) (look ahead size must be static ?)

No, nor I do think that a limit defined at run-time will suffice.

monarchdodra · 2013-06-28T19:44:06Z

So which direction should we take? Truth be told, I don't really see much point to having "cache(range, howMany)": Range are either iterated RA style, or forward style. I know of few algorithms that pop the range, while accessing only the N first elements.

I proposed this range, I really only had simple caching in mind. I think it provides useful and simple functionality.

I see the immediate value in using singular .cache in an UFCS chain as a bit of aspirin to ease the transient .front headache.

I'm actually unsure it would help much: As soon as popFront is called, the cache is refreshed, and the internal range pop'ed too, so the problem is still here.

DmitryOlshansky · 2013-06-28T20:43:21Z

I see the immediate value in using singular .cache in an UFCS chain as a bit of aspirin to ease the transient .front headache.

I'm actually unsure it would help much: As soon as popFront is called, the cache is refreshed, and the internal range pop'ed too, so the problem is still here.

Well by itself no, since one have to deep copy first, but something like:

range.map!(x => x.idup).cache

will at least avoid allocations per access to front down the pipeline.

In short it makes sure you do deep copy only once. In fact the irony is that map used to cache values but then it was fixed not to. Now we finaly allow folks to do it again but explicitly.

quickfur · 2013-07-18T14:59:40Z

The transient .front problem can't (at the moment) be solved generically, certainly not with Cache. You basically have to know what the element type is, and how to make a copy of it. If it's a shallow array, you can use .idup (you may have to do more than .idup if the array elements themselves are reference types). Otherwise, if it's a reference type, you have to know how to make a copy of said type. Barring a generic deep-copy function in Phobos, this is the best we can do right now.

quickfur · 2013-09-25T14:37:48Z

ping

Might want to rebase this to prod the autotester onward.

monarchdodra · 2013-10-06T13:01:42Z

Might want to rebase this to prod the autotester onward.

Rebased.

ping

I'm here. I don't recall there is anything else that requires changing?

quickfur · 2013-12-30T15:24:10Z

Hmm. Now there's a whole bunch of compile errors. Rebase again?

monarchdodra · 2014-01-02T13:59:25Z

Hmm. Now there's a whole bunch of compile errors. Rebase again?

Fixed. I also did a bit of cleanup.

brad-anderson · 2014-01-02T20:07:39Z

I just hit the problem this solves so I'm eager to see this included.

I had a long pseudo member chain with a std.net.curl call early in it. It was taking far longer than it should have to process so sticking some writelns in revealed everything was happening twice because of a filter() call near the end of the chain.

Once this is merged I think it would be useful to add some warnings to functions like filter() that advise people to use cache() if the preceding elements of the chain are costly.

monarchdodra · 2014-01-02T21:54:10Z

I just hit the problem this solves so I'm eager to see this included.

That's good to hear, because I myself was actually wondering how useful this actually is, and was having trouble coming up with a good usecace.

Indeed, filter does "double" calls to front, and cache is a good adapter to place right before (in particular, as you say, if the previous call is expensive, such as a map).

Once this is merged I think it would be useful to add some warnings to functions like filter() that advise people to use cache() if the preceding elements of the chain are costly.

No reason not to do it now :) if it gets merged.

quickfur · 2014-01-02T22:11:30Z

On Thu, Jan 02, 2014 at 01:54:15PM -0800, monarch dodra wrote:

I just hit the problem this solves so I'm eager to see this
included.

That's good to hear, because I myself was actually wondering how
useful this actually is, and was having trouble coming up with a good
usecace.

Indeed, filter does "double" calls to front, and cache is a good
adapter to place right before (in particular, as you say, if the
previous call is expensive, such as a map).
[...]

Just out of curiosity, why can't filter be changed so that it doesn't
access .front twice?

T

People tell me that I'm paranoid, but they're just out to get me.

brad-anderson · 2014-01-02T22:26:48Z

(in particular, as you say, if the previous call is expensive, such as a map)

@monarchdodra It's actually not just the previous call. The std.net.curl call was several steps further up the chain so everything was getting double called all the way up to the beginning of the chain which is terribly unexpected. I actually had to comment out each step of the chain over and over until I isolated what was causing it.

Here's the code.

Perhaps filter() should make use of cache() internally to avoid the issue entirely.

monarchdodra · 2014-01-03T07:04:50Z

Just out of curiosity, why can't filter be changed so that it doesn't access .front twice?

Intrinsically, I'm not sure it's possible, because of the way it works. Each element must be tested, and then the user calls front, leading to a second call to front. I don't think there's any way to work around that.

Perhaps filter() should make use of cache() internally to avoid the issue entirely.

filter could cache the result internally, but doing that is actually difficult, and not 100% generic. It'll fail for ranges with const elements, or non-assignable elements. That's a restriction that's not acceptable. In contrast, cache is opt in. Furthermore, cache (as I wrote it), will error with a very explicit explanation (IMO) about the requirements on the element type (Assignablility of ElementType to Unqual!ElementType).

quickfur · 2014-07-17T14:50:51Z

ping all reviewers
Are we moving forward with this, or is this a bad idea and we should close it?

monarchdodra · 2014-07-17T15:03:31Z

I still think it would be useful to have. It can be a useful complement in otherwise long and costly UFCS chains.

But I'm not passionate about it. Anybody of the opinion this range is actually bad/wrong?

Semi-on topic, I'm not too sure about providing birdirectional access anymore: 99% of the functionality is in forward iteration anyways. Bidirectional iteration could have a surprising effect, as the "last" element would be evaluated twice.

DmitryOlshansky · 2014-07-17T15:04:49Z

IMO we should go with it. Thoughts about cached range as a new kind of range need some public discussion.

mihails-strasuns · 2014-07-17T15:36:14Z

I have this bookmarked for some more detailed experiments during the upcoming weekend.

brad-anderson · 2014-07-17T16:08:00Z

I'm 100% in favor of this. Right now I have to use array() which eagerly caches everything unnecessarily in situations I'd use cache(). cache() would be a big improvement because it lets you retain some laziness and uses far less memory.

I can't really think of uses for the multi-element cache. Maybe to compartmentalize part of a long chain so there is less pressure on the CPU cache? I welcome the feature in case it becomes handy for something though.

Unless there is some huge defect in the concept I'm not seeing it should definitely go forward once everyone who wants to review the implementation has done so.

brad-anderson · 2014-07-17T16:13:20Z

std/algorithm.d

I wonder if @DmitryOlshansky's sliding window range idea could be used here and, more broadly, if there is actually some overlap between cache() and his concept.

There is, which is something I find intriguing after looking at this again. Given that @andralex wasn't objecting adding a couple of primitives which is a good sign.

I'm going to (hopefully soon) do a short writeup on a buffer range design and new i/o.

I like the idea of a sliding window range. It could even be applicable to the prospective std.io.

quickfur · 2014-08-05T17:38:57Z

Looks like we have merge conflicts. Please rebase.

JakobOvrum · 2014-08-20T21:26:21Z

std/algorithm.d

s/useful a function/a useful function/
s/functons/functions/ (typo)

ghost · 2014-09-10T12:00:28Z

I don't know how the changelog works. Should I do anything else here in regards to that?

No need to worry, we'll handle it.

DmitryOlshansky · 2014-09-10T12:09:46Z

@AndrejMitrovic Cool. Versionized labels feel just right.

mihails-strasuns · 2014-09-10T12:33:01Z

Note that this won't be in 2.067 as branch for it has been already forked : https://github.com/D-Programming-Language/phobos/tree/2.067

Unless it is going to be restarted, of course.

ghost · 2014-09-10T12:36:04Z

Note that this won't be in 2.067 as branch for it has been already forked

I don't get it, 2.066 just came out. 2.067 can't come out that quickly, can it? And what commits is it going to contain anyway, there haven't been that many?

yebblies · 2014-09-10T12:44:11Z

I don't get it, 2.066 just came out. 2.067 can't come out that quickly, can it? And what commits is it going to contain anyway, there haven't been that many?

Everything that went into master while 2.066 was in alpha/beta except for the regression fixes which went into both.

ghost · 2014-09-10T13:06:49Z

Ah. I'm clearly out of my depth now for how things are done. I guess we'll have to label it 2.068 then? But isn't that like.. a year from now (judging by how unfrequent our releases are)? :/

yebblies · 2014-09-10T13:29:52Z

The ideas is releases should be more frequent. Who knows what will actually happen though.

monarchdodra · 2014-09-11T15:46:37Z

AndrejMitrovic added the changelog_v2.067 label a day ago

Oh, OK. I get it now.

DmitryOlshansky · 2014-09-11T16:01:19Z

It seems very close to the idea of Scala's sliding (they had it way back in 2009):
http://daily-scala.blogspot.ru/2009/11/iteratorsliding.html

monarchdodra · 2014-09-11T16:40:16Z

Addressed points raised by @AndrejMitrovic 's in regards to style. Also improved the unittest to show that the effects of using cache over not using cache (in a fashion I find well expressed mind you).

I also removed "support" for objects with disabled this, as it lead to using said objects without actually initializing them, so I wasn't actually supporting them.

If all is well and good, I'll give it a squash.

quickfur · 2014-09-11T17:06:58Z

std/algorithm.d

Nitpick: "instantiate" not "instanciate".

monarchdodra · 2014-09-15T10:12:30Z

Anything else?

JakobOvrum · 2014-09-15T10:15:58Z

std/algorithm.d

Shouldn't this be moved to the template constraints?

It would be hard to express in a short and "grokkable" way for the end user.

Also, I still believe that template constraints is for "logical disambiguation", and not "internal validation". This is the overload that works on input ranges. From there, the static assert is only designed as a way to cleanly express a compile error.

We do not want allow someone else to write a "cache" that could also operates on input ranges, with completly different semantics, just because the element type has a particular trait. This would require an explicit disambiguation (even if one of them wouldn't compile anyways, it is close enough logically to be ambigous).

As discussed in this thread: http://forum.dlang.org/thread/ovbjcmogezbvsxrwfcol@forum.dlang.org This provides a range adaptor that cache the result of another range. Meant as a lazy alternative to array. Not much else to say...? Documentation might suck.

quickfur · 2014-09-23T14:47:15Z

ping
Anything else? This pull has been green for a while now. Are we ready to merge?

monarchdodra · 2014-09-25T09:47:46Z

Anything else?

AFAIK, no.

DmitryOlshansky · 2014-09-25T09:49:16Z

Let's do it.

DmitryOlshansky · 2014-09-25T09:49:25Z

Auto-merge toggled on

Range adaptor: cache

monarchdodra · 2014-09-25T12:17:59Z

Merged #1364.

Cool. Thanks.

quickfur · 2014-09-25T23:28:17Z

On Thu, Sep 25, 2014 at 04:20:43PM -0700, Михаил Страшун wrote:

..and yet another epic PR taken care of! :)

Go, team Phobos! :-P

MartinNowak · 2014-09-29T23:09:24Z

Perhaps filter() should make use of cache() internally to avoid the issue entirely.

Filter could just simply cache it's front element like any other range, at least if front has a side-effect.
Can someone please write a bug report.

monarchdodra · 2014-09-30T07:08:15Z

Filter could just simply cache it's front element like any other range, at least if front has a side-effect.

The point is that filter doesn't have a side effect on its element. It merely (sometimes) evaluates front more than once. Making filter systematically cache its elements would bloat it, potentially be counter productive (there is no overhead for accessing array elements), and most of all, restricts it: No lvalue access, require isAssignable!(ElementType!R)...

CyberShadow · 2014-10-02T14:22:37Z

Please add new functions to the cheat sheet as well as the function list.

CyberShadow · 2014-10-02T14:28:52Z

#2578

CyberShadow · 2015-03-18T12:16:02Z

A regression has been filed, caused by this pull request:
https://issues.dlang.org/show_bug.cgi?id=14301

ntrel · 2016-03-06T11:44:45Z

Maybe too late now, but IMO cache should really be in std.range, it's not an algorithm.

brad-anderson reviewed Jun 23, 2013
View reviewed changes

std/algorithm.d Outdated

Copy link

Contributor

brad-anderson Jun 23, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo on cacheBidir

brad-anderson reviewed Jul 17, 2014
View reviewed changes

mihails-strasuns added the needs review label Jul 29, 2014

JakobOvrum reviewed Aug 20, 2014
View reviewed changes

std/algorithm.d Outdated

Copy link

Contributor

JakobOvrum Aug 20, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/useful a function/a useful function/
s/functons/functions/ (typo)

quickfur reviewed Sep 11, 2014
View reviewed changes

std/algorithm.d Outdated

Copy link

Member

quickfur Sep 11, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: "instantiate" not "instanciate".

JakobOvrum reviewed Sep 15, 2014
View reviewed changes

Range adaptor: cache

ebcc18e

As discussed in this thread: http://forum.dlang.org/thread/ovbjcmogezbvsxrwfcol@forum.dlang.org This provides a range adaptor that cache the result of another range. Meant as a lazy alternative to array. Not much else to say...? Documentation might suck.

DmitryOlshansky added a commit that referenced this pull request Sep 25, 2014

Merge pull request #1364 from monarchdodra/cache

83e90a8

Range adaptor: cache

DmitryOlshansky merged commit 83e90a8 into dlang:master Sep 25, 2014

Uh oh!

Range adaptor: cache #1364

Range adaptor: cache #1364

Uh oh!

Conversation

monarchdodra commented Jun 22, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andralex commented Jun 23, 2013

Uh oh!

andralex commented Jun 23, 2013

Uh oh!

monarchdodra commented Jun 24, 2013

Uh oh!

DmitryOlshansky commented Jun 24, 2013

Uh oh!

monarchdodra commented Jun 28, 2013

Uh oh!

DmitryOlshansky commented Jun 28, 2013

Uh oh!

quickfur commented Jul 18, 2013

Uh oh!

quickfur commented Sep 25, 2013

Uh oh!

monarchdodra commented Oct 6, 2013

Uh oh!

quickfur commented Dec 30, 2013

Uh oh!

monarchdodra commented Jan 2, 2014

Uh oh!

brad-anderson commented Jan 2, 2014

Uh oh!

monarchdodra commented Jan 2, 2014

Uh oh!

quickfur commented Jan 2, 2014

Uh oh!

brad-anderson commented Jan 2, 2014

Uh oh!

monarchdodra commented Jan 3, 2014

Uh oh!

quickfur commented Jul 17, 2014

Uh oh!

monarchdodra commented Jul 17, 2014

Uh oh!

DmitryOlshansky commented Jul 17, 2014

Uh oh!

mihails-strasuns commented Jul 17, 2014

Uh oh!

brad-anderson commented Jul 17, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quickfur commented Aug 5, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost commented Sep 10, 2014

Uh oh!

DmitryOlshansky commented Sep 10, 2014

Uh oh!

mihails-strasuns commented Sep 10, 2014

Uh oh!

ghost commented Sep 10, 2014

Uh oh!

yebblies commented Sep 10, 2014

Uh oh!

ghost commented Sep 10, 2014

Uh oh!

yebblies commented Sep 10, 2014

Uh oh!

monarchdodra commented Sep 11, 2014

Uh oh!

DmitryOlshansky commented Sep 11, 2014

Uh oh!

monarchdodra commented Sep 11, 2014