Skip to content

Conversation

monarchdodra
Copy link
Collaborator

As discussed in this thread:
http://forum.dlang.org/thread/ovbjcmogezbvsxrwfcol@forum.dlang.org

This provides a range adaptor that cache the result of another range.

Meant as a lazy alternative to array. Not much else to say...?

Implemented as obscure type

Wording in documentation might suck.

Please destroy.

std/algorithm.d Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo on cacheBidir

@andralex
Copy link
Member

I think a complete caching proposal would add two more overloads of cache: cache(range, 42) caches up to 42 items ahead of the current one and offers a method auto ref ElementType!R ahead(size_t) that offers access for up to 42 elements and size_t aheadAvailable() returning how many items ahead are available.

Also, cache(42, 1024) caches 42 elements ahead and 1024 elements behind. Should offer auto ref ElementType!R behind(size_t) and size_t behindAvailable().

We should also offer cacheBehind for the rare case in which lookbehind is needed but not lookahead.

As a rule of thumb for index calculations, cache(range, 0, 0) should only cache front.

@andralex
Copy link
Member

As a rule of thumb for index calculations, cache(range, 0, 0) should only cache front.

(Not sure whether that's the best rule.)

@monarchdodra
Copy link
Collaborator Author

For, say "cache(range, 42)", what would be the semantics (before I get working I mean)? Is this OK?
Eagerly evaluates the first 42 elements of the range, then caches them in a private buffer. As elements are popped from the front, more items are eagerly evaluated ahead?"

"Calls to index[i] that are outside of that cache will be lazily evaluated, but will not raise an error".

Would aheadAvailable() really be very useful? Would it be any different from "max(42, length)" ? Maybe it might be fishy for infinite ranges...? I guess it can't hurt to have it.

Implementation wise, this would be implemented via a circular buffer? (Cycle) ?

Would it be OK to instead have this signature: cache!1024(range) (look ahead size must be static ?)

I'm not sure how the "cache behind would work"? Ranges don't "grow", and indexes can't be negative...? The semantic usage would seem very strange to me.

@DmitryOlshansky
Copy link
Member

I think a complete caching proposal would add two more overloads of cache: cache(range, 42) caches up to 42 items ahead of the current one and offers a method auto ref ElementType!R ahead(size_t) that offers access for up to 42 elements and size_t aheadAvailable() returning how many items ahead are available.

[...]

Truth be told I don't see how this design will carry its own weight without a good enough use case. Also I'd say that having behind or ahead rather return a RA-range would eliminate an extra primitive.

I see the immediate value in using singular .cache in an UFCS chain as a bit of aspirin to ease the transient .front headache.

Fixed-sized look-ahead/look-behind (commonly with length=1) could be useful but it turns out that having these things as "always-on" is a bad idea performance-wise.

Now the main trouble with enable/disable and std.algorithm style adapter is that having an InputRange range at start and then calling auto withLookahead = lookahead(range)
you may have fun with the new range withLookahead. But now how do you stop this accumulation and keep the perceived position intact? Another obstacle here is that each adapter is a new type unrelated to the original source.

Second point is that e.g. parser requires unrestricted length in lookahead....

3rd point is where and how (as in container/allocation policy) does this adapter store items is important.

Would it be OK to instead have this signature: cache!1024(range) (look ahead size must be static ?)

No, nor I do think that a limit defined at run-time will suffice.

@monarchdodra
Copy link
Collaborator Author

So which direction should we take? Truth be told, I don't really see much point to having "cache(range, howMany)": Range are either iterated RA style, or forward style. I know of few algorithms that pop the range, while accessing only the N first elements.

I proposed this range, I really only had simple caching in mind. I think it provides useful and simple functionality.

I see the immediate value in using singular .cache in an UFCS chain as a bit of aspirin to ease the transient .front headache.

I'm actually unsure it would help much: As soon as popFront is called, the cache is refreshed, and the internal range pop'ed too, so the problem is still here.

@DmitryOlshansky
Copy link
Member

I see the immediate value in using singular .cache in an UFCS chain as a bit of aspirin to ease the transient .front headache.

I'm actually unsure it would help much: As soon as popFront is called, the cache is refreshed, and the internal range pop'ed too, so the problem is still here.

Well by itself no, since one have to deep copy first, but something like:

range.map!(x => x.idup).cache

will at least avoid allocations per access to front down the pipeline.

In short it makes sure you do deep copy only once. In fact the irony is that map used to cache values but then it was fixed not to. Now we finaly allow folks to do it again but explicitly.

@quickfur
Copy link
Member

The transient .front problem can't (at the moment) be solved generically, certainly not with Cache. You basically have to know what the element type is, and how to make a copy of it. If it's a shallow array, you can use .idup (you may have to do more than .idup if the array elements themselves are reference types). Otherwise, if it's a reference type, you have to know how to make a copy of said type. Barring a generic deep-copy function in Phobos, this is the best we can do right now.

@quickfur
Copy link
Member

ping

Might want to rebase this to prod the autotester onward.

@monarchdodra
Copy link
Collaborator Author

Might want to rebase this to prod the autotester onward.

Rebased.

ping

I'm here. I don't recall there is anything else that requires changing?

@quickfur
Copy link
Member

Hmm. Now there's a whole bunch of compile errors. Rebase again?

@monarchdodra
Copy link
Collaborator Author

Hmm. Now there's a whole bunch of compile errors. Rebase again?

Fixed. I also did a bit of cleanup.

@brad-anderson
Copy link
Contributor

I just hit the problem this solves so I'm eager to see this included.

I had a long pseudo member chain with a std.net.curl call early in it. It was taking far longer than it should have to process so sticking some writelns in revealed everything was happening twice because of a filter() call near the end of the chain.

Once this is merged I think it would be useful to add some warnings to functions like filter() that advise people to use cache() if the preceding elements of the chain are costly.

@monarchdodra
Copy link
Collaborator Author

I just hit the problem this solves so I'm eager to see this included.

That's good to hear, because I myself was actually wondering how useful this actually is, and was having trouble coming up with a good usecace.

Indeed, filter does "double" calls to front, and cache is a good adapter to place right before (in particular, as you say, if the previous call is expensive, such as a map).

Once this is merged I think it would be useful to add some warnings to functions like filter() that advise people to use cache() if the preceding elements of the chain are costly.

No reason not to do it now :) if it gets merged.

@quickfur
Copy link
Member

quickfur commented Jan 2, 2014

On Thu, Jan 02, 2014 at 01:54:15PM -0800, monarch dodra wrote:

I just hit the problem this solves so I'm eager to see this
included.

That's good to hear, because I myself was actually wondering how
useful this actually is, and was having trouble coming up with a good
usecace.

Indeed, filter does "double" calls to front, and cache is a good
adapter to place right before (in particular, as you say, if the
previous call is expensive, such as a map).
[...]

Just out of curiosity, why can't filter be changed so that it doesn't
access .front twice?

T

People tell me that I'm paranoid, but they're just out to get me.

@brad-anderson
Copy link
Contributor

(in particular, as you say, if the previous call is expensive, such as a map)

@monarchdodra It's actually not just the previous call. The std.net.curl call was several steps further up the chain so everything was getting double called all the way up to the beginning of the chain which is terribly unexpected. I actually had to comment out each step of the chain over and over until I isolated what was causing it.

Here's the code.

Perhaps filter() should make use of cache() internally to avoid the issue entirely.

@monarchdodra
Copy link
Collaborator Author

Just out of curiosity, why can't filter be changed so that it doesn't access .front twice?

Intrinsically, I'm not sure it's possible, because of the way it works. Each element must be tested, and then the user calls front, leading to a second call to front. I don't think there's any way to work around that.

Perhaps filter() should make use of cache() internally to avoid the issue entirely.

filter could cache the result internally, but doing that is actually difficult, and not 100% generic. It'll fail for ranges with const elements, or non-assignable elements. That's a restriction that's not acceptable. In contrast, cache is opt in. Furthermore, cache (as I wrote it), will error with a very explicit explanation (IMO) about the requirements on the element type (Assignablility of ElementType to Unqual!ElementType).

@quickfur
Copy link
Member

ping all reviewers
Are we moving forward with this, or is this a bad idea and we should close it?

@monarchdodra
Copy link
Collaborator Author

I still think it would be useful to have. It can be a useful complement in otherwise long and costly UFCS chains.

But I'm not passionate about it. Anybody of the opinion this range is actually bad/wrong?

Semi-on topic, I'm not too sure about providing birdirectional access anymore: 99% of the functionality is in forward iteration anyways. Bidirectional iteration could have a surprising effect, as the "last" element would be evaluated twice.

@DmitryOlshansky
Copy link
Member

IMO we should go with it. Thoughts about cached range as a new kind of range need some public discussion.

@mihails-strasuns
Copy link

I have this bookmarked for some more detailed experiments during the upcoming weekend.

@brad-anderson
Copy link
Contributor

I'm 100% in favor of this. Right now I have to use array() which eagerly caches everything unnecessarily in situations I'd use cache(). cache() would be a big improvement because it lets you retain some laziness and uses far less memory.

I can't really think of uses for the multi-element cache. Maybe to compartmentalize part of a long chain so there is less pressure on the CPU cache? I welcome the feature in case it becomes handy for something though.

Unless there is some huge defect in the concept I'm not seeing it should definitely go forward once everyone who wants to review the implementation has done so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if @DmitryOlshansky's sliding window range idea could be used here and, more broadly, if there is actually some overlap between cache() and his concept.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is, which is something I find intriguing after looking at this again. Given that @andralex wasn't objecting adding a couple of primitives which is a good sign.

I'm going to (hopefully soon) do a short writeup on a buffer range design and new i/o.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of a sliding window range. It could even be applicable to the prospective std.io.

@quickfur
Copy link
Member

quickfur commented Aug 5, 2014

Looks like we have merge conflicts. Please rebase.

std/algorithm.d Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/useful a function/a useful function/
s/functons/functions/ (typo)

@ghost
Copy link

ghost commented Sep 10, 2014

I don't know how the changelog works. Should I do anything else here in regards to that?

No need to worry, we'll handle it.

@DmitryOlshansky
Copy link
Member

@AndrejMitrovic Cool. Versionized labels feel just right.

@mihails-strasuns
Copy link

Note that this won't be in 2.067 as branch for it has been already forked : https://github.com/D-Programming-Language/phobos/tree/2.067

Unless it is going to be restarted, of course.

@ghost
Copy link

ghost commented Sep 10, 2014

Note that this won't be in 2.067 as branch for it has been already forked

I don't get it, 2.066 just came out. 2.067 can't come out that quickly, can it? And what commits is it going to contain anyway, there haven't been that many?

@yebblies
Copy link
Contributor

I don't get it, 2.066 just came out. 2.067 can't come out that quickly, can it? And what commits is it going to contain anyway, there haven't been that many?

Everything that went into master while 2.066 was in alpha/beta except for the regression fixes which went into both.

@ghost
Copy link

ghost commented Sep 10, 2014

Ah. I'm clearly out of my depth now for how things are done. I guess we'll have to label it 2.068 then? But isn't that like.. a year from now (judging by how unfrequent our releases are)? :/

@yebblies
Copy link
Contributor

The ideas is releases should be more frequent. Who knows what will actually happen though.

@monarchdodra
Copy link
Collaborator Author

AndrejMitrovic added the changelog_v2.067 label a day ago

Oh, OK. I get it now.

@DmitryOlshansky
Copy link
Member

It seems very close to the idea of Scala's sliding (they had it way back in 2009):
http://daily-scala.blogspot.ru/2009/11/iteratorsliding.html

@monarchdodra
Copy link
Collaborator Author

Addressed points raised by @AndrejMitrovic 's in regards to style. Also improved the unittest to show that the effects of using cache over not using cache (in a fashion I find well expressed mind you).

I also removed "support" for objects with disabled this, as it lead to using said objects without actually initializing them, so I wasn't actually supporting them.

If all is well and good, I'll give it a squash.

std/algorithm.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: "instantiate" not "instanciate".

@monarchdodra
Copy link
Collaborator Author

Anything else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be moved to the template constraints?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be hard to express in a short and "grokkable" way for the end user.

Also, I still believe that template constraints is for "logical disambiguation", and not "internal validation". This is the overload that works on input ranges. From there, the static assert is only designed as a way to cleanly express a compile error.

We do not want allow someone else to write a "cache" that could also operates on input ranges, with completly different semantics, just because the element type has a particular trait. This would require an explicit disambiguation (even if one of them wouldn't compile anyways, it is close enough logically to be ambigous).

As discussed in this thread:
http://forum.dlang.org/thread/ovbjcmogezbvsxrwfcol@forum.dlang.org

This provides a range adaptor that cache the result of another range.

Meant as a lazy alternative to array. Not much else to say...?

Documentation might suck.
@quickfur
Copy link
Member

ping
Anything else? This pull has been green for a while now. Are we ready to merge?

@monarchdodra
Copy link
Collaborator Author

Anything else?

AFAIK, no.

@DmitryOlshansky
Copy link
Member

Let's do it.

@DmitryOlshansky
Copy link
Member

Auto-merge toggled on

DmitryOlshansky added a commit that referenced this pull request Sep 25, 2014
@DmitryOlshansky DmitryOlshansky merged commit 83e90a8 into dlang:master Sep 25, 2014
@monarchdodra
Copy link
Collaborator Author

Merged #1364.

Cool. Thanks.

@quickfur
Copy link
Member

On Thu, Sep 25, 2014 at 04:20:43PM -0700, Михаил Страшун wrote:

..and yet another epic PR taken care of! :)

Go, team Phobos! :-P

@MartinNowak
Copy link
Member

Perhaps filter() should make use of cache() internally to avoid the issue entirely.

Filter could just simply cache it's front element like any other range, at least if front has a side-effect.
Can someone please write a bug report.

@monarchdodra
Copy link
Collaborator Author

Filter could just simply cache it's front element like any other range, at least if front has a side-effect.

The point is that filter doesn't have a side effect on its element. It merely (sometimes) evaluates front more than once. Making filter systematically cache its elements would bloat it, potentially be counter productive (there is no overhead for accessing array elements), and most of all, restricts it: No lvalue access, require isAssignable!(ElementType!R)...

@CyberShadow
Copy link
Member

Please add new functions to the cheat sheet as well as the function list.

@CyberShadow
Copy link
Member

#2578

@CyberShadow
Copy link
Member

A regression has been filed, caused by this pull request:
https://issues.dlang.org/show_bug.cgi?id=14301

@ntrel
Copy link
Contributor

ntrel commented Mar 6, 2016

Maybe too late now, but IMO cache should really be in std.range, it's not an algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.