Initialize `std.algorithm.filter` lazily #1987

schuetzm · 2014-03-09T15:13:44Z

Timon Gehr recently complained [1] about the fact that std.algorithm.filter initializes itself in its constructor, i.e. potentially performs a lot of computation by calling popFront() repeatedly.

With this PR, I reworked filter and filterBidirectional to initialize themselves only when they are accessed the first time.

I don't know however whether it is actually worth it. Usually, a constructed filter range will be used at least once. This implementation needs to carry around an additional flag, and needs to check it before each operation, which will probably degrade performance slightly. On the other hand, there might be corner cases where initializing upon construction is too expensive, like when you construct hundreds of filters, but use only a handful of them and can't tell which one until you actually need them.

Comments?

[1] http://forum.dlang.org/thread/lfap6l$2eb1$1@digitalmars.com?page=3#post-lfenmi:241lve:241:40digitalmars.com

andralex · 2014-03-09T18:54:45Z

std/algorithm.d

-    this(R r)
-    {
-        _input = r;
+    void popFrontUntilNextMatch() {


no egyptians please

andralex · 2014-03-09T19:06:04Z

I'm torn about this. On one hand it does make life easier for people who create a filter but never get to use it, which is in keep with the spirit of laziness. On the other hand it penalizes all other uses by inserting one more test on each primitive. I think that's too much penalty.

monarchdodra · 2014-03-09T21:24:17Z

This kind of change would make more sense to me if we made each iteration lazy, and not just the first. With this new change, I'd have expected popFront to become something like:

void popFront()
{
    initialize();
    _input.popFront();
    _initialized = false;
}

With this, you'd make each call to front lazy.

I don't see why the first front would get special treatment: It's not actually special. And it's definitely not a change I agree with. I agree with @andralex in regards to On the other hand it penalizes all other uses by inserting one more test on each primitive.: I say either leave out the test, or let every primitive enjoy it.

Personally, I'm not a huge fan of such "micro-lazyness", but others seem to enjoy it, so why not.

schuetzm · 2014-03-10T10:10:13Z

I changed the egyptian braces and removed the unnecessary return this, which was a left-over from an experiment.

schuetzm · 2014-03-10T10:17:02Z

@monarchdodra: I see... I can implement that, if desired. But I'll prefer to wait until it's decided whether laziness is wanted at all. I also tend to prefer the eager version. A possible compromise would be to use a template flag to distinguish between lazy and eager, or have an eagerFilter() and a lazyFilter().

schveiguy · 2014-03-10T13:04:11Z

@andralex consider the following discussion, which may provide a generic solution for "initialization heavy" ranges:

http://forum.dlang.org/post/op.xbyehzvzeav7ka@stevens-macbook-pro.local

Basically, the idea is to have a range that wraps and lazily constructs the full range on the first call to empty. Argument of only doing empty is, popFront and front are invalid if empty is true, so one must always call empty if you don't know the state of it.

monarchdodra · 2014-03-10T14:10:14Z

one must always call empty if you don't know the state of it.

One should call empty, if the range may be empty. There are plenty of scenarios where you may know a range is not empty, even without calling empty. Any range-adaptor that doesn't change the underlying range's length comes to mind (such as map, or the potential cache).

It's less obvious with filter, but I could definitely see a case where one would want to filter out punctuation out of a sentence, where we already know the sentence is punctuation-terminated. In such a case, the user will call front without checking empty.

Besides, once you insert a check in any of the empty/front/popFront trio, I'm not sure there is much performance penalty in adding the same check to the others. Especially if it means the difference between being correct or incorrect.

Still, in the context if this specific pull, I'm not convinced there is a "construction cost" issue. As mentioned, there may be a "lazyness" issue, which can be fixed for the same (AFAIK) cost in both runtime and object size. So I think we should just stick to the safe, simple and straightforward solution.

IMO.

schveiguy · 2014-03-10T14:43:31Z

One can leave an assert(initialized) in the front/popFront to ensure that at least in debug mode, the code fails when you use it improperly. I would be suspect of any code that accepts a range but fails to call empty on it first. If you are writing code with specific requirements or knowledge of the ranges involved, there are better ways to lazily initialize.

In any case, I think a general solution which provides an option when needed, would be advantageous to making everyone pay for the lazy initialization penalty, even if it lazily initializes on any of the 3 mechanisms.

Besides, once you insert a check in any of the empty/front/popFront trio, I'm not sure there is much performance penalty in adding the same check to the others.

One may call front repeatedly during the processing of a specific element. Also, typically, one calls popFront, empty, front all at once during iteration, it makes no sense to penalize all 3.

monarchdodra · 2014-03-10T15:30:35Z

One can leave an assert(initialized) in the front/popFront to ensure that at least in debug mode, the code fails when you use it improperly. I would be suspect of any code that accepts a range but fails to call empty on it first.

All the more reason to not have to rely on empty having a side effect: If empty is called in an assert, then code that worked in debug would start mysteriously failing in release (!).

If you are writing code with specific requirements or knowledge of the ranges involved, there are better ways to lazily initialize.

Well, the issue is that the implementer of the range has the responsibility of initialization. The user should not have to care how its done, only that the range works.

it makes no sense to penalize all 3.

The easiest way to avoid that is with a little bit of eager initialization :)

I agree though, but I don't see how it could be done, in a 100% safe way.

schuetzm · 2014-03-10T16:22:07Z

@schveiguy, @monarchdodra: I first implemented it in a way that assumed that empty has to be called. But it turned out that lots of places in Phobos don't do it that way; in particular, there are takeExactly and std.bitmanip.peek(), std.bitmanip.read(), and probably others.

I tried to insert the empty calls where needed to make the unittests work again, but there were many of them, and sometimes filter was directly called on array literals, so I decided it wasn't worth the trouble, and would probably break too much code.

And indeed, the documentation of std.range.isInputRange() states:

Calling r.front is allowed only if calling r.empty has, or would have, returned false.

schveiguy · 2014-03-10T16:25:11Z

That's good evidence that empty isn't enough. Thanks.

andralex · 2014-03-16T00:48:16Z

Yah, we can't assume people will call empty before front.

I think we need to drop this. Basically filter should be as good as a simple loop, and this doubles the amount of testing performed. If enough evidence comes about, it would be nice to add lazyFilter or filterLazily (or even a more general approach) to do what this diff does.

andralex · 2014-03-17T00:38:54Z

Upon more thinking - if filter is more than a loop with a conditional it failed. I'll close this at least until new ideas come about.

andralex reviewed Mar 9, 2014
View reviewed changes

std/algorithm.d

this(R r)

{

_input = r;

void popFrontUntilNextMatch() {

Copy link

Member

andralex Mar 9, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no egyptians please

Initialize std.algorithm.filter lazily

d731f44

andralex closed this Mar 17, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Initialize `std.algorithm.filter` lazily #1987

Initialize `std.algorithm.filter` lazily #1987

Uh oh!

schuetzm commented Mar 9, 2014

Uh oh!

andralex Mar 9, 2014

Uh oh!

andralex commented Mar 9, 2014

Uh oh!

monarchdodra commented Mar 9, 2014

Uh oh!

schuetzm commented Mar 10, 2014

Uh oh!

schuetzm commented Mar 10, 2014

Uh oh!

schveiguy commented Mar 10, 2014

Uh oh!

monarchdodra commented Mar 10, 2014

Uh oh!

schveiguy commented Mar 10, 2014

Uh oh!

monarchdodra commented Mar 10, 2014

Uh oh!

schuetzm commented Mar 10, 2014

Uh oh!

schveiguy commented Mar 10, 2014

Uh oh!

andralex commented Mar 16, 2014

Uh oh!

andralex commented Mar 17, 2014

Uh oh!

Uh oh!

Uh oh!

Initialize std.algorithm.filter lazily #1987

Initialize std.algorithm.filter lazily #1987

Uh oh!

Conversation

schuetzm commented Mar 9, 2014

Uh oh!

andralex Mar 9, 2014

Choose a reason for hiding this comment

Uh oh!

andralex commented Mar 9, 2014

Uh oh!

monarchdodra commented Mar 9, 2014

Uh oh!

schuetzm commented Mar 10, 2014

Uh oh!

schuetzm commented Mar 10, 2014

Uh oh!

schveiguy commented Mar 10, 2014

Uh oh!

monarchdodra commented Mar 10, 2014

Uh oh!

schveiguy commented Mar 10, 2014

Uh oh!

monarchdodra commented Mar 10, 2014

Uh oh!

schuetzm commented Mar 10, 2014

Uh oh!

schveiguy commented Mar 10, 2014

Uh oh!

andralex commented Mar 16, 2014

Uh oh!

andralex commented Mar 17, 2014

Uh oh!

Uh oh!

Initialize `std.algorithm.filter` lazily #1987

Initialize `std.algorithm.filter` lazily #1987