Skip to content

Initialize std.algorithm.filter lazily #1987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

schuetzm
Copy link
Contributor

@schuetzm schuetzm commented Mar 9, 2014

Timon Gehr recently complained [1] about the fact that std.algorithm.filter initializes itself in its constructor, i.e. potentially performs a lot of computation by calling popFront() repeatedly.

With this PR, I reworked filter and filterBidirectional to initialize themselves only when they are accessed the first time.

I don't know however whether it is actually worth it. Usually, a constructed filter range will be used at least once. This implementation needs to carry around an additional flag, and needs to check it before each operation, which will probably degrade performance slightly. On the other hand, there might be corner cases where initializing upon construction is too expensive, like when you construct hundreds of filters, but use only a handful of them and can't tell which one until you actually need them.

Comments?

[1] http://forum.dlang.org/thread/lfap6l$2eb1$1@digitalmars.com?page=3#post-lfenmi:241lve:241:40digitalmars.com

this(R r)
{
_input = r;
void popFrontUntilNextMatch() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no egyptians please

@andralex
Copy link
Member

andralex commented Mar 9, 2014

I'm torn about this. On one hand it does make life easier for people who create a filter but never get to use it, which is in keep with the spirit of laziness. On the other hand it penalizes all other uses by inserting one more test on each primitive. I think that's too much penalty.

@monarchdodra
Copy link
Collaborator

This kind of change would make more sense to me if we made each iteration lazy, and not just the first. With this new change, I'd have expected popFront to become something like:

void popFront()
{
    initialize();
    _input.popFront();
    _initialized = false;
}

With this, you'd make each call to front lazy.

I don't see why the first front would get special treatment: It's not actually special. And it's definitely not a change I agree with. I agree with @andralex in regards to On the other hand it penalizes all other uses by inserting one more test on each primitive.: I say either leave out the test, or let every primitive enjoy it.

Personally, I'm not a huge fan of such "micro-lazyness", but others seem to enjoy it, so why not.

@schuetzm
Copy link
Contributor Author

I changed the egyptian braces and removed the unnecessary return this, which was a left-over from an experiment.

@schuetzm
Copy link
Contributor Author

@monarchdodra: I see... I can implement that, if desired. But I'll prefer to wait until it's decided whether laziness is wanted at all. I also tend to prefer the eager version. A possible compromise would be to use a template flag to distinguish between lazy and eager, or have an eagerFilter() and a lazyFilter().

@schveiguy
Copy link
Member

@andralex consider the following discussion, which may provide a generic solution for "initialization heavy" ranges:

http://forum.dlang.org/post/op.xbyehzvzeav7ka@stevens-macbook-pro.local

Basically, the idea is to have a range that wraps and lazily constructs the full range on the first call to empty. Argument of only doing empty is, popFront and front are invalid if empty is true, so one must always call empty if you don't know the state of it.

@monarchdodra
Copy link
Collaborator

one must always call empty if you don't know the state of it.

One should call empty, if the range may be empty. There are plenty of scenarios where you may know a range is not empty, even without calling empty. Any range-adaptor that doesn't change the underlying range's length comes to mind (such as map, or the potential cache).

It's less obvious with filter, but I could definitely see a case where one would want to filter out punctuation out of a sentence, where we already know the sentence is punctuation-terminated. In such a case, the user will call front without checking empty.

Besides, once you insert a check in any of the empty/front/popFront trio, I'm not sure there is much performance penalty in adding the same check to the others. Especially if it means the difference between being correct or incorrect.

Still, in the context if this specific pull, I'm not convinced there is a "construction cost" issue. As mentioned, there may be a "lazyness" issue, which can be fixed for the same (AFAIK) cost in both runtime and object size. So I think we should just stick to the safe, simple and straightforward solution.

IMO.

@schveiguy
Copy link
Member

One can leave an assert(initialized) in the front/popFront to ensure that at least in debug mode, the code fails when you use it improperly. I would be suspect of any code that accepts a range but fails to call empty on it first. If you are writing code with specific requirements or knowledge of the ranges involved, there are better ways to lazily initialize.

In any case, I think a general solution which provides an option when needed, would be advantageous to making everyone pay for the lazy initialization penalty, even if it lazily initializes on any of the 3 mechanisms.

Besides, once you insert a check in any of the empty/front/popFront trio, I'm not sure there is much performance penalty in adding the same check to the others.

One may call front repeatedly during the processing of a specific element. Also, typically, one calls popFront, empty, front all at once during iteration, it makes no sense to penalize all 3.

@monarchdodra
Copy link
Collaborator

One can leave an assert(initialized) in the front/popFront to ensure that at least in debug mode, the code fails when you use it improperly. I would be suspect of any code that accepts a range but fails to call empty on it first.

All the more reason to not have to rely on empty having a side effect: If empty is called in an assert, then code that worked in debug would start mysteriously failing in release (!).

If you are writing code with specific requirements or knowledge of the ranges involved, there are better ways to lazily initialize.

Well, the issue is that the implementer of the range has the responsibility of initialization. The user should not have to care how its done, only that the range works.

it makes no sense to penalize all 3.

The easiest way to avoid that is with a little bit of eager initialization :)

I agree though, but I don't see how it could be done, in a 100% safe way.

@schuetzm
Copy link
Contributor Author

@schveiguy, @monarchdodra: I first implemented it in a way that assumed that empty has to be called. But it turned out that lots of places in Phobos don't do it that way; in particular, there are takeExactly and std.bitmanip.peek(), std.bitmanip.read(), and probably others.

I tried to insert the empty calls where needed to make the unittests work again, but there were many of them, and sometimes filter was directly called on array literals, so I decided it wasn't worth the trouble, and would probably break too much code.

And indeed, the documentation of std.range.isInputRange() states:

Calling r.front is allowed only if calling r.empty has, or would have, returned false.

@schveiguy
Copy link
Member

That's good evidence that empty isn't enough. Thanks.

@andralex
Copy link
Member

Yah, we can't assume people will call empty before front.

I think we need to drop this. Basically filter should be as good as a simple loop, and this doubles the amount of testing performed. If enough evidence comes about, it would be nice to add lazyFilter or filterLazily (or even a more general approach) to do what this diff does.

@andralex
Copy link
Member

Upon more thinking - if filter is more than a loop with a conditional it failed. I'll close this at least until new ideas come about.

@andralex andralex closed this Mar 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants