Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep a position cache to speed up sibling iteration #287

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

haarg
Copy link
Collaborator

@haarg haarg commented Jan 26, 2023

Iterating over siblings using ->next_sibling or similar methods will take quadratic time if it has to search the parent's child list every time.

Add a cache of the position in the parent's child list. When the position of a child is requested, cache the position of every child of the parent. Verifying that the cache is accurate is nearly instant, so we can use that to invalidate the cache rather than invalidating on modifications to the structure.

Update the __position method to use this cache, and change all of the child iterating functions to use __position.

Iterating over siblings using ->next_sibling or similar methods will
take quadratic time if it has to search the parent's child list every
time.

Add a cache of the position in the parent's child list. When the
position of a child is requested, cache the position of every child of
the parent. Verifying that the cache is accurate is nearly instant, so
we can use that to invalidate the cache rather than invalidating on
modifications to the structure.

Update the __position method to use this cache, and change all of the
child iterating functions to use __position.
@haarg
Copy link
Collaborator Author

haarg commented Jan 26, 2023

This is meant to be an optimization, so I'm not sure of a good way to test it. For a demonstration of the impact it has though:

$ cat iterate-siblings.pl
use strict;
use warnings;

use PPI::Document;

my $code = "{\n" . join('', map "foo$_ => $_,\n", 1 .. 3000) . "};\n";

my $doc = PPI::Document->new(\$code);
my $expr = $doc->find_first('PPI::Statement::Expression');

my $element = $expr->first_element;
while ($element = $element->next_sibling) {
  1;
}

$ time perl iterate-siblings.pl # using latest PPI release

real	0m14.398s
user	0m14.150s
sys	0m0.110s
$ time perl -Ilib iterate-siblings.pl # with position cache

real	0m0.249s
user	0m0.167s
sys	0m0.038s

@haarg
Copy link
Collaborator Author

haarg commented Jan 27, 2023

I should also add that we ran into this problem when using the ValuesAndExpressions::ProhibitDuplicateHashKeys critic policy on files with large hashes.

@wchristian
Copy link
Member

Ahead: This looks excellent and my direct urge would be to throw it in immediately. That said: I had a heck of a week and will check this tomorrow to see if i can spot any issues. Don't think i will, but due diligence and such.

@wchristian
Copy link
Member

Quick question without having looked for it in the code: Does this invalidate the cache if an element's position within its parent is changed by inserting or removing something?

@haarg
Copy link
Collaborator Author

haarg commented Jan 27, 2023

No, it doesn't. The cached position is always checked before being used, so it doesn't need to be invalidated on modifications. Validating the position doesn't slow anything down, so this seemed like a safer invalidation strategy than needing to ensure every write operation cleared the cache.

@haarg
Copy link
Collaborator Author

haarg commented Mar 6, 2023

I guess it's worth pointing out that this could slow down iteration in some specific cases. If you are only iterating through a small number of siblings at the start of a long list, the new code will still end up calculating the positions of every sibling where the old code would not. I don't think this is a large concern though. Iterating through the list of siblings once is not slow, so it seems unlikely it would ever be a meaningful slowdown.

I did think of an alternate strategy that could avoid this. When checking the position of a given element, it could cache (and validate/recache) the positions of the requested element, as well as its previous and next sibling. Compared to what this PR implements, it ends up doing a bit more work for each individual operation, but could be cheaper if not iterating through all children. And iterating through all siblings would still be linear, not quadratic. It also makes the code a bit more complex. I felt that on balance the routine in this PR was a better solution.

@wchristian
Copy link
Member

I'm prepping for merging this, but first need to actually get a solid set of dependent tests, which i'm fiddling with over here: https://github.com/Perl-Critic/PPI/actions/workflows/test-dependents.yml

I saw that it seems you adapted some of the code style conventions you saw, like shift shift shift, and forgoing postfix constructs. If you care and feel like, you could abbreviate some of them to not have the code be unnecessarily verbose. Otherwise it looks fine in execution flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants