keep a position cache to speed up sibling iteration #287

haarg · 2023-01-26T14:54:22Z

Iterating over siblings using ->next_sibling or similar methods will take quadratic time if it has to search the parent's child list every time.

Add a cache of the position in the parent's child list. When the position of a child is requested, cache the position of every child of the parent. Verifying that the cache is accurate is nearly instant, so we can use that to invalidate the cache rather than invalidating on modifications to the structure.

Update the __position method to use this cache, and change all of the child iterating functions to use __position.

Iterating over siblings using ->next_sibling or similar methods will take quadratic time if it has to search the parent's child list every time. Add a cache of the position in the parent's child list. When the position of a child is requested, cache the position of every child of the parent. Verifying that the cache is accurate is nearly instant, so we can use that to invalidate the cache rather than invalidating on modifications to the structure. Update the __position method to use this cache, and change all of the child iterating functions to use __position.

haarg · 2023-01-26T14:57:15Z

This is meant to be an optimization, so I'm not sure of a good way to test it. For a demonstration of the impact it has though:

$ cat iterate-siblings.pl
use strict;
use warnings;

use PPI::Document;

my $code = "{\n" . join('', map "foo$_ => $_,\n", 1 .. 3000) . "};\n";

my $doc = PPI::Document->new(\$code);
my $expr = $doc->find_first('PPI::Statement::Expression');

my $element = $expr->first_element;
while ($element = $element->next_sibling) {
  1;
}

$ time perl iterate-siblings.pl # using latest PPI release

real	0m14.398s
user	0m14.150s
sys	0m0.110s
$ time perl -Ilib iterate-siblings.pl # with position cache

real	0m0.249s
user	0m0.167s
sys	0m0.038s

haarg · 2023-01-27T18:04:41Z

I should also add that we ran into this problem when using the ValuesAndExpressions::ProhibitDuplicateHashKeys critic policy on files with large hashes.

wchristian · 2023-01-27T18:48:10Z

Ahead: This looks excellent and my direct urge would be to throw it in immediately. That said: I had a heck of a week and will check this tomorrow to see if i can spot any issues. Don't think i will, but due diligence and such.

wchristian · 2023-01-27T18:51:18Z

Quick question without having looked for it in the code: Does this invalidate the cache if an element's position within its parent is changed by inserting or removing something?

haarg · 2023-01-27T20:16:45Z

No, it doesn't. The cached position is always checked before being used, so it doesn't need to be invalidated on modifications. Validating the position doesn't slow anything down, so this seemed like a safer invalidation strategy than needing to ensure every write operation cleared the cache.

haarg · 2023-03-06T15:13:06Z

I guess it's worth pointing out that this could slow down iteration in some specific cases. If you are only iterating through a small number of siblings at the start of a long list, the new code will still end up calculating the positions of every sibling where the old code would not. I don't think this is a large concern though. Iterating through the list of siblings once is not slow, so it seems unlikely it would ever be a meaningful slowdown.

I did think of an alternate strategy that could avoid this. When checking the position of a given element, it could cache (and validate/recache) the positions of the requested element, as well as its previous and next sibling. Compared to what this PR implements, it ends up doing a bit more work for each individual operation, but could be cheaper if not iterating through all children. And iterating through all siblings would still be linear, not quadratic. It also makes the code a bit more complex. I felt that on balance the routine in this PR was a better solution.

wchristian · 2023-09-23T14:30:58Z

I'm prepping for merging this, but first need to actually get a solid set of dependent tests, which i'm fiddling with over here: https://github.com/Perl-Critic/PPI/actions/workflows/test-dependents.yml

I saw that it seems you adapted some of the code style conventions you saw, like shift shift shift, and forgoing postfix constructs. If you care and feel like, you could abbreviate some of them to not have the code be unnecessarily verbose. Otherwise it looks fine in execution flow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keep a position cache to speed up sibling iteration #287

keep a position cache to speed up sibling iteration #287

haarg commented Jan 26, 2023

haarg commented Jan 26, 2023

haarg commented Jan 27, 2023

wchristian commented Jan 27, 2023

wchristian commented Jan 27, 2023

haarg commented Jan 27, 2023

haarg commented Mar 6, 2023

wchristian commented Sep 23, 2023

keep a position cache to speed up sibling iteration #287

Are you sure you want to change the base?

keep a position cache to speed up sibling iteration #287

Conversation

haarg commented Jan 26, 2023

haarg commented Jan 26, 2023

haarg commented Jan 27, 2023

wchristian commented Jan 27, 2023

wchristian commented Jan 27, 2023

haarg commented Jan 27, 2023

haarg commented Mar 6, 2023

wchristian commented Sep 23, 2023