reportInfiniteRecursion pass is excessively slow #457

nene · 2024-01-02T12:30:50Z

I'm working with a pretty large grammar (6000 LOC) and running Peggy has been getting prohibitively slow, with the compilation taking more than a minute.

So I looked into what might be causing such a drag and discovered that the main culprit is this reportInfiniteRecursion pass. I threw some logging code to the Peggy compiler logic that loops over all the compiler passes, which produced the following output:

Pass check.reportUndefinedRules took 19ms
Pass check.reportDuplicateRules took 1ms
Pass check.reportDuplicateLabels took 6ms
Pass check.reportInfiniteRecursion took 57791ms  !!!
Pass check.reportInfiniteRepetition took 3ms
Pass check.reportIncorrectPlucking took 1ms
Pass transform.removeProxyRules took 5ms
Pass transform.inferenceMatchResult took 19ms
Pass generate.generateBytecode took 1505ms
Pass generate.generateJS took 189ms
✨  Done in 62.10s.

As I understand, this pass goes through all rules and checks that they aren't left-recursive. But to do the latter, it recursively goes through all the rules referenced by that rule. Which I suspect makes the code run in O(n^2) time.

I'm sure there is a way to optimize this. I can try to figure this out, but perhaps somebody who's more intimately familiar with Peggy can easily figure out a solution. It's a bit of a shame that this non-essential compiler pass is taking over 90% of the compilation time.

The text was updated successfully, but these errors were encountered:

nene · 2024-01-02T12:45:52Z

Did some more investigating. In my grammar I have rules like these:

select_columns
  = head:column_list_item tail:(__ "," __ column_list_item)* trailing:(__ "," __ empty)? {
      ...
    }

column_list_item
  = expr:star_or_qualified_star kw:(__ EXCEPT __) columns:paren$list$column {
    ...
  }
  / expr:star_or_qualified_star kw:(__ REPLACE __) columns:paren$list$alias$expr {
    ...
  }
  / star_or_qualified_star
  / expr:expr alias:(__ alias)? {
    ...
  }

expr = mysql_assign_expr / or_expr

mysql_assign_expr = ... // ends up referencing same stuff that or_expr references

or_expr = ...

It takes from reportInfiniteRecursion:

4.1 seconds to check the select_columns rule
3.9 seconds to check the column_list_item rule.
3.7 seconds to check the expr rule.

Essentially it comes down to this main expr rule being really big and complex and taking a whole lot of time to fully traverse.

Mingun · 2024-01-02T13:37:39Z

Yes, it definitely can be optimized. I think that the problem is because it tried to check each rule independently and as result can traverse the same nodes many times.

The best strategy would be to introduce new pass that would calculate some useful properties of each AST node, which includes:

lookahead: how many characters this node can lookahead? For example, literal always lookahead by literal.length, and character class by 1 character
minMatched: the minimal characters that would consumed if that node matched. For example, for . (dot) it is 1 character and for .? it is 0 characters
maxMatched: the maximum characters that would consumed if that node matched. For example, for . (dot) and .? it is 1 character and for .+ it is unfinity characters
haveUserCode: is this node have user actions or predicates (which could have side-effects)?

Then to find left recursion we need to find rules where minMatched is 0.

hildjj · 2024-01-02T18:28:50Z

What we're trying to do is prove that there is NOT a loop, which means we only need to find the first loop of any size. In this case, speed matters more than completeness.

I bet if we took one O(n) pass over the rules to generate an adjacency matrix indexed by rule number, then did a depth-first search on each non-seen rule, we'd be able to prove there were no loops more quickly.

hildjj · 2024-01-02T18:30:12Z

In the same pass, we could prove that certain rules were not reachable by any of the allowedStartRules, marking them for removal.

hildjj · 2024-02-08T18:01:57Z

I wonder if #467 will help with this?

nene · 2024-02-08T18:56:53Z

I don't think this will help at all. Though I might misunderstand - I didn't test.

The main problem concerns the case where no errors are found. Which in my case is like 99.9999% of the time. Frankly, I wouldn't actually mind if it were slow when errors are present.

I assume that moving this to prepare pass will also not effect the performance in any way.

hildjj · 2024-02-08T19:02:24Z

Nod. There are a couple of speedups for big grammars that I can add after 4.0, but I expect most of those to have minimal impact to this. Someone will have to help with the approach Mingun suggested above, I'm unlikely to get to that myself.

nene · 2024-02-08T19:17:50Z

I've been thinking of tackling this, but each time I look at it, the problem sort of overwhelms me as I can't seem to even fully wrap my head around the current implementation. Mainly me not being familiar with the code base :(

nene mentioned this issue Jan 2, 2024

ts-pegjs 4.2.1 is significantly slower than ts-pegjs 3.0.0 metadevpro/ts-pegjs#120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reportInfiniteRecursion pass is excessively slow #457

reportInfiniteRecursion pass is excessively slow #457

nene commented Jan 2, 2024

nene commented Jan 2, 2024

Mingun commented Jan 2, 2024

hildjj commented Jan 2, 2024

hildjj commented Jan 2, 2024

hildjj commented Feb 8, 2024

nene commented Feb 8, 2024

hildjj commented Feb 8, 2024

nene commented Feb 8, 2024

reportInfiniteRecursion pass is excessively slow #457

reportInfiniteRecursion pass is excessively slow #457

Comments

nene commented Jan 2, 2024

nene commented Jan 2, 2024

Mingun commented Jan 2, 2024

hildjj commented Jan 2, 2024

hildjj commented Jan 2, 2024

hildjj commented Feb 8, 2024

nene commented Feb 8, 2024

hildjj commented Feb 8, 2024

nene commented Feb 8, 2024