New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can exceed regex complex recursion limit #8369
Comments
From @shlomifCreated by @shlomifThe following program segfaults after printing "A" and before printing "B": <<<<<<<< use strict; This code was adapted from a more complex example that was discovered by http://perl.org.il/pipermail/perl/2006-March/007695.html So it will be a good idea to see that the segfault there does not happen Perl Info
|
From @demerphqOn 3/12/06, via RT Shlomi Fish <perlbug-followup@perl.org> wrote:
This is a well known issue with the regex engine. If you rewrite the $string =~ s{a|'(?:[^'\\]+|\\.)*'}{}; then odds are that you wont see the segfault occur. When alternating $string =~ s{a|'(?:[^'\\]+|(?:\\.)+)*'}{}; continues on in this theme, but given the general rarity of escaped In theory the above patterns are identical (they should produce the Maybe an optimisation hack would be if an alternation can't be trie'd yves cheers, -- |
The RT System itself - Status changed from 'new' to 'open' |
From @demerphqOn 3/12/06, demerphq <demerphq@gmail.com> wrote:
Although on second thought maybe the behaviour of later perls on your Complex regular subexpression recursion limit (32766) exceeded at Wheras with the "optimisation" I mentioned above the re takes a Also changing the pattern to two makes it finish instantly: s{'(?:[^'\\]+|\\.)*'}{} || s{a}{} As does using a zero width assertion: $string =~ s{(?=[a''])(?:a|'(?:[^''\\]+|(?:\\.)+)*')}{}; seems to work nicely. :-) Yves -- |
From @smpeters
With change #27598, the core dump has gone away, but a panic is still steve@kirk:~/smoke/perl-current$ ./perl -Ilib rt_38717.pl use strict; |
From roland@roland-illig.deCreated by roland@baccf5ee.roland-illig.deThe function S_regmatch calls itself recursively, resulting in a stack For example: Note: The regular expression splits a line from a Makefile into two Perl Info
|
From @smpetersOn Sun, Jul 09, 2006 at 01:17:30PM -0700, roland@roland-illig.de wrote:
In a recent development version of Perl, the core dump goes away, but it steve@kirk:~/perl-current$ ./perl -wle'("aaaaaaaaaa " x 8000) =~ Steve Peters |
The RT System itself - Status changed from 'new' to 'open' |
From @demerphqOn 7/9/06, via RT roland @ roland-illig. de <perlbug-followup@perl.org> wrote:
Expressions of this form: (?:\\#|[^#])* Are always going to cause perls regex engine (indeed any NFA regex (?:[^#\\]+|(?:\\#?)+)* This is because A) the most likely to match possibility is first in It would be nice if perl knew this and did this optimisation IMO we should probably add something to the docs about this type of Yves -- |
From @jimcdemerphq wrote:
<mulling out loud> When I read the Owl, I found myself wondering if a *non-transparent* given: sub friedl_unroll ($) ... - simplify the regex - make readable
-jimc |
p5p@spam.wizbit.be - Status changed from 'open' to 'stalled' |
The limit as been doubled to 65K |
This ticket now comes down to that we do have recursion limits. I think that's a given in Perl, and we should just close this ticket. |
I do find it sad that we have a built-in hard limit: with perl-level recursion, IIRC we take care to store the bookkeeping on the heap rather than the C stack, and we limit it to a warning so that if you have the memory you can recurse more deeply. 2^16 as a hard limit regardless of your available memory seems rather restrictive for modern hardware. That said, Perl no longer seems to be the language of choice for bioinformatics, and I don't know who else has a real-world need for this - I don't remember it ever being an issue for my mathematical work. |
On Thu, Apr 14, 2022 at 06:07:15AM -0700, Hugo van der Sanden wrote:
I do find it sad that we have a built-in _hard_ limit: with perl-level
recursion, IIRC we take care to store the bookkeeping on the heap rather
than the C stack, and we limit it to a warning so that if you have the
memory you can recurse more deeply. 2^16 as a hard limit regardless of
your available memory seems rather restrictive for modern hardware.
regmatch() used to be recursive and so a small hard limit made sense. But
for the last 10-15 years we now store the recursing state in malloced
memory, so I no longer see any need for a hard limit.
…--
You never really learn to swear until you learn to drive.
|
perldiag says this
So is it no longer true that recursion is used sometimes? |
On Sun, 17 Apr 2022, 02:03 Karl Williamson, ***@***.***> wrote:
perldiag says this
=item Complex regular subexpression recursion limit (%d) exceeded
(W regexp) The regular expression engine uses recursion in complex
situations where back-tracking is required. Recursion depth is limited
to 32766, or perhaps less in architectures where the stack cannot grow
arbitrarily. ("Simple" and "medium" situations are handled without
recursion and are not subject to a limit.) Try shortening the string
under examination; looping in Perl code (e.g. with C) rather than
in the regular expression engine; or rewriting the regular expression so
that it is simpler or backtracks less. (See L for information
on I.)
So is it no longer true that recursion is used sometimes?
Regexec does not recurse anymore. It is a push down automata, but it's
"recursion" uses heap allocated storage, it does not use the C stack nor C
level recursion. The compiler and optimizer do use C level recursion.
Yves
… |
So is this message obsolete? I commented out the two places in regexec.c that stop looking and lead to this message, and ran the reduction program in this ticket. Everything worked. I propose removing those in early 5.37 and see what happens @shlomif, I can't find the original program you created the reduction from. |
Can you show me a patch of what you tried @khwilliamson please? |
I see. I personally would be fine with removing those blocks right away:
shouldn't break anything, the execution engine manages its own stack using heap memory.
Note that |
There is no recursion to exceed limits of.
There is no recursion to exceed limits of. This fixes Perl#8369
I have created a proper PR |
There is no recursion to exceed limits of. This fixes #8369
There is no recursion to exceed limits of. This fixes Perl#8369
This fixes GH Perl#19781
Migrated from rt.perl.org#38717 (status was 'stalled')
Searchable as RT38717$
The text was updated successfully, but these errors were encountered: