Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re eval stack unwinding #16197

Open
p5pRT opened this issue Oct 12, 2017 · 9 comments
Open

re eval stack unwinding #16197

p5pRT opened this issue Oct 12, 2017 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 12, 2017

Migrated from rt.perl.org#132277 (status was 'new')

Searchable as RT132277$

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 2017

From @hvds

Created by @hvds

[Re-edited after github import]

The code below shows an unexpected difference in stack unwinding the second time we enter a GOSUB inside a SUSPEND, that does not happen if only one or neither of those is present. This causes problems, for example, if you attempt to use SUSPEND inside rules in a Regexp::Grammars grammar.

Bisect suggests this has always been there, so I'm guessing this is original behaviour preserved through DaveM's rework of the handling of regexp code blocks.

I haven't yet attempted to follow this into the code, that's next. (But I hope Dave will take a look too.)

The use re qw{ debug } and show_stack() are optional, the intent is to show via the Frame->new and Frame->DESTROY diagnostics that when running the program with argument 0, 1, 2 or 3, the behaviour is identical for 0, 1 or 2, but differs in stack unwinding for 3. This is the full match for that case:

'a9b8' =~ m{
    ^ ( [a-z]
        (?>                  (?# 1: cut or uncut )
            (?{ local @stack = push_frame('pre-D') })
            (?&D)            (?# 2: gosub or paren )
            (?{ local @stack = push_frame('post-D') })
        )    
    )* $
    (?(DEFINE) (?<D> [0-9] ) )
}x

I assume this is a bug primarily because I can see no reason for case 3 to differ from case 2.

Hugo

#!perl -w
use strict;
use re qw{ eval };
use re qw{ debug };

package Frame {
    my $next_frame;
    sub new {
        my($class, $tag) = @_;
        $tag .= ' ' . ++$next_frame;
        print "add frame $tag\n";
        return bless \$tag, $class;
    }
    sub DESTROY { print "remove frame ${ $_[0] }\n"; }
};

our @stack = Frame->new('start');
sub show_stack {
    print @_, sprintf "[%s]\n",
    join ', ', map $$_, @stack;
}

sub push_frame {
    my($tag) = @_;
    my $new = Frame->new($tag);
    show_stack("before push $tag: ");
    return (@stack, $new);
}

my @re = map {
    # pattern to match a string of (letter-digit) pairs
    my $res = sprintf q{
        ^ ( [a-z]
            %s                (?# 1: cut or uncut )
            (?{ local @stack = push_frame('pre-D') })
            %s                (?# 2: gosub or paren )
            (?{ local @stack = push_frame('post-D') })
            )    
        )* $
        (?(DEFINE) (?<D> [0-9] ) )
    }, ($_ & 1) ? '(?>'   : '(?:',         # 1: cut or uncut 
       ($_ & 2) ? '(?&D)' : '( [0-9] )';   # 2: gosub or paren
    qr{$res}x;
} 0 .. 3;

my $string = 'a9b8';
my $arg = shift(@ARGV) || 0;
my $legend = join ' ',
    ($arg & 1) ? 'cut' : 'uncut',
    ($arg & 2) ? 'gosub' : 'paren';
if ($string =~ $re[$arg]) {
    print "ok $string ($legend)\n";
    show_stack('final');
} else {
    print "not ok $string ($legend)\n";
}   
__END__
Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.27.5:

Configured by hv at Wed Oct 11 09:20:09 BST 2017.

Summary of my perl5 (revision 5 version 27 subversion 5) configuration:
  Commit id: 0c541dc5633a341cf44b818014b58e7f8be532e9
  Platform:
    osname=linux
    osvers=3.13.0-101-generic
    archname=x86_64-linux
    uname='linux shad2 3.13.0-101-generic #148-ubuntu smp thu oct 20 22:08:32 utc 2016 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Dcc=gcc -Dprefix=/opt/blead -Doptimize=-g -O6 -Dusedevel -Uversiononly'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='gcc'
    ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-g -O6'
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion=''
    gccversion='4.8.4'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='gcc'
    ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /lib64 /usr/lib64
    libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.19.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.19'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -g -O6 -L/usr/local/lib -fstack-protector'



@INC for perl 5.27.5:
    /opt/blead/lib/perl5/site_perl/5.27.5/x86_64-linux
    /opt/blead/lib/perl5/site_perl/5.27.5
    /opt/blead/lib/perl5/5.27.5/x86_64-linux
    /opt/blead/lib/perl5/5.27.5
    /opt/blead/lib/perl5/site_perl/5.21.9
    /opt/blead/lib/perl5/site_perl


Environment for perl 5.27.5:
    HOME=/home/hv
    LANG=C
    LANGUAGE=en_GB:en
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/hv/bin:/and/more
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@hvds
Copy link
Contributor

hvds commented Jan 23, 2020

Confirmed this problem persists in blead@6568ef821.

@hvds
Copy link
Contributor

hvds commented Jan 24, 2020

I've spent some time trying to trace through this. The unexpected stack unwinding (which results in the diagnostic "remove frame post-D 3" appearing prematurely) is triggered in the FREETMPS of the PP(nextstate) that is the first op of the CALLRUNOPS invoked by the third EVAL. At this point in the above program, running with argument 1 vs argument 3, in both cases we're trying to go from PL_tmps_ix == 16 down to PL_tmps_floor == 6. The last 4 of those (PL_tmps_stack[6..9]) look like this with argument 1:

{   
    sv_any = 0x555555caa3d0,
    sv_refcnt = 1,
    sv_flags = 541699,  # SVt_PV, SVf_POK, SVp_POK, SVs_TEMP
    sv_u.svu_pv = 0x555555c92ee0 "EVAL"
}, {
    sv_any = 0x555555cf3118,
    sv_refcnt = 2,
    sv_flags = 2049,    # SVt_IV, SVf_ROK
    sv_u.svu_pv = 0x555555c87028 ""
}, {
    sv_any = 0x555555d00f98,
    sv_refcnt = 2,
    sv_flags = 2049,    # SVt_IV, SVf_ROK
    sv_u.svu_pv = 0x555555c62f58 "\340\324\305UUU"
}, {
    sv_any = 0x555555d00fe0,
    sv_refcnt = 2,
    sv_flags = 2049,    # SVt_IV, SVf_ROK
    sv_u.svu_pv = 0x555555d010e0 "\020\325\305UUU"
}

.. and with argument 3 it is identical, except the last 3 of those have a refcnt of 1. The refcnt_dec of the last of them (PL_tmps_stack[9]) triggers the releasing of the frame object.

Tracking where those refcounts diverge between the two cases with a hardware watchpoint on PL_tmps_stack[9]->sv_refcnt points me to the regcpblow(ST.cp); at regexec.c:7777, part of the case EVAL_postponed_AB logic, which is described in commit 4ee1652 with:

    The existing backtrack state EVAL_AB has been renamed EVAL_postponed_AB
    to make it clear it's only used on postponed /(??{A})B/ regexes, [...]

.. but in this case I assume the EVAL_postponed_AB state has been pushed via the GOSUB case which finishes with:

            /* and then jump to the code we share with EVAL */
            goto eval_recurse_doit;

@iabyn can you maybe explain a bit more about the difference between EVAL_postponed_AB and EVAL_B, and whether it is correct that GOSUB is using the former? I've been focused enough on regcomp that it's quite a while since I delved into regexec.

In case it helps here are the one-linish versions of the two cases:

# arg=1
./perl -e 'package Frame { my $next_frame; sub new { my($class, $tag) = @_; $tag .= " " . ++$next_frame; return bless \$tag, $class } sub DESTROY { print "remove frame ${ $_[0] }\n" } } our @stack = Frame->new("start"); sub push_frame { my($tag) = @_; my $new = Frame->new($tag); return (@stack, $new) } "a9b8" =~ m{ ^ ( [a-z] (?> (?{ local @stack = push_frame("pre-D") }) ( [0-9] ) (?{ local @stack = push_frame("post-D") }) ) )* $ (?(DEFINE) (?<D> [0-9] ) ) }x'

# arg=3
./perl -e 'package Frame { my $next_frame; sub new { my($class, $tag) = @_; $tag .= " " . ++$next_frame; return bless \$tag, $class } sub DESTROY { print "remove frame ${ $_[0] }\n" } } our @stack = Frame->new("start"); sub push_frame { my($tag) = @_; my $new = Frame->new($tag); return (@stack, $new) } "a9b8" =~ m{ ^ ( [a-z] (?> (?{ local @stack = push_frame("pre-D") }) (?&D) (?{ local @stack = push_frame("post-D") }) ) )* $ (?(DEFINE) (?<D> [0-9] ) ) }x'

Hugo

@hvds
Copy link
Contributor

hvds commented Jan 24, 2020

Tracking where those refcounts diverge between the two cases points me to the regcpblow(ST.cp); at regexec.c:7777, part of the case EVAL_postponed_AB logic, [...]
.. but in this case I assume the EVAL_postponed_AB state has been pushed via the GOSUB case which finishes with:

            /* and then jump to the code we share with EVAL */
            goto eval_recurse_doit;

Oh, it's a bit more complicated than that: -Mre=Debug,ALL shows there are actually 2 EVAL_postponed_AB in quick succession, one pushed by the (non-deferred) EVAL, the other by the GOSUB. Hacking the code to skip the regcpblow in just one case - either if it came from GOSUB or if it came from EVAL - does not fix the problem, but hacking it out entirely does. If my hackery was accurate, at least.

@hvds
Copy link
Contributor

hvds commented Jan 24, 2020

$^R is also a confounding factor, since it is capturing the unintended return from the evals. Avoiding that, we see an additional premature unwinding, and more variation. With the main pattern:

m{
  ^ (
    [a-z]
    (?>
      (?{ local @stack = push_frame("pre-D"); () })
      (?&D)
      (?{ local @stack = push_frame("post-D"); () })
     )
  )* $
  (?(DEFINE) (?<D> [0-9] ) )
}x

.. I see remove frame post-D 3 between the first pair of SUCCEED: subpattern success... and CLOSE1 and remove frame post-D 5 between the second pair.

Replacing (?&D) with ([0-9]) I see no premature removals, and we finish with:

Match successful!
remove frame post-D 5
remove frame pre-D 4
remove frame post-D 3
remove frame pre-D 2
remove frame start 1

Replacing instead (?> with (?:, I see no premature removals, but we now finish with:

remove frame post-D 5
remove frame pre-D 4
remove frame post-D 3
Match successful!
remove frame pre-D 2
remove frame start 1

My expectation is that none of the localizations should be unwound unless we backtrack, and there is no backtracking going on here.

@demerphq
Copy link
Contributor

demerphq commented Jan 25, 2020 via email

@demerphq
Copy link
Contributor

demerphq commented Jan 25, 2020 via email

@demerphq
Copy link
Contributor

demerphq commented Jan 25, 2020 via email

@hvds
Copy link
Contributor

hvds commented Jan 25, 2020

I have worked on all of this at some point or another, but I am not sure I understand what you are reporting as wrong.

Precisely the last line you quoted of mine: my expectation is that none of the localizations should be unwound unless we backtrack, and there is no backtracking going on here.

In this bit in particular: the "remove frame" diagnostic shows that some level of localization of @stack has been unwound, which I don't believe should have been.

$ ./perl -Ilib hv_bug.pl 3
[...]

                             |   3|         #1   WHILEM_A_max
                             |   4|     Setting an EVAL scope, savestack=49,
  re EVAL PL_op=0x563ab08ba270
remove frame post-D 3
add frame pre-D 4
before push pre-D: [start 1, pre-D 2]
                             |   4|     push #4   EVAL_B
                             |   4|          #3   IFMATCH_A  yes

Also, are you aware of use re Debug => 'ALL'; use re Debug => 'State'; use re Debug => 'STATE'; all of which allow you see what is happening in the regex engine state machine.

I've used ALL occasionally, I haven't looked specifically at STATE.

Hugo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants