Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null pointer dereference in S_SvREFCNT_dec #16627

Closed
p5pRT opened this issue Jul 13, 2018 · 9 comments
Closed

Null pointer dereference in S_SvREFCNT_dec #16627

p5pRT opened this issue Jul 13, 2018 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 13, 2018

Migrated from rt.perl.org#133369 (status was 'new')

Searchable as RT133369$

@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2018

From @geeknik

./perl -e 'for$0(qw(0 0)){push@​r,qr/@​r(?{})/}' triggers a null pointer
dereference and segfault in v5.29.0-87-ga13f1de.

==10676==ERROR​: AddressSanitizer​: SEGV on unknown address 0x000000000000
(pc 0x0000007060c8 bp 0x7fffb4efe100 sp 0x7fffb4efe0e0 T0)
==10676==The signal is caused by a READ memory access.
==10676==Hint​: address points to the zero page.
  #0 0x7060c7 in S_SvREFCNT_dec /root/perl/./inline.h​:212​:11
  #1 0x7305f0 in S_free_codeblocks /root/perl/regcomp.c​:6268​:9
  #2 0x9b59ca in Perl_leave_scope /root/perl/scope.c
  #3 0x9df198 in Perl_dounwind /root/perl/pp_ctl.c​:1549​:9
  #4 0x5b5846 in S_my_exit_jump /root/perl/perl.c​:5240​:9
  #5 0x5c092e in Perl_my_failure_exit /root/perl/perl.c​:5227​:5
  #6 0x9e1d2f in Perl_die_unwind /root/perl/pp_ctl.c​:1796​:5
  #7 0x7ce081 in Perl_vcroak /root/perl/util.c​:1715​:5
  #8 0x7c6b5b in Perl_croak /root/perl/util.c​:1760​:5
  #9 0x70362b in S_reg /root/perl/regcomp.c
  #10 0x77618a in S_regatom /root/perl/regcomp.c​:12960​:15
  #11 0x77232c in S_regpiece /root/perl/regcomp.c​:12004​:11
  #12 0x762ea2 in S_regbranch /root/perl/regcomp.c​:11932​:18
  #13 0x6f451c in S_reg /root/perl/regcomp.c​:11663​:10
  #14 0x77618a in S_regatom /root/perl/regcomp.c​:12960​:15
  #15 0x77232c in S_regpiece /root/perl/regcomp.c​:12004​:11
  #16 0x762ea2 in S_regbranch /root/perl/regcomp.c​:11932​:18
  #17 0x6f451c in S_reg /root/perl/regcomp.c​:11663​:10
  #18 0x6dc4ba in Perl_re_op_compile /root/perl/regcomp.c​:7224​:9
  #19 0x9c4274 in Perl_pp_regcomp /root/perl/pp_ctl.c​:108​:14
  #20 0x7c17d8 in Perl_runops_debug /root/perl/dump.c​:2536​:23
  #21 0x5b0831 in S_run_body /root/perl/perl.c
  #22 0x5afe7b in perl_run /root/perl/perl.c​:2617​:2
  #23 0x50da47 in main /root/perl/perlmain.c​:122​:9
  #24 0x7f39127c182f in __libc_start_main
/build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c​:291
  #25 0x436d28 in _start (/root/perl/perl+0x436d28)

AddressSanitizer can not provide additional info.
SUMMARY​: AddressSanitizer​: SEGV /root/perl/./inline.h​:212​:11 in
S_SvREFCNT_dec
==10676==ABORTING

@khwilliamson
Copy link
Contributor

This is still present in v39.10

==62279==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x55e6f775dee0 bp 0x7fffab24da10 sp 0x7fffab24d9d0 T0)
==62279==The signal is caused by a READ memory access.
==62279==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used.
#0 0x55e6f775dee0 in Perl_SvREFCNT_dec_NN /home/khw/perl/locales/straight/./sv_inline.h:710:14
#1 0x55e6f7777ae2 in S_free_codeblocks /home/khw/perl/locales/straight/regcomp.c:498:13
#2 0x55e6f7a3c0cc in Perl_leave_scope /home/khw/perl/locales/straight/scope.c:1537:13
#3 0x55e6f74bcbb5 in Perl_dounwind /home/khw/perl/locales/straight/pp_ctl.c:1775:9
#4 0x55e6f707e980 in S_my_exit_jump /home/khw/perl/locales/straight/perl.c:5519:9
#5 0x55e6f7093ca7 in Perl_my_failure_exit /home/khw/perl/locales/straight/perl.c:5506:5
#6 0x55e6f74c3996 in Perl_die_unwind /home/khw/perl/locales/straight/pp_ctl.c:2092:5
#7 0x55e6f7d7f9e6 in Perl_vcroak /home/khw/perl/locales/straight/util.c:1897:5
#8 0x55e6f7d8077d in Perl_croak /home/khw/perl/locales/straight/util.c:1948:5
#9 0x55e6f77446a4 in S_reg /home/khw/perl/locales/straight/regcomp.c:3797:21
#10 0x55e6f77c2880 in S_regatom /home/khw/perl/locales/straight/regcomp.c:5609:15
#11 0x55e6f77b55ae in S_regpiece /home/khw/perl/locales/straight/regcomp.c:4745:11
#12 0x55e6f778d187 in S_regbranch /home/khw/perl/locales/straight/regcomp.c:4510:18
#13 0x55e6f7755896 in S_reg /home/khw/perl/locales/straight/regcomp.c:4182:10
#14 0x55e6f77c2880 in S_regatom /home/khw/perl/locales/straight/regcomp.c:5609:15
#15 0x55e6f77b55ae in S_regpiece /home/khw/perl/locales/straight/regcomp.c:4745:11
#16 0x55e6f778d187 in S_regbranch /home/khw/perl/locales/straight/regcomp.c:4510:18
#17 0x55e6f7755896 in S_reg /home/khw/perl/locales/straight/regcomp.c:4182:10
#18 0x55e6f76fb3be in Perl_re_op_compile /home/khw/perl/locales/straight/regcomp.c:1714:9
#19 0x55e6f7494e3f in Perl_pp_regcomp /home/khw/perl/locales/straight/pp_ctl.c:119:14
#20 0x55e6f7155438 in Perl_runops_debug /home/khw/perl/locales/straight/dump.c:2866:23
#21 0x55e6f7074fa2 in S_run_body /home/khw/perl/locales/straight/perl.c:2889:9
#22 0x55e6f707219d in perl_run /home/khw/perl/locales/straight/perl.c:2804:9
#23 0x55e6f6f58cf4 in main /home/khw/perl/locales/straight/perlmain.c:127:9
#24 0x7fad5cf0dd8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#25 0x7fad5cf0de3f in __libc_start_main csu/../csu/libc-start.c:392:3
#26 0x55e6f6e9ad24 in _start (/home/khw/perl/locales/straight/perl+0x28fd24) (BuildId: 0cb917dd7204b49daa8373c79aba4404dbac8aea)

@mauke
Copy link
Contributor

mauke commented Apr 21, 2024

I don't get a null pointer dereference, but I'm not using ASan either.

$ ./perl -Ilib -Mre=eval -e 'for$0(qw(0 0)){push@r,qr/@r(?{})/}' 
Segmentation fault
$ gdb --args ./perl -Ilib -Mre=eval -e 'for$0(qw(0 0)){push@r,qr/@r(?{})/}'
...
(gdb) r
Starting program: /home/mauke/Projects/perl5/perl -Ilib -Mre=eval -e for\$0\(qw\(0\ 0\)\)\{push@r,qr/@r\(\?\{\}\)/\}
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
S_compile_runtime_code (plen=<optimized out>, pat=0x555555a520f0 "(?^:(?{}))(?{})", pRExC_state=0x7fffffffd280) at regcomp.c:1053
1053                assert(pat[src->start] == '(');
(gdb) bt
#0  S_compile_runtime_code (plen=<optimized out>, pat=0x555555a520f0 "(?^:(?{}))(?{})", pRExC_state=0x7fffffffd280) at regcomp.c:1053
#1  Perl_re_op_compile (patternp=<optimized out>, pat_count=<optimized out>, expr=<optimized out>, eng=0x555555a235c0 <PL_core_reg_engine>, old_re=0x555555a2f678, is_bare_re=<optimized out>, orig_rx_flags=0, pm_flags=2013265920)
    at regcomp.c:1602
#2  0x000055555568c1e0 in Perl_pp_regcomp () at pp_ctl.c:121
#3  0x00005555555f0252 in Perl_runops_debug () at dump.c:2866
#4  0x00005555555d5ed0 in S_run_body (oldscope=1) at perl.c:2865
#5  perl_run (my_perl=<optimized out>) at perl.c:2780
#6  0x000055555559e19c in main (argc=<optimized out>, argv=<optimized out>, env=<optimized out>) at perlmain.c:127
(gdb) p src
$1 = (struct reg_code_block *) 0x555555a63350
(gdb) p src->start
$2 = 2314885530818453536
(gdb) 

@mauke
Copy link
Contributor

mauke commented Apr 21, 2024

Slightly reduced:

perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'

This segfaults about 50% of the time for me.

@jkeenan
Copy link
Contributor

jkeenan commented Apr 21, 2024

Slightly reduced:

perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'

This segfaults about 50% of the time for me.

... which is what I also get on Ubuntu Linux 22.04 LTS. But on FreeBSD-13, I don't get any message or segfault.

$ perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'
$ perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'
$ perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'
$ perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'
$ perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'
$ perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'
$ perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'
$ perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'
$ perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'
$ perl -e 'use re "eval"; my @x = qr/(?{})/; push @x, qr/(?{})@x/'

mauke added a commit to mauke/perl5 that referenced this issue Apr 22, 2024
n looks like a fill pointer for pRExC_state->code_blocks->cb. It is a
local variable in S_concat_pat that is incremented in several places.
S_concat_pat also calls itself, but the recursive call has its own n.
That feels wrong.

Use a pointer to make all recursive invocations of S_concat_pat share
the same n, which is actually "allocated" at the top level (where
S_concat_pat is invoked with pn = NULL).

Cf. Perl#16627.
mauke added a commit to mauke/perl5 that referenced this issue Apr 22, 2024
n looks like a fill pointer for pRExC_state->code_blocks->cb. It is a
local variable in S_concat_pat that is incremented in several places.
S_concat_pat also calls itself, but the recursive call has its own n.
That feels wrong.

Use a pointer to make all recursive invocations of S_concat_pat share
the same n, which is actually "allocated" at the top level (where
S_concat_pat is invoked with pn = NULL).

Cf. Perl#16627.
@mauke
Copy link
Contributor

mauke commented Apr 22, 2024

I have a patch that seems to fix this issue: mauke@039cadf

However, this patch was written based on vibes, not any deep understanding of the code. I'd appreciate any review of what's going on here. @iabyn ?

@iabyn
Copy link
Contributor

iabyn commented Apr 22, 2024

I have a very similar patch in development (passing address of n as a pointer), but there are some subtleties you've missed (notably at line 734, *pn shouldn't be reset to zero). I wasn't entirely happy with my branch, so I was going to leave it until tomorrow to see if there was a more elegant fix. Probably best to leave this one to me - it was my mistake in the first place.

iabyn added a commit that referenced this issue May 8, 2024
Split the 'count' field of reg_code_blocks structures into separate
'count' and 'size' fields to make the code less fragile; and as an
intended side-effect, fix GH #16627.

Background:

When a pattern includes embedded perl code, such as /(?{ CODE })/, then
at compile-time the op trees associated with each of those code blocks
are stored within the compiled regex, in a reg_code_blocks structure.

This structure contains some basic info, plus a pointer to an array of
reg_code_block structures, each of which contains a pointer to the
optree for that code block, plus string offsets to where the (?{..}) or
similar expression starts and ends within the pattern string.

For a runtime pattern, perl tries to reuse any original compiled code
blocks rather than recompiling them, to maintain correct closure
behaviour.

So for example, in the following:

    my $x = 1;
    { my $x = 2; $r = qr/(??{$x})/ }
    my $y = 3;
    my $s = '(??{$y})';

    my $pat = qr/A (??{$x}) B $r C $s/x;

at perl compile time, the two '$x' code blocks are compiled, and their
optrees stored.

At runtime, when the $pat pattern is compiled, the third code block,
'$y', is compiled, and the two earlier optrees are retrieved. A new
three-element 'struct reg_code_blocks' array is malloc()ed, and the
pointers to the two old, and one new, optrees are stored in it.
Overall, $pat has the same effect as qr/A1B2C3/.

The assembly of this reg_code_blocks array is mostly performed by
S_concat_pat() and S_compile_runtime_code(). It is done incrementally,
since the total number of code blocks isn't known in advance.

Prior to this commit, the array was often realloced() and grown one
element at at a time, as each new run-time code block was discovered,
with a corresponding pRExC_state->code_blocks->count++.

This count field served twin purposes: it indicated both how many code
blocks had been found and stored so far, and the malloc()ed size of the
array. But some parts of the regex compiler allocate more than one slot
at a time, and so the two meanings of the 'count' field temporarily
diverge. This became noticeable when S_concat_pat() recursed to
interpolate the contents of an array, such as qr/$a$b@c/, where
interpolating $a, $b was done iteratively at the top level, then it
recursed to process each element of @c. S_concat_pat() had a local var,
'int n', which counted how many code blocks had been found so far, and
this value sometimes represented the difference between the two meanings
of the 'count' field.

However when it recursed, n started from zero again and things got out
of whack, which led to GH #16627. The bug in that ticket can be reduced
to:

    my @x = ( qr/(?{A})/ );
    qr/(?{B})@x/;

Here the B code block is stored in pRExC_state->code_blocks->cb[0],
but then S_concat_pat recurses, n is reset to 0, and the A code block is
also stored into slot 0. Then things would start to crash.

The quick and dirty fix would be to share n between recursive calls to
S_concat_pat(), by passing a pointer to it. Instead, this commit takes
the approach of adding a 'size' field to pRExC_state->code_blocks,
so that ->count now only indicates the current number of code blocks
stored (replacing the local var n) while ->size indicates the current
number of slots malloc()ed.

This makes the code more conventional and simpler to understand, and
allows the realloc() to pre-allocate rather than incrementing the array
size by 1 each time. By removing the fragile double meaning of the
'count' field, it should make any future bugs easier to diagnose, at the
cost of this initial commit being more complex.
@iabyn
Copy link
Contributor

iabyn commented May 8, 2024

I've now created PR #22201, which has a more general fix to this issue.

iabyn added a commit that referenced this issue May 20, 2024
Split the 'count' field of reg_code_blocks structures into separate
'count' and 'size' fields to make the code less fragile; and as an
intended side-effect, fix GH #16627.

Background:

When a pattern includes embedded perl code, such as /(?{ CODE })/, then
at compile-time the op trees associated with each of those code blocks
are stored within the compiled regex, in a reg_code_blocks structure.

This structure contains some basic info, plus a pointer to an array of
reg_code_block structures, each of which contains a pointer to the
optree for that code block, plus string offsets to where the (?{..}) or
similar expression starts and ends within the pattern string.

For a runtime pattern, perl tries to reuse any original compiled code
blocks rather than recompiling them, to maintain correct closure
behaviour.

So for example, consider the following:

    my $x = 1;
    { my $x = 2; $r = qr/(??{$x})/ }
    my $y = 3;
    my $s = '(??{$y})';

    my $pat = qr/A (??{$x}) B $r C $s/x;

At perl compile time, the two '$x' code blocks are compiled, and their
optrees stored.

At runtime, when the $pat pattern is compiled, the third code block,
'$y', is compiled, and the two earlier optrees are retrieved. A new
three-element 'struct reg_code_block' array is malloc()ed, and the
pointers to the two old, and one new, optrees are stored in it.

So when $pat gets compiled, it becomes equivalent to:

    qr/A (??{$x}) B (??{$x}) C (??{$y})/x;

except that the $x's have different values since they are from different
closures. When the pattern is executed, the sub-patterns returned by the
various (??{..})'s result in $pat having the same overall effect as
qr/A1B2C3/.

The assembly of this reg_code_block array is mostly performed by
S_concat_pat() and S_compile_runtime_code(). It is done incrementally,
since the total number of code blocks isn't known in advance.

Prior to this commit, the array was often realloced() and grown one
element at a time as each new run-time code block was discovered, with
a corresponding pRExC_state->code_blocks->count++.

This count field served twin purposes: it indicated both how many code
blocks had been found and stored so far, and the malloc()ed size of the
array. But some parts of the regex compiler allocate more than one slot
at a time, and so the two meanings of the 'count' field temporarily
diverge. This became noticeable when S_concat_pat() recursed to
interpolate the contents of an array, such as qr/$a$b@c/, where
interpolating $a, $b was done iteratively at the top level, then it
recursed to process each element of @c. S_concat_pat() had a local var,
'int n', which counted how many code blocks had been found so far, and
this value sometimes represented the difference between the two meanings
of the 'count' field.

However when it recursed, n started from zero again and things got out
of whack, which led to GH #16627. The bug in that ticket can be reduced
to:

    my @x = ( qr/(?{A})/ );
    qr/(?{B})@x/;

Here the B code block is stored in pRExC_state->code_blocks->cb[0],
but then S_concat_pat() recurses, n is reset to 0, and the A code block
is also stored into slot 0. Then things would start to crash.

The quick and dirty fix would be to share n between recursive calls to
S_concat_pat(), by passing a pointer to it. Instead, this commit takes
the approach of adding a 'size' field to pRExC_state->code_blocks,
so that ->count now only indicates the current number of code blocks
stored (replacing the local var n) while ->size indicates the current
number of slots malloc()ed.

This makes the code more conventional and simpler to understand, and
allows the realloc() to pre-allocate rather than incrementing the array
size by 1 each time. By removing the fragile double meaning of the
'count' field, it should make any future bugs easier to diagnose, at the
cost of this initial commit being more complex.
iabyn added a commit that referenced this issue Jun 18, 2024
Split the 'count' field of reg_code_blocks structures into separate
'count' and 'size' fields to make the code less fragile; and as an
intended side-effect, fix GH #16627.

Background:

When a pattern includes embedded perl code, such as /(?{ CODE })/, then
at compile-time the op trees associated with each of those code blocks
are stored within the compiled regex, in a reg_code_blocks structure.

This structure contains some basic info, plus a pointer to an array of
reg_code_block structures, each of which contains a pointer to the
optree for that code block, plus string offsets to where the (?{..}) or
similar expression starts and ends within the pattern string.

For a runtime pattern, perl tries to reuse any original compiled code
blocks rather than recompiling them, to maintain correct closure
behaviour.

So for example, consider the following:

    my $x = 1;
    { my $x = 2; $r = qr/(??{$x})/ }
    my $y = 3;
    my $s = '(??{$y})';

    my $pat = qr/A (??{$x}) B $r C $s/x;

At perl compile time, the two '$x' code blocks are compiled, and their
optrees stored.

At runtime, when the $pat pattern is compiled, the third code block,
'$y', is compiled, and the two earlier optrees are retrieved. A new
three-element 'struct reg_code_block' array is malloc()ed, and the
pointers to the two old, and one new, optrees are stored in it.

So when $pat gets compiled, it becomes equivalent to:

    qr/A (??{$x}) B (??{$x}) C (??{$y})/x;

except that the $x's have different values since they are from different
closures. When the pattern is executed, the sub-patterns returned by the
various (??{..})'s result in $pat having the same overall effect as
qr/A1B2C3/.

The assembly of this reg_code_block array is mostly performed by
S_concat_pat() and S_compile_runtime_code(). It is done incrementally,
since the total number of code blocks isn't known in advance.

Prior to this commit, the array was often realloced() and grown one
element at a time as each new run-time code block was discovered, with
a corresponding pRExC_state->code_blocks->count++.

This count field served twin purposes: it indicated both how many code
blocks had been found and stored so far, and the malloc()ed size of the
array. But some parts of the regex compiler allocate more than one slot
at a time, and so the two meanings of the 'count' field temporarily
diverge. This became noticeable when S_concat_pat() recursed to
interpolate the contents of an array, such as qr/$a$b@c/, where
interpolating $a, $b was done iteratively at the top level, then it
recursed to process each element of @c. S_concat_pat() had a local var,
'int n', which counted how many code blocks had been found so far, and
this value sometimes represented the difference between the two meanings
of the 'count' field.

However when it recursed, n started from zero again and things got out
of whack, which led to GH #16627. The bug in that ticket can be reduced
to:

    my @x = ( qr/(?{A})/ );
    qr/(?{B})@x/;

Here the B code block is stored in pRExC_state->code_blocks->cb[0],
but then S_concat_pat() recurses, n is reset to 0, and the A code block
is also stored into slot 0. Then things would start to crash.

The quick and dirty fix would be to share n between recursive calls to
S_concat_pat(), by passing a pointer to it. Instead, this commit takes
the approach of adding a 'size' field to pRExC_state->code_blocks,
so that ->count now only indicates the current number of code blocks
stored (replacing the local var n) while ->size indicates the current
number of slots malloc()ed.

This makes the code more conventional and simpler to understand, and
allows the realloc() to pre-allocate rather than incrementing the array
size by 1 each time. By removing the fragile double meaning of the
'count' field, it should make any future bugs easier to diagnose, at the
cost of this initial commit being more complex.
@iabyn
Copy link
Contributor

iabyn commented Jun 18, 2024

Now fixed with v5.41.0-45-g40727c420c.

@iabyn iabyn closed this as completed Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants