Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in Perl v5.32.1 Regex #18604

Closed
perlscallion opened this issue Feb 23, 2021 · 17 comments
Closed

Memory Leak in Perl v5.32.1 Regex #18604

perlscallion opened this issue Feb 23, 2021 · 17 comments

Comments

@perlscallion
Copy link

There's a memory leak here. Perl v5.32.1 on Fedora 33
Just plain code; a subroutine, a regex, no modules

Steps to Reproduce
perl-leak.txt

Perl configuration
perl-V.txt

@perlscallion
Copy link
Author

Additional Information
Same code works on Fedora 32 (Perl v5.30.3)
perl-V-5.30.3.txt

Expected Result
The code will run indefinitely, taking 100% of one CPU. Memory use should be stable

Observed Result
With v5.32.1, memory use increases steadily; observed through htop.
The alternative "this doesn't leak" code doesn't leak...

@perlscallion
Copy link
Author

Additional Information
Also tested on Fedora 30, Perl v5.28.2, works OK
perl-V-5.28.2.txt

@hvds
Copy link
Contributor

hvds commented Feb 25, 2021

I can confirm the leak, it appears that the culprit is an invlist. With this code:

use Devel::Leak;
my $count = shift @ARGV;

$string = "STRING 1";

my $handle;
Devel::Leak::NoteSV($handle);
for (1 .. $count) {
    $x = test_leak($string, "string 1");
    $x = test_leak($string, "string 2");
}     
Devel::Leak::CheckSV($handle);
exit;

sub test_leak {
    local ($string, $match) = @_;
    $match =~ s/\s/\\s/g;  # This Leaks
    return ($string =~ /$match/i);
}  

.. and comparing the output of running with argument 1 or 2, it appears that an extra two copies of an invlist are created for each iteration, each looking like this:

new 0x55880214d018 : SV = INVLIST(0x55880213b450) at 0x55880214d018
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x5588021948a0 
    [0] 0x0085
    [2] 0x00A0
  CUR = 40
  LEN = 41

This appears to have been introduced after v5.30.0, I'll try a bisect.

@jkeenan
Copy link
Contributor

jkeenan commented Feb 25, 2021

I can confirm the leak, it appears that the culprit is an invlist. With this code:

Hugo, two points:

  • Does the leak at all depend on the fact that this code does not work under use strict;?
  • When I run your code, I only get this output:
[perlmonger: p5p] $ bleadperl  gh-18604-hvds.pl 1
new 0x800adeba0 : 
new 0x800adbb88 : 
new 0x800adbc78 : 
new 0x800adbcf0 : 
new 0x800a9e810 : 
[perlmonger: p5p] $ bleadperl  gh-18604-hvds.pl 2
new 0x800ade0f0 : 
new 0x800adeba0 : 
new 0x800adede0 : 
new 0x800adbb88 : 
new 0x800adbba0 : 
new 0x800adbc78 : 
new 0x800adbd20 : 

What am I missing? @hvds

Thank you very much.
Jim Keenan

@hvds
Copy link
Contributor

hvds commented Feb 25, 2021

Hugo, two points:

* Does the leak at all depend on the fact that this code does not work under `use strict;`?

No, adding our($string, $match, $x); at the top is enough to make it work under strict, and shows the same results.

* When I run your code, I only get this output:

[...]
What am I missing?

Devel::Leak requires a perl built with debugging enabled for full functionality.

@jkeenan
Copy link
Contributor

jkeenan commented Feb 25, 2021

* When I run your code, I only get this output:
[perlmonger: p5p] $ bleadperl  gh-18604-hvds.pl 1
new 0x800adeba0 : 
new 0x800adbb88 : 
new 0x800adbc78 : 
new 0x800adbcf0 : 
new 0x800a9e810 : 
[perlmonger: p5p] $ bleadperl  gh-18604-hvds.pl 2
new 0x800ade0f0 : 
new 0x800adeba0 : 
new 0x800adede0 : 
new 0x800adbb88 : 
new 0x800adbba0 : 
new 0x800adbc78 : 
new 0x800adbd20 : 

What am I missing? @hvds

To answer my own (second) question: I need a perl built with -DDEBUGGING. When I then install Devel::Leak, I get output like Hugo's.

@hvds
Copy link
Contributor

hvds commented Feb 25, 2021

I can confirm the leak, it appears that the culprit is an invlist. With this code:

(tightened up a bit from above)

#!/usr/bin/perl
use strict;
use warnings;

use Devel::Leak;
my $count = shift @ARGV;
my $handle;

Devel::Leak::NoteSV($handle);
for (1 .. $count) {
    test_leak('a b');
    test_leak('a c');
}
Devel::Leak::CheckSV($handle);
exit;

sub test_leak {
    my($match) = @_;
    $match =~ s/\s/\\s/g;  
    qr{$match};
    undef;
}

.. and comparing the output of running with argument 1 or 2, it appears that an extra two copies of an invlist are created for each iteration, each looking like this:

new 0x55880214d018 : SV = INVLIST(0x55880213b450) at 0x55880214d018
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x5588021948a0 
    [0] 0x0085
    [2] 0x00A0
  CUR = 40
  LEN = 41

This appears to have been introduced after v5.30.0, I'll try a bisect.

Manual bisect points to cfba2ec, @khwilliamson can you take a look?

@hvds
Copy link
Contributor

hvds commented Feb 25, 2021

Oh, it doesn't need s/// either, just qr{} on the same variable with two different strings, that need an invlist:

#!/usr/bin/perl
use strict;
use warnings;

use Devel::Leak;
my $count = shift @ARGV;
my $handle;

Devel::Leak::NoteSV($handle);
for (1 .. $count) {
    test_leak("a\\sb");
    test_leak("a\\sc");
}
Devel::Leak::CheckSV($handle);
exit;

sub test_leak {
    my($match) = @_;
    qr{$match};
    undef;
}

@khwilliamson
Copy link
Contributor

khwilliamson commented Feb 26, 2021 via email

@perlscallion
Copy link
Author

... tightened up a bit from above
... Oh, it doesn't need s/// either, just qr{}

Na ja... I missed a few tricks. Nevertheless, good that it could be replicated.
Good luck and Thanks!

@hvds
Copy link
Contributor

hvds commented Feb 26, 2021

I found a likely candidate, which caused me to look through that function for others. There were several that would leak if there is a syntax error or warnings are made fatal. I'm thinking the best solution would be to make these things mortal, and not try to catch all possible paths through the code and decrement the reference counts explicitly.

If there's a risk that such a "best solution" becomes somewhat complex, it may be worth having a commit first to fix the direct leak (that doesn't need syntax errors or fatal warnings), since that will be a priority to get to a maintenance release.

@hvds
Copy link
Contributor

hvds commented Feb 26, 2021

... tightened up a bit from above
... Oh, it doesn't need s/// either, just qr{}

Na ja... I missed a few tricks. Nevertheless, good that it could be replicated.
Good luck and Thanks!

No worries, we appreciate the reports, we have the expertise (usually) to minimize them. :)

khwilliamson added a commit that referenced this issue Feb 27, 2021
This fixes GH #18604.  The symptoms of that there was a path through the
code where a particular SV did not get its reference count decremented.

I did an audit of the function and came up with several other
possiblities that are included in this commit.

Further, there would be leaks for some instances of finding syntax
errors in the input pattern, or when warnings are fatalized.  Those
would require mortalizing some SVs, but that is beyond the scope of this
commit.
@khwilliamson
Copy link
Contributor

khwilliamson commented Feb 28, 2021 via email

@jkeenan
Copy link
Contributor

jkeenan commented Feb 28, 2021

smoke-me/khw-18604

Hugo's debugging program appears to have run successfully on this branch on FreeBSD-12 and OpenBSD-6.6. Smoke-test results of the branch (without the debugging program) will be available later today.

@hvds
Copy link
Contributor

hvds commented Feb 28, 2021

A minimal patch should be smoking at smoke-me/khw-18604 if someone wants to confirm that it fixes the issue.

Looks good to me.

khwilliamson added a commit that referenced this issue Feb 28, 2021
This fixes GH #18604.  There was a path through the code where a
particular SV did not get its reference count decremented.

I did an audit of the function and came up with several other
possiblities that are included in this commit.

Further, there would be leaks for some instances of finding syntax
errors in the input pattern, or when warnings are fatalized.  Those
would require mortalizing some SVs, but that is beyond the scope of this
commit.
@perlscallion
Copy link
Author

An overly simple question perhaps, how do I see when/whether this patch has got to 'whatever' distribution. Is there something I look for in 'perl -V'?

@dur-randir
Copy link
Member

An overly simple question perhaps, how do I see when/whether this patch has got to 'whatever' distribution. Is there something I look for in 'perl -V'?

It'll 100% get into the next release, 5.34.0. It might get into 5.32.2, but there's no definite date on that and no -V output to look for. You can either detect this from perl5322delta or by checking whether or not this commit will have been merged into the maint-5.32 branch.

jnavila added a commit to jnavila/po4a that referenced this issue May 20, 2021
This change mitigates mquinson#302 when perl exhibits the bug
Perl/perl5#18604

and seems to speed up flag matching.

fix mquinson#302
mquinson pushed a commit to mquinson/po4a that referenced this issue May 20, 2021
This change mitigates #302 when perl exhibits the bug
Perl/perl5#18604

and seems to speed up flag matching.

fix #302
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue May 24, 2021
= Core Enhancements

= Experimental Try/Catch Syntax

An initial experimental attempt at providing C<try>/C<catch> notation has
been added.

    use feature 'try';

    try {
        a_function();
    }
    catch ($e) {
        warn "An error occurred: $e";
    }

For more information, see L<perlsyn/"Try Catch Exception Handling">.

= C<qr/{,n}/> is now accepted

An empty lower bound is now accepted for regular expression quantifiers,
like C<{,3}>.

= Blanks freely allowed within but adjacent to curly braces

(in double-quotish contexts and regular expression patterns)

This means you can write things like S<C<\x{ FFFC }>> if you like.  This
applies to all such constructs, namely C<\b{}>, C<\g{}>, C<\k{}>,
C<\N{}>, C<\o{}>, and C<\x{}>; as well as the regular expression
quantifier C<{I<m>,I<n>}>.  C<\p{}> and C<\P{}> retain their
already-existing, even looser, rules mandated by the Unicode standard
(see L<perluniprops/Properties accessible through \p{} and \P{}>).

This ability is in effect regardless of the presence of the C</x>
regular expression pattern modifier.

Additionally, the comma in a regular expression braced quantifier may
have blanks (tabs or spaces) before and/or after the comma, like
S<C<qr/a{ 5, 7 }/>>.

= New octal syntax C<0oI<ddddd>>

It is now possible to specify octal literals with C<0o> prefixes,
as in C<0o123_456>, parallel to the existing construct to specify
hexadecimal literal C<0xI<ddddd>> and binary literal C<0bI<ddddd>>.
Also, the builtin C<oct()> function now accepts this new syntax.

See L<perldata/Scalar value constructors> and L<perlfunc/oct EXPR>.

= Performance Enhancements

=item *

Fix a memory leak in RegEx
[L<GH #18604|Perl/perl5#18604>]

= Modules and Pragmata

= New Modules and Pragmata

=item *

L<ExtUtils::PL2Bat> 0.004 has been added to the Perl core.

This module is a generalization of the C<pl2bat> script. It being a script has
led to at least two forks of this code; this module will unify them under one
implementation with tests.

(and lots more changes)
Corion pushed a commit to Corion/perl5 that referenced this issue Jun 20, 2021
This fixes GH Perl#18604.  There was a path through the code where a
particular SV did not get its reference count decremented.

I did an audit of the function and came up with several other
possiblities that are included in this commit.

Further, there would be leaks for some instances of finding syntax
errors in the input pattern, or when warnings are fatalized.  Those
would require mortalizing some SVs, but that is beyond the scope of this
commit.
atoomic pushed a commit to atoomic/perl5 that referenced this issue Mar 30, 2022
This fixes GH Perl#18604.  There was a path through the code where a
particular SV did not get its reference count decremented.

I did an audit of the function and came up with several other
possiblities that are included in this commit.

Further, there would be leaks for some instances of finding syntax
errors in the input pattern, or when warnings are fatalized.  Those
would require mortalizing some SVs, but that is beyond the scope of this
commit.

(cherry picked from commit 5f41fa4)
Signed-off-by: Nicolas Rochelemagne <rochelemagne@cpanel.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants