Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in regex appears in 5.20.1 #14236

Closed
p5pRT opened this issue Nov 13, 2014 · 16 comments
Closed

Memory leak in regex appears in 5.20.1 #14236

p5pRT opened this issue Nov 13, 2014 · 16 comments
Labels

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented Nov 13, 2014

Migrated from rt.perl.org#123198 (status was 'resolved')

Searchable as RT123198$

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 13, 2014

From arocker@vex.net

A memory leak from a previously-working regex has been posted in
LinkedIn's Perl group. The reporter tried but failed to submit it with
perlbug.

This is obviously not a satisfactory report, but if a maintainer contacts
me, I will try to get the necessary information supplied.

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 14, 2014

From @iabyn

On Thu, Nov 13, 2014 at 06​:30​:47AM -0800, arocker@​vex.net wrote​:

A memory leak from a previously-working regex has been posted in
LinkedIn's Perl group. The reporter tried but failed to submit it with
perlbug.

This is obviously not a satisfactory report, but if a maintainer contacts
me, I will try to get the necessary information supplied.

Yes, please supply some more info

--
All wight. I will give you one more chance. This time, I want to hear
no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers.
  -- Life of Brian

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 14, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 15, 2014

From arocker@vex.net

---------------------------- Original Message ----------------------------
From​: "Dave Mitchell via RT" <perlbug-followup@​perl.org>
Date​: Fri, November 14, 2014 7​:03 am
To​: arocker@​vex.net


The poster in LinkedIn's "Perl" discussion group is Nimrod Shlomo Chotzen

Initial message​:

I switched this week to using perl-5.20.1, and I try to run an old (but
working) script on it. in this loop in loop in one function which iterates
around 50000 time

foreach my $key_word (grep {/\w+/} @​keywords) {
if ($string_to_check =~
m/^\Q$key_word\E$|^\Q$key_word\E[\W]|[\W]\Q$key_word\E[s\W]|[\W]\Q$key_word\E$|^\Q$key_word\Eies|[\W]\Q$key_word\Eies|^\Q$key_word\Ees|[\W]\Q$key_word\Ees/i){
$found_key_words{lc($key_word)}=1;
}

}

the memory usage increases very fast. at the next call of that function it
continues to increase, until it just freezes my machine.


Second​:

I simplified the regexp to this​:
$string_to_check =~ m/(?<=\W)\Q$key_word\E(?=\W|(s|es|ies\W))/i)
and now it's working great. Though I still wonder when will the next
eruption of memory occur :)


Response to a request for sample data​:

there are no two different datasets. just different perls 5.18.2 and
5-20.1(where the memory leak occurs) the dataset is very simple​: about
50000 terms​: one words,two words or three words.

something like​: "lantern,green lantern,green lantern
movie,movie,reviews,bad reviews,bad movie,bad movie reviews"...I can't
give the real set...


It looks like a failure to free memory in the handling of regex patterns.
If that area was changed between 5.18.2 & 5.20, that's probably where it
is.
If this isn't sufficient, I'll try to elicit more clues.

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 17, 2014

From @iabyn

On Fri, Nov 14, 2014 at 09​:38​:38AM -0500, arocker@​Vex.Net wrote​:

The poster in LinkedIn's "Perl" discussion group is Nimrod Shlomo Chotzen

Initial message​:

I switched this week to using perl-5.20.1, and I try to run an old (but
working) script on it. in this loop in loop in one function which iterates
around 50000 time

foreach my $key_word (grep {/\w+/} @​keywords) {
if ($string_to_check =~
m/^\Q$key_word\E$|^\Q$key_word\E[\W]|[\W]\Q$key_word\E[s\W]|[\W]\Q$key_word\E$|^\Q$key_word\Eies|[\W]\Q$key_word\Eies|^\Q$key_word\Ees|[\W]\Q$key_word\Ees/i){
$found_key_words{lc($key_word)}=1;
}

}

I've managed to reproduce it with the following self-contained script​:

  my @​keywords;
  for (1..20_000) {
  push @​keywords, "lantern$_";
  push @​keywords, "green lantern$_";
  push @​keywords, "green lantern movie$_";
  }
  my $string_to_check = "xntsstnstnstnstnstnstnstbsgthsthsthsthsthshsrharhar";

  foreach my $key_word (grep {/\w+/} @​keywords) {
  if ($string_to_check =~
  m/^\Q$key_word\E$|^\Q$key_word\E[\W]|[\W]\Q$key_word\E[s\W]|[\W]\Q$key_word\E$|^\Q$key_word\Eies|[\W]\Q$key_word\Eies|^\Q$key_word\Ees|[\W]\Q$key_word\Ees/i
  ) {
  $found_key_words{lc($key_word)}=1;
  }

  }

and bisected it using this (the perl binary is passed as a command-line
arg)​:

  $x = `/usr/bin/time -v $ARGV[0] /home/davem/tmp/p 2>&1`;

  $x =~ /Maximum resident set size \(kbytes\)​: (\d+)/
  or exit 2;
  $n = $1;
  print "n=[$n]\n";
  exit 1 if $n > 1_000_000;
  exit 0;

It bisects to the following. I haven't looked further yet. Karl, is this
something you want to look at?

commit 3c075fe
Author​: Karl Williamson <public@​khwilliamson.com>
AuthorDate​: Sun Feb 2 22​:30​:10 2014 -0700
Commit​: Karl Williamson <public@​khwilliamson.com>
CommitDate​: Sun Feb 2 22​:59​:58 2014 -0700

  PATCH [perl #121144]​: \S, \W, etc fail for above ASCII
 
  There were three things wrong with these couple of lines of code.that
  help generate the synthetic start class (SSC).
 
  1) It used PL_regkind, instead of straight OP. This meant that /u
  matches were treated the same as /d matches since they both have the
  same regkind.
  2) For what it thought was just for /d, it used the complement of
  ASCII, which matches 128-INFINITY, whereas it wanted 128-255 only..
  3) It did a union of this complement, instead of a subtract of the
  non-complement, forgetting that we are about to complement the
  result, so that if we want the end result to have something, we
  better have the input not have that something, or the complementing
  will screw it up.

--
The warp engines start playing up a bit, but seem to sort themselves out
after a while without any intervention from boy genius Wesley Crusher.
  -- Things That Never Happen in "Star Trek" #17

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 17, 2014

From @khwilliamson

On 11/17/2014 05​:14 AM, Dave Mitchell wrote​:

It bisects to the following. I haven't looked further yet. Karl, is this
something you want to look at?

Not particularly, but I'm doing it anyway

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 18, 2014

From @khwilliamson

On 11/17/2014 05​:14 AM, Dave Mitchell wrote​:

On Fri, Nov 14, 2014 at 09​:38​:38AM -0500, arocker@​Vex.Net wrote​:

The poster in LinkedIn's "Perl" discussion group is Nimrod Shlomo Chotzen

Initial message​:

I switched this week to using perl-5.20.1, and I try to run an old (but
working) script on it. in this loop in loop in one function which iterates
around 50000 time

foreach my $key_word (grep {/\w+/} @​keywords) {
if ($string_to_check =~
m/^\Q$key_word\E$|^\Q$key_word\E[\W]|[\W]\Q$key_word\E[s\W]|[\W]\Q$key_word\E$|^\Q$key_word\Eies|[\W]\Q$key_word\Eies|^\Q$key_word\Ees|[\W]\Q$key_word\Ees/i){
$found_key_words{lc($key_word)}=1;
}

}

I've managed to reproduce it with the following self-contained script​:

 my @&#8203;keywords;
 for \(1\.\.20\_000\) \{
     push @&#8203;keywords\, "lantern$\_";
     push @&#8203;keywords\, "green lantern$\_";
     push @&#8203;keywords\, "green lantern movie$\_";
 \}
 my $string\_to\_check = "xntsstnstnstnstnstnstnstbsgthsthsthsthsthshsrharhar";

 foreach my $key\_word \(grep \{/\\w\+/\} @&#8203;keywords\) \{
     if \($string\_to\_check =~
     m/^\\Q$key\_word\\E$|^\\Q$key\_word\\E\[\\W\]|\[\\W\]\\Q$key\_word\\E\[s\\W\]|\[\\W\]\\Q$key\_word\\E$|^\\Q$key\_word\\Eies|\[\\W\]\\Q$key\_word\\Eies|^\\Q$key\_word\\Ees|\[\\W\]\\Q$key\_word\\Ees/i
     \) \{
         $found\_key\_words\{lc\($key\_word\)\}=1;
     \}

 \}

and bisected it using this (the perl binary is passed as a command-line
arg)​:

 $x = \`/usr/bin/time \-v  $ARGV\[0\] /home/davem/tmp/p 2>&1\`;

 $x =~ /Maximum resident set size \\\(kbytes\\\)&#8203;: \(\\d\+\)/
     or exit 2;
 $n = $1;
 print "n=\[$n\]\\n";
 exit 1 if $n > 1\_000\_000;
 exit 0;

It bisects to the following. I haven't looked further yet. Karl, is this
something you want to look at?

commit 3c075fe
Author​: Karl Williamson <public@​khwilliamson.com>

A fix is now smoking in

http​://perl5.git.perl.org/perl.git/shortlog/refs/heads/smoke-me/khw-leak

This passes your test, but I tried getting blead to fail with the
infrastructure in t/op/svleak.t, and couldn't, even knowing the exact
cause of the problem.

(For future readers trying to reproduce this, the file in the script
above "/home/davem/tmp/p" should contain the perl program designamted as
the "self-contained script".)

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 18, 2014

From @cpansprout

On Mon Nov 17 21​:26​:52 2014, public@​khwilliamson.com wrote​:

A fix is now smoking in

http​://perl5.git.perl.org/perl.git/shortlog/refs/heads/smoke-me/khw-
leak

This passes your test, but I tried getting blead to fail with the
infrastructure in t/op/svleak.t, and couldn't, even knowing the exact
cause of the problem.

I have just added a to-do test in commit 43275f0.

--

Father Chrysostomos

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 18, 2014

From @khwilliamson

Now fixed by commit 512e01a
in blead

Thanks for the help in tracking this down and making a test for it.

This should be a candidate for the next 5.20 maint release

--
Karl Williamson

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 18, 2014

@khwilliamson - Status changed from 'open' to 'resolved'

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 18, 2014

From @khwilliamson

Reopen so can be moved to pending release, instead of resolved
--
Karl Williamson

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 18, 2014

@khwilliamson - Status changed from 'resolved' to 'open'

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 18, 2014

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 18, 2014

From @cpansprout

On Tue Nov 18 09​:13​:53 2014, khw wrote​:

Now fixed by commit 512e01a
in blead

Thanks for the help in tracking this down and making a test for it.

This should be a candidate for the next 5.20 maint release

It gets my wote.

--

Father Chrysostomos

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 2, 2015

From @khwilliamson

Thanks for submitting this ticket

The issue should be resolved with the release today of Perl v5.22, available at http​://www.perl.org/get.html
If you find that the problem persists, feel free to reopen this ticket

--
Karl Williamson for the Perl 5 porters team

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 2, 2015

@khwilliamson - Status changed from 'pending release' to 'resolved'

@p5pRT p5pRT closed this Jun 2, 2015
@p5pRT p5pRT added the Severity Low label Oct 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.