Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks (doubles the input buffer size at regular intervals, with no limit on that) #154

Open
gitcnd opened this issue May 20, 2022 · 10 comments

Comments

@gitcnd
Copy link

gitcnd commented May 20, 2022

Seems to indefinitely grow the input buffer by double at regular intervals, resulting in system memory starvation eventually.

Example of problem:-

# echo '  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND ';for I in {0..999};do top -n 1 | grep ipcrun_leak.pl;sleep 10;done         
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND 
15660 root      20   0  129696  56252   4724 S  47.1   0.1   0:03.41 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0  181544 135032   4724 R  41.2   0.3   0:08.12 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0  224924 169084   4724 S  37.5   0.4   0:12.57 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0  292704 239724   4724 S  43.8   0.6   0:17.12 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0 2445980 404572   4724 S  35.3   1.0   0:21.56 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0 2445980 404572   4724 S  43.8   1.0   0:26.09 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0 2445980 404572   4724 S  43.8   1.0   0:30.59 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0 2504816 486744   4724 S  41.2   1.2   0:35.19 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0 2578364 570816   4724 R  50.0   1.4   0:39.81 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0 2670296 652012   4724 S  41.2   1.6   0:44.40 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0 2785216 735044   4724 S  43.8   1.8   0:49.00 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0 2928860   1.3g   4724 R  43.8   3.4   0:53.76 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0 2928860   1.3g   4724 R  47.1   3.4   0:58.36 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0 2928860   1.3g   4724 R  35.3   3.4   1:02.89 perl ipcrun_leak.pl                                                                                                                                                                                                                                         
15660 root      20   0 2928860   1.3g   4724 S  50.0   3.4   1:07.48 perl ipcrun_leak.pl  
[etc]
15660 root      20   0   14.3g  12.6g   4724 S  43.8  32.1   3:51.89 perl ipcrun_leak.pl  

The code causing the leak:-

#!/usr/bin/perl -w

use strict;
use IPC::Run qw( start pump finish timeout );

my ($fh, $got);
open($fh,'<','data.json.bz2') or die $!; # Put a huge (e.g. 2gb) .bz2-compressed file here
my @cmd=('bzip2','-d'); my($in,$out)=('','');
my $h = start \@cmd, \$in, \$out;
my($ol)=0;

do {
  my $buf; $got=sysread($fh, $buf, 1024);
  if(length($in) < $ol) { print "$ol " . length($in) . "\n"; } # Example showing where the bug is
  $ol=length($in);
  $in.=$buf; $h->pump();
  if(length($out)) {
    #$md5->add($out); # Process the decompressed stream
    $out='';
  }
} until($got<1024);
$h->finish();
if(length($out)) {
  #$md5->add($out); # Process the decompressed stream
  $out='';
}

Outputs:-

# perl   ipcrun_leak.pl
180224 0
579584 0
1741824 0
5782528 0
18942976 0
60907520 0
190749696 0
570285056 0

Example input file size:-

# dir data.json.bz2
-rwxr-xr-x. 1 root cnd 2052715775 Jan  7  2018 data.json.bz2
# 

Version:-

cpan[1]> i IPC::Run
Reading '/home/cnd/.local/share/.cpan/Metadata'
  Database was generated on Thu, 19 May 2022 13:17:03 GMT
Module id = IPC::Run
    CPAN_USERID  TODDR (Todd Rinaldo <toddr@cpan.org>)
    CPAN_VERSION 20200505.0
    CPAN_FILE    T/TO/TODDR/IPC-Run-20200505.0.tar.gz
    UPLOAD_DATE  2020-05-05
    MANPAGE      IPC::Run - system() and background procs w/ piping, redirs, ptys (Unix, Win32)
    INST_FILE    /usr/local/share/perl5/IPC/Run.pm
    INST_VERSION 20200505.0
@gitcnd
Copy link
Author

gitcnd commented May 20, 2022

The size issue is almost exactly 3 x (not double, like I said before).

i.e. Each time it's consumed X bytes, it flushes its buffer, and X=X*3 for next time...

Same problem using brotli instead of bzip2 (in case you're wondering)

#!/usr/bin/perl -w
  
use strict;
use IPC::Run qw( start pump finish timeout );

my ($fh, $got);
open($fh,'<','ticks-2021_12_04.json.br') or die $!; # Put a huge (e.g. 2gb) file here
my @cmd=('brotli','-d','-f'); my($in,$out)=('','');
my $h = start \@cmd, \$in, \$out;
my($ol,$l)=(0,0);
my $buffsz=102400;

do {
  my $buf; $got=sysread($fh, $buf, $buffsz);
  if(length($in) < $l) { print "$l " . length($in) . "\n"; } # Example showing where the bug is
  $in.=$buf; $h->pump();
  $ol=length($in); $l=$ol;
  while(length($out) || $ol) {
    $out='';
    $h->pump_nb(); # This consumes stuff - but doesn't stop the buffer tripling in size periodically
    if($ol==length($in)) { $ol=0; } else {$ol=length($in);}
  }
} until($got<$buffsz);
$h->finish();
if(length($out)) {
  $out='';
}

@gitcnd
Copy link
Author

gitcnd commented May 20, 2022

Digging further into this... I notice that using a buffer size of 310 or less bytes tends not to cause the problem

Using 320 or more bytes for the buffer, and the problem rapidly grows.

Strange, but true.

i.e. This works, albeit annoyingly slowly, owing to the tiny buffer...

#!/usr/bin/perl -w

use strict;
use IPC::Run qw( start pump finish timeout );

my ($fh, $got);
open($fh,'<','ticks.json.br') or die $!; 
my @cmd=('brotli','-d','-f'); my($in,$out)=('','');
#open($fh,'<','data.bz2') or die $!; # both brotli and bzip2 work with 310 or less byte buffers
#my @cmd=('bzip2','-d'); my($in,$out)=('','');
my $h = start \@cmd, \$in, \$out;
my($ol)=0; my $buffsz=310; my %seen;

do {
  my $buf; $got=sysread($fh, $buf, $buffsz);
  if(length($in) < $ol) { print "$ol " . length($in) . "\n" unless($seen{$ol}++); }
  $ol=length($in);
  $in.=$buf;
  my $more=0;
  do {
    $more=! $h->pump(); #$in='';
     warn length($in) . "\t" . length($out) if($more); # Never happens
    if(length($out)) {
      #$md5->add($out); # Process the decompressed stream
      #print $out;
      $out='';
    }
  } while($more); # pump
} until($got<$buffsz);
$h->finish();
if(length($out)) {
  #$md5->add($out); # Process the decompressed stream
  $out='';
}

# print $md5->hexdigest;
print " end.\n";
exit(0);

Output:-

# perl  ipcrun_leak0.pl
43400 0
2170 0
55800 0
3100 0
55490 0
3410 0
55180 0
54870 0
54560 0
2790 0
54250 0
53320 0
58590 0
58280 0
53940 0
57350 0
57970 0
57660 0
59520 0
57040 0
58900 0
52700 0
59210 0
56730 0
56110 0
53630 0
53010 0
56420 0
51150 0
52080 0
49910 0
52390 0
51770 0
51460 0
48360 0
2480 0
50530 0
50840 0
3720 0
48670 0
114390 0
117490 0
62310 0
113770 0
62000 0
50220 0
111910 0
117180 0
115320 0
115940 0
49290 0
41850 0
34410 0
1860 0
25110 0
1240 0
37820 0
36890 0
 end.

Memory:-

# echo '  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND ';for I in {0..999};do top -n 1 | grep ipcrun_leak;sleep 10;done
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND 
24826 root      20   0   41936  10688   4648 S  52.9   0.0   5:16.35 perl ipcrun_leak0.pl                                                                                                                                                                                                                                        
24826 root      20   0   41936  10688   4648 R  62.5   0.0   5:22.33 perl ipcrun_leak0.pl                                                                                                                                                                                                                                        
24826 root      20   0   41936  10688   4648 R  62.5   0.0   5:28.28 perl ipcrun_leak0.pl                                                                                                                                                                                                                                        
24826 root      20   0   41936  10688   4648 R  56.2   0.0   5:34.25 perl ipcrun_leak0.pl                                                                                                                                                                                                                                        
24826 root      20   0   41936  10688   4648 S  52.9   0.0   5:40.15 perl ipcrun_leak0.pl                                                                                                                                                                                                                                        
24826 root      20   0   41936  10688   4648 S  56.2   0.0   5:46.01 perl ipcrun_leak0.pl                                                                                                                                                                                                                                        
24826 root      20   0   41936  10688   4648 S  52.9   0.0   5:51.97 perl ipcrun_leak0.pl                                                                                                                                                                                                                                        
24826 root      20   0   41936  10688   4648 R  52.9   0.0   5:58.02 perl ipcrun_leak0.pl   

@gitcnd
Copy link
Author

gitcnd commented May 22, 2022

I spent all day trying to find and fix the bug, with no luck. I attempted to use IPC::Run3 as an alternative, but that has even more bugs, and doesn't seem to have any pump() mechanism, making it not very useful.

If anyone can give me some clues about where and/or why this buffer-size-tripling might be taking place, do please let me know!! The code is almost beyond comprehension to me - I only have 26 years perl experience...

@gitcnd
Copy link
Author

gitcnd commented May 22, 2022

OK - so another whole day on this problem, and I discovered this workaround. Basically - if IPC::Run does not consume the entire input you last gave it, you must back-off the amount of data you give it next time (to 240 bytes max) until pump() has finally consumed it all, then you can go back to sending larger buffers. 65536 seems to be the most it ever reads in one go - so use that.

Working example:-

#!/usr/bin/perl -w

use strict;
use IPC::Run qw( start pump finish timeout );

my ($fh, $got);
open($fh,'<','data.json.bz2') or die $!; # Put a huge (e.g. 2gb) .bz2-compressed file here
my @cmd=('bzip2','-d'); my($in,$out)=('','');
my $h = start \@cmd, \$in, \$out;
my($inl)=0; my($outl)=0; my ($oinl,$ooutl);  my $buffsz;

do {
  my $buf='';
  $buffsz = (length($in) > 0) ? 248 : 65536;
  $got=sysread($fh, $buf, $buffsz);

  $in.=$buf; $h->pump();
  $inl=length($in);

  do { # Flush as much in and out as we can, before sending any more.
    $h->pump_nb() if($inl);
    if(length($out)) { $out=''; } # Read and process the output here.
    ($oinl,$ooutl)=($inl,$outl);
    $inl=length($in); $outl=length($out);
  } while((($inl!=$oinl) || $outl!=$ooutl) && ( $inl>0 || $outl>0 ));

} until($got<$buffsz);
$h->finish();
if(length($out)) { $out=''; } # Finish processing the output

exit(0);

@gitcnd
Copy link
Author

gitcnd commented May 25, 2022

Two+ more days on this problem, again... the bug appears to be inside Run.pm in the _select_loop handler:

When the pipe is finish()'d it enters a loop which drains all the process output into the buffer "while ( $self->pumpable )"

That's wrong. e.g. A brotli-compressed logfile can easily return tens or hundreds of gigs at this point.

I'm not sure how to fix...

@mohawk2
Copy link
Collaborator

mohawk2 commented May 25, 2022

The docs show the example code assigning to (i.e. overwriting) the contents of $in, whereas you're appending to it. Is that really what you intend to do, i.e. keep sending the same input but with more stuff added to the end? Apologies if I am misunderstanding something, I am highly rusty on this module.

@gitcnd
Copy link
Author

gitcnd commented May 26, 2022

$in is not necessarily always consumed entirely in each "pump" (depends on the simplicity/triviality of the receiving process I expect, but for brotli and bzip2, and in-general all well-behaved code etc, you can't just overwrite $in)

The bug is in the finish() sub in Run.pm:-

    while ( $self->pumpable ) {
        $self->_select_loop($options); 
    }

It doesn't yield to the caller at any time until it has accumulated the entirety of the decompressed-output in RAM... which, for my brotli-compressed log files when you finally reach the end of the compressed input, is way more ram and swap than I've got...

@mohawk2
Copy link
Collaborator

mohawk2 commented May 26, 2022

Having stared at the various versions of your code for about 15 mins, I am somewhat stumped: all the abbreviated variable names may make perfect sense to you, but I am not going to write myself a lookup chart to constantly refer back to, to keep decoding what e.g. $ol or $inl mean. I'm a fan of short variable names if there's a couple, but this is too many for my small brain.

What you say about $in not always being fully consumed makes sense, thank you. It looks like you're:

  • reading a compressed .bz2 file a chunk at a time, sending it using IPC::Run to a decompressor, then in a loop:
    • reading output from that
    • then (quite rightly, for debugging purposes) throwing it away
  • stopping when no more input

I don't understand how there is room for any large amount of data to be pending in the above scenario, for finish to have to deal with. It does seem to me like finish is doing the right thing (keep processing everything) because that's what I would expect finish to do. I'm not saying there's no error here, but I am saying I don't see it yet.

@gitcnd
Copy link
Author

gitcnd commented May 29, 2022

@mohawk2 - compression programs consume input, and spit back expanded output - but it's not a "give one get one" relationship - you might give it data, and get nothing back (yet). When it comes to the end of the file, the decompressor, before it knows that no more input is coming, has not necessarily given you back all the expanded output it can - only when you finally close the input handle, does the decompressor know that no more data is coming in - and at this point, it finally returns the expansion of everything you gave it before.

That's where the bug is. It's an architectural error on the part of the module author - not something that can be trivially fixed by changing the code - the entire way the module works needs to be changed... The bug is that after you close the handle, it accumulates all the output into memory, because there's no way that the caller can continue to iterate over the (now closed, or half-closed) pump loop after performing the close.

I fixed this by writing my own code to replace the flawed IPC::Run version entirely - you can see my approach here: https://github.com/gitcnd/perl-IPC-Open3-example/blob/master/ipc_open3_example.pl

@mohawk2
Copy link
Collaborator

mohawk2 commented May 29, 2022

It sounds like this module's API needs a close_fh type of method? That would let you do that, yet keep pumping for output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants