index segfaults on 2G strings in 64bit perl #13700
Comments
From mike.chamberlain@pirum.comCreated by mike.chamberlain@pirum.comThis is a bug report for perl from mike.chamberlain@pirum.com, ----------------------------------------------------------------- Hi We're attemping to parse a large file over 2G in size, and it's segfaulting. We are scanning chambm@wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' Regards Mike Perl Info
|
From @tonycozOn Wed Apr 02 02:18:05 2014, mike.chamberlain@pirum.com wrote:
Reproduced in blead: [tonyc@dromedary-001 perl]$ ./perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.10-34-g6447043)) built for x86_64-linux-thread-multi It looks like pp_index suffers from the I32 bug, assuming fbm_instr() and rninstr() are safe it should be easy to fix. Tony |
The RT System itself - Status changed from 'new' to 'open' |
From @tonycozOn Thu Apr 03 15:44:40 2014, tonyc wrote:
Fix attached, for 5.21, though perhaps it should be in 5.20. Tony |
From @tonycoz0001-perl-121562-fix-the-I32-bug-for-index-and-rindex.patchFrom eca5fe3ca0af1af58bcf4ae90fb24ede7c9d4c56 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 9 Apr 2014 06:15:08 +0200
Subject: [perl #121562] fix the I32 bug for index() and rindex()
---
MANIFEST | 1 +
pp.c | 6 +++---
t/bigmem/index.t | 26 ++++++++++++++++++++++++++
3 files changed, 30 insertions(+), 3 deletions(-)
create mode 100644 t/bigmem/index.t
diff --git a/MANIFEST b/MANIFEST
index 9652cd5..190315f 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -4920,6 +4920,7 @@ t/base/rs.t See if record-read works
t/base/term.t See if various terms work
t/base/while.t See if while work
t/benchmark/rt26188-speed-up-keys-on-empty-hash.t Benchmark if keys on empty hashes is fast enough
+t/bigmem/index.t Check that index() handles large offsets
t/bigmem/pos.t Check that pos() handles large offsets
t/bigmem/read.t Check read() handles large offsets
t/bigmem/regexp.t Test regular expressions with large strings
diff --git a/pp.c b/pp.c
index 4ec6887..071b4f0 100644
--- a/pp.c
+++ b/pp.c
@@ -3197,8 +3197,8 @@ PP(pp_index)
SV *temp = NULL;
STRLEN biglen;
STRLEN llen = 0;
- I32 offset;
- I32 retval;
+ SSize_t offset;
+ SSize_t retval;
const char *big_p;
const char *little_p;
bool big_utf8;
@@ -3287,7 +3287,7 @@ PP(pp_index)
}
if (offset < 0)
offset = 0;
- else if (offset > (I32)biglen)
+ else if (offset > (SSize_t)biglen)
offset = biglen;
if (!(little_p = is_index
? fbm_instr((unsigned char*)big_p + offset,
diff --git a/t/bigmem/index.t b/t/bigmem/index.t
new file mode 100644
index 0000000..0c3658c
--- /dev/null
+++ b/t/bigmem/index.t
@@ -0,0 +1,26 @@
+#!perl
+BEGIN {
+ chdir 't';
+ unshift @INC, "../lib";
+}
+
+use strict;
+require './test.pl';
+use Config qw(%Config);
+
+# some copying means we end up using 4GB, checked with top
+$ENV{PERL_TEST_MEMORY} >= 4
+ or skip_all("Need ~4GB for this test");
+$Config{ptrsize} >= 8
+ or skip_all("Need 64-bit pointers for this test");
+
+plan(tests => 2);
+
+my $space = " "; # avoid constant folding from doubling memory usage
+my $work = $space x 0x80000000 . "\n\n";
+
+# this would SEGV
+is(index($work, "\n"), 0x80000000, "test index() over 2G mark");
+
+# this would simply fail
+is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark");
--
1.7.1
|
From @ilmari"Tony Cook via RT" <perlbug-followup@perl.org> writes:
It needs to be switched to use sv_pos_(u2b|b2u)_flags as well, to handle pp.c: In function ‘Perl_pp_index’: And adding utf8::upgrade($work) to t/bigmem/index.t, gives: ilmari@nurket:~/src/perl/t$ PERL_TEST_MEMORY=4 ../perl -I../lib bigmem/index.t -- |
From @karenetheridgeOn Thu, Apr 03, 2014 at 03:44:41PM -0700, Tony Cook via RT wrote:
FWIW, I cannot reproduce this on 64-bit darwin: $; perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' $; perl -v |
From @shlomifOn Wed Apr 09 17:32:07 2014, perl@froods.org wrote:
For what it’s worth, I can reproduce this on Mageia Linux x86-64 5/Cauldron with both the Mageia perl-5.18.2-4.mga5 ( /usr/bin/perl ) and bleadperl: [SHELL] This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.9-308-g935db47*)) built for x86_64-linux Copyright 1987-2014, Larry Wall Perl may be copied only under the terms of either the Artistic License or the Complete documentation for Perl, including FAQ lists, should be found on shlomif@telaviv1:~$ perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-thread-multi Copyright 1987-2013, Larry Wall Perl may be copied only under the terms of either the Artistic License or the Complete documentation for Perl, including FAQ lists, should be found on shlomif@telaviv1:~$
Does valgrind complain about something when doing that? Regards, -- Shlomi Fish |
From @tonycozOn Wed Apr 09 17:02:58 2014, ilmari wrote:
Thanks, I looked for warnings when I built it, but must have missed them.
I should have tested the unicode path too. Tony |
From @rjbs* Tony Cook via RT <perlbug-followup@perl.org> [2014-04-09T00:30:36]
Putting aside the subsequent amendment to the patch, do we want this in 5.20? -- |
From @tonycozOn Mon Apr 14 04:22:05 2014, tonyc wrote:
...
Here's a new patch. Tony |
From @tonycoz0001-perl-121562-fix-the-I32-bug-for-index-and-rindex.patchFrom ab05702adb71d31562278ae9c3946d757c09d534 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Tue, 15 Apr 2014 03:57:57 +0200
Subject: [PATCH] [perl #121562] fix the I32 bug for index() and rindex()
---
MANIFEST | 1 +
pp.c | 10 +++++-----
t/bigmem/index.t | 37 +++++++++++++++++++++++++++++++++++++
3 files changed, 43 insertions(+), 5 deletions(-)
create mode 100644 t/bigmem/index.t
diff --git a/MANIFEST b/MANIFEST
index 9652cd5..190315f 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -4920,6 +4920,7 @@ t/base/rs.t See if record-read works
t/base/term.t See if various terms work
t/base/while.t See if while work
t/benchmark/rt26188-speed-up-keys-on-empty-hash.t Benchmark if keys on empty hashes is fast enough
+t/bigmem/index.t Check that index() handles large offsets
t/bigmem/pos.t Check that pos() handles large offsets
t/bigmem/read.t Check read() handles large offsets
t/bigmem/regexp.t Test regular expressions with large strings
diff --git a/pp.c b/pp.c
index 4ec6887..04c1f29 100644
--- a/pp.c
+++ b/pp.c
@@ -3197,8 +3197,8 @@ PP(pp_index)
SV *temp = NULL;
STRLEN biglen;
STRLEN llen = 0;
- I32 offset;
- I32 retval;
+ SSize_t offset = 0;
+ SSize_t retval;
const char *big_p;
const char *little_p;
bool big_utf8;
@@ -3281,13 +3281,13 @@ PP(pp_index)
offset = is_index ? 0 : biglen;
else {
if (big_utf8 && offset > 0)
- sv_pos_u2b(big, &offset, 0);
+ offset = sv_pos_u2b_flags(big, offset, 0, SV_CONST_RETURN);
if (!is_index)
offset += llen;
}
if (offset < 0)
offset = 0;
- else if (offset > (I32)biglen)
+ else if (offset > (SSize_t)biglen)
offset = biglen;
if (!(little_p = is_index
? fbm_instr((unsigned char*)big_p + offset,
@@ -3298,7 +3298,7 @@ PP(pp_index)
else {
retval = little_p - big_p;
if (retval > 0 && big_utf8)
- sv_pos_b2u(big, &retval);
+ retval = sv_pos_b2u_flags(big, retval, SV_CONST_RETURN);
}
SvREFCNT_dec(temp);
fail:
diff --git a/t/bigmem/index.t b/t/bigmem/index.t
new file mode 100644
index 0000000..fdd502c
--- /dev/null
+++ b/t/bigmem/index.t
@@ -0,0 +1,37 @@
+#!perl
+BEGIN {
+ chdir 't';
+ unshift @INC, "../lib";
+}
+
+use strict;
+require './test.pl';
+use Config qw(%Config);
+
+# memory usage checked with top
+$ENV{PERL_TEST_MEMORY} >= 2
+ or skip_all("Need ~2GB for this test");
+$Config{ptrsize} >= 8
+ or skip_all("Need 64-bit pointers for this test");
+
+plan(tests => 4);
+
+my $space = " "; # avoid constant folding from doubling memory usage
+# concatenation here increases memory usage significantly
+my $work = $space x 0x80000002;
+substr($work, 0x80000000) = "\n\n";
+
+# this would SEGV
+is(index($work, "\n"), 0x80000000, "test index() over 2G mark");
+
+# this would simply fail
+is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark");
+
+utf8::upgrade($work);
+
+# this would SEGV
+is(index($work, "\n"), 0x80000000, "test index() over 2G mark (utf8-ish)");
+
+# this would simply fail
+is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark (utf8-ish)");
+
--
1.7.1
|
@tonycoz - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#121562 (status was 'resolved')
Searchable as RT121562$
The text was updated successfully, but these errors were encountered: