-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
index segfaults on 2G strings in 64bit perl #13700
Comments
From mike.chamberlain@pirum.comCreated by mike.chamberlain@pirum.comThis is a bug report for perl from mike.chamberlain@pirum.com, ----------------------------------------------------------------- Hi We're attemping to parse a large file over 2G in size, and it's segfaulting. We are scanning chambm@wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' Regards Mike Perl Info
|
From @tonycozOn Wed Apr 02 02:18:05 2014, mike.chamberlain@pirum.com wrote:
Reproduced in blead: [tonyc@dromedary-001 perl]$ ./perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.10-34-g6447043)) built for x86_64-linux-thread-multi It looks like pp_index suffers from the I32 bug, assuming fbm_instr() and rninstr() are safe it should be easy to fix. Tony |
The RT System itself - Status changed from 'new' to 'open' |
From @tonycozOn Thu Apr 03 15:44:40 2014, tonyc wrote:
Fix attached, for 5.21, though perhaps it should be in 5.20. Tony |
From @tonycoz0001-perl-121562-fix-the-I32-bug-for-index-and-rindex.patchFrom eca5fe3ca0af1af58bcf4ae90fb24ede7c9d4c56 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 9 Apr 2014 06:15:08 +0200
Subject: [perl #121562] fix the I32 bug for index() and rindex()
---
MANIFEST | 1 +
pp.c | 6 +++---
t/bigmem/index.t | 26 ++++++++++++++++++++++++++
3 files changed, 30 insertions(+), 3 deletions(-)
create mode 100644 t/bigmem/index.t
diff --git a/MANIFEST b/MANIFEST
index 9652cd5..190315f 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -4920,6 +4920,7 @@ t/base/rs.t See if record-read works
t/base/term.t See if various terms work
t/base/while.t See if while work
t/benchmark/rt26188-speed-up-keys-on-empty-hash.t Benchmark if keys on empty hashes is fast enough
+t/bigmem/index.t Check that index() handles large offsets
t/bigmem/pos.t Check that pos() handles large offsets
t/bigmem/read.t Check read() handles large offsets
t/bigmem/regexp.t Test regular expressions with large strings
diff --git a/pp.c b/pp.c
index 4ec6887..071b4f0 100644
--- a/pp.c
+++ b/pp.c
@@ -3197,8 +3197,8 @@ PP(pp_index)
SV *temp = NULL;
STRLEN biglen;
STRLEN llen = 0;
- I32 offset;
- I32 retval;
+ SSize_t offset;
+ SSize_t retval;
const char *big_p;
const char *little_p;
bool big_utf8;
@@ -3287,7 +3287,7 @@ PP(pp_index)
}
if (offset < 0)
offset = 0;
- else if (offset > (I32)biglen)
+ else if (offset > (SSize_t)biglen)
offset = biglen;
if (!(little_p = is_index
? fbm_instr((unsigned char*)big_p + offset,
diff --git a/t/bigmem/index.t b/t/bigmem/index.t
new file mode 100644
index 0000000..0c3658c
--- /dev/null
+++ b/t/bigmem/index.t
@@ -0,0 +1,26 @@
+#!perl
+BEGIN {
+ chdir 't';
+ unshift @INC, "../lib";
+}
+
+use strict;
+require './test.pl';
+use Config qw(%Config);
+
+# some copying means we end up using 4GB, checked with top
+$ENV{PERL_TEST_MEMORY} >= 4
+ or skip_all("Need ~4GB for this test");
+$Config{ptrsize} >= 8
+ or skip_all("Need 64-bit pointers for this test");
+
+plan(tests => 2);
+
+my $space = " "; # avoid constant folding from doubling memory usage
+my $work = $space x 0x80000000 . "\n\n";
+
+# this would SEGV
+is(index($work, "\n"), 0x80000000, "test index() over 2G mark");
+
+# this would simply fail
+is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark");
--
1.7.1
|
From @ilmari"Tony Cook via RT" <perlbug-followup@perl.org> writes:
It needs to be switched to use sv_pos_(u2b|b2u)_flags as well, to handle pp.c: In function ‘Perl_pp_index’: And adding utf8::upgrade($work) to t/bigmem/index.t, gives: ilmari@nurket:~/src/perl/t$ PERL_TEST_MEMORY=4 ../perl -I../lib bigmem/index.t -- |
From @karenetheridgeOn Thu, Apr 03, 2014 at 03:44:41PM -0700, Tony Cook via RT wrote:
FWIW, I cannot reproduce this on 64-bit darwin: $; perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' $; perl -v |
From @shlomifOn Wed Apr 09 17:32:07 2014, perl@froods.org wrote:
For what it’s worth, I can reproduce this on Mageia Linux x86-64 5/Cauldron with both the Mageia perl-5.18.2-4.mga5 ( /usr/bin/perl ) and bleadperl: [SHELL] This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.9-308-g935db47*)) built for x86_64-linux Copyright 1987-2014, Larry Wall Perl may be copied only under the terms of either the Artistic License or the Complete documentation for Perl, including FAQ lists, should be found on shlomif@telaviv1:~$ perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-thread-multi Copyright 1987-2013, Larry Wall Perl may be copied only under the terms of either the Artistic License or the Complete documentation for Perl, including FAQ lists, should be found on shlomif@telaviv1:~$
Does valgrind complain about something when doing that? Regards, -- Shlomi Fish |
From @tonycozOn Wed Apr 09 17:02:58 2014, ilmari wrote:
Thanks, I looked for warnings when I built it, but must have missed them.
I should have tested the unicode path too. Tony |
From @rjbs* Tony Cook via RT <perlbug-followup@perl.org> [2014-04-09T00:30:36]
Putting aside the subsequent amendment to the patch, do we want this in 5.20? -- |
From @tonycozOn Mon Apr 14 04:22:05 2014, tonyc wrote:
...
Here's a new patch. Tony |
From @tonycoz0001-perl-121562-fix-the-I32-bug-for-index-and-rindex.patchFrom ab05702adb71d31562278ae9c3946d757c09d534 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Tue, 15 Apr 2014 03:57:57 +0200
Subject: [PATCH] [perl #121562] fix the I32 bug for index() and rindex()
---
MANIFEST | 1 +
pp.c | 10 +++++-----
t/bigmem/index.t | 37 +++++++++++++++++++++++++++++++++++++
3 files changed, 43 insertions(+), 5 deletions(-)
create mode 100644 t/bigmem/index.t
diff --git a/MANIFEST b/MANIFEST
index 9652cd5..190315f 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -4920,6 +4920,7 @@ t/base/rs.t See if record-read works
t/base/term.t See if various terms work
t/base/while.t See if while work
t/benchmark/rt26188-speed-up-keys-on-empty-hash.t Benchmark if keys on empty hashes is fast enough
+t/bigmem/index.t Check that index() handles large offsets
t/bigmem/pos.t Check that pos() handles large offsets
t/bigmem/read.t Check read() handles large offsets
t/bigmem/regexp.t Test regular expressions with large strings
diff --git a/pp.c b/pp.c
index 4ec6887..04c1f29 100644
--- a/pp.c
+++ b/pp.c
@@ -3197,8 +3197,8 @@ PP(pp_index)
SV *temp = NULL;
STRLEN biglen;
STRLEN llen = 0;
- I32 offset;
- I32 retval;
+ SSize_t offset = 0;
+ SSize_t retval;
const char *big_p;
const char *little_p;
bool big_utf8;
@@ -3281,13 +3281,13 @@ PP(pp_index)
offset = is_index ? 0 : biglen;
else {
if (big_utf8 && offset > 0)
- sv_pos_u2b(big, &offset, 0);
+ offset = sv_pos_u2b_flags(big, offset, 0, SV_CONST_RETURN);
if (!is_index)
offset += llen;
}
if (offset < 0)
offset = 0;
- else if (offset > (I32)biglen)
+ else if (offset > (SSize_t)biglen)
offset = biglen;
if (!(little_p = is_index
? fbm_instr((unsigned char*)big_p + offset,
@@ -3298,7 +3298,7 @@ PP(pp_index)
else {
retval = little_p - big_p;
if (retval > 0 && big_utf8)
- sv_pos_b2u(big, &retval);
+ retval = sv_pos_b2u_flags(big, retval, SV_CONST_RETURN);
}
SvREFCNT_dec(temp);
fail:
diff --git a/t/bigmem/index.t b/t/bigmem/index.t
new file mode 100644
index 0000000..fdd502c
--- /dev/null
+++ b/t/bigmem/index.t
@@ -0,0 +1,37 @@
+#!perl
+BEGIN {
+ chdir 't';
+ unshift @INC, "../lib";
+}
+
+use strict;
+require './test.pl';
+use Config qw(%Config);
+
+# memory usage checked with top
+$ENV{PERL_TEST_MEMORY} >= 2
+ or skip_all("Need ~2GB for this test");
+$Config{ptrsize} >= 8
+ or skip_all("Need 64-bit pointers for this test");
+
+plan(tests => 4);
+
+my $space = " "; # avoid constant folding from doubling memory usage
+# concatenation here increases memory usage significantly
+my $work = $space x 0x80000002;
+substr($work, 0x80000000) = "\n\n";
+
+# this would SEGV
+is(index($work, "\n"), 0x80000000, "test index() over 2G mark");
+
+# this would simply fail
+is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark");
+
+utf8::upgrade($work);
+
+# this would SEGV
+is(index($work, "\n"), 0x80000000, "test index() over 2G mark (utf8-ish)");
+
+# this would simply fail
+is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark (utf8-ish)");
+
--
1.7.1
|
@tonycoz - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#121562 (status was 'resolved')
Searchable as RT121562$
The text was updated successfully, but these errors were encountered: