-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CVE-2018-6798] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c) #16143
Comments
From imdb95@gmail.comGreetings, **********Build Date & Hardware********** This is perl 5, version 27, subversion 4 (v5.27.4 (v5.27.3-14-gd2dccc0)) Copyright 1987-2017, Larry Wall Perl may be copied only under the terms of either the Artistic License or Complete documentation for Perl, including FAQ lists, should be found on OS: Ubuntu 16.04 Desktop Compilation: **********Reproduce**********
|
From @khwilliamsonI looked at this enough so I believe I understand the cause. I grepped blead for other possible problems with this, or its synonym UTF8_MAXBYTES. Fortunately there aren't many. I haven't investigated the attached results to see whether the others are problematic or not. But these certainly are. |
From @khwilliamson |
From @khwilliamsonI looked at this enough so I believe I understand the cause. I grepped blead for other possible problems with this, or its synonym UTF8_MAXBYTES. Fortunately there aren't many. I haven't investigated the attached results to see whether the others are problematic or not. But these certainly are. |
1 similar comment
From @khwilliamsonI looked at this enough so I believe I understand the cause. I grepped blead for other possible problems with this, or its synonym UTF8_MAXBYTES. Fortunately there aren't many. I haven't investigated the attached results to see whether the others are problematic or not. But these certainly are. |
From @khwilliamson |
@khwilliamson - Status changed from 'new' to 'open' |
From @khwilliamsonOn 09/26/2017 10:28 PM, Karl Williamson via RT wrote:
By "these", I meant the ones in regexec.c |
From @khwilliamsonOn 09/26/2017 11:03 PM, Karl Williamson wrote:
This should be fixed by the attached patch. The consequences of this bug are that if a perl program were to allow I don't know if this is a real security issue or not. I tried the test without using /l as a pattern modifier, and it did not Tell me how to proceed. |
From @khwilliamson0002-PATCH-perl-132063-Heap-buffer-overflow.patchFrom 4c9e25a6146d63470ac98a35766bbf4f4b6db459 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sun, 21 Jan 2018 17:48:22 -0700
Subject: [PATCH 2/2] PATCH: [perl #132063]: Heap buffer overflow
This was caused by several instances in regexec.c of the code assuming
that the input was valid UTF-8, whereas the input was too short for what
the start byte claimed it would be.
This commit changes to use the actual length available.
I grepped through the core for any other similar uses, and did not find
any.
---
regexec.c | 47 ++++++++++++++++++++++++++++-------------------
t/lib/warnings/regexec | 7 +++++++
2 files changed, 35 insertions(+), 19 deletions(-)
diff --git a/regexec.c b/regexec.c
index 9ee7e6bab5..0cdaa49487 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1626,7 +1626,13 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_fold \
: trie_latin_utf8_fold)))
-#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
+/* Convenience, because 'e' is easier to use than a size */
+#define utf8_to_uvchr_buf_flags(s, e, lenp, flags) \
+ utf8n_to_uvchr(s, (U8*)(e) - (U8*)(s), lenp, flags)
+
+/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is
+ * 'foldbuf+sizeof(foldbuf)' */
+#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uc_end, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
STMT_START { \
STRLEN skiplen; \
U8 flags = FOLD_FLAGS_FULL; \
@@ -1634,7 +1640,7 @@ STMT_START {
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
- _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \
+ _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end); \
} \
goto do_trie_utf8_fold; \
case trie_utf8_exactfa_fold: \
@@ -1643,14 +1649,14 @@ STMT_START {
case trie_utf8_fold: \
do_trie_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8_to_uvchr_buf_flags( (const U8*) uscan, foldbuf + sizeof(foldbuf), &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
} else { \
- len = UTF8SKIP(uc); \
- uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, foldbuf, &foldlen, \
+ uvc = _toFOLD_utf8_flags( (const U8*) uc, uc_end, foldbuf, &foldlen, \
flags); \
+ len = UTF8SKIP(uc); \
skiplen = UVCHR_SKIP( uvc ); \
foldlen -= skiplen; \
uscan = foldbuf + skiplen; \
@@ -1661,7 +1667,7 @@ STMT_START {
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8_to_uvchr_buf_flags( (const U8*) uscan, foldbuf + sizeof(foldbuf), &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
@@ -1676,11 +1682,11 @@ STMT_START {
case trie_utf8l: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
- _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \
+ _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end); \
} \
/* FALLTHROUGH */ \
case trie_utf8: \
- uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8_to_uvchr_buf_flags( (const U8*) uc, uc_end, &len, uniflags ); \
break; \
case trie_plain: \
uvc = (UV)*uc; \
@@ -1821,7 +1827,7 @@ REXEC_FBC_SCAN( /* Loops while (s < strend) */ \
} \
else { /* Back-up to the start of the previous character */ \
U8 * const r = reghop3((U8*)s, -1, (U8*)reginfo->strbeg); \
- tmp = utf8n_to_uvchr(r, (U8*) reginfo->strend - r, \
+ tmp = utf8_to_uvchr_buf_flags(r, (U8*) reginfo->strend, \
0, UTF8_ALLOW_DEFAULT); \
} \
tmp = TEST_UV(tmp); \
@@ -2785,10 +2791,10 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
}
points[pointpos++ % maxlen]= uc;
if (foldlen || uc < (U8*)strend) {
- REXEC_TRIE_READ_CHAR(trie_type, trie,
- widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
+ REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
+ (U8 *) strend, uscan, len, uvc,
+ charid, foldlen, foldbuf,
+ uniflags);
DEBUG_TRIE_EXECUTE_r({
dump_exec_pos( (char *)uc, c, strend,
real_start, s, utf8_target, 0);
@@ -5847,9 +5853,10 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
if ( base && (foldlen || uc < (U8*)(reginfo->strend))) {
I32 offset;
REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
- charcount++;
+ (U8 *) reginfo->strend, uscan,
+ len, uvc, charid, foldlen,
+ foldbuf, uniflags);
+ charcount++;
if (foldlen>0)
ST.longfold = TRUE;
if (charid &&
@@ -5983,8 +5990,10 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
while (foldlen) {
if (!--chars)
break;
- uvc = utf8n_to_uvchr(uscan, UTF8_MAXLEN, &len,
- uniflags);
+ uvc = utf8_to_uvchr_buf_flags(uscan,
+ foldbuf + sizeof(foldbuf),
+ &len,
+ uniflags);
uscan += len;
foldlen -= len;
}
@@ -9559,7 +9568,7 @@ S_reginclass(pTHX_ regexp * const prog, const regnode * const n, const U8* const
if (! UTF8_IS_INVARIANT(c) && utf8_target) {
STRLEN c_len = 0;
const U32 utf8n_flags = UTF8_ALLOW_DEFAULT;
- c = utf8n_to_uvchr(p, p_end - p, &c_len, utf8n_flags | UTF8_CHECK_ONLY);
+ c = utf8_to_uvchr_buf_flags(p, p_end, &c_len, utf8n_flags | UTF8_CHECK_ONLY);
if (c_len == (STRLEN)-1) {
_force_out_malformed_utf8_message(p, p_end,
utf8n_flags,
diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec
index 900dd6ee7f..6635142dea 100644
--- a/t/lib/warnings/regexec
+++ b/t/lib/warnings/regexec
@@ -260,3 +260,10 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale);
"k" =~ /(?[ \N{KELVIN SIGN} ])/i;
":" =~ /(?[ \: ])/;
EXPECT
+########
+# NAME perl #132063, read beyond buffer end
+# OPTION fatal
+"\xff" =~ /(?il)\x{100}|\x{100}/;
+EXPECT
+Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2.
+Malformed UTF-8 character (fatal) at - line 2.
--
2.11.0
|
From @tonycozOn Tue, 26 Sep 2017 21:28:51 -0700, khw wrote:
./pp_pack.c:2585: if (utf8) end -= UTF8_MAXLEN-1; These can both lead to end pointing before the beginning of the allocated PV for the W and U pack formats. This is classified as undefined behaviour by the C standard. The end pointer is only used for comparisons to ensure that enough space is allocated when adding new characters to the buffer, with the small offset (UTF8_MAXLEN) the pointer is unlikely to wrap on modern systems. There's no memory accesses through the end pointers, so while I think the pack() issues are bugs, I don't think they're security issues.
It sounds like a security issue to me. Tony |
From @tonycozOn Wed, 24 Jan 2018 20:26:09 -0800, tonyc wrote:
Karl asked me in IRC what needs to be done with security patches. I believe we need to do the following (not necessarily in this order). 1) produce a fix (Karl's done this), including fixes for currently supported maint branches. These should not be pushed to the public repository unless the issue is public. Hopefully you'll get votes from the people monitoring the security list for the change to be included in maint. 2) Allocate a CVE number, this is done with the form at: If the issue is not public yet then the description should be vague and the rest of form should not contain the details of the problem. You might want to think about those details for when we update the CVE. I can do this if you can get me to understand the issue. 3) Notify downstreams, with the fixes. Last time Sawyer did this. 4) Deal with queries from downstreams, it's more than likely they'll want to know: a) whether the issue applies to the unsupported maint releases they ship, and if it does b) whether the fixes work for those older maints and possibly platform issues. 5) Pick a publication date in consultation with downstreams. On publication date: 6) push the fix to blead, maint 7) Update the CVE entry with the details of the issue (if the issue wasn't public.) 8) Once downstreams are ready, make the ticket public. Tony |
From @khwilliamsonOn 01/31/2018 08:30 PM, Tony Cook via RT wrote:
So is this a real security problem? The original code is '$x="(?il)\x{100}|\x{100}"; "\xff" =~ /$x/;' The trick here is that the pattern has to compile as a trie. It doesn't It turns out that tries are vulnerable, because they just assume there
|
From @khwilliamsonOn 01/24/2018 09:26 PM, Tony Cook via RT wrote:
Hmm. I missed this email until just now. My analysis is slightly |
From @tonycozOn Wed, Jan 31, 2018 at 09:00:09PM -0700, Karl Williamson wrote:
There's two potential attacks that I can see with this bug: 1) the regexp engine can access beyond the end of a malloced block. 2) the error message may reveal sensitive information, eg. if the This all assumes that the regexp engine is intended to work with Tony |
From @xsawyerxOn 1 February 2018 at 04:30, Tony Cook via RT
Thanks, Tony.
Yes. I handle that.
This happens sometimes and when it does, I normally contact the person
This is usually about two weeks, a month, or far longer. Vendors
These can be done prior to a release, too, to avoid the fix being |
From @demerphqOn 27 September 2017 at 06:28, Karl Williamson via RT
I have reviewed this ticket and I disagree with Karls analysis. U8 foldbuf[ UTF8_MAXBYTES_CASE + 1 ]; Anyway, when we use UTF8_MAXBYTES in uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); if you check you will find that uscan points into foldbuf[] mentioned uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, flags); \ would have to construct a malformed utf8 sequence inside of foldbuf. What is actually is happening is that the logic to choose what What is needed is the patch I am attaching here for blead. This affects v5.22 and later, so we need patches for 22, 24, and 26, Me personally, if i had found this without the security report I I will let you guys figure this out. All i know is that m//li is not Yves -- |
From @demerphqblead_fix_132063.patchcommit d94aa431d493f1e6a8ad4ad335bad04f7ceccc82
Author: Yves Orton <demerphq@gmail.com>
Date: Fri Feb 2 21:29:21 2018 +0100
fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8 target
diff --git a/regexec.c b/regexec.c
index 08bf713..2b54a7d 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1805,7 +1805,7 @@ Perl_re_intuit_start(pTHX_
#define DECL_TRIE_TYPE(scan) \
const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold, \
trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold, \
- trie_utf8l, trie_flu8 } \
+ trie_utf8l, trie_flu8, trie_flu8_latin } \
trie_type = ((scan->flags == EXACT) \
? (utf8_target ? trie_utf8 : trie_plain) \
: (scan->flags == EXACTL) \
@@ -1815,10 +1815,12 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_exactfa_fold \
: trie_latin_utf8_exactfa_fold) \
: (scan->flags == EXACTFLU8 \
- ? trie_flu8 \
+ ? (utf8_target \
+ ? trie_flu8 \
+ : trie_flu8_latin) \
: (utf8_target \
? trie_utf8_fold \
- : trie_latin_utf8_fold)))
+ : trie_latin_utf8_fold)))
#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
STMT_START { \
@@ -1827,7 +1829,7 @@ STMT_START {
switch (trie_type) { \
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
- if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
+ if (UTF8_IS_ABOVE_LATIN1(*uc)) { \
_CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \
} \
goto do_trie_utf8_fold; \
@@ -1850,10 +1852,14 @@ STMT_START {
uscan = foldbuf + skiplen; \
} \
break; \
+ case trie_flu8_latin: \
+ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
+ goto do_trie_latin_utf8_fold; \
case trie_latin_utf8_exactfa_fold: \
flags |= FOLD_FLAGS_NOMIX_ASCII; \
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
+ do_trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
foldlen -= len; \
diff --git a/t/re/re_tests b/t/re/re_tests
index 61b8c87..3db034c 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1989,7 +1989,7 @@ AB\s+\x{100} AB \x{100}X y - -
([[:ascii:]]+)\x81 a\x80b\x81 y $& b\x81
[[:^ascii:]]+b \x80a\x81b y $& \x81b
[[:^ascii:]]+b \x80a\x81\x{100}b y $& \x81\x{100}b
-
+(?il)\x{100}|\x{100}|\x{FF} \xFF y $& \xFF
# Keep these lines at the end of the file
# vim: softtabstop=0 noexpandtab
|
From @demerphqv5224_fix_132063.patchcommit 02920531c2f12d3dcbbf885a4203b6dd4662dcb6
Author: Yves Orton <demerphq@gmail.com>
Date: Fri Feb 2 21:29:21 2018 +0100
v5.22.4: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8 target
diff --git a/regexec.c b/regexec.c
index b7335ae..25d4aaa 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1451,7 +1451,7 @@ Perl_re_intuit_start(pTHX_
#define DECL_TRIE_TYPE(scan) \
const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold, \
trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold, \
- trie_utf8l, trie_flu8 } \
+ trie_utf8l, trie_flu8, trie_flu8_latin } \
trie_type = ((scan->flags == EXACT) \
? (utf8_target ? trie_utf8 : trie_plain) \
: (scan->flags == EXACTL) \
@@ -1461,10 +1461,12 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_exactfa_fold \
: trie_latin_utf8_exactfa_fold) \
: (scan->flags == EXACTFLU8 \
- ? trie_flu8 \
+ ? (utf8_target \
+ ? trie_flu8 \
+ : trie_flu8_latin) \
: (utf8_target \
? trie_utf8_fold \
- : trie_latin_utf8_fold)))
+ : trie_latin_utf8_fold)))
#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
STMT_START { \
@@ -1473,7 +1475,7 @@ STMT_START {
switch (trie_type) { \
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
- if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
+ if (UTF8_IS_ABOVE_LATIN1(*uc)) { \
_CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \
} \
goto do_trie_utf8_fold; \
@@ -1495,10 +1497,14 @@ STMT_START {
uscan = foldbuf + skiplen; \
} \
break; \
+ case trie_flu8_latin: \
+ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
+ goto do_trie_latin_utf8_fold; \
case trie_latin_utf8_exactfa_fold: \
flags |= FOLD_FLAGS_NOMIX_ASCII; \
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
+ do_trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
foldlen -= len; \
diff --git a/t/re/re_tests b/t/re/re_tests
index 85ce7f4..bac789e 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1941,6 +1941,7 @@ A+(*PRUNE)BC(?{}) AAABC y $& AAABC
.{1}?? - c - Nested quantifiers
.{1}?+ - c - Nested quantifiers
(?:.||)(?|)000000000@ 000000000@ y $& 000000000@ # [perl #126405]
+(?il)\x{100}|\x{100}|\x{FF} \xFF y $& \xFF
# Keep these lines at the end of the file
# vim: softtabstop=0 noexpandtab
|
From @demerphqv5243_fix_132063.patchcommit 6e4801235e12ad2e5f742955f7270f034789b34b
Author: Yves Orton <demerphq@gmail.com>
Date: Fri Feb 2 21:29:21 2018 +0100
v5.24.3: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8 target
diff --git a/regexec.c b/regexec.c
index 5735b99..2eef483 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1451,7 +1451,7 @@ Perl_re_intuit_start(pTHX_
#define DECL_TRIE_TYPE(scan) \
const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold, \
trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold, \
- trie_utf8l, trie_flu8 } \
+ trie_utf8l, trie_flu8, trie_flu8_latin } \
trie_type = ((scan->flags == EXACT) \
? (utf8_target ? trie_utf8 : trie_plain) \
: (scan->flags == EXACTL) \
@@ -1461,10 +1461,12 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_exactfa_fold \
: trie_latin_utf8_exactfa_fold) \
: (scan->flags == EXACTFLU8 \
- ? trie_flu8 \
+ ? (utf8_target \
+ ? trie_flu8 \
+ : trie_flu8_latin) \
: (utf8_target \
? trie_utf8_fold \
- : trie_latin_utf8_fold)))
+ : trie_latin_utf8_fold)))
#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
STMT_START { \
@@ -1473,7 +1475,7 @@ STMT_START {
switch (trie_type) { \
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
- if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
+ if (UTF8_IS_ABOVE_LATIN1(*uc)) { \
_CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \
} \
goto do_trie_utf8_fold; \
@@ -1495,10 +1497,14 @@ STMT_START {
uscan = foldbuf + skiplen; \
} \
break; \
+ case trie_flu8_latin: \
+ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
+ goto do_trie_latin_utf8_fold; \
case trie_latin_utf8_exactfa_fold: \
flags |= FOLD_FLAGS_NOMIX_ASCII; \
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
+ do_trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
foldlen -= len; \
diff --git a/t/re/re_tests b/t/re/re_tests
index 7e8522d..ab7ddbb 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1968,6 +1968,7 @@ ab(?#Comment){2}c abbc y $& abbc
(?:.||)(?|)000000000@ 000000000@ y $& 000000000@ # [perl #126405]
aa$|a(?R)a|a aaa y $& aaa # [perl 128420] recursive matches
(?:\1|a)([bcd])\1(?:(?R)|e)\1 abbaccaddedcb y $& abbaccaddedcb # [perl 128420] recursive match with backreferences
+(?il)\x{100}|\x{100}|\x{FF} \xFF y $& \xFF
# Keep these lines at the end of the file
# vim: softtabstop=0 noexpandtab
|
From @demerphqv5261_fix_132063.patchcommit 80853ef9d835d1f24f3913a0b78eb3b5ca71e34f
Author: Yves Orton <demerphq@gmail.com>
Date: Fri Feb 2 21:29:21 2018 +0100
5.26.1: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8 target
diff --git a/regexec.c b/regexec.c
index 134b196..9b8581f 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1472,7 +1472,7 @@ Perl_re_intuit_start(pTHX_
#define DECL_TRIE_TYPE(scan) \
const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold, \
trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold, \
- trie_utf8l, trie_flu8 } \
+ trie_utf8l, trie_flu8, trie_flu8_latin } \
trie_type = ((scan->flags == EXACT) \
? (utf8_target ? trie_utf8 : trie_plain) \
: (scan->flags == EXACTL) \
@@ -1482,10 +1482,12 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_exactfa_fold \
: trie_latin_utf8_exactfa_fold) \
: (scan->flags == EXACTFLU8 \
- ? trie_flu8 \
+ ? (utf8_target \
+ ? trie_flu8 \
+ : trie_flu8_latin) \
: (utf8_target \
? trie_utf8_fold \
- : trie_latin_utf8_fold)))
+ : trie_latin_utf8_fold)))
#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
STMT_START { \
@@ -1494,7 +1496,7 @@ STMT_START {
switch (trie_type) { \
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
- if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
+ if (UTF8_IS_ABOVE_LATIN1(*uc)) { \
_CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \
} \
goto do_trie_utf8_fold; \
@@ -1517,10 +1519,14 @@ STMT_START {
uscan = foldbuf + skiplen; \
} \
break; \
+ case trie_flu8_latin: \
+ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
+ goto do_trie_latin_utf8_fold; \
case trie_latin_utf8_exactfa_fold: \
flags |= FOLD_FLAGS_NOMIX_ASCII; \
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
+ do_trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
foldlen -= len; \
diff --git a/t/re/re_tests b/t/re/re_tests
index 410fcea..78baed6 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1985,6 +1985,7 @@ AB\s+\x{100} AB \x{100}X y - -
/(?x)[a b]/xx \N{SPACE} yS $& # Note a space char here
/(?xx)[a b]/x \N{SPACE} n - -
/(?-x:[a b])/xx \N{SPACE} yS $& # Note a space char here
+(?il)\x{100}|\x{100}|\x{FF} \xFF y $& \xFF
# Keep these lines at the end of the file
# vim: softtabstop=0 noexpandtab
|
From @khwilliamsonI agree with Yves' analysis. Mine was flawed about the cause of this particular issue. I think I still think that when dealing with UTF-8, it's best to use the real space available size, but I'd have to examine it more closely in light of the new analysis. What Yves didn't mention is that this bug requires the following things in order to manifest: 1) the pattern is compiled under /il Items 1) and 2) together would be extremely rare in the field, unless the attacker gets to choose the pattern as well. No non-UTF-8 string may match this pattern, so if we could just short circuit the rest of trying the match, it would work. |
From @tonycozOn Fri, 02 Feb 2018 12:41:41 -0800, demerphq wrote:
No, it points into the string we're matching: tony@mars:.../git/perl$ valgrind --vgdb-error=1 --vgdb=full -q ./perl -Ilib -e '$x="(?il)\x{100}|\x{100}"; "\xff" =~ /$x/;' tony@mars:.../git/perl$ gdb ./perl so the bytes being dumped come from the SV, not from foldbuf[] If I continue until we get the invalid access: Program received signal SIGTRAP, Trace/breakpoint trap. Still in the string we're matching.
I suspect code that uses the :utf8 layer to read from a file could hit this bug with this confusion[1], but I tend to think that such code is buggy in itself. I don't know if the regexp engine is intended to handle badly encoded SVf_UTF8 SVs.
Thanks for the version range, saves me having to work it out for the CVE. Tony [1] since :utf8 could return invalidly encoded SVf_UTF8 SVs. |
From @tonycozOn Fri, 02 Feb 2018 13:15:17 -0800, khw wrote:
I can imagine an older application containing/managing non-Unicode data allowing a regular expression search that could trigger this issue. CVE form entry for once this is public: Vulnerability type: Vendor of the product: Product: Version: Has vendor confirmed or acknowledged the vulnerability? Attack vector: Impact: - the other choices for Impact are: Affected Components: Attack vector Matching a locale dependent regular expression against a non-UTF-8 encoded scalar could result in a heap buffer read overflow while reporting an error message. That error message includes bytes beyond the end of the string, and possibly beyond the end of the buffer, providing a potential information disclosure if the memory had contained any sensitive information. Suggested description of the vulnerability for use in the CVE Matching a crafted locale dependent regular expression can cause a heap buffer read overflow and potentially information disclosure. Discoverer(s)/Credits Nguyen Duc Manh <imdb95@gmail.com> Reference(s) https://rt.perl.org/Public/Bug/Display.html?id=132063 Additional Information (blank) |
From @demerphqOn 5 February 2018 at 05:32, Tony Cook via RT
Yes, because it is not going through the correct code path. It is If it went through the right code path it would not blow up. IOW, Karl This ticket is extremely misleading FWIW. The bug says its about Perl__byte_dump_string. It isnt. That is a If you look at the stack the problem is caused here
while we are in
That Perl__to_utf8_fold_flags() is the same thing as uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, foldbuf, &foldlen, flags); And as you can see the *target* buffer is foldbuf. The *source* is the We should *never* end up in the code to read utf8, however a bug in If you look at the patch I posted you can see how this was fixed.
That wasnt the point. That function reads a character of utf8, and However in this case it is intentionally (albeit incorrectly) being fed latin1.
And still not relevant. If you construct a sting with a single 0xFF The thing is you shouldnt be able to construct a string with a single
Well, I would need to a see a proper case. This ticket isn't such.
No, it is not, and it shouldn't be. If you load malformed utf8 without The regex engine might read the same utf8 character millions of times
Then that is a bug in the :utf8 layer, not in the regex engine, and Yves -- |
From @khwilliamsonOn 02/05/2018 01:09 AM, demerphq wrote:
Yves is correct that the trie code is wrong (and I made it wrong) and if But I also think we should look that any UTF-8 we handle isn't too long So this bug wouldn't have manifested as being a security issue if we had I want to emphasize that no extra time is required here to process using
|
From @demerphqOn 5 February 2018 at 19:04, Karl Williamson <public@khwilliamson.com> wrote:
Well that really isnt clear to me at all. I assume you mean the patch The changes with utf8_to_uvchr_buf_flags() I am not fond of. The What is left after that looks fine to me.
Sure, but we dont need to do the things that patch that does. foldlen
Actually I beg to differ. The *security* issue comes from the utf8 The security issue doesn't come from the trie code at all, nor does it _toFOLD_utf8_flags() and then from the internal error throwing logic in the utf8 code. If If you look at the stack trace: READ of size 1 at 0x602000001b3a thread T0 The security issues comes from S_unexpected_non_continuation_text() or On the other hand the bug that leads to the security issue is that we Anyway, yes we can and should change the UTF8_MAXLEN, and stop using
Yeah I get it. Lets redo that patch reflecting the responses above and Yves |
From @tonycozOn Mon, 05 Feb 2018 00:09:08 -0800, demerphq wrote:
It's true the bug isn't in Perl__byte_dump_string(), but that is where an attacker takes advantage of the bug.
The title simply says there's a heap buffer overflow in Perl__byte_dump_string(), which is the case, it just happens because of a bug elsewhere.
Sorry, I misunderstood the point you were trying to make.
The regular expression engine does have code supporting badly encoded inputs - all the calls to utf8n_to_uvchr() and its variants could be replaced with calls to a much simpler function with less error checking/handling if it wasn't trying to handle such badly encoded data (by croaking instead of crashing). I think your change is needed to fix both the behaviour bug (treating the Latin-1 data as UTF-8) which results in a mistaken call that results in a heap buffer read overflow. I think Karl's change is useful to improve robustness - it's not a direct fix the behaviour bug - which would have resulted in an error with only Karl's change, rather than a heap buffer read overflow. Tony |
From @tonycozOn Sun, 04 Feb 2018 20:40:45 -0800, tonyc wrote:
I've requested a CVE id for this issue. Tony |
From @xsawyerxPlease let me know when we decide on a patch (or patches, depending on On 6 February 2018 at 07:47, Tony Cook via RT
|
From @khwilliamsonOn 02/05/2018 01:38 PM, demerphq wrote:
The point of dumping out the data is to save a lot of debugging time. It occurs to me as I write, that the dump function could assume that a I could change blead to do this, marked as a defensive coding measure, The function already has a flag to indicate that this particular call But it is also good housekeeping to not assume that there is sufficient And this is the final place in blead that still just just passes this So now we have three things Or, this security bug becomes non-security if the calls to the utf8 Or, this security bug becomes non-security if the trie handler is fixed. I now think the first one is what we should do for the security fix. It The attached patch is testing for something that won't happen if either |
From @khwilliamson0004-Subject-PATCH-perl-132063-Heap-buffer-overflow.patchFrom e4d9f67624f2547cb3a48377b7840e74ad024f9d Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Tue, 6 Feb 2018 14:50:48 -0700
Subject: [PATCH 4/4] Subject: PATCH: [perl #132063]: Heap buffer overflow
The proximal cause is several instances in regexec.c of the code
assuming that the input was valid UTF-8, whereas the input was too short
for what the start byte claimed it would be.
I grepped through the core for any other similar uses, and did not find
any.
---
regexec.c | 33 ++++++++++++++++++---------------
t/lib/warnings/regexec | 7 +++++++
2 files changed, 25 insertions(+), 15 deletions(-)
diff --git a/regexec.c b/regexec.c
index 31a133f20b..163e4c4c5b 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1820,7 +1820,9 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_fold \
: trie_latin_utf8_fold)))
-#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
+/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is
+ * 'foldbuf+sizeof(foldbuf)' */
+#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uc_end, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
STMT_START { \
STRLEN skiplen; \
U8 flags = FOLD_FLAGS_FULL; \
@@ -1828,7 +1830,7 @@ STMT_START {
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
- _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \
+ _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \
} \
goto do_trie_utf8_fold; \
case trie_utf8_exactfa_fold: \
@@ -1837,14 +1839,14 @@ STMT_START {
case trie_utf8_fold: \
do_trie_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
} else { \
- len = UTF8SKIP(uc); \
- uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, foldbuf, &foldlen, \
+ uvc = _toFOLD_utf8_flags( (const U8*) uc, uc_end, foldbuf, &foldlen, \
flags); \
+ len = UTF8SKIP(uc); \
skiplen = UVCHR_SKIP( uvc ); \
foldlen -= skiplen; \
uscan = foldbuf + skiplen; \
@@ -1855,7 +1857,7 @@ STMT_START {
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
@@ -1874,7 +1876,7 @@ STMT_START {
} \
/* FALLTHROUGH */ \
case trie_utf8: \
- uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uc, uc_end - uc, &len, uniflags ); \
break; \
case trie_plain: \
uvc = (UV)*uc; \
@@ -3000,10 +3002,10 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
}
points[pointpos++ % maxlen]= uc;
if (foldlen || uc < (U8*)strend) {
- REXEC_TRIE_READ_CHAR(trie_type, trie,
- widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
+ REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
+ (U8 *) strend, uscan, len, uvc,
+ charid, foldlen, foldbuf,
+ uniflags);
DEBUG_TRIE_EXECUTE_r({
dump_exec_pos( (char *)uc, c, strend,
real_start, s, utf8_target, 0);
@@ -6062,8 +6064,9 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
if ( base && (foldlen || uc < (U8*)(reginfo->strend))) {
I32 offset;
REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
+ (U8 *) reginfo->strend, uscan,
+ len, uvc, charid, foldlen,
+ foldbuf, uniflags);
charcount++;
if (foldlen>0)
ST.longfold = TRUE;
@@ -6198,8 +6201,8 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
while (foldlen) {
if (!--chars)
break;
- uvc = utf8n_to_uvchr(uscan, UTF8_MAXLEN, &len,
- uniflags);
+ uvc = utf8n_to_uvchr(uscan, foldlen, &len,
+ uniflags);
uscan += len;
foldlen -= len;
}
diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec
index 900dd6ee7f..6635142dea 100644
--- a/t/lib/warnings/regexec
+++ b/t/lib/warnings/regexec
@@ -260,3 +260,10 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale);
"k" =~ /(?[ \N{KELVIN SIGN} ])/i;
":" =~ /(?[ \: ])/;
EXPECT
+########
+# NAME perl #132063, read beyond buffer end
+# OPTION fatal
+"\xff" =~ /(?il)\x{100}|\x{100}/;
+EXPECT
+Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2.
+Malformed UTF-8 character (fatal) at - line 2.
--
2.11.0
|
From @tonycozOn Mon, 05 Feb 2018 21:47:34 -0800, tonyc wrote:
This is CVE-2018-6798 Tony |
From @khwilliamsonOn 02/07/2018 03:23 PM, Tony Cook via RT wrote:
Attached is my utf8.c patch that solves this at the lowest level. I |
From @khwilliamson0006-utf8.c-Don-t-dump-malformation-past-first-NUL.patchFrom 2578c13f9160090fb3c40f799148df8dc60c328d Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Tue, 13 Feb 2018 07:03:43 -0700
Subject: [PATCH 6/6] utf8.c: Don't dump malformation past first NUL
When a UTF-8 string contains a malformation, the bytes are dumped out as
a debugging aid. One should exercise caution, however, and not dump out
bytes that are actually past the end of the string. Commit 99a765e9e37
from 2016 added the capability to signal to the dumping routines that
we're not sure where the string ends, and to dump the minimal possible.
It occurred to me that an additional safety measure can be easily added,
which this commit does. And that is, in the dumping routines to stop at
the first NUL. All strings automatically get a traiing NUL added, even
if they contain embedded NULs. A NUL can never be part of a
malformation, and so its presence likely signals the end of the string.
---
utf8.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/utf8.c b/utf8.c
index 9cd4fa0927..e586c120ae 100644
--- a/utf8.c
+++ b/utf8.c
@@ -1121,7 +1121,7 @@ Perl__byte_dump_string(pTHX_ const U8 * const start, const STRLEN len, const boo
PERL_STATIC_INLINE char *
S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
- /* How many bytes to print */
+ /* Max number of bytes to print */
STRLEN print_len,
/* Which one is the non-continuation */
@@ -1137,6 +1137,8 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
? "immediately"
: Perl_form(aTHX_ "%d bytes",
(int) non_cont_byte_pos);
+ const U8 * x = s + non_cont_byte_pos;
+ const U8 * e = s + print_len;
PERL_ARGS_ASSERT_UNEXPECTED_NON_CONTINUATION_TEXT;
@@ -1144,10 +1146,20 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
* calculated, it's likely faster to pass it; verify under DEBUGGING */
assert(expect_len == UTF8SKIP(s));
+ /* As a defensive coding measure, don't output anything past a NUL. Such
+ * bytes shouldn't be in the middle of a malformation, and could mark the
+ * end of the allocated string, and what comes after is undefined */
+ for (; x < e; x++) {
+ if (*x == '\0') {
+ x++; /* Output this particular NUL */
+ break;
+ }
+ }
+
return Perl_form(aTHX_ "%s: %s (unexpected non-continuation byte 0x%02x,"
" %s after start byte 0x%02x; need %d bytes, got %d)",
malformed_text,
- _byte_dump_string(s, print_len, 0),
+ _byte_dump_string(s, x - s, 0),
*(s + non_cont_byte_pos),
where,
*s,
--
2.11.0
|
From @khwilliamsonOn 02/07/2018 03:23 PM, Tony Cook via RT wrote:
Attached a patch for 5.24, since the other one doesn't apply. |
From @khwilliamson0375-5.24-maint-perl-132063-Heap-buffer-overflow-in-byte_.patchFrom cb4f483cf79debd0177f5695f501f173ab24a986 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Wed, 14 Feb 2018 20:05:26 -0700
Subject: [PATCH 375/375] 5.24 maint [perl #132063] Heap-buffer-overflow in
byte_dump_strin)
This is a specially formulated patch for 5.24 maint, as the regular
patch does not apply. It adds a check and dies if malformed. Later
releases dump what's in error as a debugging aid, but to keep this patch
minimal for a maint release, it dispenses with that, and just dies with
a generic message.
---
regexec.c | 32 +++++++++++++++++++-------------
t/lib/warnings/regexec | 7 +++++++
2 files changed, 26 insertions(+), 13 deletions(-)
diff --git a/regexec.c b/regexec.c
index 5735b997fd..cb3fb4bfdc 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1466,7 +1466,9 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_fold \
: trie_latin_utf8_fold)))
-#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
+/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is
+ * 'foldbuf+sizeof(foldbuf)' */
+#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uc_end, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
STMT_START { \
STRLEN skiplen; \
U8 flags = FOLD_FLAGS_FULL; \
@@ -1474,7 +1476,7 @@ STMT_START {
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
- _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \
+ _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \
} \
goto do_trie_utf8_fold; \
case trie_utf8_exactfa_fold: \
@@ -1483,11 +1485,14 @@ STMT_START {
case trie_utf8_fold: \
do_trie_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
} else { \
+ if (uc + UTF8SKIP(uc) > uc_end) { \
+ Perl_croak(aTHX_ "Malformed UTF-8 character (fatal)"); \
+ } \
uvc = _to_utf8_fold_flags( (const U8*) uc, foldbuf, &foldlen, flags); \
len = UTF8SKIP(uc); \
skiplen = UVCHR_SKIP( uvc ); \
@@ -1500,7 +1505,7 @@ STMT_START {
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
@@ -1519,7 +1524,7 @@ STMT_START {
} \
/* FALLTHROUGH */ \
case trie_utf8: \
- uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uc, uc_end - uc, &len, uniflags ); \
break; \
case trie_plain: \
uvc = (UV)*uc; \
@@ -2599,10 +2604,10 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
}
points[pointpos++ % maxlen]= uc;
if (foldlen || uc < (U8*)strend) {
- REXEC_TRIE_READ_CHAR(trie_type, trie,
- widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
+ REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
+ (U8 *) strend, uscan, len, uvc,
+ charid, foldlen, foldbuf,
+ uniflags);
DEBUG_TRIE_EXECUTE_r({
dump_exec_pos( (char *)uc, c, strend,
real_start, s, utf8_target, 0);
@@ -5511,8 +5516,9 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
if ( base && (foldlen || uc < (U8*)(reginfo->strend))) {
I32 offset;
REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
+ (U8 *) reginfo->strend, uscan,
+ len, uvc, charid, foldlen,
+ foldbuf, uniflags);
charcount++;
if (foldlen>0)
ST.longfold = TRUE;
@@ -5642,8 +5648,8 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
while (foldlen) {
if (!--chars)
break;
- uvc = utf8n_to_uvchr(uscan, UTF8_MAXLEN, &len,
- uniflags);
+ uvc = utf8n_to_uvchr(uscan, foldlen, &len,
+ uniflags);
uscan += len;
foldlen -= len;
}
diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec
index 900dd6ee7f..6635142dea 100644
--- a/t/lib/warnings/regexec
+++ b/t/lib/warnings/regexec
@@ -260,3 +260,10 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale);
"k" =~ /(?[ \N{KELVIN SIGN} ])/i;
":" =~ /(?[ \: ])/;
EXPECT
+########
+# NAME perl #132063, read beyond buffer end
+# OPTION fatal
+"\xff" =~ /(?il)\x{100}|\x{100}/;
+EXPECT
+Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2.
+Malformed UTF-8 character (fatal) at - line 2.
--
2.11.0
|
From @tonycozOn Wed, 07 Feb 2018 14:23:08 -0800, tonyc wrote:
Patches for 5.24 for this ticket, all merged and de-conflicted. These apply after the changes in 131844 (though there should be no conflicts there.) |
From @tonycoz5.24-0002-perl-132063-Heap-buffer-overflow.patchFrom 29231d73407542051a287cab5e18546e5a622f4a Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Tue, 6 Feb 2018 14:50:48 -0700
Subject: [perl #132063]: Heap buffer overflow
The proximal cause is several instances in regexec.c of the code
assuming that the input was valid UTF-8, whereas the input was too short
for what the start byte claimed it would be.
I grepped through the core for any other similar uses, and did not find
any.
---
regexec.c | 29 ++++++++++++++++-------------
t/lib/warnings/regexec | 7 +++++++
2 files changed, 23 insertions(+), 13 deletions(-)
diff --git a/regexec.c b/regexec.c
index 5735b997fd..ea432c39d3 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1466,7 +1466,9 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_fold \
: trie_latin_utf8_fold)))
-#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
+/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is
+ * 'foldbuf+sizeof(foldbuf)' */
+#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uc_end, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
STMT_START { \
STRLEN skiplen; \
U8 flags = FOLD_FLAGS_FULL; \
@@ -1474,7 +1476,7 @@ STMT_START {
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
- _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \
+ _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \
} \
goto do_trie_utf8_fold; \
case trie_utf8_exactfa_fold: \
@@ -1483,7 +1485,7 @@ STMT_START {
case trie_utf8_fold: \
do_trie_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
@@ -1500,7 +1502,7 @@ STMT_START {
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
@@ -1519,7 +1521,7 @@ STMT_START {
} \
/* FALLTHROUGH */ \
case trie_utf8: \
- uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uc, uc_end - uc, &len, uniflags ); \
break; \
case trie_plain: \
uvc = (UV)*uc; \
@@ -2599,10 +2601,10 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
}
points[pointpos++ % maxlen]= uc;
if (foldlen || uc < (U8*)strend) {
- REXEC_TRIE_READ_CHAR(trie_type, trie,
- widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
+ REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
+ (U8 *) strend, uscan, len, uvc,
+ charid, foldlen, foldbuf,
+ uniflags);
DEBUG_TRIE_EXECUTE_r({
dump_exec_pos( (char *)uc, c, strend,
real_start, s, utf8_target, 0);
@@ -5511,8 +5513,9 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
if ( base && (foldlen || uc < (U8*)(reginfo->strend))) {
I32 offset;
REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
+ (U8 *) reginfo->strend, uscan,
+ len, uvc, charid, foldlen,
+ foldbuf, uniflags);
charcount++;
if (foldlen>0)
ST.longfold = TRUE;
@@ -5642,8 +5645,8 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
while (foldlen) {
if (!--chars)
break;
- uvc = utf8n_to_uvchr(uscan, UTF8_MAXLEN, &len,
- uniflags);
+ uvc = utf8n_to_uvchr(uscan, foldlen, &len,
+ uniflags);
uscan += len;
foldlen -= len;
}
diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec
index 900dd6ee7f..6635142dea 100644
--- a/t/lib/warnings/regexec
+++ b/t/lib/warnings/regexec
@@ -260,3 +260,10 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale);
"k" =~ /(?[ \N{KELVIN SIGN} ])/i;
":" =~ /(?[ \: ])/;
EXPECT
+########
+# NAME perl #132063, read beyond buffer end
+# OPTION fatal
+"\xff" =~ /(?il)\x{100}|\x{100}/;
+EXPECT
+Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2.
+Malformed UTF-8 character (fatal) at - line 2.
--
2.11.0
|
From @tonycoz5.24-0003-v5.24.3-fix-TRIE_READ_CHAR-and-DECL_TRIE_TYPE-to-acc.patchFrom a59dc1f157bd0f626b6b864b9996480f9bac44aa Mon Sep 17 00:00:00 2001
From: Yves Orton <demerphq@gmail.com>
Date: Mon, 19 Feb 2018 13:49:46 +1100
Subject: v5.24.3: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for
non-utf8 target
---
regexec.c | 14 ++++++++++----
t/re/re_tests | 1 +
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/regexec.c b/regexec.c
index ea432c39d3..ff8e89cb65 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1451,7 +1451,7 @@ Perl_re_intuit_start(pTHX_
#define DECL_TRIE_TYPE(scan) \
const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold, \
trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold, \
- trie_utf8l, trie_flu8 } \
+ trie_utf8l, trie_flu8, trie_flu8_latin } \
trie_type = ((scan->flags == EXACT) \
? (utf8_target ? trie_utf8 : trie_plain) \
: (scan->flags == EXACTL) \
@@ -1461,10 +1461,12 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_exactfa_fold \
: trie_latin_utf8_exactfa_fold) \
: (scan->flags == EXACTFLU8 \
- ? trie_flu8 \
+ ? (utf8_target \
+ ? trie_flu8 \
+ : trie_flu8_latin) \
: (utf8_target \
? trie_utf8_fold \
- : trie_latin_utf8_fold)))
+ : trie_latin_utf8_fold)))
/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is
* 'foldbuf+sizeof(foldbuf)' */
@@ -1475,7 +1477,7 @@ STMT_START {
switch (trie_type) { \
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
- if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
+ if (UTF8_IS_ABOVE_LATIN1(*uc)) { \
_CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \
} \
goto do_trie_utf8_fold; \
@@ -1497,10 +1499,14 @@ STMT_START {
uscan = foldbuf + skiplen; \
} \
break; \
+ case trie_flu8_latin: \
+ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
+ goto do_trie_latin_utf8_fold; \
case trie_latin_utf8_exactfa_fold: \
flags |= FOLD_FLAGS_NOMIX_ASCII; \
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
+ do_trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
diff --git a/t/re/re_tests b/t/re/re_tests
index 7e8522da98..ab7ddbb848 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1968,6 +1968,7 @@ ab(?#Comment){2}c abbc y $& abbc
(?:.||)(?|)000000000@ 000000000@ y $& 000000000@ # [perl #126405]
aa$|a(?R)a|a aaa y $& aaa # [perl 128420] recursive matches
(?:\1|a)([bcd])\1(?:(?R)|e)\1 abbaccaddedcb y $& abbaccaddedcb # [perl 128420] recursive match with backreferences
+(?il)\x{100}|\x{100}|\x{FF} \xFF y $& \xFF
# Keep these lines at the end of the file
# vim: softtabstop=0 noexpandtab
--
2.11.0
|
From @tonycoz5.24-0004-perl-132063-we-should-no-longer-warn-for-this-code.patchFrom 9dd4e0280eca2ba666cc0671ec3724610ed7d366 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 19 Feb 2018 15:11:42 +1100
Subject: (perl #132063) we should no longer warn for this code
The first patch for 132063 prevented the buffer read overflow when
dumping the warning but didn't fix the underlying problem.
The next change treats the supplied buffer correctly, preventing the
non-UTF-8 SV from being treated as UTF-8, preventing the warning.
---
t/lib/warnings/regexec | 3 ---
1 file changed, 3 deletions(-)
diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec
index 6635142dea..c370ddc3c7 100644
--- a/t/lib/warnings/regexec
+++ b/t/lib/warnings/regexec
@@ -262,8 +262,5 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale);
EXPECT
########
# NAME perl #132063, read beyond buffer end
-# OPTION fatal
"\xff" =~ /(?il)\x{100}|\x{100}/;
EXPECT
-Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2.
-Malformed UTF-8 character (fatal) at - line 2.
--
2.11.0
|
From @tonycozOn Wed, 07 Feb 2018 14:23:08 -0800, tonyc wrote:
Patches for 5.26. |
From @tonycoz5.26-0002-perl-132063-Heap-buffer-overflow.patchFrom 7dc63e39d2d84920efd005092bc4d03b6ab24e1c Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Tue, 6 Feb 2018 14:50:48 -0700
Subject: [perl #132063]: Heap buffer overflow
The proximal cause is several instances in regexec.c of the code
assuming that the input was valid UTF-8, whereas the input was too short
for what the start byte claimed it would be.
I grepped through the core for any other similar uses, and did not find
any.
---
regexec.c | 33 ++++++++++++++++++---------------
t/lib/warnings/regexec | 7 +++++++
2 files changed, 25 insertions(+), 15 deletions(-)
diff --git a/regexec.c b/regexec.c
index 134b196fc4..fa888823bd 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1487,7 +1487,9 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_fold \
: trie_latin_utf8_fold)))
-#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
+/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is
+ * 'foldbuf+sizeof(foldbuf)' */
+#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uc_end, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
STMT_START { \
STRLEN skiplen; \
U8 flags = FOLD_FLAGS_FULL; \
@@ -1495,7 +1497,7 @@ STMT_START {
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
- _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \
+ _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \
} \
goto do_trie_utf8_fold; \
case trie_utf8_exactfa_fold: \
@@ -1504,14 +1506,14 @@ STMT_START {
case trie_utf8_fold: \
do_trie_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
} else { \
- len = UTF8SKIP(uc); \
- uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, foldbuf, &foldlen, \
+ uvc = _toFOLD_utf8_flags( (const U8*) uc, uc_end, foldbuf, &foldlen, \
flags); \
+ len = UTF8SKIP(uc); \
skiplen = UVCHR_SKIP( uvc ); \
foldlen -= skiplen; \
uscan = foldbuf + skiplen; \
@@ -1522,7 +1524,7 @@ STMT_START {
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
@@ -1541,7 +1543,7 @@ STMT_START {
} \
/* FALLTHROUGH */ \
case trie_utf8: \
- uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uc, uc_end - uc, &len, uniflags ); \
break; \
case trie_plain: \
uvc = (UV)*uc; \
@@ -2623,10 +2625,10 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
}
points[pointpos++ % maxlen]= uc;
if (foldlen || uc < (U8*)strend) {
- REXEC_TRIE_READ_CHAR(trie_type, trie,
- widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
+ REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
+ (U8 *) strend, uscan, len, uvc,
+ charid, foldlen, foldbuf,
+ uniflags);
DEBUG_TRIE_EXECUTE_r({
dump_exec_pos( (char *)uc, c, strend,
real_start, s, utf8_target, 0);
@@ -5686,8 +5688,9 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
if ( base && (foldlen || uc < (U8*)(reginfo->strend))) {
I32 offset;
REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
+ (U8 *) reginfo->strend, uscan,
+ len, uvc, charid, foldlen,
+ foldbuf, uniflags);
charcount++;
if (foldlen>0)
ST.longfold = TRUE;
@@ -5822,8 +5825,8 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
while (foldlen) {
if (!--chars)
break;
- uvc = utf8n_to_uvchr(uscan, UTF8_MAXLEN, &len,
- uniflags);
+ uvc = utf8n_to_uvchr(uscan, foldlen, &len,
+ uniflags);
uscan += len;
foldlen -= len;
}
diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec
index 900dd6ee7f..6635142dea 100644
--- a/t/lib/warnings/regexec
+++ b/t/lib/warnings/regexec
@@ -260,3 +260,10 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale);
"k" =~ /(?[ \N{KELVIN SIGN} ])/i;
":" =~ /(?[ \: ])/;
EXPECT
+########
+# NAME perl #132063, read beyond buffer end
+# OPTION fatal
+"\xff" =~ /(?il)\x{100}|\x{100}/;
+EXPECT
+Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2.
+Malformed UTF-8 character (fatal) at - line 2.
--
2.11.0
|
From @tonycoz5.26-0003-5.26.1-fix-TRIE_READ_CHAR-and-DECL_TRIE_TYPE-to-acco.patchFrom d58a6811d830c2f2f850a03a18129c38cb732791 Mon Sep 17 00:00:00 2001
From: Yves Orton <demerphq@gmail.com>
Date: Tue, 13 Feb 2018 16:11:55 +1100
Subject: 5.26.1: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8
target
---
regexec.c | 14 ++++++++++----
t/re/re_tests | 1 +
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/regexec.c b/regexec.c
index fa888823bd..cf81b07e30 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1472,7 +1472,7 @@ Perl_re_intuit_start(pTHX_
#define DECL_TRIE_TYPE(scan) \
const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold, \
trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold, \
- trie_utf8l, trie_flu8 } \
+ trie_utf8l, trie_flu8, trie_flu8_latin } \
trie_type = ((scan->flags == EXACT) \
? (utf8_target ? trie_utf8 : trie_plain) \
: (scan->flags == EXACTL) \
@@ -1482,10 +1482,12 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_exactfa_fold \
: trie_latin_utf8_exactfa_fold) \
: (scan->flags == EXACTFLU8 \
- ? trie_flu8 \
+ ? (utf8_target \
+ ? trie_flu8 \
+ : trie_flu8_latin) \
: (utf8_target \
? trie_utf8_fold \
- : trie_latin_utf8_fold)))
+ : trie_latin_utf8_fold)))
/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is
* 'foldbuf+sizeof(foldbuf)' */
@@ -1496,7 +1498,7 @@ STMT_START {
switch (trie_type) { \
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
- if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
+ if (UTF8_IS_ABOVE_LATIN1(*uc)) { \
_CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \
} \
goto do_trie_utf8_fold; \
@@ -1519,10 +1521,14 @@ STMT_START {
uscan = foldbuf + skiplen; \
} \
break; \
+ case trie_flu8_latin: \
+ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
+ goto do_trie_latin_utf8_fold; \
case trie_latin_utf8_exactfa_fold: \
flags |= FOLD_FLAGS_NOMIX_ASCII; \
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
+ do_trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
diff --git a/t/re/re_tests b/t/re/re_tests
index 410fceadac..78baed6ffc 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1985,6 +1985,7 @@ AB\s+\x{100} AB \x{100}X y - -
/(?x)[a b]/xx \N{SPACE} yS $& # Note a space char here
/(?xx)[a b]/x \N{SPACE} n - -
/(?-x:[a b])/xx \N{SPACE} yS $& # Note a space char here
+(?il)\x{100}|\x{100}|\x{FF} \xFF y $& \xFF
# Keep these lines at the end of the file
# vim: softtabstop=0 noexpandtab
--
2.11.0
|
From @tonycoz5.26-0004-perl-132063-we-should-no-longer-warn-for-this-code.patchFrom 3c4ea4886efdb477335f6931d0a553e818ee172f Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 19 Feb 2018 15:11:42 +1100
Subject: (perl #132063) we should no longer warn for this code
The first patch for 132063 prevented the buffer read overflow when
dumping the warning but didn't fix the underlying problem.
The next change treats the supplied buffer correctly, preventing the
non-UTF-8 SV from being treated as UTF-8, preventing the warning.
---
t/lib/warnings/regexec | 3 ---
1 file changed, 3 deletions(-)
diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec
index 6635142dea..c370ddc3c7 100644
--- a/t/lib/warnings/regexec
+++ b/t/lib/warnings/regexec
@@ -262,8 +262,5 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale);
EXPECT
########
# NAME perl #132063, read beyond buffer end
-# OPTION fatal
"\xff" =~ /(?il)\x{100}|\x{100}/;
EXPECT
-Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2.
-Malformed UTF-8 character (fatal) at - line 2.
--
2.11.0
|
From @tonycoz5.26-0005-utf8.c-Don-t-dump-malformation-past-first-NUL.patchFrom d72ba890c8d8ac800c9d00a1f542deca11551f33 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Tue, 13 Feb 2018 07:03:43 -0700
Subject: utf8.c: Don't dump malformation past first NUL
When a UTF-8 string contains a malformation, the bytes are dumped out as
a debugging aid. One should exercise caution, however, and not dump out
bytes that are actually past the end of the string. Commit 99a765e9e37
from 2016 added the capability to signal to the dumping routines that
we're not sure where the string ends, and to dump the minimal possible.
It occurred to me that an additional safety measure can be easily added,
which this commit does. And that is, in the dumping routines to stop at
the first NUL. All strings automatically get a traiing NUL added, even
if they contain embedded NULs. A NUL can never be part of a
malformation, and so its presence likely signals the end of the string.
---
utf8.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/utf8.c b/utf8.c
index a3d5f61b64..61346f0cb6 100644
--- a/utf8.c
+++ b/utf8.c
@@ -810,7 +810,7 @@ Perl__byte_dump_string(pTHX_ const U8 * s, const STRLEN len, const bool format)
PERL_STATIC_INLINE char *
S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
- /* How many bytes to print */
+ /* Max number of bytes to print */
STRLEN print_len,
/* Which one is the non-continuation */
@@ -826,6 +826,8 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
? "immediately"
: Perl_form(aTHX_ "%d bytes",
(int) non_cont_byte_pos);
+ const U8 * x = s + non_cont_byte_pos;
+ const U8 * e = s + print_len;
PERL_ARGS_ASSERT_UNEXPECTED_NON_CONTINUATION_TEXT;
@@ -833,10 +835,20 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
* calculated, it's likely faster to pass it; verify under DEBUGGING */
assert(expect_len == UTF8SKIP(s));
+ /* As a defensive coding measure, don't output anything past a NUL. Such
+ * bytes shouldn't be in the middle of a malformation, and could mark the
+ * end of the allocated string, and what comes after is undefined */
+ for (; x < e; x++) {
+ if (*x == '\0') {
+ x++; /* Output this particular NUL */
+ break;
+ }
+ }
+
return Perl_form(aTHX_ "%s: %s (unexpected non-continuation byte 0x%02x,"
" %s after start byte 0x%02x; need %d bytes, got %d)",
malformed_text,
- _byte_dump_string(s, print_len, 0),
+ _byte_dump_string(s, x - s, 0),
*(s + non_cont_byte_pos),
where,
*s,
--
2.11.0
|
From @tonycozOn Wed, 07 Feb 2018 14:23:08 -0800, tonyc wrote:
Patches for blead. |
From @tonycozblead-0002-perl-132063-Heap-buffer-overflow.patchFrom af3b52eb96c0db51ca5efb7c605f59b4c89c3ae3 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Tue, 6 Feb 2018 14:50:48 -0700
Subject: [perl #132063]: Heap buffer overflow
The proximal cause is several instances in regexec.c of the code
assuming that the input was valid UTF-8, whereas the input was too short
for what the start byte claimed it would be.
I grepped through the core for any other similar uses, and did not find
any.
---
regexec.c | 33 ++++++++++++++++++---------------
t/lib/warnings/regexec | 7 +++++++
2 files changed, 25 insertions(+), 15 deletions(-)
diff --git a/regexec.c b/regexec.c
index 1cda2e8e88..47793e6e16 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1822,7 +1822,9 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_fold \
: trie_latin_utf8_fold)))
-#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
+/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is
+ * 'foldbuf+sizeof(foldbuf)' */
+#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uc_end, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \
STMT_START { \
STRLEN skiplen; \
U8 flags = FOLD_FLAGS_FULL; \
@@ -1830,7 +1832,7 @@ STMT_START {
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
- _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \
+ _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \
} \
goto do_trie_utf8_fold; \
case trie_utf8_exactfa_fold: \
@@ -1839,14 +1841,14 @@ STMT_START {
case trie_utf8_fold: \
do_trie_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
} else { \
- len = UTF8SKIP(uc); \
- uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, foldbuf, &foldlen, \
+ uvc = _toFOLD_utf8_flags( (const U8*) uc, uc_end, foldbuf, &foldlen, \
flags); \
+ len = UTF8SKIP(uc); \
skiplen = UVCHR_SKIP( uvc ); \
foldlen -= skiplen; \
uscan = foldbuf + skiplen; \
@@ -1857,7 +1859,7 @@ STMT_START {
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
- uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
uscan += len; \
len=0; \
@@ -1876,7 +1878,7 @@ STMT_START {
} \
/* FALLTHROUGH */ \
case trie_utf8: \
- uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags ); \
+ uvc = utf8n_to_uvchr( (const U8*) uc, uc_end - uc, &len, uniflags ); \
break; \
case trie_plain: \
uvc = (UV)*uc; \
@@ -3038,10 +3040,10 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
}
points[pointpos++ % maxlen]= uc;
if (foldlen || uc < (U8*)strend) {
- REXEC_TRIE_READ_CHAR(trie_type, trie,
- widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
+ REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
+ (U8 *) strend, uscan, len, uvc,
+ charid, foldlen, foldbuf,
+ uniflags);
DEBUG_TRIE_EXECUTE_r({
dump_exec_pos( (char *)uc, c, strend,
real_start, s, utf8_target, 0);
@@ -6100,8 +6102,9 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
if ( base && (foldlen || uc < (U8*)(reginfo->strend))) {
I32 offset;
REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc,
- uscan, len, uvc, charid, foldlen,
- foldbuf, uniflags);
+ (U8 *) reginfo->strend, uscan,
+ len, uvc, charid, foldlen,
+ foldbuf, uniflags);
charcount++;
if (foldlen>0)
ST.longfold = TRUE;
@@ -6236,8 +6239,8 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
while (foldlen) {
if (!--chars)
break;
- uvc = utf8n_to_uvchr(uscan, UTF8_MAXLEN, &len,
- uniflags);
+ uvc = utf8n_to_uvchr(uscan, foldlen, &len,
+ uniflags);
uscan += len;
foldlen -= len;
}
diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec
index 900dd6ee7f..6635142dea 100644
--- a/t/lib/warnings/regexec
+++ b/t/lib/warnings/regexec
@@ -260,3 +260,10 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale);
"k" =~ /(?[ \N{KELVIN SIGN} ])/i;
":" =~ /(?[ \: ])/;
EXPECT
+########
+# NAME perl #132063, read beyond buffer end
+# OPTION fatal
+"\xff" =~ /(?il)\x{100}|\x{100}/;
+EXPECT
+Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2.
+Malformed UTF-8 character (fatal) at - line 2.
--
2.11.0
|
From @tonycozblead-0003-fix-TRIE_READ_CHAR-and-DECL_TRIE_TYPE-to-account-for.patchFrom 7cb7018008ced2cac377126364a3842d3f91d275 Mon Sep 17 00:00:00 2001
From: Yves Orton <demerphq@gmail.com>
Date: Wed, 14 Feb 2018 10:29:26 +1100
Subject: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8 target
---
regexec.c | 14 ++++++++++----
t/re/re_tests | 2 +-
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/regexec.c b/regexec.c
index 47793e6e16..1630b676ac 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1807,7 +1807,7 @@ Perl_re_intuit_start(pTHX_
#define DECL_TRIE_TYPE(scan) \
const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold, \
trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold, \
- trie_utf8l, trie_flu8 } \
+ trie_utf8l, trie_flu8, trie_flu8_latin } \
trie_type = ((scan->flags == EXACT) \
? (utf8_target ? trie_utf8 : trie_plain) \
: (scan->flags == EXACTL) \
@@ -1817,10 +1817,12 @@ Perl_re_intuit_start(pTHX_
? trie_utf8_exactfa_fold \
: trie_latin_utf8_exactfa_fold) \
: (scan->flags == EXACTFLU8 \
- ? trie_flu8 \
+ ? (utf8_target \
+ ? trie_flu8 \
+ : trie_flu8_latin) \
: (utf8_target \
? trie_utf8_fold \
- : trie_latin_utf8_fold)))
+ : trie_latin_utf8_fold)))
/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is
* 'foldbuf+sizeof(foldbuf)' */
@@ -1831,7 +1833,7 @@ STMT_START {
switch (trie_type) { \
case trie_flu8: \
_CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
- if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \
+ if (UTF8_IS_ABOVE_LATIN1(*uc)) { \
_CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \
} \
goto do_trie_utf8_fold; \
@@ -1854,10 +1856,14 @@ STMT_START {
uscan = foldbuf + skiplen; \
} \
break; \
+ case trie_flu8_latin: \
+ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \
+ goto do_trie_latin_utf8_fold; \
case trie_latin_utf8_exactfa_fold: \
flags |= FOLD_FLAGS_NOMIX_ASCII; \
/* FALLTHROUGH */ \
case trie_latin_utf8_fold: \
+ do_trie_latin_utf8_fold: \
if ( foldlen>0 ) { \
uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \
foldlen -= len; \
diff --git a/t/re/re_tests b/t/re/re_tests
index 61b8c875e2..3db034c67e 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1989,7 +1989,7 @@ AB\s+\x{100} AB \x{100}X y - -
([[:ascii:]]+)\x81 a\x80b\x81 y $& b\x81
[[:^ascii:]]+b \x80a\x81b y $& \x81b
[[:^ascii:]]+b \x80a\x81\x{100}b y $& \x81\x{100}b
-
+(?il)\x{100}|\x{100}|\x{FF} \xFF y $& \xFF
# Keep these lines at the end of the file
# vim: softtabstop=0 noexpandtab
--
2.11.0
|
From @tonycozblead-0004-perl-132063-we-should-no-longer-warn-for-this-code.patchFrom f7234720f553e29e4629b88d3928052def520da3 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 19 Feb 2018 15:11:42 +1100
Subject: (perl #132063) we should no longer warn for this code
The first patch for 132063 prevented the buffer read overflow when
dumping the warning but didn't fix the underlying problem.
The next change treats the supplied buffer correctly, preventing the
non-UTF-8 SV from being treated as UTF-8, preventing the warning.
---
t/lib/warnings/regexec | 3 ---
1 file changed, 3 deletions(-)
diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec
index 6635142dea..c370ddc3c7 100644
--- a/t/lib/warnings/regexec
+++ b/t/lib/warnings/regexec
@@ -262,8 +262,5 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale);
EXPECT
########
# NAME perl #132063, read beyond buffer end
-# OPTION fatal
"\xff" =~ /(?il)\x{100}|\x{100}/;
EXPECT
-Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2.
-Malformed UTF-8 character (fatal) at - line 2.
--
2.11.0
|
From @tonycozblead-0005-utf8.c-Don-t-dump-malformation-past-first-NUL.patchFrom f348493a414bc87cf78b4c882c409e2d4b775595 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Tue, 13 Feb 2018 07:03:43 -0700
Subject: utf8.c: Don't dump malformation past first NUL
When a UTF-8 string contains a malformation, the bytes are dumped out as
a debugging aid. One should exercise caution, however, and not dump out
bytes that are actually past the end of the string. Commit 99a765e9e37
from 2016 added the capability to signal to the dumping routines that
we're not sure where the string ends, and to dump the minimal possible.
It occurred to me that an additional safety measure can be easily added,
which this commit does. And that is, in the dumping routines to stop at
the first NUL. All strings automatically get a traiing NUL added, even
if they contain embedded NULs. A NUL can never be part of a
malformation, and so its presence likely signals the end of the string.
---
utf8.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/utf8.c b/utf8.c
index a30182a43f..4bec553d56 100644
--- a/utf8.c
+++ b/utf8.c
@@ -1121,7 +1121,7 @@ Perl__byte_dump_string(pTHX_ const U8 * const start, const STRLEN len, const boo
PERL_STATIC_INLINE char *
S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
- /* How many bytes to print */
+ /* Max number of bytes to print */
STRLEN print_len,
/* Which one is the non-continuation */
@@ -1137,6 +1137,8 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
? "immediately"
: Perl_form(aTHX_ "%d bytes",
(int) non_cont_byte_pos);
+ const U8 * x = s + non_cont_byte_pos;
+ const U8 * e = s + print_len;
PERL_ARGS_ASSERT_UNEXPECTED_NON_CONTINUATION_TEXT;
@@ -1144,10 +1146,20 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
* calculated, it's likely faster to pass it; verify under DEBUGGING */
assert(expect_len == UTF8SKIP(s));
+ /* As a defensive coding measure, don't output anything past a NUL. Such
+ * bytes shouldn't be in the middle of a malformation, and could mark the
+ * end of the allocated string, and what comes after is undefined */
+ for (; x < e; x++) {
+ if (*x == '\0') {
+ x++; /* Output this particular NUL */
+ break;
+ }
+ }
+
return Perl_form(aTHX_ "%s: %s (unexpected non-continuation byte 0x%02x,"
" %s after start byte 0x%02x; need %d bytes, got %d)",
malformed_text,
- _byte_dump_string(s, print_len, 0),
+ _byte_dump_string(s, x - s, 0),
*(s + non_cont_byte_pos),
where,
*s,
--
2.11.0
|
From @xsawyerxThe public disclosure date is set for March 15th. |
From @tonycozOn Sat, 24 Feb 2018 08:56:44 -0800, xsawyerx@gmail.com wrote:
This has been delayed. Tony |
From @xsawyerxThe disclosure date has been postponed officially to April 14th. On 19 March 2018 at 00:13, Tony Cook via RT
|
@xsawyerx - Status changed from 'open' to 'pending release' |
From @xsawyerxOn Mon, 19 Mar 2018 15:26:47 -0700, xsawyerx@gmail.com wrote:
Moved to public queue. |
From @khwilliamsonThank you for filing this report. You have helped make Perl better. With the release yesterday of Perl 5.28.0, this and 185 other issues have been Perl 5.28.0 may be downloaded via: If you find that the problem persists, feel free to reopen this ticket. |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#132063 (status was 'resolved')
Searchable as RT132063$
The text was updated successfully, but these errors were encountered: