Skip to content

Commit 095b7b9

Browse files
committed
Merge branch 'merge/merge-pcre' into 10.0
2 parents 359ae59 + e7591a1 commit 095b7b9

39 files changed

+2225
-1264
lines changed

pcre/ChangeLog

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,182 @@
11
ChangeLog for PCRE
22
------------------
33

4+
Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
5+
development is happening in the PCRE2 10.xx series.
6+
7+
Version 8.38 23-November-2015
8+
-----------------------------
9+
10+
1. If a group that contained a recursive back reference also contained a
11+
forward reference subroutine call followed by a non-forward-reference
12+
subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
13+
compile correct code, leading to undefined behaviour or an internally
14+
detected error. This bug was discovered by the LLVM fuzzer.
15+
16+
2. Quantification of certain items (e.g. atomic back references) could cause
17+
incorrect code to be compiled when recursive forward references were
18+
involved. For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/.
19+
This bug was discovered by the LLVM fuzzer.
20+
21+
3. A repeated conditional group whose condition was a reference by name caused
22+
a buffer overflow if there was more than one group with the given name.
23+
This bug was discovered by the LLVM fuzzer.
24+
25+
4. A recursive back reference by name within a group that had the same name as
26+
another group caused a buffer overflow. For example:
27+
/(?J)(?'d'(?'d'\g{d}))/. This bug was discovered by the LLVM fuzzer.
28+
29+
5. A forward reference by name to a group whose number is the same as the
30+
current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused
31+
a buffer overflow at compile time. This bug was discovered by the LLVM
32+
fuzzer.
33+
34+
6. A lookbehind assertion within a set of mutually recursive subpatterns could
35+
provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
36+
37+
7. Another buffer overflow bug involved duplicate named groups with a
38+
reference between their definition, with a group that reset capture
39+
numbers, for example: /(?J:(?|(?'R')(\k'R')|((?'R'))))/. This has been
40+
fixed by always allowing for more memory, even if not needed. (A proper fix
41+
is implemented in PCRE2, but it involves more refactoring.)
42+
43+
8. There was no check for integer overflow in subroutine calls such as (?123).
44+
45+
9. The table entry for \l in EBCDIC environments was incorrect, leading to its
46+
being treated as a literal 'l' instead of causing an error.
47+
48+
10. There was a buffer overflow if pcre_exec() was called with an ovector of
49+
size 1. This bug was found by american fuzzy lop.
50+
51+
11. If a non-capturing group containing a conditional group that could match
52+
an empty string was repeated, it was not identified as matching an empty
53+
string itself. For example: /^(?:(?(1)x|)+)+$()/.
54+
55+
12. In an EBCDIC environment, pcretest was mishandling the escape sequences
56+
\a and \e in test subject lines.
57+
58+
13. In an EBCDIC environment, \a in a pattern was converted to the ASCII
59+
instead of the EBCDIC value.
60+
61+
14. The handling of \c in an EBCDIC environment has been revised so that it is
62+
now compatible with the specification in Perl's perlebcdic page.
63+
64+
15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
65+
ASCII/Unicode. This has now been added to the list of characters that are
66+
recognized as white space in EBCDIC.
67+
68+
16. When PCRE was compiled without UCP support, the use of \p and \P gave an
69+
error (correctly) when used outside a class, but did not give an error
70+
within a class.
71+
72+
17. \h within a class was incorrectly compiled in EBCDIC environments.
73+
74+
18. A pattern with an unmatched closing parenthesis that contained a backward
75+
assertion which itself contained a forward reference caused buffer
76+
overflow. And example pattern is: /(?=di(?<=(?1))|(?=(.))))/.
77+
78+
19. JIT should return with error when the compiled pattern requires more stack
79+
space than the maximum.
80+
81+
20. A possessively repeated conditional group that could match an empty string,
82+
for example, /(?(R))*+/, was incorrectly compiled.
83+
84+
21. Fix infinite recursion in the JIT compiler when certain patterns such as
85+
/(?:|a|){100}x/ are analysed.
86+
87+
22. Some patterns with character classes involving [: and \\ were incorrectly
88+
compiled and could cause reading from uninitialized memory or an incorrect
89+
error diagnosis.
90+
91+
23. Pathological patterns containing many nested occurrences of [: caused
92+
pcre_compile() to run for a very long time.
93+
94+
24. A conditional group with only one branch has an implicit empty alternative
95+
branch and must therefore be treated as potentially matching an empty
96+
string.
97+
98+
25. If (?R was followed by - or + incorrect behaviour happened instead of a
99+
diagnostic.
100+
101+
26. Arrange to give up on finding the minimum matching length for overly
102+
complex patterns.
103+
104+
27. Similar to (4) above: in a pattern with duplicated named groups and an
105+
occurrence of (?| it is possible for an apparently non-recursive back
106+
reference to become recursive if a later named group with the relevant
107+
number is encountered. This could lead to a buffer overflow. Wen Guanxing
108+
from Venustech ADLAB discovered this bug.
109+
110+
28. If pcregrep was given the -q option with -c or -l, or when handling a
111+
binary file, it incorrectly wrote output to stdout.
112+
113+
29. The JIT compiler did not restore the control verb head in case of *THEN
114+
control verbs. This issue was found by Karl Skomski with a custom LLVM
115+
fuzzer.
116+
117+
30. Error messages for syntax errors following \g and \k were giving inaccurate
118+
offsets in the pattern.
119+
120+
31. Added a check for integer overflow in conditions (?(<digits>) and
121+
(?(R<digits>). This omission was discovered by Karl Skomski with the LLVM
122+
fuzzer.
123+
124+
32. Handling recursive references such as (?2) when the reference is to a group
125+
later in the pattern uses code that is very hacked about and error-prone.
126+
It has been re-written for PCRE2. Here in PCRE1, a check has been added to
127+
give an internal error if it is obvious that compiling has gone wrong.
128+
129+
33. The JIT compiler should not check repeats after a {0,1} repeat byte code.
130+
This issue was found by Karl Skomski with a custom LLVM fuzzer.
131+
132+
34. The JIT compiler should restore the control chain for empty possessive
133+
repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer.
134+
135+
35. Match limit check added to JIT recursion. This issue was found by Karl
136+
Skomski with a custom LLVM fuzzer.
137+
138+
36. Yet another case similar to 27 above has been circumvented by an
139+
unconditional allocation of extra memory. This issue is fixed "properly" in
140+
PCRE2 by refactoring the way references are handled. Wen Guanxing
141+
from Venustech ADLAB discovered this bug.
142+
143+
37. Fix two assertion fails in JIT. These issues were found by Karl Skomski
144+
with a custom LLVM fuzzer.
145+
146+
38. Fixed a corner case of range optimization in JIT.
147+
148+
39. An incorrect error "overran compiling workspace" was given if there were
149+
exactly enough group forward references such that the last one extended
150+
into the workspace safety margin. The next one would have expanded the
151+
workspace. The test for overflow was not including the safety margin.
152+
153+
40. A match limit issue is fixed in JIT which was found by Karl Skomski
154+
with a custom LLVM fuzzer.
155+
156+
41. Remove the use of /dev/null in testdata/testinput2, because it doesn't
157+
work under Windows. (Why has it taken so long for anyone to notice?)
158+
159+
42. In a character class such as [\W\p{Any}] where both a negative-type escape
160+
("not a word character") and a property escape were present, the property
161+
escape was being ignored.
162+
163+
43. Fix crash caused by very long (*MARK) or (*THEN) names.
164+
165+
44. A sequence such as [[:punct:]b] that is, a POSIX character class followed
166+
by a single ASCII character in a class item, was incorrectly compiled in
167+
UCP mode. The POSIX class got lost, but only if the single character
168+
followed it.
169+
170+
45. [:punct:] in UCP mode was matching some characters in the range 128-255
171+
that should not have been matched.
172+
173+
46. If [:^ascii:] or [:^xdigit:] or [:^cntrl:] are present in a non-negated
174+
class, all characters with code points greater than 255 are in the class.
175+
When a Unicode property was also in the class (if PCRE_UCP is set, escapes
176+
such as \w are turned into Unicode properties), wide characters were not
177+
correctly handled, and could fail to match.
178+
179+
4180
Version 8.37 28-April-2015
5181
--------------------------
6182

pcre/NEWS

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
News about PCRE releases
22
------------------------
33

4+
Release 8.38 23-November-2015
5+
-----------------------------
6+
7+
This is bug-fix release. Note that this library (now called PCRE1) is now being
8+
maintained for bug fixes only. New projects are advised to use the new PCRE2
9+
libraries.
10+
11+
412
Release 8.37 28-April-2015
513
--------------------------
614

pcre/NON-AUTOTOOLS-BUILD

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -764,9 +764,9 @@ required. For details, please see this web site:
764764

765765
http://www.zaconsultants.net
766766

767-
There is also a mirror here:
768-
769-
http://www.vsoft-software.com/downloads.html
767+
You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
768+
executable, is in EBCDIC and native z/OS file formats and this is the
769+
recommended download site.
770770

771771
==========================
772-
Last Updated: 10 February 2015
772+
Last Updated: 25 June 2015

pcre/RunGrepTest

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -512,6 +512,14 @@ echo "aaaaa" >>testtemp1grep
512512
(cd $srcdir; $valgrind $pcregrep --line-offsets '(?<=\Ka)' $builddir/testtemp1grep) >>testtrygrep 2>&1
513513
echo "RC=$?" >>testtrygrep
514514

515+
echo "---------------------------- Test 108 ------------------------------" >>testtrygrep
516+
(cd $srcdir; $valgrind $pcregrep -lq PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep
517+
echo "RC=$?" >>testtrygrep
518+
519+
echo "---------------------------- Test 109 -----------------------------" >>testtrygrep
520+
(cd $srcdir; $valgrind $pcregrep -cq lazy ./testdata/grepinput*) >>testtrygrep
521+
echo "RC=$?" >>testtrygrep
522+
515523
# Now compare the results.
516524

517525
$cf $srcdir/testdata/grepoutput testtrygrep

pcre/configure.ac

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
99
dnl be defined as -RC2, for example. For real releases, it should be empty.
1010

1111
m4_define(pcre_major, [8])
12-
m4_define(pcre_minor, [37])
12+
m4_define(pcre_minor, [38])
1313
m4_define(pcre_prerelease, [])
14-
m4_define(pcre_date, [2015-04-28])
14+
m4_define(pcre_date, [2015-11-23])
1515

1616
# NOTE: The CMakeLists.txt file searches for the above variables in the first
1717
# 50 lines of this file. Please update that if the variables above are moved.
1818

1919
# Libtool shared library interface versions (current:revision:age)
20-
m4_define(libpcre_version, [3:5:2])
21-
m4_define(libpcre16_version, [2:5:2])
22-
m4_define(libpcre32_version, [0:5:0])
20+
m4_define(libpcre_version, [3:6:2])
21+
m4_define(libpcre16_version, [2:6:2])
22+
m4_define(libpcre32_version, [0:6:0])
2323
m4_define(libpcreposix_version, [0:3:0])
2424
m4_define(libpcrecpp_version, [0:1:0])
2525

pcre/doc/html/NON-AUTOTOOLS-BUILD.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -764,9 +764,9 @@ required. For details, please see this web site:
764764

765765
http://www.zaconsultants.net
766766

767-
There is also a mirror here:
768-
769-
http://www.vsoft-software.com/downloads.html
767+
You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
768+
executable, is in EBCDIC and native z/OS file formats and this is the
769+
recommended download site.
770770

771771
==========================
772-
Last Updated: 10 February 2015
772+
Last Updated: 25 June 2015

pcre/doc/html/pcrepattern.html

Lines changed: 28 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -329,7 +329,8 @@ <h1>pcrepattern man page</h1>
329329
in patterns in a visible manner. There is no restriction on the appearance of
330330
non-printing characters, apart from the binary zero that terminates a pattern,
331331
but when a pattern is being prepared by text editing, it is often easier to use
332-
one of the following escape sequences than the binary character it represents:
332+
one of the following escape sequences than the binary character it represents.
333+
In an ASCII or Unicode environment, these escapes are as follows:
333334
<pre>
334335
\a alarm, that is, the BEL character (hex 07)
335336
\cx "control-x", where x is any ASCII character
@@ -353,19 +354,33 @@ <h1>pcrepattern man page</h1>
353354
compile-time error occurs. This locks out non-ASCII characters in all modes.
354355
</P>
355356
<P>
356-
The \c facility was designed for use with ASCII characters, but with the
357-
extension to Unicode it is even less useful than it once was. It is, however,
358-
recognized when PCRE is compiled in EBCDIC mode, where data items are always
359-
bytes. In this mode, all values are valid after \c. If the next character is a
360-
lower case letter, it is converted to upper case. Then the 0xc0 bits of the
361-
byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
362-
the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
363-
characters also generate different values.
357+
When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
358+
generate the appropriate EBCDIC code values. The \c escape is processed
359+
as specified for Perl in the <b>perlebcdic</b> document. The only characters
360+
that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
361+
other character provokes a compile-time error. The sequence \@ encodes
362+
character code 0; the letters (in either case) encode characters 1-26 (hex 01
363+
to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
364+
\? becomes either 255 (hex FF) or 95 (hex 5F).
365+
</P>
366+
<P>
367+
Thus, apart from \?, these escapes generate the same character code values as
368+
they do in an ASCII environment, though the meanings of the values mostly
369+
differ. For example, \G always generates code value 7, which is BEL in ASCII
370+
but DEL in EBCDIC.
371+
</P>
372+
<P>
373+
The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
374+
because 127 is not a control character in EBCDIC, Perl makes it generate the
375+
APC character. Unfortunately, there are several variants of EBCDIC. In most of
376+
them the APC character has the value 255 (hex FF), but in the one Perl calls
377+
POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
378+
values, PCRE makes \? generate 95; otherwise it generates 255.
364379
</P>
365380
<P>
366381
After \0 up to two further octal digits are read. If there are fewer than two
367-
digits, just those that are present are used. Thus the sequence \0\x\07
368-
specifies two binary zeros followed by a BEL character (code value 7). Make
382+
digits, just those that are present are used. Thus the sequence \0\x\015
383+
specifies two binary zeros followed by a CR character (code value 13). Make
369384
sure you supply two digits after the initial zero if the pattern character that
370385
follows is itself an octal digit.
371386
</P>
@@ -3249,9 +3264,9 @@ <h1>pcrepattern man page</h1>
32493264
</P>
32503265
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
32513266
<P>
3252-
Last updated: 08 January 2014
3267+
Last updated: 14 June 2015
32533268
<br>
3254-
Copyright &copy; 1997-2014 University of Cambridge.
3269+
Copyright &copy; 1997-2015 University of Cambridge.
32553270
<br>
32563271
<p>
32573272
Return to the <a href="index.html">PCRE index page</a>.

0 commit comments

Comments
 (0)