Skip to content

Commit

Permalink
File tidies for 10.43-RC1 release
Browse files Browse the repository at this point in the history
  • Loading branch information
PhilipHazel committed Dec 28, 2023
1 parent 2bba84b commit aadef0c
Show file tree
Hide file tree
Showing 18 changed files with 450 additions and 379 deletions.
6 changes: 3 additions & 3 deletions AUTHORS
Expand Up @@ -8,7 +8,7 @@ Email domain: gmail.com
Retired from University of Cambridge Computing Service,
Cambridge, England.

Copyright (c) 1997-2022 University of Cambridge
Copyright (c) 1997-2023 University of Cambridge
All rights reserved


Expand All @@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu

Copyright(c) 2010-2022 Zoltan Herczeg
Copyright(c) 2010-2023 Zoltan Herczeg
All rights reserved.


Expand All @@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu

Copyright(c) 2009-2022 Zoltan Herczeg
Copyright(c) 2009-2023 Zoltan Herczeg
All rights reserved.

####
4 changes: 2 additions & 2 deletions ChangeLog
Expand Up @@ -5,8 +5,8 @@ Before the move to GitHub, this was the only record of changes to PCRE2. Now
there is often more detail in the pull requests.


Version 10.43 xx-xxx-202x
-------------------------
Version 10.43 27-December-2023
------------------------------

1. The test program added by change 2 of 10.42 didn't work when the default
newline setting didn't include \n as a newline. One test needed (*LF) to ensure
Expand Down
6 changes: 3 additions & 3 deletions LICENCE
Expand Up @@ -26,7 +26,7 @@ Email domain: gmail.com
Retired from University of Cambridge Computing Service,
Cambridge, England.

Copyright (c) 1997-2022 University of Cambridge
Copyright (c) 1997-2023 University of Cambridge
All rights reserved.


Expand All @@ -37,7 +37,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Email domain: freemail.hu

Copyright(c) 2010-2022 Zoltan Herczeg
Copyright(c) 2010-2023 Zoltan Herczeg
All rights reserved.


Expand All @@ -48,7 +48,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Email domain: freemail.hu

Copyright(c) 2009-2022 Zoltan Herczeg
Copyright(c) 2009-2023 Zoltan Herczeg
All rights reserved.


Expand Down
46 changes: 46 additions & 0 deletions NEWS
Expand Up @@ -2,6 +2,52 @@ News about PCRE2 releases
-------------------------


Version 10.43 27-December-2023
------------------------------

There are quite a lot of changes in this release (see ChangeLog and git log for
a list). Those that are not bugfixes or code tidies are:

* A new function pcre2_get_match_data_heapframes_size() for finer heap control.

* New option flags to restrict the interaction between ASCII and non-ASCII
characters for caseless matching and \d and friends. There are also new
pattern constructs to control these flags from within a pattern.

* Upgrade to Unicode 15.0.0.

* Treat a NULL pattern with zero length as an empty string.

* Added support for limited-length variable-length lookbehind assertions, with
a default maximum length of 255 characters (same as Perl) but with a function
to adjust the limit.

* Support for LoongArch to JIT.

This comment has been minimized.

Copy link
@carenas

carenas Jan 2, 2024

Contributor

theree was also a JIT upgrade that basically removes support for ARMv5 CPUs and might also affect really old x86 CPUs that might have SSE2 support only when using SIMD.


* Perl changed the meaning of (for examle) {,3} which did not used to be
recognized as a quantifier. Now it means {0,3} and PCRE2 has also changed.
Note that {,} is still not a quantifier.

* Following Perl, allow spaces and tabs after { and before } in all Perl-
compatible items that use braces, and also around commas in quantifiers. The
one exception in PCRE2 is \u{...}, which is from ECMAScript, not Perl, and
PCRE2 follows ECMAScript usage.

* Changed the meaning of \w and its synonyms and derivatives (\b and \B) in UCP
mode to follow Perl. It now matches characters whose general categories are L
or N or whose particular categories are Mn (non-spacing mark) or Pc
(combining puntuation).

* Changed the default meaning of [:xdigit:] in UCP mode to follow Perl. It now
matches the "fullwidth" versions of hex digits. PCRE2_EXTRA_ASCII_DIGIT can
be used to keep it ASCII only.

* Make PCRE2_UCP the default in UTF mode in pcre2grep and add -no_ucp,
--case-restrict and --posix-digit.

* Add --group-separator and --no-group-separator to pcre2grep.


Version 10.42 11-December-2022
------------------------------

Expand Down
12 changes: 6 additions & 6 deletions configure.ac
Expand Up @@ -10,14 +10,14 @@ dnl be defined as -RC2, for example. For real releases, it should be empty.

m4_define(pcre2_major, [10])
m4_define(pcre2_minor, [43])
m4_define(pcre2_prerelease, [-DEV])
m4_define(pcre2_date, [2023-04-14])
m4_define(pcre2_prerelease, [-RC1])
m4_define(pcre2_date, [2023-12-27])

# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre2_8_version, [11:2:11])
m4_define(libpcre2_16_version, [11:2:11])
m4_define(libpcre2_32_version, [11:2:11])
m4_define(libpcre2_posix_version, [3:4:0])
m4_define(libpcre2_8_version, [12:0:12])
m4_define(libpcre2_16_version, [12:0:12])
m4_define(libpcre2_32_version, [12:0:12])
m4_define(libpcre2_posix_version, [3:5:0])

# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
Expand Down
49 changes: 29 additions & 20 deletions doc/html/pcre2grep.html
Expand Up @@ -71,15 +71,16 @@ <h1>pcre2grep man page</h1>
<pre>
pcre2grep some-pattern file1 - file3
</pre>
By default, input files are searched line by line. Each line that matches a
pattern is copied to the standard output, and if there is more than one file,
the file name is output at the start of each line, followed by a colon.
However, there are options that can change how <b>pcre2grep</b> behaves. For
example, the <b>-M</b> option makes it possible to search for strings that span
line boundaries. What defines a line boundary is controlled by the <b>-N</b>
(<b>--newline</b>) option. The <b>-h</b> and <b>-H</b> options control whether or
not file names are shown, and the <b>-Z</b> option changes the file name
terminator to a zero byte.
By default, input files are searched line by line, so pattern assertions about
the beginning and end of a subject string (^, $, \A, \Z, and \z) match at
the beginning and end of each line. When a line matches a pattern, it is copied
to the standard output, and if there is more than one file, the file name is
output at the start of each line, followed by a colon. However, there are
options that can change how <b>pcre2grep</b> behaves. For example, the <b>-M</b>
option makes it possible to search for strings that span line boundaries. What
defines a line boundary is controlled by the <b>-N</b> (<b>--newline</b>) option.
The <b>-h</b> and <b>-H</b> options control whether or not file names are shown,
and the <b>-Z</b> option changes the file name terminator to a zero byte.
</P>
<P>
The amount of memory used for buffering files that are being scanned is
Expand Down Expand Up @@ -563,16 +564,24 @@ <h1>pcre2grep man page</h1>
<P>
<b>-M</b>, <b>--multiline</b>
Allow patterns to match more than one line. When this option is set, the PCRE2
library is called in "multiline" mode. This allows a matched string to extend
past the end of a line and continue on one or more subsequent lines. Patterns
used with <b>-M</b> may usefully contain literal newline characters and internal
occurrences of ^ and $ characters. The output for a successful match may
consist of more than one line. The first line is the line in which the match
started, and the last line is the line in which the match ended. If the matched
string ends with a newline sequence, the output ends at the end of that line.
If <b>-v</b> is set, none of the lines in a multi-line match are output. Once a
match has been handled, scanning restarts at the beginning of the line after
the one in which the match ended.
library is called in "multiline" mode, and a match is allowed to continue past
the end of the initial line and onto one or more subsequent lines.
<br>
<br>
Patterns used with <b>-M</b> may usefully contain literal newline characters and
internal occurrences of ^ and $ characters, because in multiline mode these can
match at internal newlines. Because <b>pcre2grep</b> is scanning multiple lines,
the \Z and \z assertions match only at the end of the last line in the file.
The \A assertion matches at the start of the first line of a match. This can
be any line in the file; it is not anchored to the first line.
<br>
<br>
The output for a successful match may consist of more than one line. The first
line is the line in which the match started, and the last line is the line in
which the match ended. If the matched string ends with a newline sequence, the
output ends at the end of that line. If <b>-v</b> is set, none of the lines in a
multi-line match are output. Once a match has been handled, scanning restarts
at the beginning of the line after the one in which the match ended.
<br>
<br>
The newline sequence that separates multiple lines must be matched as part of
Expand Down Expand Up @@ -1107,7 +1116,7 @@ <h1>pcre2grep man page</h1>
</P>
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
<P>
Last updated: 20 November 2023
Last updated: 22 December 2023
<br>
Copyright &copy; 1997-2023 University of Cambridge.
<br>
Expand Down
34 changes: 17 additions & 17 deletions doc/html/pcre2pattern.html
Expand Up @@ -328,10 +328,10 @@ <h1>pcre2pattern man page</h1>
Brace characters { and } are also used to enclose data for constructions such
as \g{2} or \k{name}. In almost all uses of braces, space and/or horizontal
tab characters that follow { or precede } are allowed and are ignored. In the
case of quantifiers, they may also appear before or after the comma. The
case of quantifiers, they may also appear before or after the comma. The
exception to this is \u{...} which is an ECMAScript compatibility feature
that is recognized only when the PCRE2_EXTRA_ALT_BSUX option is set. ECMAScript
does not ignore such white space; it causes the item to be interpreted as
that is recognized only when the PCRE2_EXTRA_ALT_BSUX option is set. ECMAScript
does not ignore such white space; it causes the item to be interpreted as
literal.
</P>
<P>
Expand Down Expand Up @@ -472,7 +472,7 @@ <h1>pcre2pattern man page</h1>
(carriage return) character.
</P>
<P>
An error occurs if \c is not followed by a character whose ASCII code point
An error occurs if \c is not followed by a character whose ASCII code point
is in the range 32 to 126. The precise effect of \cx is as follows: if x is a
lower case letter, it is converted to upper case. Then bit 6 of the character
(hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is
Expand Down Expand Up @@ -694,8 +694,8 @@ <h1>pcre2pattern man page</h1>
\s any character that matches \p{Z} or \h or \v
\w any character that matches \p{L}, \p{N}, \p{Mn}, or \p{Pc}
</pre>
The addition of \p{Mn} (non-spacing mark) and the replacement of an explicit
test for underscore with a test for \p{Pc} (connector punctuation) happened in
The addition of \p{Mn} (non-spacing mark) and the replacement of an explicit
test for underscore with a test for \p{Pc} (connector punctuation) happened in
PCRE2 release 10.43. This brings PCRE2 into line with Perl.
</P>
<P>
Expand Down Expand Up @@ -1074,7 +1074,7 @@ <h1>pcre2pattern man page</h1>
carriage return, and any other character that has the Z (separator) property.
Xsp is the same as Xps; in PCRE1 it used to exclude vertical tab, for Perl
compatibility, but Perl changed. Xwd matches the same characters as Xan, plus
those that match Mn (non-spacing mark) or Pc (connector punctuation, which
those that match Mn (non-spacing mark) or Pc (connector punctuation, which
includes underscore).
</P>
<P>
Expand Down Expand Up @@ -1586,7 +1586,7 @@ <h1>pcre2pattern man page</h1>
</P>
<P>
The other POSIX classes are unchanged by PCRE2_UCP, and match only characters
with code points less than 256.
with code points less than 256.
</P>
<P>
There are two options that can be used to restrict the POSIX classes to ASCII
Expand All @@ -1613,8 +1613,8 @@ <h1>pcre2pattern man page</h1>
<a href="#smallassertions">"Simple assertions"</a>
above), and in a Perl-style pattern the preceding or following character
normally shows which is wanted, without the need for the assertions that are
used above in order to give exactly the POSIX behaviour. Note also that the
PCRE2_UCP option changes the meaning of \w (and therefore \b) by default, so
used above in order to give exactly the POSIX behaviour. Note also that the
PCRE2_UCP option changes the meaning of \w (and therefore \b) by default, so
it also affects these POSIX sequences.
</P>
<br><a name="SEC12" href="#TOC1">VERTICAL BAR</a><br>
Expand Down Expand Up @@ -1682,8 +1682,8 @@ <h1>pcre2pattern man page</h1>
above, it sets (or unsets) all the ASCII options.
</P>
<P>
PCRE2_EXTRA_ASCII_DIGIT has no additional effect when PCRE2_EXTRA_ASCII_POSIX
is set, but including it in (?aP) means that (?-aP) suppresses all ASCII
PCRE2_EXTRA_ASCII_DIGIT has no additional effect when PCRE2_EXTRA_ASCII_POSIX
is set, but including it in (?aP) means that (?-aP) suppresses all ASCII
restrictions for POSIX classes.
</P>
<P>
Expand Down Expand Up @@ -1993,7 +1993,7 @@ <h1>pcre2pattern man page</h1>
X{,4} is interpreted as X{0,4}
</pre>
This is a change in behaviour that happened in Perl 5.34.0 and PCRE2 10.43. In
earlier versions such a sequence was not interpreted as a quantifier. Other
earlier versions such a sequence was not interpreted as a quantifier. Other
regular expression engines may behave either way.
</P>
<P>
Expand Down Expand Up @@ -2287,7 +2287,7 @@ <h1>pcre2pattern man page</h1>
The sequence \g{-1} is a reference to the capture group whose number is one
less than the number of the next group to be started, so in this example (where
the next group would be numbered 3) is it equivalent to \2, and \g{-2} would
be equivalent to \1. Note that if this construct is inside a capture group,
be equivalent to \1. Note that if this construct is inside a capture group,
that group is included in the count, so in this example \g{-2} also refers to
group 1:
<pre>
Expand Down Expand Up @@ -2323,8 +2323,8 @@ <h1>pcre2pattern man page</h1>
</P>
<P>
There are several different ways of writing backreferences to named capture
groups. The .NET syntax is \k{name}, the Python syntax is (?=name), and the
original Perl syntax is \k&#60;name&#62; or \k'name'. All of these are now supported
groups. The .NET syntax is \k{name}, the Python syntax is (?=name), and the
original Perl syntax is \k&#60;name&#62; or \k'name'. All of these are now supported
by both Perl and PCRE2. Perl 5.10's unified backreference syntax, in which \g
can be used for both numeric and named references, is also supported by PCRE2.
We could rewrite the above example in any of the following ways:
Expand Down Expand Up @@ -2778,7 +2778,7 @@ <h1>pcre2pattern man page</h1>
condition is true if a capture group of that number has previously matched. If
there is more than one capture group with the same number (see the earlier
<a href="#recursion">section about duplicate group numbers),</a>
the condition is true if any of them have matched. An alternative notation,
the condition is true if any of them have matched. An alternative notation,
which is a PCRE2 extension, not supported by Perl, is to precede the digits
with a plus or minus sign. In this case, the group number is relative rather
than absolute. The most recently opened capture group (which could be enclosing
Expand Down
4 changes: 2 additions & 2 deletions doc/html/pcre2syntax.html
Expand Up @@ -408,8 +408,8 @@ <h1>pcre2syntax man page</h1>
(?-...) unset the given option(s)
(?^) unset imnrsx options
</pre>
(?aP) implies (?aT) as well, though this has no additional effect. However, it
means that (?-aP) is really (?-PT) which disables all ASCII restrictions for
(?aP) implies (?aT) as well, though this has no additional effect. However, it
means that (?-aP) is really (?-PT) which disables all ASCII restrictions for
POSIX classes.
</P>
<P>
Expand Down

0 comments on commit aadef0c

Please sign in to comment.