Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions doc/html/pcre2limits.html
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,11 @@ <h2>
a compile context.
</p>
<p>
The maximum length of name for a named capture group is 32 code units, and the
maximum number of such groups is 10000.
The maximum length of the name for a named capture group as well as the number
of such groups is configurable at build time. The maximum length for the name
defaults to
128 code units, and the maximum number of such groups to
10000.
</p>
<p>
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
Expand Down Expand Up @@ -96,7 +99,7 @@ <h2>
REVISION
</h2>
<p>
Last updated: 16 August 2023
Last updated: 17 August 2025
<br>
Copyright &copy; 1997-2023 University of Cambridge.
<br>
Expand Down
6 changes: 3 additions & 3 deletions doc/html/pcre2pattern.html
Original file line number Diff line number Diff line change
Expand Up @@ -2007,8 +2007,8 @@ <h2><a name="SEC18" href="#TOC1">NAMED CAPTURE GROUPS</a></h2>
</p>
<p>
In PCRE2, a capture group can be named in one of three ways: (?&#60;name&#62;...) or
(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. Names may be up to 128
code units long. When PCRE2_UTF is not set, they may contain only ASCII
(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. Names may be up to
128 code units long. When PCRE2_UTF is not set, they may contain only ASCII
alphanumeric characters and underscores, but must start with a non-digit. When
PCRE2_UTF is set, the syntax of group names is extended to allow any Unicode
letter or Unicode decimal digit. In other words, group names must match one of
Expand Down Expand Up @@ -4183,7 +4183,7 @@ <h2><a name="SEC33" href="#TOC1">AUTHOR</a></h2>
</p>
<h2><a name="SEC34" href="#TOC1">REVISION</a></h2>
<p>
Last updated: 28 March 2025
Last updated: 17 August 2025
<br>
Copyright &copy; 1997-2024 University of Cambridge.
<br>
Expand Down
10 changes: 6 additions & 4 deletions doc/pcre2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6238,8 +6238,10 @@ SIZE AND OTHER LIMITATIONS
is set to 250. An application can change this limit by calling
pcre2_set_parens_nest_limit() to set the limit in a compile context.

The maximum length of name for a named capture group is 32 code units,
and the maximum number of such groups is 10000.
The maximum length of the name for a named capture group as well as the
number of such groups is configurable at build time. The maximum length
for the name defaults to 128 code units, and the maximum number of such
groups to 10000.

The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or
(*THEN) verb is 255 code units for the 8-bit library and 65535 code
Expand All @@ -6262,7 +6264,7 @@ AUTHOR

REVISION

Last updated: 16 August 2023
Last updated: 17 August 2025
Copyright (c) 1997-2023 University of Cambridge.


Expand Down Expand Up @@ -10747,7 +10749,7 @@ AUTHOR

REVISION

Last updated: 28 March 2025
Last updated: 17 August 2025
Copyright (c) 1997-2024 University of Cambridge.


Expand Down
11 changes: 8 additions & 3 deletions doc/pcre2limits.3
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,13 @@ when PCRE2 is built; if not, the default is set to 250. An application can
change this limit by calling pcre2_set_parens_nest_limit() to set the limit in
a compile context.
.P
The maximum length of name for a named capture group is 32 code units, and the
maximum number of such groups is 10000.
The maximum length of the name for a named capture group as well as the number
of such groups is configurable at build time. The maximum length for the name
defaults to
.\" DEFINE MAX_NAME_SIZE
128 code units, and the maximum number of such groups to
.\" DEFINE MAX_NAME_COUNT
10000.
.P
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
Expand Down Expand Up @@ -76,6 +81,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 16 August 2023
Last updated: 17 August 2025
Copyright (c) 1997-2023 University of Cambridge.
.fi
7 changes: 4 additions & 3 deletions doc/pcre2pattern.3
Original file line number Diff line number Diff line change
Expand Up @@ -2015,8 +2015,9 @@ the naming of capture groups. This feature was not added to Perl until release
using the Python syntax. PCRE2 supports both the Perl and the Python syntax.
.P
In PCRE2, a capture group can be named in one of three ways: (?<name>...) or
(?'name'...) as in Perl, or (?P<name>...) as in Python. Names may be up to 128
code units long. When PCRE2_UTF is not set, they may contain only ASCII
(?'name'...) as in Perl, or (?P<name>...) as in Python. Names may be up to
.\" DEFINE MAX_NAME_SIZE
128 code units long. When PCRE2_UTF is not set, they may contain only ASCII
alphanumeric characters and underscores, but must start with a non-digit. When
PCRE2_UTF is set, the syntax of group names is extended to allow any Unicode
letter or Unicode decimal digit. In other words, group names must match one of
Expand Down Expand Up @@ -4229,6 +4230,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 28 March 2025
Last updated: 17 August 2025
Copyright (c) 1997-2024 University of Cambridge.
.fi
1 change: 1 addition & 0 deletions maint/CheckMan
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ while (scalar(@ARGV) > 0)
^\.P\s*$|
^\.PP\s*$|
^\.\\"(?:\ HREF)?\s*$|
^\.\\"\sDEFINE\s\w+$|
^\.\\"\sHTML\s<a\shref="[^"]+?">\s*$|
^\.\\"\sHTML\s<a\sname="[^"]+?"><\/a>\s*$|
^\.\\"\s<\/a>\s*$|
Expand Down
66 changes: 66 additions & 0 deletions maint/LintMan
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/usr/bin/perl

use warnings;
use strict;
use Getopt::Long;
use vars qw /$opt_verbose/;

# A script to scan PCRE2's man pages to check for values that might need to
# be updated to match the code.
#
# It updates numerical values after \" DEFINE <name> or errors if name is
# not found.

my $file;
my %defs;

foreach $file ("../src/config.h.generic")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made one change here. If the user configures their PCRE2 differently, they should still (probably) get the same documentation is everyone else?

I've made the docs match the generic config. Users don't receive the maintainer scripts in maint/ as part of a tarball release, so they won't be customising the values in the docs for their installation anyway.

Finally, I've checked that it's running nicely in CI - and it seems to work well. Great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but isn't config.h.generic created by the Makefile?

in that scenario, if a new setting is added to the documentation then this will break CI, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if you want to reference a new constant using your .\" DEFINE syntax then it has to be defined in config.h. That's not really changed from what you did. config.h.generic is just an exact copy of config.h when you run ./configure with no arguments, so the two config files always have the same constants in.

Copy link
Contributor Author

@carenas carenas Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I only change config.h.in, and configure creates config.h, and after make runs then config.h.generic is created from it.

but the CI checks for that last file, which hasn't been updated yet because configure and make didn't run yet, right?, and when LintMan runs finds a DEFINE that doesn't exist in that file and dies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI ensures that config.h.generic is kept up to date.

{
open (INCLUDE, $file) or die "Failed to open include $file\n";

while (<INCLUDE>)
{
next unless /^#define ([[:upper:]_\d]+)\s+(\d+)/a;
$defs{$1} = $2;
}

close(INCLUDE);
}

GetOptions("verbose");
while (scalar(@ARGV) > 0)
{
$file = shift @ARGV;

open my $fh, "+<", $file or die "Failed to open $file\n";

my @lines = <$fh>;
my $updated = 0;

foreach my $index (0 .. $#lines)
{
if ($lines[$index] =~ /^\.\\"\sDEFINE\s([[:upper:]_\d]+)$/a)
{
my $l = $index + 1;
die "Invalid DEFINE line $l of $file\n" unless defined $lines[$l];

my $key = $1;
die "Bad DEFINE key $key line $l of $file\n" unless exists $defs{$key};

my $value = $defs{$key};
if ($lines[$index + 1] !~ /^$value\b/)
{
$updated += $lines[$index + 1] =~ s/^\d+/$value/a;
print "Updated $key in $file to $value\n" if $opt_verbose;
}
}
}

if ($updated > 0)
{
seek($fh, 0, 0);
print $fh @lines;
truncate($fh, tell($fh));
}
close($fh);
}
4 changes: 4 additions & 0 deletions maint/README
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ GenerateUcpTables.py
GenerateCommon.py and Unicode data files. The generated file contains tables
for looking up Unicode property names.

LintMan
A Perl script to check and update magic numbers in the documentation that
correspond to configurable settings in the codebase.

manifest-*
Data files used to verify the contents of the distribution tarball and
`make install` file lists.
Expand Down
7 changes: 7 additions & 0 deletions maint/UpdateAlways
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@

# Detrail A Perl script that removes trailing spaces from files.

# LintMan A Perl script that lints man pages looking for inconsistencies.

# doc/index.html.src
# A file that is copied as index.html into the doc/html directory
# when the HTML documentation is built. It works like this so that
Expand Down Expand Up @@ -54,6 +56,11 @@ echo Processing documentation
perl ../maint/CheckMan *.1 *.3
if [ $? != 0 ] ; then exit 1; fi

if [ -f ../src/config.h.generic ] ; then
perl ../maint/LintMan -v *.3
if [ $? != 0 ] ; then exit 1; fi
fi

# Verify the version number in the man pages

for file in *.1 *.3 ; do
Expand Down
2 changes: 1 addition & 1 deletion vms/configure.com
Original file line number Diff line number Diff line change
Expand Up @@ -905,7 +905,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PCRE2_EXPORT
#define LINK_SIZE 2
#define MAX_NAME_COUNT 10000
#define MAX_NAME_SIZE 32
#define MAX_NAME_SIZE 128
#define MATCH_LIMIT 10000000
#define HEAP_LIMIT 20000000
#define NEWLINE_DEFAULT 2
Expand Down
Loading