From ce6e6e69d2cd678d454fa73a006032980e83a5ac Mon Sep 17 00:00:00 2001
From: Karl Williamson
Date: Tue, 6 May 2025 07:05:46 -0600
Subject: [PATCH 1/3] perlintro: Define metacharacter before using the term
This adds a bit of text about metacharacters that was missing from this
introductory pod.
---
pod/perlintro.pod | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)
diff --git a/pod/perlintro.pod b/pod/perlintro.pod
index 4fdee7b16796..da428397ae7b 100644
--- a/pod/perlintro.pod
+++ b/pod/perlintro.pod
@@ -584,10 +584,32 @@ the meantime, here's a quick cheat sheet:
^ start of string
$ end of string
-Quantifiers can be used to specify how many of the previous thing you
-want to match on, where "thing" means either a literal character, one
-of the metacharacters listed above, or a group of characters or
-metacharacters in parentheses.
+Note that in the above, C<$> doesn't match a dollar sign. Similarly
+C<.>, C<\>, C<[>, C<]>, C<(>, C<)>, and C<^> don't match the characters
+you might expect. These are called "metacharacters". In contrast, the
+characters C, C, C, C, and C, for example, are not
+metacharacters. They match themselves literally. Metacharacters
+normally match something that isn't their literal value. There are a few
+more metacharacters than the ones above. Some quantifier ones are
+given below, and the full list is in L.
+
+To make a metacharacter match its literal value, you "escape" (or "quote")
+it, by preceding it with a backslash. Hence, C<\$> does match a dollar sign,
+and C<\\> matches a literal backslash.
+
+Note also that above, the string C<\s>, for example, doesn't match a
+backslash followed by the letter C. In this case, preceding the
+non-metacharacter C with a backslash turns it into something that
+doesn't match its literal value. Such a sequence is called an "escape
+sequence". L documents all of the current ones.
+
+A warning is raised if you escape a character that isn't a metacharacter
+and isn't part of a currently defined escape sequence.
+
+You can specify how many of the previous thing you want to match on by
+using quantifiers (where "thing" means one of: a literal character, one
+of the constructs listed above, or a group of either of them in
+parentheses).
* zero or more of the previous thing
+ one or more of the previous thing
From 7a6669488174ee5438a9db1bfe15bfadd52f19f6 Mon Sep 17 00:00:00 2001
From: Karl Williamson
Date: Tue, 6 May 2025 20:57:43 -0600
Subject: [PATCH 2/3] pod and comments: Note escape vs quote
Fixes #15221
The documentation and comments were misleading about conflating quoting
a metacharacter and escaping it. Since \Q stands for quote, we have to
continue to use that terminology. This commit clarifies that the two
terms are often equivalent.
This also adds detail about quotemeta and \Q.
---
pod/perldiag.pod | 8 +++---
pod/perlfunc.pod | 5 ++++
pod/perlre.pod | 61 ++++++++++++++++++++++++++++++-----------
pod/perlrebackslash.pod | 14 +++++-----
pod/perlreref.pod | 2 +-
pod/perlretut.pod | 2 +-
pp.c | 4 +--
7 files changed, 65 insertions(+), 31 deletions(-)
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 5cf5fc7b3fde..6c9948f861e9 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -2602,8 +2602,8 @@ and perl's F emulation was unable to create an empty temporary file.
(W regexp)(F) A character class range must start and end at a literal
character, not another character class like C<\d> or C<[:alpha:]>. The "-"
in your false range is interpreted as a literal "-". In a C<(?[...])>
-construct, this is an error, rather than a warning. Consider quoting
-the "-", "\-". The S<<-- HERE> shows whereabouts in the regular expression
+construct, this is an error, rather than a warning. Consider escaping
+the "-" as "\-". The S<<-- HERE> shows whereabouts in the regular expression
the problem was discovered. See L.
=item Fatal VMS error (status=%d) at %s, line %d
@@ -5453,7 +5453,7 @@ S<<-- HERE> in m/%s/
(F) Within regular expression character classes ([]) the syntax beginning
with "[." and ending with ".]" is reserved for future extensions. If you
need to represent those character sequences inside a regular expression
-character class, just quote the square brackets with the backslash: "\[."
+character class, just escape the square brackets with the backslash: "\[."
and ".\]". The S<<-- HERE> shows whereabouts in the regular expression the
problem was discovered. See L.
@@ -5463,7 +5463,7 @@ S<<-- HERE> in m/%s/
(F) Within regular expression character classes ([]) the syntax beginning
with "[=" and ending with "=]" is reserved for future extensions. If you
need to represent those character sequences inside a regular expression
-character class, just quote the square brackets with the backslash: "\[="
+character class, just escape the square brackets with the backslash: "\[="
and "=\]". The S<<-- HERE> shows whereabouts in the regular expression the
problem was discovered. See L.
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index 99f40e54c6d2..388b4b9ee1a4 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -6536,6 +6536,11 @@ the C<\Q> escape in double-quoted strings.
If EXPR is omitted, uses L|perlvar/$_>.
+The motivation behind this is to make all characters in EXPR match their
+literal selves. Otherwise any metacharacters in it could trigger
+their "magic" matching behaviors. The characters this function has been
+applied to are said to be "quoted" or "escaped".
+
quotemeta (and C<\Q> ... C<\E>) are useful when interpolating strings into
regular expressions, because by default an interpolated variable will be
considered a mini-regular expression. For example:
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 3d046ac64f26..b834f9e1423d 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -1350,24 +1350,42 @@ X
X
=head2 Quoting metacharacters
-Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
-C<\w>, C<\n>. Unlike some other regular expression languages, there
-are no backslashed symbols that aren't alphanumeric. So anything
-that looks like C<\\>, C<\(>, C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> is
-always
-interpreted as a literal character, not a metacharacter. This was
-once used in a common idiom to disable or quote the special meanings
-of regular expression metacharacters in a string that you want to
-use for a pattern. Simply quote all non-"word" characters:
+(Also known as "escaping".)
- $pattern =~ s/(\W)/\\$1/g;
+To cause a metacharacter to match its literal self, you precede it with
+a backslash. Unlike some other regular expression languages, any
+sequence consisting of a backslash followed by a non-alphanumeric
+matches that non-alphanumeric, literally. So things like C<\\>, C<\(>,
+C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> are always interpreted as the
+literal character that follows the backslash.
+
+(That's not true when an alphanumeric character is preceded by a
+backslash. There are a few such "escape sequences", like C<\w>, which have
+special matching behaviors in Perl. All such are currently limited to
+ASCII-range alphanumerics.)
+
+The best method to escape metacharacters is to use the
+C> function, or the equivalent, but the
+more flexible, and often more convenient, C<\Q> metaquoting escape
+sequence
+
+ quotemeta $pattern;
+
+This changes C<$pattern> so that the metacharacters are quoted. You can
+then do
+
+ $string =~ s/$pattern/foo/;
-(If C
X
-=head2 Quoting metacharacters
-
-(Also known as "escaping".)
+=head2 Quoting (escaping) metacharacters
To cause a metacharacter to match its literal self, you precede it with
a backslash. Unlike some other regular expression languages, any
@@ -3413,6 +3411,10 @@ Subroutine call to a named capture group. Equivalent to C<< (?&I) >>.
=back
+=head2 Quoting metacharacters
+
+This section has been replaced by L.
+
=head1 BUGS
There are a number of issues with regard to case-insensitive matching