Reword discussion of /d regexp modifier.

The phrasing as it stood confused UTF8-flagged strings with “UTF-8 encoded”. The latter term should refer to strings that the Perl application has actually encode()d, which probably *won’t* be UTF8-flagged and thus won’t, per /d modifier rules, get the Unicode treatment. This also removes an incorrect statement about only ASCII characters matching in the absence of (the UTF8 flag). This is trivially false given that "\xff" =~ /\xff/ is truthy. This also reorders and rewords some parts in an attempt to clarify that new code should avoid this flag, including use of the 'unicode_strings' feature to avoid implicit use.
Perl · Aug 27, 2021 · 5fbaad5 · 5fbaad5
1 parent 3bbdeca
commit 5fbaad5
Showing 1 changed file with 31 additions and 20 deletions.
diff --git a/pod/perlre.pod b/pod/perlre.pod
@@ -678,18 +678,27 @@ X</u>
 
 =head4 /d
 
+B<IMPORTANT:> Because of the unpredictable behaviors this
+modifier causes, only use it to maintain weird backward compatibilities.
+Use the
+L<< C<unicode_strings>|feature/"The 'unicode_strings' feature" >>
+feature
+in new code to avoid inadvertently enabling this modifier by default.
+
 This modifier means to use the "Default" native rules of the platform
 except when there is cause to use Unicode rules instead, as follows:
 
 =over 4
 
 =item 1
 
-the target string is encoded in UTF-8; or
+the target string's L<UTF8 flag|perlunifaq/What is "the UTF8 flag"?>
+(see below) is set; or
 
 =item 2
 
-the pattern is encoded in UTF-8; or
+the pattern's L<UTF8 flag|perlunifaq/What is "the UTF8 flag"?>
+(see below) is set; or
 
 =item 3
 
@@ -718,30 +727,32 @@ the pattern uses L<C<(*script_run: ...)>|/Script Runs>
 
 =back
 
-Another mnemonic for this modifier is "Depends", as the rules actually
-used depend on various things, and as a result you can get unexpected
-results.  See L<perlunicode/The "Unicode Bug">.  The Unicode Bug has
-become rather infamous, leading to yet other (without swearing) names
-for this modifier, "Dicey" and "Dodgy".
-
-Unless the pattern or string are encoded in UTF-8, only ASCII characters
-can match positively.
+Regarding the "UTF8 flag" references above: Another mnemonic for this
+modifier is "Depends". This is because that UTF8 flag isn't something
+Perl applications should think about; it's part of Perl's internals,
+so it can change whenever Perl wants. C</d> may thus cause unpredictable
+results. See L<perlunicode/The "Unicode Bug">. This bug
+has become rather infamous, leading to yet other (without swearing) names
+for this modifier like "Dicey" and "Dodgy".
 
 Here are some examples of how that works on an ASCII platform:
 
- $str =  "\xDF";      # $str is not in UTF-8 format.
- $str =~ /^\w/;       # No match, as $str isn't in UTF-8 format.
- $str .= "\x{0e0b}";  # Now $str is in UTF-8 format.
- $str =~ /^\w/;       # Match! $str is now in UTF-8 format.
+ $str =  "\xDF";        #
+ utf8::downgrade($str); # $str is not UTF8-flagged.
+ $str =~ /^\w/;         # No match, since no UTF8 flag.
+
+ $str .= "\x{0e0b}";    # Now $str is UTF8-flagged.
+ $str =~ /^\w/;         # Match! $str is now UTF8-flagged.
  chop $str;
- $str =~ /^\w/;       # Still a match! $str remains in UTF-8 format.
+ $str =~ /^\w/;         # Still a match! $str retains its UTF8 flag.
 
-This modifier is automatically selected by default when none of the
-others are, so yet another name for it is "Default".
+Under Perl's default configuration this modifier is automatically
+selected by default when none of the others are, so yet another name
+for it (unfortunately) is "Default".
 
-Because of the unexpected behaviors associated with this modifier, you
-probably should only explicitly use it to maintain weird backward
-compatibilities.
+Whenever you can, use the
+L<< C<unicode_strings>|feature/"The 'unicode_strings' feature" >>
+to cause X</u> to be the default instead.
 
 =head4 /a (and /aa)