Add reference to Whitespace (#4334)

tbrowder · web-flow · commit 84d30c3c5c1e · 2023-07-30T14:36:36.000-04:00
+ Gammmmar tweaks
+ Add more examples
+ Align examples of Unicode naming corrections
  for clarity
  - Use comments to distinguish correct versus
    incorrect naming versions
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -77,7 +77,7 @@ channel and/or the [issues for this repository](https://github.com/Raku/doc/issu
 before you proceed further. After you get consensus on a title, subtitle,
 section, and filename, you can add the document by following these steps:
 
-+ create a **filename.pod6** file in the **doc/Language** directory and
++ create a **filename.rakudoc** file in the **doc/Language** directory and
   ensure it adheres to the conventions in
   [CREATING-NEW-DOCS.md](writing-docs/CREATING-NEW-DOCS.md).
 
@@ -97,9 +97,9 @@ with the helper tool `util/new-type.raku`. Say you want to create `MyFunnyRole`:
 
     $ raku util/new-type.raku --kind=role MyFunnyRole
 
-Fill the documentation file `doc/Type/MyFunnyRole.pod6` like this:
+Fill the documentation file `doc/Type/MyFunnyRole.rakudoc` like this:
 
-```perl6
+```raku
 =TITLE role MyFunnyRole
 
 =SUBTITLE Sentence or half-sentence about what it does
@@ -131,7 +131,7 @@ comment `Z<>`.
 
 When providing a code example result or output, use this style:
 
-```perl6
+```raku
 # For the result of an expression.
 1 + 2;     # RESULT: «3»
 # For the output.
@@ -182,7 +182,7 @@ to display heading numbers.
 
 Report issues with the content on [github](https://github.com/Raku/doc/issues).
 This includes missing or incorrect documentation, as well as information about
-versioning (e.g. "method foo" only available in raku v6.d).
+versioning (e.g., "method foo" only available in raku v6.d).
 
 For issues with the website functionality (as opposed to the content), for
 examples issues with search,
diff --git a/doc/Language/regexes.rakudoc b/doc/Language/regexes.rakudoc
@@ -738,8 +738,8 @@ write the backslashed forms for character classes between the C<[ ]>.
 
 You can include Unicode properties in the list as well:
 
-    /<:Zs + [\x9] - [\xA0]>/
-    # Any character with "Zs" property, or a tab, but not a "no-break space"
+    /<:Zs + [\x9] - [\xA0] - [\x202F] >/
+    # Any character with "Zs" property, or a tab, but not a "no-break space" or "narrow no-break space"
 
 To negate a character class, put a C<-> after the opening angle bracket:
 
diff --git a/doc/Language/unicode.rakudoc b/doc/Language/unicode.rakudoc
@@ -35,21 +35,21 @@ Raku will turn both these inputs into one codepoint, as is specified for
 Normalization Form C (B<X<NFC|Reference,NFC>>). In most cases this is useful and means
 that two inputs that are equivalent are both treated the same. Unicode has a concept
 of canonical equivalence which allows us to determine the canonical form of a string,
-allowing us to properly compare strings and manipulate them, without having to worry
+thus allowing us to properly compare strings and manipulate them without having to worry
 about the text losing these properties. By default, any text you process or output
 from Raku will be in this “canonical” form, even when making modifications or
 concatenations to the string (see below for how to avoid this). For more detailed information
 about Normalization Form C and canonical equivalence, see the Unicode Foundation's page on
 L<Normalization and Canonical Equivalence|https://unicode.org/reports/tr15/#Canon_Compat_Equivalence>.
 
-One case where we don't default to this, is for the names of files. This is because
+One case where we don't default to this is for the names of files. This is because
 the names of files must be accessed exactly as the bytes are written on the disk.
 
 To avoid normalization you can use a special encoding format called L<UTF8-C8|#UTF8-C8>.
 Using this encoding with any filehandle will allow you to read the exact bytes as they are
-on disk, without normalization. They may look funny when printed out, if you print it out using a
+on disk without normalization. They may look funny when printed out if you use a
 UTF8 handle. If you print it out to a handle where the output encoding is UTF8-C8,
-then it will render as you would normally expect, and be a byte for byte exact
+then it will render as you would normally expect as a byte-for-byte exact
 copy. More technical details on L<UTF8-C8|#UTF8-C8> on MoarVM are described below.
 
 =head2 X<UTF8-C8|Reference,UTF8-C8>
@@ -60,7 +60,7 @@ UTF-8, or that would not round-trip due to normalization, it will use
 L<NFG synthetics|/language/glossary#NFG>
 to keep track of the original bytes involved.
 This means that encoding back to UTF-8 Clean-8 will be able to recreate the
-bytes as they originally existed. The synthetics contain 4 codepoints:
+bytes as they originally existed. The synthetics contain four codepoints:
 
 =item The codepoint 0x10FFFD (which is a private use codepoint)
 =item The codepoint 'x'
@@ -71,13 +71,13 @@ Under normal UTF-8 encoding, this means the unrepresentable characters will
 come out as something like C<?xFF>.
 
 UTF-8 Clean-8 is used in places where MoarVM receives strings from the
-environment, command line arguments, and filesystem queries, for instance when decoding buffers:
+environment, command line arguments, and filesystem queries; for instance when decoding buffers:
 
     say Buf.new(ord('A'), 0xFE, ord('Z')).decode('utf8-c8');
     #  OUTPUT: «A􏿽xFEZ␤»
 
-You can see how the two initial codepoints used by UTF8-C8 show up here, right
-before the "FE". You can use this type of encoding to read files with unknown
+You can see how the two initial codepoints used by UTF8-C8 show up below right
+before the 'FE'. You can use this type of encoding to read files with unknown
 encoding:
 
     my $test-file = "/tmp/test";
@@ -126,13 +126,13 @@ the L<uniparse|/routine/uniparse>:
 
 See L<uniname|/routine/uniname> and L<uninames|/routine/uninames> for routines
 that work in the opposite direction with a single codepoint and multiple
-codepoints respectively.
+codepoints, respectively.
 
 =head2 Name aliases
 
 Name Aliases are used mainly for codepoints without an official
 name, for abbreviations, or for corrections (Unicode names never change).
-For full list of them see L<here|https://www.unicode.org/Public/UCD/latest/ucd/NameAliases.txt>.
+For a full list of them see L<here|https://www.unicode.org/Public/UCD/latest/ucd/NameAliases.txt>.
 
 Control codes without any official name:
 
@@ -141,16 +141,22 @@ Control codes without any official name:
 
 Corrections:
 
-    say "\c[LATIN CAPITAL LETTER GHA]"; # OUTPUT: «Ƣ␤»
+    #   Correct name as input:
+    say                     "\c[LATIN CAPITAL LETTER GHA]"; # OUTPUT: «Ƣ␤»
+    #   Original, erroneous name as output:
     say "Ƣ".uniname; # OUTPUT: «LATIN CAPITAL LETTER OI␤»
+
     # This one is a spelling mistake that was corrected in a Name Alias:
-    say "\c[PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET]".uniname;
+    #   Correct name as input:
+    say    "\c[PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET]".uniname;
+    #   Original, erroneous name as output:
     # OUTPUT: «PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET␤»
 
 Abbreviations:
 
-    say "\c[ZWJ]".uniname;  # OUTPUT: «ZERO WIDTH JOINER␤»
-    say "\c[NBSP]".uniname; # OUTPUT: «NO-BREAK SPACE␤»
+    say "\c[ZWJ]".uniname;   # OUTPUT: «ZERO WIDTH JOINER␤»
+    say "\c[NBSP]".uniname;  # OUTPUT: «NO-BREAK SPACE␤»
+    say "\c[NNBSP]".uniname; # OUTPUT: «NARROW NO-BREAK SPACE␤»
 
 =head2 Named sequences
 
@@ -166,7 +172,7 @@ Raku supports Emoji sequences.
 For all of them see:
 L<Emoji ZWJ Sequences|https://www.unicode.org/Public/emoji/4.0/emoji-zwj-sequences.txt>
 and L<Emoji Sequences|https://www.unicode.org/Public/emoji/4.0/emoji-sequences.txt>.
-Note that any names with commas should have their commas removed, since Raku uses
+Note that any names with commas should have their commas removed since Raku uses
 commas to separate different codepoints/sequences inside the same C<\c> sequence.
 
     say "\c[woman gesturing OK]";         # OUTPUT: «🙆‍♀️␤»
diff --git a/doc/Language/unicode_ascii.rakudoc b/doc/Language/unicode_ascii.rakudoc
@@ -31,7 +31,7 @@ Any codepoint that has the C<Nd> (Number, decimal digit) property, can
 be used as a digit in any number.  For example:
 
   my $var = １９; # U+FF11 U+FF19
-  say $var + 2;  # OUTPUT: «21␤»
+  say $var + 2;   # OUTPUT: «21␤»
 
 =head1 Numeric values
 
@@ -40,14 +40,20 @@ property can be used standalone as a numeric value, such as ½ and ⅓. (These
 aren't decimal digit characters, so can't be combined.) For example:
 
   my $var = ⅒ + 2 + Ⅻ; # here ⅒ is No and Rat and Ⅻ is Nl and Int
-  say $var;            # OUTPUT: «14.1␤»
+  say $var;              # OUTPUT: «14.1␤»
 
-=head1 Whitespace characters
+=head1 X<Language,Whitespace> Whitespace characters
 
-Besides spaces and tabs you can use any other unicode whitespace
+Besides spaces and tabs, you can use any other unicode whitespace
 character that has the C<Zs> (Separator, space), C<Zl> (Separator,
 line), or C<Zp> (Separator, paragraph) property.
 
+See Wikipedia's L<Whitespace|https://en.m.wikipedia.org/wiki/Whitespace_character> 
+section for detailed
+tables of the Unicode codepoints with (or associated with) 
+whitespace characteristics. This is an important section for Raku
+authors of digital typeography modules for print or web use.
+
 =head1 Other acceptable single codepoints
 
 This list contains the single codepoints [and their ASCII