Add NFG entry.

WIP! THESE COMMITS WILL BE SQUASHED
Raku · Mar 8, 2023 · a14318c · a14318c
1 parent 95f08d0
commit a14318c
Show file tree

Hide file tree

Showing 6 changed files with 34 additions and 6 deletions.
diff --git a/doc/Language/NFG.rakudoc b/doc/Language/NFG.rakudoc
@@ -0,0 +1,27 @@
+=begin pod :kind("Language") :subkind("Language") :category("reference")
+
+=TITLE NFG
+
+=SUBTITLE Normal Form Grapheme
+
+=head1 Overview
+
+All Strings in Raku are represented in Normal Form Grapheme, or C<NFG>.
+A grapheme is what a B<user> thinks a character is.
+
+    my $str = "D\c[COMBINING DOT BELOW]";
+    say $str.chars;                        # OUTPUT: «1␤»
+    say $str.uniname;                      # OUTPUT: «LATIN CAPITAL LETTER D WITH DOT BELOW␤»
+    say $str.ord.base(16);                 # OUTPUT: «1E0C␤»
+    say "D\c[COMBINING DOT BELOW]" eq "Ḍ"; # OUTPUT: «True␤»
+
+In this example, we create a string by combining two different codepoints.
+Raku composes this into the combined codepoint, which you can then introspect.
+From Raku's standpoint, there's no difference between a string assembled from
+the combined codepoints, or a precomposed character.
+
+=head1 What's a normalization form? - Go through the unicode ones.
+
+=head1 Related classes like Uni
+
+=end pod
diff --git a/doc/Language/glossary.rakudoc b/doc/Language/glossary.rakudoc
@@ -695,7 +695,7 @@ and are subject to L<Multi-Dispatch|#Multi-dispatch>.
 
 =head1 X<NFG|Reference,NFG>
 
-Normal Form Grapheme is the way Raku implements graphemes, using a normal form
+L<Normal Form Grapheme|language/NFG> is the way Raku implements graphemes, using a normal form
 in which strings with the same graphemes can be easily compared in constant
 time. More on that on L<these articles|https://6guts.wordpress.com/2015/04/12/this-week-unicode-normalization-many-rts/> L<in 6guts|https://6guts.wordpress.com/2015/04/20/this-week-digging-into-nfg-fixing-use-fatal-and-more/>
 and a fun explanation of how NFG works in L<this IRC log|https://colabti.org/irclogger/irclogger_log/perl6?date=2018-04-29#l465>.

diff --git a/doc/Language/regexes.rakudoc b/doc/Language/regexes.rakudoc
@@ -534,7 +534,7 @@ that specific character. For example:
 C<\C> matches a single character that is not the named Unicode character.
 
 Note that the word "character" is used, here, in the sense that the UCD
-does, but because Raku uses L<NFG|/language/glossary#NFG>, combining
+does, but because Raku uses L<NFG|/language/NFG>, combining
 code points and the base characters to which they are attached,
 will generally not match individually. For example if you compose
 C<"ü"> as C<"u\x[0308]">, that works just fine, but matching may surprise

diff --git a/doc/Language/unicode.rakudoc b/doc/Language/unicode.rakudoc
@@ -57,7 +57,7 @@ copy. More technical details on L<UTF8-C8|#UTF8-C8> on MoarVM are described belo
 X<UTF-8 Clean-8|Reference,UTF-8 Clean-8> is an encoder/decoder that primarily works as the UTF-8 one.
 However, upon encountering a byte sequence that will either not decode as valid
 UTF-8, or that would not round-trip due to normalization, it will use
-L<NFG synthetics|/language/glossary#NFG>
+L<NFG synthetics|/language/NFG>
 to keep track of the original bytes involved.
 This means that encoding back to UTF-8 Clean-8 will be able to recreate the
 bytes as they originally existed. The synthetics contain 4 codepoints:

diff --git a/doc/Type/Unicode.rakudoc b/doc/Type/Unicode.rakudoc
@@ -31,13 +31,13 @@ version.
     method NFG(Unicode:)
 
 Returns a L«C<Bool>|/type/Bool» indicating whether complete
-L«C<Normalization Form Grapheme>|/language/glossary#NFG» support is
+L«C<Normalization Form Grapheme>|/language/NFG» support is
 available.
 
-    # on MoarVM
+    # on Rakudo running on MoarVM
     say Unicode.NFG; # OUTPUT: «True␤»
 
-    # on JVM
+    # on Rakudo running on JVM
     say Unicode.NFG; # OUTPUT: «False␤»
 
 =end pod
diff --git a/xt/pws/words.pws b/xt/pws/words.pws
@@ -1068,6 +1068,7 @@ precompile
 precompiled
 precompiles
 precompiling
+precomposed
 pred
 predictiveiterator
 preincrement