Skip to content

Commit

Permalink
Add NFG entry.
Browse files Browse the repository at this point in the history
WIP!  THESE COMMITS WILL BE SQUASHED
  • Loading branch information
coke committed Mar 8, 2023
1 parent 95f08d0 commit a14318c
Show file tree
Hide file tree
Showing 6 changed files with 34 additions and 6 deletions.
27 changes: 27 additions & 0 deletions doc/Language/NFG.rakudoc
@@ -0,0 +1,27 @@
=begin pod :kind("Language") :subkind("Language") :category("reference")

=TITLE NFG

=SUBTITLE Normal Form Grapheme

=head1 Overview

All Strings in Raku are represented in Normal Form Grapheme, or C<NFG>.
A grapheme is what a B<user> thinks a character is.

my $str = "D\c[COMBINING DOT BELOW]";
say $str.chars; # OUTPUT: «1␤»
say $str.uniname; # OUTPUT: «LATIN CAPITAL LETTER D WITH DOT BELOW␤»
say $str.ord.base(16); # OUTPUT: «1E0C␤»
say "D\c[COMBINING DOT BELOW]" eq "Ḍ"; # OUTPUT: «True␤»

In this example, we create a string by combining two different codepoints.
Raku composes this into the combined codepoint, which you can then introspect.
From Raku's standpoint, there's no difference between a string assembled from
the combined codepoints, or a precomposed character.

=head1 What's a normalization form? - Go through the unicode ones.

=head1 Related classes like Uni

=end pod
2 changes: 1 addition & 1 deletion doc/Language/glossary.rakudoc
Expand Up @@ -695,7 +695,7 @@ and are subject to L<Multi-Dispatch|#Multi-dispatch>.

=head1 X<NFG|Reference,NFG>

Normal Form Grapheme is the way Raku implements graphemes, using a normal form
L<Normal Form Grapheme|language/NFG> is the way Raku implements graphemes, using a normal form
in which strings with the same graphemes can be easily compared in constant
time. More on that on L<these articles|https://6guts.wordpress.com/2015/04/12/this-week-unicode-normalization-many-rts/> L<in 6guts|https://6guts.wordpress.com/2015/04/20/this-week-digging-into-nfg-fixing-use-fatal-and-more/>
and a fun explanation of how NFG works in L<this IRC log|https://colabti.org/irclogger/irclogger_log/perl6?date=2018-04-29#l465>.
Expand Down
2 changes: 1 addition & 1 deletion doc/Language/regexes.rakudoc
Expand Up @@ -534,7 +534,7 @@ that specific character. For example:
C<\C> matches a single character that is not the named Unicode character.

Note that the word "character" is used, here, in the sense that the UCD
does, but because Raku uses L<NFG|/language/glossary#NFG>, combining
does, but because Raku uses L<NFG|/language/NFG>, combining
code points and the base characters to which they are attached,
will generally not match individually. For example if you compose
C<"ü"> as C<"u\x[0308]">, that works just fine, but matching may surprise
Expand Down
2 changes: 1 addition & 1 deletion doc/Language/unicode.rakudoc
Expand Up @@ -57,7 +57,7 @@ copy. More technical details on L<UTF8-C8|#UTF8-C8> on MoarVM are described belo
X<UTF-8 Clean-8|Reference,UTF-8 Clean-8> is an encoder/decoder that primarily works as the UTF-8 one.
However, upon encountering a byte sequence that will either not decode as valid
UTF-8, or that would not round-trip due to normalization, it will use
L<NFG synthetics|/language/glossary#NFG>
L<NFG synthetics|/language/NFG>
to keep track of the original bytes involved.
This means that encoding back to UTF-8 Clean-8 will be able to recreate the
bytes as they originally existed. The synthetics contain 4 codepoints:
Expand Down
6 changes: 3 additions & 3 deletions doc/Type/Unicode.rakudoc
Expand Up @@ -31,13 +31,13 @@ version.
method NFG(Unicode:)

Returns a L«C<Bool>|/type/Bool» indicating whether complete
L«C<Normalization Form Grapheme>|/language/glossary#NFG» support is
L«C<Normalization Form Grapheme>|/language/NFG» support is
available.

# on MoarVM
# on Rakudo running on MoarVM
say Unicode.NFG; # OUTPUT: «True␤»

# on JVM
# on Rakudo running on JVM
say Unicode.NFG; # OUTPUT: «False␤»

=end pod
1 change: 1 addition & 0 deletions xt/pws/words.pws
Expand Up @@ -1068,6 +1068,7 @@ precompile
precompiled
precompiles
precompiling
precomposed
pred
predictiveiterator
preincrement
Expand Down

0 comments on commit a14318c

Please sign in to comment.