Skip to content

Commit a14318c

Browse files
committed
Add NFG entry.
WIP! THESE COMMITS WILL BE SQUASHED
1 parent 95f08d0 commit a14318c

File tree

6 files changed

+34
-6
lines changed

6 files changed

+34
-6
lines changed

doc/Language/NFG.rakudoc

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
=begin pod :kind("Language") :subkind("Language") :category("reference")
2+
3+
=TITLE NFG
4+
5+
=SUBTITLE Normal Form Grapheme
6+
7+
=head1 Overview
8+
9+
All Strings in Raku are represented in Normal Form Grapheme, or C<NFG>.
10+
A grapheme is what a B<user> thinks a character is.
11+
12+
my $str = "D\c[COMBINING DOT BELOW]";
13+
say $str.chars; # OUTPUT: «1␤»
14+
say $str.uniname; # OUTPUT: «LATIN CAPITAL LETTER D WITH DOT BELOW␤»
15+
say $str.ord.base(16); # OUTPUT: «1E0C␤»
16+
say "D\c[COMBINING DOT BELOW]" eq "Ḍ"; # OUTPUT: «True␤»
17+
18+
In this example, we create a string by combining two different codepoints.
19+
Raku composes this into the combined codepoint, which you can then introspect.
20+
From Raku's standpoint, there's no difference between a string assembled from
21+
the combined codepoints, or a precomposed character.
22+
23+
=head1 What's a normalization form? - Go through the unicode ones.
24+
25+
=head1 Related classes like Uni
26+
27+
=end pod

doc/Language/glossary.rakudoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -695,7 +695,7 @@ and are subject to L<Multi-Dispatch|#Multi-dispatch>.
695695

696696
=head1 X<NFG|Reference,NFG>
697697

698-
Normal Form Grapheme is the way Raku implements graphemes, using a normal form
698+
L<Normal Form Grapheme|language/NFG> is the way Raku implements graphemes, using a normal form
699699
in which strings with the same graphemes can be easily compared in constant
700700
time. More on that on L<these articles|https://6guts.wordpress.com/2015/04/12/this-week-unicode-normalization-many-rts/> L<in 6guts|https://6guts.wordpress.com/2015/04/20/this-week-digging-into-nfg-fixing-use-fatal-and-more/>
701701
and a fun explanation of how NFG works in L<this IRC log|https://colabti.org/irclogger/irclogger_log/perl6?date=2018-04-29#l465>.

doc/Language/regexes.rakudoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -534,7 +534,7 @@ that specific character. For example:
534534
C<\C> matches a single character that is not the named Unicode character.
535535

536536
Note that the word "character" is used, here, in the sense that the UCD
537-
does, but because Raku uses L<NFG|/language/glossary#NFG>, combining
537+
does, but because Raku uses L<NFG|/language/NFG>, combining
538538
code points and the base characters to which they are attached,
539539
will generally not match individually. For example if you compose
540540
C<"ü"> as C<"u\x[0308]">, that works just fine, but matching may surprise

doc/Language/unicode.rakudoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ copy. More technical details on L<UTF8-C8|#UTF8-C8> on MoarVM are described belo
5757
X<UTF-8 Clean-8|Reference,UTF-8 Clean-8> is an encoder/decoder that primarily works as the UTF-8 one.
5858
However, upon encountering a byte sequence that will either not decode as valid
5959
UTF-8, or that would not round-trip due to normalization, it will use
60-
L<NFG synthetics|/language/glossary#NFG>
60+
L<NFG synthetics|/language/NFG>
6161
to keep track of the original bytes involved.
6262
This means that encoding back to UTF-8 Clean-8 will be able to recreate the
6363
bytes as they originally existed. The synthetics contain 4 codepoints:

doc/Type/Unicode.rakudoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,13 @@ version.
3131
method NFG(Unicode:)
3232

3333
Returns a L«C<Bool>|/type/Bool» indicating whether complete
34-
L«C<Normalization Form Grapheme>|/language/glossary#NFG» support is
34+
L«C<Normalization Form Grapheme>|/language/NFG» support is
3535
available.
3636

37-
# on MoarVM
37+
# on Rakudo running on MoarVM
3838
say Unicode.NFG; # OUTPUT: «True␤»
3939

40-
# on JVM
40+
# on Rakudo running on JVM
4141
say Unicode.NFG; # OUTPUT: «False␤»
4242

4343
=end pod

xt/pws/words.pws

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1068,6 +1068,7 @@ precompile
10681068
precompiled
10691069
precompiles
10701070
precompiling
1071+
precomposed
10711072
pred
10721073
predictiveiterator
10731074
preincrement

0 commit comments

Comments
 (0)