Skip to content

Commit 7620cb1

Browse files
author
Karl Williamson
committed
Unicode 6.1
This commit delivers the official Unicode character database files for release 6.1, plus the final bits needed to cope with the changes in them from release 6.0, including documentation.
1 parent 1f3b488 commit 7620cb1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+9089
-2861
lines changed

l1_char_class_tab.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,7 @@
172172
/* U+A4 CURRENCY SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1,
173173
/* U+A5 YEN SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1,
174174
/* U+A6 BROKEN BAR */ _CC_GRAPH_L1|_CC_PRINT_L1,
175-
/* U+A7 SECTION SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1,
175+
/* U+A7 SECTION SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1|_CC_PUNCT_L1,
176176
/* U+A8 DIAERESIS */ _CC_GRAPH_L1|_CC_PRINT_L1,
177177
/* U+A9 COPYRIGHT SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1,
178178
/* U+AA FEMININE ORDINAL INDICATOR */ _CC_ALNUMC_L1|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1,
@@ -187,7 +187,7 @@
187187
/* U+B3 SUPERSCRIPT THREE */ _CC_GRAPH_L1|_CC_PRINT_L1,
188188
/* U+B4 ACUTE ACCENT */ _CC_GRAPH_L1|_CC_PRINT_L1,
189189
/* U+B5 MICRO SIGN */ _CC_NONLATIN1_FOLD|_CC_ALNUMC_L1|_CC_ALPHA_L1|_CC_CHARNAME_CONT|_CC_GRAPH_L1|_CC_IDFIRST_L1|_CC_LOWER_L1|_CC_PRINT_L1|_CC_WORDCHAR_L1,
190-
/* U+B6 PILCROW SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1,
190+
/* U+B6 PILCROW SIGN */ _CC_GRAPH_L1|_CC_PRINT_L1|_CC_PUNCT_L1,
191191
/* U+B7 MIDDLE DOT */ _CC_GRAPH_L1|_CC_PRINT_L1|_CC_PUNCT_L1,
192192
/* U+B8 CEDILLA */ _CC_GRAPH_L1|_CC_PRINT_L1,
193193
/* U+B9 SUPERSCRIPT ONE */ _CC_GRAPH_L1|_CC_PRINT_L1,

lib/Unicode/UCD.pm

Lines changed: 51 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2252,20 +2252,56 @@ Devanagari, Gurmukhi, and Oriya scripts.
22522252
22532253
The Name_Alias property is of this form. But each scalar consists of two
22542254
components: 1) the name, and 2) the type of alias this is. They are
2255-
separated by a colon and a space. In Unicode 6.0, there are two alias types:
2256-
C<"correction">, which indicates that the name is a corrected form for the
2257-
original name (which remains valid) for the same code point; and C<"control">,
2258-
which adds a new name for a control character.
2255+
separated by a colon and a space. In Unicode 6.1, there are several alias types:
2256+
2257+
=over
2258+
2259+
=item C<correction>
2260+
2261+
indicates that the name is a corrected form for the
2262+
original name (which remains valid) for the same code point.
2263+
2264+
=item C<control>
2265+
2266+
adds a new name for a control character.
2267+
2268+
=item C<alternate>
2269+
2270+
is an alternate name for a character
2271+
2272+
=item C<figment>
2273+
2274+
is a name for a character that has been documented but was never in any
2275+
actual standard.
2276+
2277+
=item C<abbreviation>
2278+
2279+
is a common abbreviation for a character
2280+
2281+
=back
2282+
2283+
The lists are ordered (roughly) so the most preferred names come before less
2284+
preferred ones.
22592285
22602286
For example,
22612287
2262-
@aliases_ranges @alias_maps
2288+
@aliases_ranges @alias_maps
2289+
...
2290+
0x009E [ 'PRIVACY MESSAGE: control', 'PM: abbreviation' ]
2291+
0x009F [ 'APPLICATION PROGRAM COMMAND: control',
2292+
'APC: abbreviation'
2293+
]
2294+
0x00A0 'NBSP: abbreviation'
2295+
0x00A1 ""
2296+
0x00AD 'SHY: abbreviation'
2297+
0x00AE ""
2298+
0x01A2 'LATIN CAPITAL LETTER GHA: correction'
2299+
0x01A3 'LATIN SMALL LETTER GHA: correction'
2300+
0x01A4 ""
22632301
...
2264-
0x01A2 LATIN CAPITAL LETTER GHA: correction
2265-
0x01A3 LATIN SMALL LETTER GHA: correction
22662302
2267-
Unicode 6.1 will introduce other types, and some map entries will be lists of
2268-
multiple name-alias pairs for a single code point.
2303+
A map to the empty string means that there is no alias defined for the code
2304+
point.
22692305
22702306
=item C<r>
22712307
@@ -2409,7 +2445,9 @@ the function L<charnames/charnames::viacode(code)>.
24092445
24102446
Note that for control characters (C<Gc=cc>), Unicode's data files have the
24112447
string "C<E<lt>controlE<gt>>", but the real name of each of these characters is the empty
2412-
string. This function returns that real name, the empty string.
2448+
string. This function returns that real name, the empty string. (There are
2449+
names for these characters, but they are aliases, not the real name, and are
2450+
contained in the C<Name_Alias> property.)
24132451
24142452
=item C<d>
24152453
@@ -3179,6 +3217,9 @@ To convert from new-style to old-style, follow this recipe:
31793217
gets the lower end of the range (0th element) and then looks up the old name
31803218
for its block using C<charblock>).
31813219
3220+
Note that starting in Unicode 6.1, many of the block names have shorter
3221+
synonyms. These are always given in the new style.
3222+
31823223
=head1 BUGS
31833224
31843225
Does not yet support EBCDIC platforms.

lib/Unicode/UCD.t

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -342,7 +342,7 @@ is($bt->{AL}, 'Right-to-Left Arabic', 'AL is Right-to-Left Arabic');
342342

343343
# If this fails, then maybe one should look at the Unicode changes to see
344344
# what else might need to be updated.
345-
is(Unicode::UCD::UnicodeVersion, '6.0.0', 'UnicodeVersion');
345+
is(Unicode::UCD::UnicodeVersion, '6.1.0', 'UnicodeVersion');
346346

347347
use Unicode::UCD qw(compexcl);
348348

@@ -470,7 +470,7 @@ is(Unicode::UCD::_getcode('U+123x'), undef, "_getcode(x123)");
470470
{
471471
my $r1 = charscript('Latin');
472472
my $n1 = @$r1;
473-
is($n1, 30, "number of ranges in Latin script (Unicode 6.0.0)");
473+
is($n1, 30, "number of ranges in Latin script (Unicode 6.1.0)");
474474
shift @$r1 while @$r1;
475475
my $r2 = charscript('Latin');
476476
is(@$r2, $n1, "modifying results should not mess up internal caches");

0 commit comments

Comments
 (0)