Skip to content
Browse files

* Since some mapping in unicode and not in the range of big5/gbk, uni…

…code

  mappings (trad-simp and simp-trad) are maintained seperately now.

* Preserve unicode character as-is if it is not in the mapping table trad-simp
  and simp-trad.
  • Loading branch information...
1 parent 963f876 commit f0f96b6d24730d02edcafa4ea155df9be0cd986b @kcwu kcwu committed Dec 11, 2006
Showing with 905 additions and 5,302 deletions.
  1. +8 −0 Changes
  2. +1 −0 MANIFEST
  3. +867 −0 map/DerivedAge.txt
  4. +0 −63 map/b2g_map.utf8
  5. +0 −5,052 map/g2b_map.utf8
  6. +29 −187 map/umap2ucm.pl
View
8 Changes
@@ -5,6 +5,14 @@
* Fix a bug that b2g.pl and g2b.pl with -u flag won't set encoding correct if
providing file name in command line.
+* Fix lots of questionable pairs in b2g_map.txt.
+
+* Since some mapping in unicode and not in the range of big5/gbk, unicode
+ mappings (trad-simp and simp-trad) are maintained seperately now.
+
+* Preserve unicode character as-is if it is not in the mapping table trad-simp
+ and simp-trad.
+
____________________________________________________________________________
[ 10742] By: autrijus on 2004/06/04 07:15:15
Log: * This be 0.31.
View
1 MANIFEST
@@ -1,5 +1,6 @@
bin/b2g.pl Convert from Big5 to GBK (CP936)
bin/g2b.pl Convert from GBK (CP936) to Big5
+map/DerivedAge.txt Unicode code points - Not Installed
map/b2g_map.txt Big5 to GBK Map - Not Installed
map/g2b_map.txt GBK to Big5 Map - Not Installed
map/b2g_map.utf8 Trad to Simp Map - Not Installed
View
867 map/DerivedAge.txt
@@ -0,0 +1,867 @@
+# DerivedAge-5.0.0.txt
+# Date: 2006-07-14, 17:25:00 PST [MD/KW]
+#
+# Unicode Character Database
+# Copyright (c) 1991-2006 Unicode, Inc.
+# For terms of use, see http://www.unicode.org/terms_of_use.html
+# For documentation, see UCD.html
+#
+# Unicode Character Database: Derived Property Data
+# This file shows when various code points were first assigned in Unicode.
+#
+# Caution: When using the Age *property*, all assigned code points
+# in each version are included, not just the newly assigned code points.
+# For more information, see http://www.unicode.org/reports/tr18/
+#
+# Notes:
+#
+# - The term 'assigned' means that a previously reserved code point was assigned
+# to be a character (graphic, format, control, or private-use);
+# a noncharacter code point; or a surrogate code point.
+# For more information, see The Unicode Standard Section 2.4
+#
+# - Versions are only tracked from 1.1 onwards, since version 1.0
+# predated changes required by the ISO 10646 merger.
+#
+# - The Hangul Syllables that were removed from 2.0 are not included in the 1.1 listing.
+#
+# - The supplementary private use code points and the non-character code points
+# were assigned in version 2.0, but not specifically listed in the UCD
+# until versions 3.0 and 3.1 respectively.
+#
+# - Contiguous ranges are broken into separate lines where they would cross code point
+# types: graphic, format, control, private-use, surrogate, noncharacter
+#
+# For details on the contents of each version, see
+# http://www.unicode.org/versions/enumeratedversions.html.
+
+# ================================================
+
+# Property: Age
+
+# All code points not explicitly listed for Age
+# have the value unassigned.
+
+# @missing: 0000..10FFFF; unassigned
+
+# ================================================
+
+# Assigned as of Unicode 1.1.0 (June, 1993)
+# [excluding removed Hangul Syllables]
+
+0000..001F ; 1.1 # [32] <control-0000>..<control-001F>
+0020..007E ; 1.1 # [95] SPACE..TILDE
+007F..009F ; 1.1 # [33] <control-007F>..<control-009F>
+00A0..00AC ; 1.1 # [13] NO-BREAK SPACE..NOT SIGN
+00AD ; 1.1 # SOFT HYPHEN
+00AE..01F5 ; 1.1 # [328] REGISTERED SIGN..LATIN SMALL LETTER G WITH ACUTE
+01FA..0217 ; 1.1 # [30] LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE..LATIN SMALL LETTER U WITH INVERTED BREVE
+0250..02A8 ; 1.1 # [89] LATIN SMALL LETTER TURNED A..LATIN SMALL LETTER TC DIGRAPH WITH CURL
+02B0..02DE ; 1.1 # [47] MODIFIER LETTER SMALL H..MODIFIER LETTER RHOTIC HOOK
+02E0..02E9 ; 1.1 # [10] MODIFIER LETTER SMALL GAMMA..MODIFIER LETTER EXTRA-LOW TONE BAR
+0300..0345 ; 1.1 # [70] COMBINING GRAVE ACCENT..COMBINING GREEK YPOGEGRAMMENI
+0360..0361 ; 1.1 # [2] COMBINING DOUBLE TILDE..COMBINING DOUBLE INVERTED BREVE
+0374..0375 ; 1.1 # [2] GREEK NUMERAL SIGN..GREEK LOWER NUMERAL SIGN
+037A ; 1.1 # GREEK YPOGEGRAMMENI
+037E ; 1.1 # GREEK QUESTION MARK
+0384..038A ; 1.1 # [7] GREEK TONOS..GREEK CAPITAL LETTER IOTA WITH TONOS
+038C ; 1.1 # GREEK CAPITAL LETTER OMICRON WITH TONOS
+038E..03A1 ; 1.1 # [20] GREEK CAPITAL LETTER UPSILON WITH TONOS..GREEK CAPITAL LETTER RHO
+03A3..03CE ; 1.1 # [44] GREEK CAPITAL LETTER SIGMA..GREEK SMALL LETTER OMEGA WITH TONOS
+03D0..03D6 ; 1.1 # [7] GREEK BETA SYMBOL..GREEK PI SYMBOL
+03DA ; 1.1 # GREEK LETTER STIGMA
+03DC ; 1.1 # GREEK LETTER DIGAMMA
+03DE ; 1.1 # GREEK LETTER KOPPA
+03E0 ; 1.1 # GREEK LETTER SAMPI
+03E2..03F3 ; 1.1 # [18] COPTIC CAPITAL LETTER SHEI..GREEK LETTER YOT
+0401..040C ; 1.1 # [12] CYRILLIC CAPITAL LETTER IO..CYRILLIC CAPITAL LETTER KJE
+040E..044F ; 1.1 # [66] CYRILLIC CAPITAL LETTER SHORT U..CYRILLIC SMALL LETTER YA
+0451..045C ; 1.1 # [12] CYRILLIC SMALL LETTER IO..CYRILLIC SMALL LETTER KJE
+045E..0486 ; 1.1 # [41] CYRILLIC SMALL LETTER SHORT U..COMBINING CYRILLIC PSILI PNEUMATA
+0490..04C4 ; 1.1 # [53] CYRILLIC CAPITAL LETTER GHE WITH UPTURN..CYRILLIC SMALL LETTER KA WITH HOOK
+04C7..04C8 ; 1.1 # [2] CYRILLIC CAPITAL LETTER EN WITH HOOK..CYRILLIC SMALL LETTER EN WITH HOOK
+04CB..04CC ; 1.1 # [2] CYRILLIC CAPITAL LETTER KHAKASSIAN CHE..CYRILLIC SMALL LETTER KHAKASSIAN CHE
+04D0..04EB ; 1.1 # [28] CYRILLIC CAPITAL LETTER A WITH BREVE..CYRILLIC SMALL LETTER BARRED O WITH DIAERESIS
+04EE..04F5 ; 1.1 # [8] CYRILLIC CAPITAL LETTER U WITH MACRON..CYRILLIC SMALL LETTER CHE WITH DIAERESIS
+04F8..04F9 ; 1.1 # [2] CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS..CYRILLIC SMALL LETTER YERU WITH DIAERESIS
+0531..0556 ; 1.1 # [38] ARMENIAN CAPITAL LETTER AYB..ARMENIAN CAPITAL LETTER FEH
+0559..055F ; 1.1 # [7] ARMENIAN MODIFIER LETTER LEFT HALF RING..ARMENIAN ABBREVIATION MARK
+0561..0587 ; 1.1 # [39] ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LIGATURE ECH YIWN
+0589 ; 1.1 # ARMENIAN FULL STOP
+05B0..05B9 ; 1.1 # [10] HEBREW POINT SHEVA..HEBREW POINT HOLAM
+05BB..05C3 ; 1.1 # [9] HEBREW POINT QUBUTS..HEBREW PUNCTUATION SOF PASUQ
+05D0..05EA ; 1.1 # [27] HEBREW LETTER ALEF..HEBREW LETTER TAV
+05F0..05F4 ; 1.1 # [5] HEBREW LIGATURE YIDDISH DOUBLE VAV..HEBREW PUNCTUATION GERSHAYIM
+060C ; 1.1 # ARABIC COMMA
+061B ; 1.1 # ARABIC SEMICOLON
+061F ; 1.1 # ARABIC QUESTION MARK
+0621..063A ; 1.1 # [26] ARABIC LETTER HAMZA..ARABIC LETTER GHAIN
+0640..0652 ; 1.1 # [19] ARABIC TATWEEL..ARABIC SUKUN
+0660..066D ; 1.1 # [14] ARABIC-INDIC DIGIT ZERO..ARABIC FIVE POINTED STAR
+0670..06B7 ; 1.1 # [72] ARABIC LETTER SUPERSCRIPT ALEF..ARABIC LETTER LAM WITH THREE DOTS ABOVE
+06BA..06BE ; 1.1 # [5] ARABIC LETTER NOON GHUNNA..ARABIC LETTER HEH DOACHASHMEE
+06C0..06CE ; 1.1 # [15] ARABIC LETTER HEH WITH YEH ABOVE..ARABIC LETTER YEH WITH SMALL V
+06D0..06DC ; 1.1 # [13] ARABIC LETTER E..ARABIC SMALL HIGH SEEN
+06DD ; 1.1 # ARABIC END OF AYAH
+06DE..06ED ; 1.1 # [16] ARABIC START OF RUB EL HIZB..ARABIC SMALL LOW MEEM
+06F0..06F9 ; 1.1 # [10] EXTENDED ARABIC-INDIC DIGIT ZERO..EXTENDED ARABIC-INDIC DIGIT NINE
+0901..0903 ; 1.1 # [3] DEVANAGARI SIGN CANDRABINDU..DEVANAGARI SIGN VISARGA
+0905..0939 ; 1.1 # [53] DEVANAGARI LETTER A..DEVANAGARI LETTER HA
+093C..094D ; 1.1 # [18] DEVANAGARI SIGN NUKTA..DEVANAGARI SIGN VIRAMA
+0950..0954 ; 1.1 # [5] DEVANAGARI OM..DEVANAGARI ACUTE ACCENT
+0958..0970 ; 1.1 # [25] DEVANAGARI LETTER QA..DEVANAGARI ABBREVIATION SIGN
+0981..0983 ; 1.1 # [3] BENGALI SIGN CANDRABINDU..BENGALI SIGN VISARGA
+0985..098C ; 1.1 # [8] BENGALI LETTER A..BENGALI LETTER VOCALIC L
+098F..0990 ; 1.1 # [2] BENGALI LETTER E..BENGALI LETTER AI
+0993..09A8 ; 1.1 # [22] BENGALI LETTER O..BENGALI LETTER NA
+09AA..09B0 ; 1.1 # [7] BENGALI LETTER PA..BENGALI LETTER RA
+09B2 ; 1.1 # BENGALI LETTER LA
+09B6..09B9 ; 1.1 # [4] BENGALI LETTER SHA..BENGALI LETTER HA
+09BC ; 1.1 # BENGALI SIGN NUKTA
+09BE..09C4 ; 1.1 # [7] BENGALI VOWEL SIGN AA..BENGALI VOWEL SIGN VOCALIC RR
+09C7..09C8 ; 1.1 # [2] BENGALI VOWEL SIGN E..BENGALI VOWEL SIGN AI
+09CB..09CD ; 1.1 # [3] BENGALI VOWEL SIGN O..BENGALI SIGN VIRAMA
+09D7 ; 1.1 # BENGALI AU LENGTH MARK
+09DC..09DD ; 1.1 # [2] BENGALI LETTER RRA..BENGALI LETTER RHA
+09DF..09E3 ; 1.1 # [5] BENGALI LETTER YYA..BENGALI VOWEL SIGN VOCALIC LL
+09E6..09FA ; 1.1 # [21] BENGALI DIGIT ZERO..BENGALI ISSHAR
+0A02 ; 1.1 # GURMUKHI SIGN BINDI
+0A05..0A0A ; 1.1 # [6] GURMUKHI LETTER A..GURMUKHI LETTER UU
+0A0F..0A10 ; 1.1 # [2] GURMUKHI LETTER EE..GURMUKHI LETTER AI
+0A13..0A28 ; 1.1 # [22] GURMUKHI LETTER OO..GURMUKHI LETTER NA
+0A2A..0A30 ; 1.1 # [7] GURMUKHI LETTER PA..GURMUKHI LETTER RA
+0A32..0A33 ; 1.1 # [2] GURMUKHI LETTER LA..GURMUKHI LETTER LLA
+0A35..0A36 ; 1.1 # [2] GURMUKHI LETTER VA..GURMUKHI LETTER SHA
+0A38..0A39 ; 1.1 # [2] GURMUKHI LETTER SA..GURMUKHI LETTER HA
+0A3C ; 1.1 # GURMUKHI SIGN NUKTA
+0A3E..0A42 ; 1.1 # [5] GURMUKHI VOWEL SIGN AA..GURMUKHI VOWEL SIGN UU
+0A47..0A48 ; 1.1 # [2] GURMUKHI VOWEL SIGN EE..GURMUKHI VOWEL SIGN AI
+0A4B..0A4D ; 1.1 # [3] GURMUKHI VOWEL SIGN OO..GURMUKHI SIGN VIRAMA
+0A59..0A5C ; 1.1 # [4] GURMUKHI LETTER KHHA..GURMUKHI LETTER RRA
+0A5E ; 1.1 # GURMUKHI LETTER FA
+0A66..0A74 ; 1.1 # [15] GURMUKHI DIGIT ZERO..GURMUKHI EK ONKAR
+0A81..0A83 ; 1.1 # [3] GUJARATI SIGN CANDRABINDU..GUJARATI SIGN VISARGA
+0A85..0A8B ; 1.1 # [7] GUJARATI LETTER A..GUJARATI LETTER VOCALIC R
+0A8D ; 1.1 # GUJARATI VOWEL CANDRA E
+0A8F..0A91 ; 1.1 # [3] GUJARATI LETTER E..GUJARATI VOWEL CANDRA O
+0A93..0AA8 ; 1.1 # [22] GUJARATI LETTER O..GUJARATI LETTER NA
+0AAA..0AB0 ; 1.1 # [7] GUJARATI LETTER PA..GUJARATI LETTER RA
+0AB2..0AB3 ; 1.1 # [2] GUJARATI LETTER LA..GUJARATI LETTER LLA
+0AB5..0AB9 ; 1.1 # [5] GUJARATI LETTER VA..GUJARATI LETTER HA
+0ABC..0AC5 ; 1.1 # [10] GUJARATI SIGN NUKTA..GUJARATI VOWEL SIGN CANDRA E
+0AC7..0AC9 ; 1.1 # [3] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN CANDRA O
+0ACB..0ACD ; 1.1 # [3] GUJARATI VOWEL SIGN O..GUJARATI SIGN VIRAMA
+0AD0 ; 1.1 # GUJARATI OM
+0AE0 ; 1.1 # GUJARATI LETTER VOCALIC RR
+0AE6..0AEF ; 1.1 # [10] GUJARATI DIGIT ZERO..GUJARATI DIGIT NINE
+0B01..0B03 ; 1.1 # [3] ORIYA SIGN CANDRABINDU..ORIYA SIGN VISARGA
+0B05..0B0C ; 1.1 # [8] ORIYA LETTER A..ORIYA LETTER VOCALIC L
+0B0F..0B10 ; 1.1 # [2] ORIYA LETTER E..ORIYA LETTER AI
+0B13..0B28 ; 1.1 # [22] ORIYA LETTER O..ORIYA LETTER NA
+0B2A..0B30 ; 1.1 # [7] ORIYA LETTER PA..ORIYA LETTER RA
+0B32..0B33 ; 1.1 # [2] ORIYA LETTER LA..ORIYA LETTER LLA
+0B36..0B39 ; 1.1 # [4] ORIYA LETTER SHA..ORIYA LETTER HA
+0B3C..0B43 ; 1.1 # [8] ORIYA SIGN NUKTA..ORIYA VOWEL SIGN VOCALIC R
+0B47..0B48 ; 1.1 # [2] ORIYA VOWEL SIGN E..ORIYA VOWEL SIGN AI
+0B4B..0B4D ; 1.1 # [3] ORIYA VOWEL SIGN O..ORIYA SIGN VIRAMA
+0B56..0B57 ; 1.1 # [2] ORIYA AI LENGTH MARK..ORIYA AU LENGTH MARK
+0B5C..0B5D ; 1.1 # [2] ORIYA LETTER RRA..ORIYA LETTER RHA
+0B5F..0B61 ; 1.1 # [3] ORIYA LETTER YYA..ORIYA LETTER VOCALIC LL
+0B66..0B70 ; 1.1 # [11] ORIYA DIGIT ZERO..ORIYA ISSHAR
+0B82..0B83 ; 1.1 # [2] TAMIL SIGN ANUSVARA..TAMIL SIGN VISARGA
+0B85..0B8A ; 1.1 # [6] TAMIL LETTER A..TAMIL LETTER UU
+0B8E..0B90 ; 1.1 # [3] TAMIL LETTER E..TAMIL LETTER AI
+0B92..0B95 ; 1.1 # [4] TAMIL LETTER O..TAMIL LETTER KA
+0B99..0B9A ; 1.1 # [2] TAMIL LETTER NGA..TAMIL LETTER CA
+0B9C ; 1.1 # TAMIL LETTER JA
+0B9E..0B9F ; 1.1 # [2] TAMIL LETTER NYA..TAMIL LETTER TTA
+0BA3..0BA4 ; 1.1 # [2] TAMIL LETTER NNA..TAMIL LETTER TA
+0BA8..0BAA ; 1.1 # [3] TAMIL LETTER NA..TAMIL LETTER PA
+0BAE..0BB5 ; 1.1 # [8] TAMIL LETTER MA..TAMIL LETTER VA
+0BB7..0BB9 ; 1.1 # [3] TAMIL LETTER SSA..TAMIL LETTER HA
+0BBE..0BC2 ; 1.1 # [5] TAMIL VOWEL SIGN AA..TAMIL VOWEL SIGN UU
+0BC6..0BC8 ; 1.1 # [3] TAMIL VOWEL SIGN E..TAMIL VOWEL SIGN AI
+0BCA..0BCD ; 1.1 # [4] TAMIL VOWEL SIGN O..TAMIL SIGN VIRAMA
+0BD7 ; 1.1 # TAMIL AU LENGTH MARK
+0BE7..0BF2 ; 1.1 # [12] TAMIL DIGIT ONE..TAMIL NUMBER ONE THOUSAND
+0C01..0C03 ; 1.1 # [3] TELUGU SIGN CANDRABINDU..TELUGU SIGN VISARGA
+0C05..0C0C ; 1.1 # [8] TELUGU LETTER A..TELUGU LETTER VOCALIC L
+0C0E..0C10 ; 1.1 # [3] TELUGU LETTER E..TELUGU LETTER AI
+0C12..0C28 ; 1.1 # [23] TELUGU LETTER O..TELUGU LETTER NA
+0C2A..0C33 ; 1.1 # [10] TELUGU LETTER PA..TELUGU LETTER LLA
+0C35..0C39 ; 1.1 # [5] TELUGU LETTER VA..TELUGU LETTER HA
+0C3E..0C44 ; 1.1 # [7] TELUGU VOWEL SIGN AA..TELUGU VOWEL SIGN VOCALIC RR
+0C46..0C48 ; 1.1 # [3] TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI
+0C4A..0C4D ; 1.1 # [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA
+0C55..0C56 ; 1.1 # [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK
+0C60..0C61 ; 1.1 # [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL
+0C66..0C6F ; 1.1 # [10] TELUGU DIGIT ZERO..TELUGU DIGIT NINE
+0C82..0C83 ; 1.1 # [2] KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA
+0C85..0C8C ; 1.1 # [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L
+0C8E..0C90 ; 1.1 # [3] KANNADA LETTER E..KANNADA LETTER AI
+0C92..0CA8 ; 1.1 # [23] KANNADA LETTER O..KANNADA LETTER NA
+0CAA..0CB3 ; 1.1 # [10] KANNADA LETTER PA..KANNADA LETTER LLA
+0CB5..0CB9 ; 1.1 # [5] KANNADA LETTER VA..KANNADA LETTER HA
+0CBE..0CC4 ; 1.1 # [7] KANNADA VOWEL SIGN AA..KANNADA VOWEL SIGN VOCALIC RR
+0CC6..0CC8 ; 1.1 # [3] KANNADA VOWEL SIGN E..KANNADA VOWEL SIGN AI
+0CCA..0CCD ; 1.1 # [4] KANNADA VOWEL SIGN O..KANNADA SIGN VIRAMA
+0CD5..0CD6 ; 1.1 # [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
+0CDE ; 1.1 # KANNADA LETTER FA
+0CE0..0CE1 ; 1.1 # [2] KANNADA LETTER VOCALIC RR..KANNADA LETTER VOCALIC LL
+0CE6..0CEF ; 1.1 # [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
+0D02..0D03 ; 1.1 # [2] MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISARGA
+0D05..0D0C ; 1.1 # [8] MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC L
+0D0E..0D10 ; 1.1 # [3] MALAYALAM LETTER E..MALAYALAM LETTER AI
+0D12..0D28 ; 1.1 # [23] MALAYALAM LETTER O..MALAYALAM LETTER NA
+0D2A..0D39 ; 1.1 # [16] MALAYALAM LETTER PA..MALAYALAM LETTER HA
+0D3E..0D43 ; 1.1 # [6] MALAYALAM VOWEL SIGN AA..MALAYALAM VOWEL SIGN VOCALIC R
+0D46..0D48 ; 1.1 # [3] MALAYALAM VOWEL SIGN E..MALAYALAM VOWEL SIGN AI
+0D4A..0D4D ; 1.1 # [4] MALAYALAM VOWEL SIGN O..MALAYALAM SIGN VIRAMA
+0D57 ; 1.1 # MALAYALAM AU LENGTH MARK
+0D60..0D61 ; 1.1 # [2] MALAYALAM LETTER VOCALIC RR..MALAYALAM LETTER VOCALIC LL
+0D66..0D6F ; 1.1 # [10] MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE
+0E01..0E3A ; 1.1 # [58] THAI CHARACTER KO KAI..THAI CHARACTER PHINTHU
+0E3F..0E5B ; 1.1 # [29] THAI CURRENCY SYMBOL BAHT..THAI CHARACTER KHOMUT
+0E81..0E82 ; 1.1 # [2] LAO LETTER KO..LAO LETTER KHO SUNG
+0E84 ; 1.1 # LAO LETTER KHO TAM
+0E87..0E88 ; 1.1 # [2] LAO LETTER NGO..LAO LETTER CO
+0E8A ; 1.1 # LAO LETTER SO TAM
+0E8D ; 1.1 # LAO LETTER NYO
+0E94..0E97 ; 1.1 # [4] LAO LETTER DO..LAO LETTER THO TAM
+0E99..0E9F ; 1.1 # [7] LAO LETTER NO..LAO LETTER FO SUNG
+0EA1..0EA3 ; 1.1 # [3] LAO LETTER MO..LAO LETTER LO LING
+0EA5 ; 1.1 # LAO LETTER LO LOOT
+0EA7 ; 1.1 # LAO LETTER WO
+0EAA..0EAB ; 1.1 # [2] LAO LETTER SO SUNG..LAO LETTER HO SUNG
+0EAD..0EB9 ; 1.1 # [13] LAO LETTER O..LAO VOWEL SIGN UU
+0EBB..0EBD ; 1.1 # [3] LAO VOWEL SIGN MAI KON..LAO SEMIVOWEL SIGN NYO
+0EC0..0EC4 ; 1.1 # [5] LAO VOWEL SIGN E..LAO VOWEL SIGN AI
+0EC6 ; 1.1 # LAO KO LA
+0EC8..0ECD ; 1.1 # [6] LAO TONE MAI EK..LAO NIGGAHITA
+0ED0..0ED9 ; 1.1 # [10] LAO DIGIT ZERO..LAO DIGIT NINE
+0EDC..0EDD ; 1.1 # [2] LAO HO NO..LAO HO MO
+10A0..10C5 ; 1.1 # [38] GEORGIAN CAPITAL LETTER AN..GEORGIAN CAPITAL LETTER HOE
+10D0..10F6 ; 1.1 # [39] GEORGIAN LETTER AN..GEORGIAN LETTER FI
+10FB ; 1.1 # GEORGIAN PARAGRAPH SEPARATOR
+1100..1159 ; 1.1 # [90] HANGUL CHOSEONG KIYEOK..HANGUL CHOSEONG YEORINHIEUH
+115F..11A2 ; 1.1 # [68] HANGUL CHOSEONG FILLER..HANGUL JUNGSEONG SSANGARAEA
+11A8..11F9 ; 1.1 # [82] HANGUL JONGSEONG KIYEOK..HANGUL JONGSEONG YEORINHIEUH
+1E00..1E9A ; 1.1 # [155] LATIN CAPITAL LETTER A WITH RING BELOW..LATIN SMALL LETTER A WITH RIGHT HALF RING
+1EA0..1EF9 ; 1.1 # [90] LATIN CAPITAL LETTER A WITH DOT BELOW..LATIN SMALL LETTER Y WITH TILDE
+1F00..1F15 ; 1.1 # [22] GREEK SMALL LETTER ALPHA WITH PSILI..GREEK SMALL LETTER EPSILON WITH DASIA AND OXIA
+1F18..1F1D ; 1.1 # [6] GREEK CAPITAL LETTER EPSILON WITH PSILI..GREEK CAPITAL LETTER EPSILON WITH DASIA AND OXIA
+1F20..1F45 ; 1.1 # [38] GREEK SMALL LETTER ETA WITH PSILI..GREEK SMALL LETTER OMICRON WITH DASIA AND OXIA
+1F48..1F4D ; 1.1 # [6] GREEK CAPITAL LETTER OMICRON WITH PSILI..GREEK CAPITAL LETTER OMICRON WITH DASIA AND OXIA
+1F50..1F57 ; 1.1 # [8] GREEK SMALL LETTER UPSILON WITH PSILI..GREEK SMALL LETTER UPSILON WITH DASIA AND PERISPOMENI
+1F59 ; 1.1 # GREEK CAPITAL LETTER UPSILON WITH DASIA
+1F5B ; 1.1 # GREEK CAPITAL LETTER UPSILON WITH DASIA AND VARIA
+1F5D ; 1.1 # GREEK CAPITAL LETTER UPSILON WITH DASIA AND OXIA
+1F5F..1F7D ; 1.1 # [31] GREEK CAPITAL LETTER UPSILON WITH DASIA AND PERISPOMENI..GREEK SMALL LETTER OMEGA WITH OXIA
+1F80..1FB4 ; 1.1 # [53] GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER ALPHA WITH OXIA AND YPOGEGRAMMENI
+1FB6..1FC4 ; 1.1 # [15] GREEK SMALL LETTER ALPHA WITH PERISPOMENI..GREEK SMALL LETTER ETA WITH OXIA AND YPOGEGRAMMENI
+1FC6..1FD3 ; 1.1 # [14] GREEK SMALL LETTER ETA WITH PERISPOMENI..GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
+1FD6..1FDB ; 1.1 # [6] GREEK SMALL LETTER IOTA WITH PERISPOMENI..GREEK CAPITAL LETTER IOTA WITH OXIA
+1FDD..1FEF ; 1.1 # [19] GREEK DASIA AND VARIA..GREEK VARIA
+1FF2..1FF4 ; 1.1 # [3] GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI..GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI
+1FF6..1FFE ; 1.1 # [9] GREEK SMALL LETTER OMEGA WITH PERISPOMENI..GREEK DASIA
+2000..200A ; 1.1 # [11] EN QUAD..HAIR SPACE
+200B..200F ; 1.1 # [5] ZERO WIDTH SPACE..RIGHT-TO-LEFT MARK
+2010..2027 ; 1.1 # [24] HYPHEN..HYPHENATION POINT
+2028..202E ; 1.1 # [7] LINE SEPARATOR..RIGHT-TO-LEFT OVERRIDE
+2030..2046 ; 1.1 # [23] PER MILLE SIGN..RIGHT SQUARE BRACKET WITH QUILL
+206A..206F ; 1.1 # [6] INHIBIT SYMMETRIC SWAPPING..NOMINAL DIGIT SHAPES
+2070 ; 1.1 # SUPERSCRIPT ZERO
+2074..208E ; 1.1 # [27] SUPERSCRIPT FOUR..SUBSCRIPT RIGHT PARENTHESIS
+20A0..20AA ; 1.1 # [11] EURO-CURRENCY SIGN..NEW SHEQEL SIGN
+20D0..20E1 ; 1.1 # [18] COMBINING LEFT HARPOON ABOVE..COMBINING LEFT RIGHT ARROW ABOVE
+2100..2138 ; 1.1 # [57] ACCOUNT OF..DALET SYMBOL
+2153..2182 ; 1.1 # [48] VULGAR FRACTION ONE THIRD..ROMAN NUMERAL TEN THOUSAND
+2190..21EA ; 1.1 # [91] LEFTWARDS ARROW..UPWARDS WHITE ARROW FROM BAR
+2200..22F1 ; 1.1 # [242] FOR ALL..DOWN RIGHT DIAGONAL ELLIPSIS
+2300 ; 1.1 # DIAMETER SIGN
+2302..237A ; 1.1 # [121] HOUSE..APL FUNCTIONAL SYMBOL ALPHA
+2400..2424 ; 1.1 # [37] SYMBOL FOR NULL..SYMBOL FOR NEWLINE
+2440..244A ; 1.1 # [11] OCR HOOK..OCR DOUBLE BACKSLASH
+2460..24EA ; 1.1 # [139] CIRCLED DIGIT ONE..CIRCLED DIGIT ZERO
+2500..2595 ; 1.1 # [150] BOX DRAWINGS LIGHT HORIZONTAL..RIGHT ONE EIGHTH BLOCK
+25A0..25EF ; 1.1 # [80] BLACK SQUARE..LARGE CIRCLE
+2600..2613 ; 1.1 # [20] BLACK SUN WITH RAYS..SALTIRE
+261A..266F ; 1.1 # [86] BLACK LEFT POINTING INDEX..MUSIC SHARP SIGN
+2701..2704 ; 1.1 # [4] UPPER BLADE SCISSORS..WHITE SCISSORS
+2706..2709 ; 1.1 # [4] TELEPHONE LOCATION SIGN..ENVELOPE
+270C..2727 ; 1.1 # [28] VICTORY HAND..WHITE FOUR POINTED STAR
+2729..274B ; 1.1 # [35] STRESS OUTLINED WHITE STAR..HEAVY EIGHT TEARDROP-SPOKED PROPELLER ASTERISK
+274D ; 1.1 # SHADOWED WHITE CIRCLE
+274F..2752 ; 1.1 # [4] LOWER RIGHT DROP-SHADOWED WHITE SQUARE..UPPER RIGHT SHADOWED WHITE SQUARE
+2756 ; 1.1 # BLACK DIAMOND MINUS WHITE X
+2758..275E ; 1.1 # [7] LIGHT VERTICAL BAR..HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT
+2761..2767 ; 1.1 # [7] CURVED STEM PARAGRAPH SIGN ORNAMENT..ROTATED FLORAL HEART BULLET
+2776..2794 ; 1.1 # [31] DINGBAT NEGATIVE CIRCLED DIGIT ONE..HEAVY WIDE-HEADED RIGHTWARDS ARROW
+2798..27AF ; 1.1 # [24] HEAVY SOUTH EAST ARROW..NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
+27B1..27BE ; 1.1 # [14] NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW..OPEN-OUTLINED RIGHTWARDS ARROW
+3000..3037 ; 1.1 # [56] IDEOGRAPHIC SPACE..IDEOGRAPHIC TELEGRAPH LINE FEED SEPARATOR SYMBOL
+303F ; 1.1 # IDEOGRAPHIC HALF FILL SPACE
+3041..3094 ; 1.1 # [84] HIRAGANA LETTER SMALL A..HIRAGANA LETTER VU
+3099..309E ; 1.1 # [6] COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK..HIRAGANA VOICED ITERATION MARK
+30A1..30FE ; 1.1 # [94] KATAKANA LETTER SMALL A..KATAKANA VOICED ITERATION MARK
+3105..312C ; 1.1 # [40] BOPOMOFO LETTER B..BOPOMOFO LETTER GN
+3131..318E ; 1.1 # [94] HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE
+3190..319F ; 1.1 # [16] IDEOGRAPHIC ANNOTATION LINKING MARK..IDEOGRAPHIC ANNOTATION MAN MARK
+3200..321C ; 1.1 # [29] PARENTHESIZED HANGUL KIYEOK..PARENTHESIZED HANGUL CIEUC U
+3220..3243 ; 1.1 # [36] PARENTHESIZED IDEOGRAPH ONE..PARENTHESIZED IDEOGRAPH REACH
+3260..327B ; 1.1 # [28] CIRCLED HANGUL KIYEOK..CIRCLED HANGUL HIEUH A
+327F..32B0 ; 1.1 # [50] KOREAN STANDARD SYMBOL..CIRCLED IDEOGRAPH NIGHT
+32C0..32CB ; 1.1 # [12] IDEOGRAPHIC TELEGRAPH SYMBOL FOR JANUARY..IDEOGRAPHIC TELEGRAPH SYMBOL FOR DECEMBER
+32D0..32FE ; 1.1 # [47] CIRCLED KATAKANA A..CIRCLED KATAKANA WO
+3300..3376 ; 1.1 # [119] SQUARE APAATO..SQUARE PC
+337B..33DD ; 1.1 # [99] SQUARE ERA NAME HEISEI..SQUARE WB
+33E0..33FE ; 1.1 # [31] IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY ONE..IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY THIRTY-ONE
+4E00..9FA5 ; 1.1 # [20902] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FA5
+E000..F8FF ; 1.1 # [6400] <private-use-E000>..<private-use-F8FF>
+F900..FA2D ; 1.1 # [302] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA2D
+FB00..FB06 ; 1.1 # [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE ST
+FB13..FB17 ; 1.1 # [5] ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SMALL LIGATURE MEN XEH
+FB1E..FB36 ; 1.1 # [25] HEBREW POINT JUDEO-SPANISH VARIKA..HEBREW LETTER ZAYIN WITH DAGESH
+FB38..FB3C ; 1.1 # [5] HEBREW LETTER TET WITH DAGESH..HEBREW LETTER LAMED WITH DAGESH
+FB3E ; 1.1 # HEBREW LETTER MEM WITH DAGESH
+FB40..FB41 ; 1.1 # [2] HEBREW LETTER NUN WITH DAGESH..HEBREW LETTER SAMEKH WITH DAGESH
+FB43..FB44 ; 1.1 # [2] HEBREW LETTER FINAL PE WITH DAGESH..HEBREW LETTER PE WITH DAGESH
+FB46..FBB1 ; 1.1 # [108] HEBREW LETTER TSADI WITH DAGESH..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM
+FBD3..FD3F ; 1.1 # [365] ARABIC LETTER NG ISOLATED FORM..ORNATE RIGHT PARENTHESIS
+FD50..FD8F ; 1.1 # [64] ARABIC LIGATURE TEH WITH JEEM WITH MEEM INITIAL FORM..ARABIC LIGATURE MEEM WITH KHAH WITH MEEM INITIAL FORM
+FD92..FDC7 ; 1.1 # [54] ARABIC LIGATURE MEEM WITH JEEM WITH KHAH INITIAL FORM..ARABIC LIGATURE NOON WITH JEEM WITH YEH FINAL FORM
+FDF0..FDFB ; 1.1 # [12] ARABIC LIGATURE SALLA USED AS KORANIC STOP SIGN ISOLATED FORM..ARABIC LIGATURE JALLAJALALOUHOU
+FE20..FE23 ; 1.1 # [4] COMBINING LIGATURE LEFT HALF..COMBINING DOUBLE TILDE RIGHT HALF
+FE30..FE44 ; 1.1 # [21] PRESENTATION FORM FOR VERTICAL TWO DOT LEADER..PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET
+FE49..FE52 ; 1.1 # [10] DASHED OVERLINE..SMALL FULL STOP
+FE54..FE66 ; 1.1 # [19] SMALL SEMICOLON..SMALL EQUALS SIGN
+FE68..FE6B ; 1.1 # [4] SMALL REVERSE SOLIDUS..SMALL COMMERCIAL AT
+FE70..FE72 ; 1.1 # [3] ARABIC FATHATAN ISOLATED FORM..ARABIC DAMMATAN ISOLATED FORM
+FE74 ; 1.1 # ARABIC KASRATAN ISOLATED FORM
+FE76..FEFC ; 1.1 # [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LAM WITH ALEF FINAL FORM
+FEFF ; 1.1 # ZERO WIDTH NO-BREAK SPACE
+FF01..FF5E ; 1.1 # [94] FULLWIDTH EXCLAMATION MARK..FULLWIDTH TILDE
+FF61..FFBE ; 1.1 # [94] HALFWIDTH IDEOGRAPHIC FULL STOP..HALFWIDTH HANGUL LETTER HIEUH
+FFC2..FFC7 ; 1.1 # [6] HALFWIDTH HANGUL LETTER A..HALFWIDTH HANGUL LETTER E
+FFCA..FFCF ; 1.1 # [6] HALFWIDTH HANGUL LETTER YEO..HALFWIDTH HANGUL LETTER OE
+FFD2..FFD7 ; 1.1 # [6] HALFWIDTH HANGUL LETTER YO..HALFWIDTH HANGUL LETTER YU
+FFDA..FFDC ; 1.1 # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
+FFE0..FFE6 ; 1.1 # [7] FULLWIDTH CENT SIGN..FULLWIDTH WON SIGN
+FFE8..FFEE ; 1.1 # [7] HALFWIDTH FORMS LIGHT VERTICAL..HALFWIDTH WHITE CIRCLE
+FFFD ; 1.1 # REPLACEMENT CHARACTER
+FFFE..FFFF ; 1.1 # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
+
+# Total code points: 33979
+
+# ================================================
+
+# Newly assigned in Unicode 2.0.0 (July, 1996)
+
+0591..05A1 ; 2.0 # [17] HEBREW ACCENT ETNAHTA..HEBREW ACCENT PAZER
+05A3..05AF ; 2.0 # [13] HEBREW ACCENT MUNAH..HEBREW MARK MASORA CIRCLE
+05C4 ; 2.0 # HEBREW MARK UPPER DOT
+0F00..0F47 ; 2.0 # [72] TIBETAN SYLLABLE OM..TIBETAN LETTER JA
+0F49..0F69 ; 2.0 # [33] TIBETAN LETTER NYA..TIBETAN LETTER KSSA
+0F71..0F8B ; 2.0 # [27] TIBETAN VOWEL SIGN AA..TIBETAN SIGN GRU MED RGYINGS
+0F90..0F95 ; 2.0 # [6] TIBETAN SUBJOINED LETTER KA..TIBETAN SUBJOINED LETTER CA
+0F97 ; 2.0 # TIBETAN SUBJOINED LETTER JA
+0F99..0FAD ; 2.0 # [21] TIBETAN SUBJOINED LETTER NYA..TIBETAN SUBJOINED LETTER WA
+0FB1..0FB7 ; 2.0 # [7] TIBETAN SUBJOINED LETTER YA..TIBETAN SUBJOINED LETTER HA
+0FB9 ; 2.0 # TIBETAN SUBJOINED LETTER KSSA
+1E9B ; 2.0 # LATIN SMALL LETTER LONG S WITH DOT ABOVE
+20AB ; 2.0 # DONG SIGN
+AC00..D7A3 ; 2.0 # [11172] HANGUL SYLLABLE GA..HANGUL SYLLABLE HIH
+D800..DFFF ; 2.0 # [2048] <surrogate-D800>..<surrogate-DFFF>
+1FFFE..1FFFF ; 2.0 # [2] <noncharacter-1FFFE>..<noncharacter-1FFFF>
+2FFFE..2FFFF ; 2.0 # [2] <noncharacter-2FFFE>..<noncharacter-2FFFF>
+3FFFE..3FFFF ; 2.0 # [2] <noncharacter-3FFFE>..<noncharacter-3FFFF>
+4FFFE..4FFFF ; 2.0 # [2] <noncharacter-4FFFE>..<noncharacter-4FFFF>
+5FFFE..5FFFF ; 2.0 # [2] <noncharacter-5FFFE>..<noncharacter-5FFFF>
+6FFFE..6FFFF ; 2.0 # [2] <noncharacter-6FFFE>..<noncharacter-6FFFF>
+7FFFE..7FFFF ; 2.0 # [2] <noncharacter-7FFFE>..<noncharacter-7FFFF>
+8FFFE..8FFFF ; 2.0 # [2] <noncharacter-8FFFE>..<noncharacter-8FFFF>
+9FFFE..9FFFF ; 2.0 # [2] <noncharacter-9FFFE>..<noncharacter-9FFFF>
+AFFFE..AFFFF ; 2.0 # [2] <noncharacter-AFFFE>..<noncharacter-AFFFF>
+BFFFE..BFFFF ; 2.0 # [2] <noncharacter-BFFFE>..<noncharacter-BFFFF>
+CFFFE..CFFFF ; 2.0 # [2] <noncharacter-CFFFE>..<noncharacter-CFFFF>
+DFFFE..DFFFF ; 2.0 # [2] <noncharacter-DFFFE>..<noncharacter-DFFFF>
+EFFFE..EFFFF ; 2.0 # [2] <noncharacter-EFFFE>..<noncharacter-EFFFF>
+F0000..FFFFD ; 2.0 # [65534] <private-use-F0000>..<private-use-FFFFD>
+FFFFE..FFFFF ; 2.0 # [2] <noncharacter-FFFFE>..<noncharacter-FFFFF>
+100000..10FFFD; 2.0 # [65534] <private-use-100000>..<private-use-10FFFD>
+10FFFE..10FFFF; 2.0 # [2] <noncharacter-10FFFE>..<noncharacter-10FFFF>
+
+# Total code points: 144521
+
+# ================================================
+
+# Newly assigned in Unicode 2.1.2 (May, 1998)
+
+20AC ; 2.1 # EURO SIGN
+FFFC ; 2.1 # OBJECT REPLACEMENT CHARACTER
+
+# Total code points: 2
+
+# ================================================
+
+# Newly assigned in Unicode 3.0.0 (September, 1999)
+
+01F6..01F9 ; 3.0 # [4] LATIN CAPITAL LETTER HWAIR..LATIN SMALL LETTER N WITH GRAVE
+0218..021F ; 3.0 # [8] LATIN CAPITAL LETTER S WITH COMMA BELOW..LATIN SMALL LETTER H WITH CARON
+0222..0233 ; 3.0 # [18] LATIN CAPITAL LETTER OU..LATIN SMALL LETTER Y WITH MACRON
+02A9..02AD ; 3.0 # [5] LATIN SMALL LETTER FENG DIGRAPH..LATIN LETTER BIDENTAL PERCUSSIVE
+02DF ; 3.0 # MODIFIER LETTER CROSS ACCENT
+02EA..02EE ; 3.0 # [5] MODIFIER LETTER YIN DEPARTING TONE MARK..MODIFIER LETTER DOUBLE APOSTROPHE
+0346..034E ; 3.0 # [9] COMBINING BRIDGE ABOVE..COMBINING UPWARDS ARROW BELOW
+0362 ; 3.0 # COMBINING DOUBLE RIGHTWARDS ARROW BELOW
+03D7 ; 3.0 # GREEK KAI SYMBOL
+03DB ; 3.0 # GREEK SMALL LETTER STIGMA
+03DD ; 3.0 # GREEK SMALL LETTER DIGAMMA
+03DF ; 3.0 # GREEK SMALL LETTER KOPPA
+03E1 ; 3.0 # GREEK SMALL LETTER SAMPI
+0400 ; 3.0 # CYRILLIC CAPITAL LETTER IE WITH GRAVE
+040D ; 3.0 # CYRILLIC CAPITAL LETTER I WITH GRAVE
+0450 ; 3.0 # CYRILLIC SMALL LETTER IE WITH GRAVE
+045D ; 3.0 # CYRILLIC SMALL LETTER I WITH GRAVE
+0488..0489 ; 3.0 # [2] COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..COMBINING CYRILLIC MILLIONS SIGN
+048C..048F ; 3.0 # [4] CYRILLIC CAPITAL LETTER SEMISOFT SIGN..CYRILLIC SMALL LETTER ER WITH TICK
+04EC..04ED ; 3.0 # [2] CYRILLIC CAPITAL LETTER E WITH DIAERESIS..CYRILLIC SMALL LETTER E WITH DIAERESIS
+058A ; 3.0 # ARMENIAN HYPHEN
+0653..0655 ; 3.0 # [3] ARABIC MADDAH ABOVE..ARABIC HAMZA BELOW
+06B8..06B9 ; 3.0 # [2] ARABIC LETTER LAM WITH THREE DOTS BELOW..ARABIC LETTER NOON WITH DOT BELOW
+06BF ; 3.0 # ARABIC LETTER TCHEH WITH DOT ABOVE
+06CF ; 3.0 # ARABIC LETTER WAW WITH DOT ABOVE
+06FA..06FE ; 3.0 # [5] ARABIC LETTER SHEEN WITH DOT BELOW..ARABIC SIGN SINDHI POSTPOSITION MEN
+0700..070D ; 3.0 # [14] SYRIAC END OF PARAGRAPH..SYRIAC HARKLEAN ASTERISCUS
+070F ; 3.0 # SYRIAC ABBREVIATION MARK
+0710..072C ; 3.0 # [29] SYRIAC LETTER ALAPH..SYRIAC LETTER TAW
+0730..074A ; 3.0 # [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH
+0780..07B0 ; 3.0 # [49] THAANA LETTER HAA..THAANA SUKUN
+0D82..0D83 ; 3.0 # [2] SINHALA SIGN ANUSVARAYA..SINHALA SIGN VISARGAYA
+0D85..0D96 ; 3.0 # [18] SINHALA LETTER AYANNA..SINHALA LETTER AUYANNA
+0D9A..0DB1 ; 3.0 # [24] SINHALA LETTER ALPAPRAANA KAYANNA..SINHALA LETTER DANTAJA NAYANNA
+0DB3..0DBB ; 3.0 # [9] SINHALA LETTER SANYAKA DAYANNA..SINHALA LETTER RAYANNA
+0DBD ; 3.0 # SINHALA LETTER DANTAJA LAYANNA
+0DC0..0DC6 ; 3.0 # [7] SINHALA LETTER VAYANNA..SINHALA LETTER FAYANNA
+0DCA ; 3.0 # SINHALA SIGN AL-LAKUNA
+0DCF..0DD4 ; 3.0 # [6] SINHALA VOWEL SIGN AELA-PILLA..SINHALA VOWEL SIGN KETTI PAA-PILLA
+0DD6 ; 3.0 # SINHALA VOWEL SIGN DIGA PAA-PILLA
+0DD8..0DDF ; 3.0 # [8] SINHALA VOWEL SIGN GAETTA-PILLA..SINHALA VOWEL SIGN GAYANUKITTA
+0DF2..0DF4 ; 3.0 # [3] SINHALA VOWEL SIGN DIGA GAETTA-PILLA..SINHALA PUNCTUATION KUNDDALIYA
+0F6A ; 3.0 # TIBETAN LETTER FIXED-FORM RA
+0F96 ; 3.0 # TIBETAN SUBJOINED LETTER CHA
+0FAE..0FB0 ; 3.0 # [3] TIBETAN SUBJOINED LETTER ZHA..TIBETAN SUBJOINED LETTER -A
+0FB8 ; 3.0 # TIBETAN SUBJOINED LETTER A
+0FBA..0FBC ; 3.0 # [3] TIBETAN SUBJOINED LETTER FIXED-FORM WA..TIBETAN SUBJOINED LETTER FIXED-FORM RA
+0FBE..0FCC ; 3.0 # [15] TIBETAN KU RU KHA..TIBETAN SYMBOL NOR BU BZHI -KHYIL
+0FCF ; 3.0 # TIBETAN SIGN RDEL NAG GSUM
+1000..1021 ; 3.0 # [34] MYANMAR LETTER KA..MYANMAR LETTER A
+1023..1027 ; 3.0 # [5] MYANMAR LETTER I..MYANMAR LETTER E
+1029..102A ; 3.0 # [2] MYANMAR LETTER O..MYANMAR LETTER AU
+102C..1032 ; 3.0 # [7] MYANMAR VOWEL SIGN AA..MYANMAR VOWEL SIGN AI
+1036..1039 ; 3.0 # [4] MYANMAR SIGN ANUSVARA..MYANMAR SIGN VIRAMA
+1040..1059 ; 3.0 # [26] MYANMAR DIGIT ZERO..MYANMAR VOWEL SIGN VOCALIC LL
+1200..1206 ; 3.0 # [7] ETHIOPIC SYLLABLE HA..ETHIOPIC SYLLABLE HO
+1208..1246 ; 3.0 # [63] ETHIOPIC SYLLABLE LA..ETHIOPIC SYLLABLE QO
+1248 ; 3.0 # ETHIOPIC SYLLABLE QWA
+124A..124D ; 3.0 # [4] ETHIOPIC SYLLABLE QWI..ETHIOPIC SYLLABLE QWE
+1250..1256 ; 3.0 # [7] ETHIOPIC SYLLABLE QHA..ETHIOPIC SYLLABLE QHO
+1258 ; 3.0 # ETHIOPIC SYLLABLE QHWA
+125A..125D ; 3.0 # [4] ETHIOPIC SYLLABLE QHWI..ETHIOPIC SYLLABLE QHWE
+1260..1286 ; 3.0 # [39] ETHIOPIC SYLLABLE BA..ETHIOPIC SYLLABLE XO
+1288 ; 3.0 # ETHIOPIC SYLLABLE XWA
+128A..128D ; 3.0 # [4] ETHIOPIC SYLLABLE XWI..ETHIOPIC SYLLABLE XWE
+1290..12AE ; 3.0 # [31] ETHIOPIC SYLLABLE NA..ETHIOPIC SYLLABLE KO
+12B0 ; 3.0 # ETHIOPIC SYLLABLE KWA
+12B2..12B5 ; 3.0 # [4] ETHIOPIC SYLLABLE KWI..ETHIOPIC SYLLABLE KWE
+12B8..12BE ; 3.0 # [7] ETHIOPIC SYLLABLE KXA..ETHIOPIC SYLLABLE KXO
+12C0 ; 3.0 # ETHIOPIC SYLLABLE KXWA
+12C2..12C5 ; 3.0 # [4] ETHIOPIC SYLLABLE KXWI..ETHIOPIC SYLLABLE KXWE
+12C8..12CE ; 3.0 # [7] ETHIOPIC SYLLABLE WA..ETHIOPIC SYLLABLE WO
+12D0..12D6 ; 3.0 # [7] ETHIOPIC SYLLABLE PHARYNGEAL A..ETHIOPIC SYLLABLE PHARYNGEAL O
+12D8..12EE ; 3.0 # [23] ETHIOPIC SYLLABLE ZA..ETHIOPIC SYLLABLE YO
+12F0..130E ; 3.0 # [31] ETHIOPIC SYLLABLE DA..ETHIOPIC SYLLABLE GO
+1310 ; 3.0 # ETHIOPIC SYLLABLE GWA
+1312..1315 ; 3.0 # [4] ETHIOPIC SYLLABLE GWI..ETHIOPIC SYLLABLE GWE
+1318..131E ; 3.0 # [7] ETHIOPIC SYLLABLE GGA..ETHIOPIC SYLLABLE GGO
+1320..1346 ; 3.0 # [39] ETHIOPIC SYLLABLE THA..ETHIOPIC SYLLABLE TZO
+1348..135A ; 3.0 # [19] ETHIOPIC SYLLABLE FA..ETHIOPIC SYLLABLE FYA
+1361..137C ; 3.0 # [28] ETHIOPIC WORDSPACE..ETHIOPIC NUMBER TEN THOUSAND
+13A0..13F4 ; 3.0 # [85] CHEROKEE LETTER A..CHEROKEE LETTER YV
+1401..1676 ; 3.0 # [630] CANADIAN SYLLABICS E..CANADIAN SYLLABICS NNGAA
+1680..169C ; 3.0 # [29] OGHAM SPACE MARK..OGHAM REVERSED FEATHER MARK
+16A0..16F0 ; 3.0 # [81] RUNIC LETTER FEHU FEOH FE F..RUNIC BELGTHOR SYMBOL
+1780..17B3 ; 3.0 # [52] KHMER LETTER KA..KHMER INDEPENDENT VOWEL QAU
+17B4..17B5 ; 3.0 # [2] KHMER VOWEL INHERENT AQ..KHMER VOWEL INHERENT AA
+17B6..17DC ; 3.0 # [39] KHMER VOWEL SIGN AA..KHMER SIGN AVAKRAHASANYA
+17E0..17E9 ; 3.0 # [10] KHMER DIGIT ZERO..KHMER DIGIT NINE
+1800..180E ; 3.0 # [15] MONGOLIAN BIRGA..MONGOLIAN VOWEL SEPARATOR
+1810..1819 ; 3.0 # [10] MONGOLIAN DIGIT ZERO..MONGOLIAN DIGIT NINE
+1820..1877 ; 3.0 # [88] MONGOLIAN LETTER A..MONGOLIAN LETTER MANCHU ZHA
+1880..18A9 ; 3.0 # [42] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI DAGALGA
+202F ; 3.0 # NARROW NO-BREAK SPACE
+2048..204D ; 3.0 # [6] QUESTION EXCLAMATION MARK..BLACK RIGHTWARDS BULLET
+20AD..20AF ; 3.0 # [3] KIP SIGN..DRACHMA SIGN
+20E2..20E3 ; 3.0 # [2] COMBINING ENCLOSING SCREEN..COMBINING ENCLOSING KEYCAP
+2139..213A ; 3.0 # [2] INFORMATION SOURCE..ROTATED CAPITAL Q
+2183 ; 3.0 # ROMAN NUMERAL REVERSED ONE HUNDRED
+21EB..21F3 ; 3.0 # [9] UPWARDS WHITE ARROW ON PEDESTAL..UP DOWN WHITE ARROW
+2301 ; 3.0 # ELECTRIC ARROW
+237B ; 3.0 # NOT CHECK MARK
+237D..239A ; 3.0 # [30] SHOULDERED OPEN BOX..CLEAR SCREEN SYMBOL
+2425..2426 ; 3.0 # [2] SYMBOL FOR DELETE FORM TWO..SYMBOL FOR SUBSTITUTE FORM TWO
+25F0..25F7 ; 3.0 # [8] WHITE SQUARE WITH UPPER LEFT QUADRANT..WHITE CIRCLE WITH UPPER RIGHT QUADRANT
+2619 ; 3.0 # REVERSED ROTATED FLORAL HEART BULLET
+2670..2671 ; 3.0 # [2] WEST SYRIAC CROSS..EAST SYRIAC CROSS
+2800..28FF ; 3.0 # [256] BRAILLE PATTERN BLANK..BRAILLE PATTERN DOTS-12345678
+2E80..2E99 ; 3.0 # [26] CJK RADICAL REPEAT..CJK RADICAL RAP
+2E9B..2EF3 ; 3.0 # [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE
+2F00..2FD5 ; 3.0 # [214] KANGXI RADICAL ONE..KANGXI RADICAL FLUTE
+2FF0..2FFB ; 3.0 # [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
+3038..303A ; 3.0 # [3] HANGZHOU NUMERAL TEN..HANGZHOU NUMERAL THIRTY
+303E ; 3.0 # IDEOGRAPHIC VARIATION INDICATOR
+31A0..31B7 ; 3.0 # [24] BOPOMOFO LETTER BU..BOPOMOFO FINAL LETTER H
+3400..4DB5 ; 3.0 # [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
+A000..A48C ; 3.0 # [1165] YI SYLLABLE IT..YI SYLLABLE YYR
+A490..A4A1 ; 3.0 # [18] YI RADICAL QOT..YI RADICAL GA
+A4A4..A4B3 ; 3.0 # [16] YI RADICAL DDUR..YI RADICAL JO
+A4B5..A4C0 ; 3.0 # [12] YI RADICAL JJY..YI RADICAL SHAT
+A4C2..A4C4 ; 3.0 # [3] YI RADICAL SHOP..YI RADICAL ZZIET
+A4C6 ; 3.0 # YI RADICAL KE
+FB1D ; 3.0 # HEBREW LETTER YOD WITH HIRIQ
+FFF9..FFFB ; 3.0 # [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR
+
+# Total code points: 10307
+
+# ================================================
+
+# Newly assigned in Unicode 3.1.0 (March, 2001)
+
+03F4..03F5 ; 3.1 # [2] GREEK CAPITAL THETA SYMBOL..GREEK LUNATE EPSILON SYMBOL
+FDD0..FDEF ; 3.1 # [32] <noncharacter-FDD0>..<noncharacter-FDEF>
+10300..1031E ; 3.1 # [31] OLD ITALIC LETTER A..OLD ITALIC LETTER UU
+10320..10323 ; 3.1 # [4] OLD ITALIC NUMERAL ONE..OLD ITALIC NUMERAL FIFTY
+10330..1034A ; 3.1 # [27] GOTHIC LETTER AHSA..GOTHIC LETTER NINE HUNDRED
+10400..10425 ; 3.1 # [38] DESERET CAPITAL LETTER LONG I..DESERET CAPITAL LETTER ENG
+10428..1044D ; 3.1 # [38] DESERET SMALL LETTER LONG I..DESERET SMALL LETTER ENG
+1D000..1D0F5 ; 3.1 # [246] BYZANTINE MUSICAL SYMBOL PSILI..BYZANTINE MUSICAL SYMBOL GORGON NEO KATO
+1D100..1D126 ; 3.1 # [39] MUSICAL SYMBOL SINGLE BARLINE..MUSICAL SYMBOL DRUM CLEF-2
+1D12A..1D172 ; 3.1 # [73] MUSICAL SYMBOL DOUBLE SHARP..MUSICAL SYMBOL COMBINING FLAG-5
+1D173..1D17A ; 3.1 # [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE
+1D17B..1D1DD ; 3.1 # [99] MUSICAL SYMBOL COMBINING ACCENT..MUSICAL SYMBOL PES SUBPUNCTIS
+1D400..1D454 ; 3.1 # [85] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL ITALIC SMALL G
+1D456..1D49C ; 3.1 # [71] MATHEMATICAL ITALIC SMALL I..MATHEMATICAL SCRIPT CAPITAL A
+1D49E..1D49F ; 3.1 # [2] MATHEMATICAL SCRIPT CAPITAL C..MATHEMATICAL SCRIPT CAPITAL D
+1D4A2 ; 3.1 # MATHEMATICAL SCRIPT CAPITAL G
+1D4A5..1D4A6 ; 3.1 # [2] MATHEMATICAL SCRIPT CAPITAL J..MATHEMATICAL SCRIPT CAPITAL K
+1D4A9..1D4AC ; 3.1 # [4] MATHEMATICAL SCRIPT CAPITAL N..MATHEMATICAL SCRIPT CAPITAL Q
+1D4AE..1D4B9 ; 3.1 # [12] MATHEMATICAL SCRIPT CAPITAL S..MATHEMATICAL SCRIPT SMALL D
+1D4BB ; 3.1 # MATHEMATICAL SCRIPT SMALL F
+1D4BD..1D4C0 ; 3.1 # [4] MATHEMATICAL SCRIPT SMALL H..MATHEMATICAL SCRIPT SMALL K
+1D4C2..1D4C3 ; 3.1 # [2] MATHEMATICAL SCRIPT SMALL M..MATHEMATICAL SCRIPT SMALL N
+1D4C5..1D505 ; 3.1 # [65] MATHEMATICAL SCRIPT SMALL P..MATHEMATICAL FRAKTUR CAPITAL B
+1D507..1D50A ; 3.1 # [4] MATHEMATICAL FRAKTUR CAPITAL D..MATHEMATICAL FRAKTUR CAPITAL G
+1D50D..1D514 ; 3.1 # [8] MATHEMATICAL FRAKTUR CAPITAL J..MATHEMATICAL FRAKTUR CAPITAL Q
+1D516..1D51C ; 3.1 # [7] MATHEMATICAL FRAKTUR CAPITAL S..MATHEMATICAL FRAKTUR CAPITAL Y
+1D51E..1D539 ; 3.1 # [28] MATHEMATICAL FRAKTUR SMALL A..MATHEMATICAL DOUBLE-STRUCK CAPITAL B
+1D53B..1D53E ; 3.1 # [4] MATHEMATICAL DOUBLE-STRUCK CAPITAL D..MATHEMATICAL DOUBLE-STRUCK CAPITAL G
+1D540..1D544 ; 3.1 # [5] MATHEMATICAL DOUBLE-STRUCK CAPITAL I..MATHEMATICAL DOUBLE-STRUCK CAPITAL M
+1D546 ; 3.1 # MATHEMATICAL DOUBLE-STRUCK CAPITAL O
+1D54A..1D550 ; 3.1 # [7] MATHEMATICAL DOUBLE-STRUCK CAPITAL S..MATHEMATICAL DOUBLE-STRUCK CAPITAL Y
+1D552..1D6A3 ; 3.1 # [338] MATHEMATICAL DOUBLE-STRUCK SMALL A..MATHEMATICAL MONOSPACE SMALL Z
+1D6A8..1D7C9 ; 3.1 # [290] MATHEMATICAL BOLD CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC PI SYMBOL
+1D7CE..1D7FF ; 3.1 # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE
+20000..2A6D6 ; 3.1 # [42711] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6D6
+2F800..2FA1D ; 3.1 # [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
+E0001 ; 3.1 # LANGUAGE TAG
+E0020..E007F ; 3.1 # [96] TAG SPACE..CANCEL TAG
+
+# Total code points: 44978
+
+# ================================================
+
+# Newly assigned in Unicode 3.2.0 (March, 2002)
+
+0220 ; 3.2 # LATIN CAPITAL LETTER N WITH LONG RIGHT LEG
+034F ; 3.2 # COMBINING GRAPHEME JOINER
+0363..036F ; 3.2 # [13] COMBINING LATIN SMALL LETTER A..COMBINING LATIN SMALL LETTER X
+03D8..03D9 ; 3.2 # [2] GREEK LETTER ARCHAIC KOPPA..GREEK SMALL LETTER ARCHAIC KOPPA
+03F6 ; 3.2 # GREEK REVERSED LUNATE EPSILON SYMBOL
+048A..048B ; 3.2 # [2] CYRILLIC CAPITAL LETTER SHORT I WITH TAIL..CYRILLIC SMALL LETTER SHORT I WITH TAIL
+04C5..04C6 ; 3.2 # [2] CYRILLIC CAPITAL LETTER EL WITH TAIL..CYRILLIC SMALL LETTER EL WITH TAIL
+04C9..04CA ; 3.2 # [2] CYRILLIC CAPITAL LETTER EN WITH TAIL..CYRILLIC SMALL LETTER EN WITH TAIL
+04CD..04CE ; 3.2 # [2] CYRILLIC CAPITAL LETTER EM WITH TAIL..CYRILLIC SMALL LETTER EM WITH TAIL
+0500..050F ; 3.2 # [16] CYRILLIC CAPITAL LETTER KOMI DE..CYRILLIC SMALL LETTER KOMI TJE
+066E..066F ; 3.2 # [2] ARABIC LETTER DOTLESS BEH..ARABIC LETTER DOTLESS QAF
+07B1 ; 3.2 # THAANA LETTER NAA
+10F7..10F8 ; 3.2 # [2] GEORGIAN LETTER YN..GEORGIAN LETTER ELIFI
+1700..170C ; 3.2 # [13] TAGALOG LETTER A..TAGALOG LETTER YA
+170E..1714 ; 3.2 # [7] TAGALOG LETTER LA..TAGALOG SIGN VIRAMA
+1720..1736 ; 3.2 # [23] HANUNOO LETTER A..PHILIPPINE DOUBLE PUNCTUATION
+1740..1753 ; 3.2 # [20] BUHID LETTER A..BUHID VOWEL SIGN U
+1760..176C ; 3.2 # [13] TAGBANWA LETTER A..TAGBANWA LETTER YA
+176E..1770 ; 3.2 # [3] TAGBANWA LETTER LA..TAGBANWA LETTER SA
+1772..1773 ; 3.2 # [2] TAGBANWA VOWEL SIGN I..TAGBANWA VOWEL SIGN U
+2047 ; 3.2 # DOUBLE QUESTION MARK
+204E..2052 ; 3.2 # [5] LOW ASTERISK..COMMERCIAL MINUS SIGN
+2057 ; 3.2 # QUADRUPLE PRIME
+205F ; 3.2 # MEDIUM MATHEMATICAL SPACE
+2060..2063 ; 3.2 # [4] WORD JOINER..INVISIBLE SEPARATOR
+2071 ; 3.2 # SUPERSCRIPT LATIN SMALL LETTER I
+20B0..20B1 ; 3.2 # [2] GERMAN PENNY SIGN..PESO SIGN
+20E4..20EA ; 3.2 # [7] COMBINING ENCLOSING UPWARD POINTING TRIANGLE..COMBINING LEFTWARDS ARROW OVERLAY
+213D..214B ; 3.2 # [15] DOUBLE-STRUCK SMALL GAMMA..TURNED AMPERSAND
+21F4..21FF ; 3.2 # [12] RIGHT ARROW WITH SMALL CIRCLE..LEFT RIGHT OPEN-HEADED ARROW
+22F2..22FF ; 3.2 # [14] ELEMENT OF WITH LONG HORIZONTAL STROKE..Z NOTATION BAG MEMBERSHIP
+237C ; 3.2 # RIGHT ANGLE WITH DOWNWARDS ZIGZAG ARROW
+239B..23CE ; 3.2 # [52] LEFT PARENTHESIS UPPER HOOK..RETURN SYMBOL
+24EB..24FE ; 3.2 # [20] NEGATIVE CIRCLED NUMBER ELEVEN..DOUBLE CIRCLED NUMBER TEN
+2596..259F ; 3.2 # [10] QUADRANT LOWER LEFT..QUADRANT UPPER RIGHT AND LOWER LEFT AND LOWER RIGHT
+25F8..25FF ; 3.2 # [8] UPPER LEFT TRIANGLE..LOWER RIGHT TRIANGLE
+2616..2617 ; 3.2 # [2] WHITE SHOGI PIECE..BLACK SHOGI PIECE
+2672..267D ; 3.2 # [12] UNIVERSAL RECYCLING SYMBOL..PARTIALLY-RECYCLED PAPER SYMBOL
+2680..2689 ; 3.2 # [10] DIE FACE-1..BLACK CIRCLE WITH TWO WHITE DOTS
+2768..2775 ; 3.2 # [14] MEDIUM LEFT PARENTHESIS ORNAMENT..MEDIUM RIGHT CURLY BRACKET ORNAMENT
+27D0..27EB ; 3.2 # [28] WHITE DIAMOND WITH CENTRED DOT..MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET
+27F0..27FF ; 3.2 # [16] UPWARDS QUADRUPLE ARROW..LONG RIGHTWARDS SQUIGGLE ARROW
+2900..2AFF ; 3.2 # [512] RIGHTWARDS TWO-HEADED ARROW WITH VERTICAL STROKE..N-ARY WHITE VERTICAL BAR
+303B..303D ; 3.2 # [3] VERTICAL IDEOGRAPHIC ITERATION MARK..PART ALTERNATION MARK
+3095..3096 ; 3.2 # [2] HIRAGANA LETTER SMALL KA..HIRAGANA LETTER SMALL KE
+309F..30A0 ; 3.2 # [2] HIRAGANA DIGRAPH YORI..KATAKANA-HIRAGANA DOUBLE HYPHEN
+30FF ; 3.2 # KATAKANA DIGRAPH KOTO
+31F0..31FF ; 3.2 # [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO
+3251..325F ; 3.2 # [15] CIRCLED NUMBER TWENTY ONE..CIRCLED NUMBER THIRTY FIVE
+32B1..32BF ; 3.2 # [15] CIRCLED NUMBER THIRTY SIX..CIRCLED NUMBER FIFTY
+A4A2..A4A3 ; 3.2 # [2] YI RADICAL ZUP..YI RADICAL CYT
+A4B4 ; 3.2 # YI RADICAL NZUP
+A4C1 ; 3.2 # YI RADICAL ZUR
+A4C5 ; 3.2 # YI RADICAL NBIE
+FA30..FA6A ; 3.2 # [59] CJK COMPATIBILITY IDEOGRAPH-FA30..CJK COMPATIBILITY IDEOGRAPH-FA6A
+FDFC ; 3.2 # RIAL SIGN
+FE00..FE0F ; 3.2 # [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16
+FE45..FE46 ; 3.2 # [2] SESAME DOT..WHITE SESAME DOT
+FE73 ; 3.2 # ARABIC TAIL FRAGMENT
+FF5F..FF60 ; 3.2 # [2] FULLWIDTH LEFT WHITE PARENTHESIS..FULLWIDTH RIGHT WHITE PARENTHESIS
+
+# Total code points: 1016
+
+# ================================================
+
+# Newly assigned in Unicode 4.0.0 (April, 2003)
+
+0221 ; 4.0 # LATIN SMALL LETTER D WITH CURL
+0234..0236 ; 4.0 # [3] LATIN SMALL LETTER L WITH CURL..LATIN SMALL LETTER T WITH CURL
+02AE..02AF ; 4.0 # [2] LATIN SMALL LETTER TURNED H WITH FISHHOOK..LATIN SMALL LETTER TURNED H WITH FISHHOOK AND TAIL
+02EF..02FF ; 4.0 # [17] MODIFIER LETTER LOW DOWN ARROWHEAD..MODIFIER LETTER LOW LEFT ARROW
+0350..0357 ; 4.0 # [8] COMBINING RIGHT ARROWHEAD ABOVE..COMBINING RIGHT HALF RING ABOVE
+035D..035F ; 4.0 # [3] COMBINING DOUBLE BREVE..COMBINING DOUBLE MACRON BELOW
+03F7..03FB ; 4.0 # [5] GREEK CAPITAL LETTER SHO..GREEK SMALL LETTER SAN
+0600..0603 ; 4.0 # [4] ARABIC NUMBER SIGN..ARABIC SIGN SAFHA
+060D..0615 ; 4.0 # [9] ARABIC DATE SEPARATOR..ARABIC SMALL HIGH TAH
+0656..0658 ; 4.0 # [3] ARABIC SUBSCRIPT ALEF..ARABIC MARK NOON GHUNNA
+06EE..06EF ; 4.0 # [2] ARABIC LETTER DAL WITH INVERTED V..ARABIC LETTER REH WITH INVERTED V
+06FF ; 4.0 # ARABIC LETTER HEH WITH INVERTED V
+072D..072F ; 4.0 # [3] SYRIAC LETTER PERSIAN BHETH..SYRIAC LETTER PERSIAN DHALATH
+074D..074F ; 4.0 # [3] SYRIAC LETTER SOGDIAN ZHAIN..SYRIAC LETTER SOGDIAN FE
+0904 ; 4.0 # DEVANAGARI LETTER SHORT A
+09BD ; 4.0 # BENGALI SIGN AVAGRAHA
+0A01 ; 4.0 # GURMUKHI SIGN ADAK BINDI
+0A03 ; 4.0 # GURMUKHI SIGN VISARGA
+0A8C ; 4.0 # GUJARATI LETTER VOCALIC L
+0AE1..0AE3 ; 4.0 # [3] GUJARATI LETTER VOCALIC LL..GUJARATI VOWEL SIGN VOCALIC LL
+0AF1 ; 4.0 # GUJARATI RUPEE SIGN
+0B35 ; 4.0 # ORIYA LETTER VA
+0B71 ; 4.0 # ORIYA LETTER WA
+0BF3..0BFA ; 4.0 # [8] TAMIL DAY SIGN..TAMIL NUMBER SIGN
+0CBC..0CBD ; 4.0 # [2] KANNADA SIGN NUKTA..KANNADA SIGN AVAGRAHA
+17DD ; 4.0 # KHMER SIGN ATTHACAN
+17F0..17F9 ; 4.0 # [10] KHMER SYMBOL LEK ATTAK SON..KHMER SYMBOL LEK ATTAK PRAM-BUON
+1900..191C ; 4.0 # [29] LIMBU VOWEL-CARRIER LETTER..LIMBU LETTER HA
+1920..192B ; 4.0 # [12] LIMBU VOWEL SIGN A..LIMBU SUBJOINED LETTER WA
+1930..193B ; 4.0 # [12] LIMBU SMALL LETTER KA..LIMBU SIGN SA-I
+1940 ; 4.0 # LIMBU SIGN LOO
+1944..196D ; 4.0 # [42] LIMBU EXCLAMATION MARK..TAI LE LETTER AI
+1970..1974 ; 4.0 # [5] TAI LE LETTER TONE-2..TAI LE LETTER TONE-6
+19E0..19FF ; 4.0 # [32] KHMER SYMBOL PATHAMASAT..KHMER SYMBOL DAP-PRAM ROC
+1D00..1D6B ; 4.0 # [108] LATIN LETTER SMALL CAPITAL A..LATIN SMALL LETTER UE
+2053..2054 ; 4.0 # [2] SWUNG DASH..INVERTED UNDERTIE
+213B ; 4.0 # FACSIMILE SIGN
+23CF..23D0 ; 4.0 # [2] EJECT SYMBOL..VERTICAL LINE EXTENSION
+24FF ; 4.0 # NEGATIVE CIRCLED DIGIT ZERO
+2614..2615 ; 4.0 # [2] UMBRELLA WITH RAIN DROPS..HOT BEVERAGE
+268A..2691 ; 4.0 # [8] MONOGRAM FOR YANG..BLACK FLAG
+26A0..26A1 ; 4.0 # [2] WARNING SIGN..HIGH VOLTAGE SIGN
+2B00..2B0D ; 4.0 # [14] NORTH EAST WHITE ARROW..UP DOWN BLACK ARROW
+321D..321E ; 4.0 # [2] PARENTHESIZED KOREAN CHARACTER OJEON..PARENTHESIZED KOREAN CHARACTER O HU
+3250 ; 4.0 # PARTNERSHIP SIGN
+327C..327D ; 4.0 # [2] CIRCLED KOREAN CHARACTER CHAMKO..CIRCLED KOREAN CHARACTER JUEUI
+32CC..32CF ; 4.0 # [4] SQUARE HG..LIMITED LIABILITY SIGN
+3377..337A ; 4.0 # [4] SQUARE DM..SQUARE IU
+33DE..33DF ; 4.0 # [2] SQUARE V OVER M..SQUARE A OVER M
+33FF ; 4.0 # SQUARE GAL
+4DC0..4DFF ; 4.0 # [64] HEXAGRAM FOR THE CREATIVE HEAVEN..HEXAGRAM FOR BEFORE COMPLETION
+FDFD ; 4.0 # ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
+FE47..FE48 ; 4.0 # [2] PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET..PRESENTATION FORM FOR VERTICAL RIGHT SQUARE BRACKET
+10000..1000B ; 4.0 # [12] LINEAR B SYLLABLE B008 A..LINEAR B SYLLABLE B046 JE
+1000D..10026 ; 4.0 # [26] LINEAR B SYLLABLE B036 JO..LINEAR B SYLLABLE B032 QO
+10028..1003A ; 4.0 # [19] LINEAR B SYLLABLE B060 RA..LINEAR B SYLLABLE B042 WO
+1003C..1003D ; 4.0 # [2] LINEAR B SYLLABLE B017 ZA..LINEAR B SYLLABLE B074 ZE
+1003F..1004D ; 4.0 # [15] LINEAR B SYLLABLE B020 ZO..LINEAR B SYLLABLE B091 TWO
+10050..1005D ; 4.0 # [14] LINEAR B SYMBOL B018..LINEAR B SYMBOL B089
+10080..100FA ; 4.0 # [123] LINEAR B IDEOGRAM B100 MAN..LINEAR B IDEOGRAM VESSEL B305
+10100..10102 ; 4.0 # [3] AEGEAN WORD SEPARATOR LINE..AEGEAN CHECK MARK
+10107..10133 ; 4.0 # [45] AEGEAN NUMBER ONE..AEGEAN NUMBER NINETY THOUSAND
+10137..1013F ; 4.0 # [9] AEGEAN WEIGHT BASE UNIT..AEGEAN MEASURE THIRD SUBUNIT
+10380..1039D ; 4.0 # [30] UGARITIC LETTER ALPA..UGARITIC LETTER SSU
+1039F ; 4.0 # UGARITIC WORD DIVIDER
+10426..10427 ; 4.0 # [2] DESERET CAPITAL LETTER OI..DESERET CAPITAL LETTER EW
+1044E..1049D ; 4.0 # [80] DESERET SMALL LETTER OI..OSMANYA LETTER OO
+104A0..104A9 ; 4.0 # [10] OSMANYA DIGIT ZERO..OSMANYA DIGIT NINE
+10800..10805 ; 4.0 # [6] CYPRIOT SYLLABLE A..CYPRIOT SYLLABLE JA
+10808 ; 4.0 # CYPRIOT SYLLABLE JO
+1080A..10835 ; 4.0 # [44] CYPRIOT SYLLABLE KA..CYPRIOT SYLLABLE WO
+10837..10838 ; 4.0 # [2] CYPRIOT SYLLABLE XA..CYPRIOT SYLLABLE XE
+1083C ; 4.0 # CYPRIOT SYLLABLE ZA
+1083F ; 4.0 # CYPRIOT SYLLABLE ZO
+1D300..1D356 ; 4.0 # [87] MONOGRAM FOR EARTH..TETRAGRAM FOR FOSTERING
+1D4C1 ; 4.0 # MATHEMATICAL SCRIPT SMALL L
+E0100..E01EF ; 4.0 # [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
+
+# Total code points: 1226
+
+# ================================================
+
+# Newly assigned in Unicode 4.1.0 (March, 2005)
+
+0237..0241 ; 4.1 # [11] LATIN SMALL LETTER DOTLESS J..LATIN CAPITAL LETTER GLOTTAL STOP
+0358..035C ; 4.1 # [5] COMBINING DOT ABOVE RIGHT..COMBINING DOUBLE BREVE BELOW
+03FC..03FF ; 4.1 # [4] GREEK RHO WITH STROKE SYMBOL..GREEK CAPITAL REVERSED DOTTED LUNATE SIGMA SYMBOL
+04F6..04F7 ; 4.1 # [2] CYRILLIC CAPITAL LETTER GHE WITH DESCENDER..CYRILLIC SMALL LETTER GHE WITH DESCENDER
+05A2 ; 4.1 # HEBREW ACCENT ATNAH HAFUKH
+05C5..05C7 ; 4.1 # [3] HEBREW MARK LOWER DOT..HEBREW POINT QAMATS QATAN
+060B ; 4.1 # AFGHANI SIGN
+061E ; 4.1 # ARABIC TRIPLE DOT PUNCTUATION MARK
+0659..065E ; 4.1 # [6] ARABIC ZWARAKAY..ARABIC FATHA WITH TWO DOTS
+0750..076D ; 4.1 # [30] ARABIC LETTER BEH WITH THREE DOTS HORIZONTALLY BELOW..ARABIC LETTER SEEN WITH TWO DOTS VERTICALLY ABOVE
+097D ; 4.1 # DEVANAGARI LETTER GLOTTAL STOP
+09CE ; 4.1 # BENGALI LETTER KHANDA TA
+0BB6 ; 4.1 # TAMIL LETTER SHA
+0BE6 ; 4.1 # TAMIL DIGIT ZERO
+0FD0..0FD1 ; 4.1 # [2] TIBETAN MARK BSKA- SHOG GI MGO RGYAN..TIBETAN MARK MNYAM YIG GI MGO RGYAN
+10F9..10FA ; 4.1 # [2] GEORGIAN LETTER TURNED GAN..GEORGIAN LETTER AIN
+10FC ; 4.1 # MODIFIER LETTER GEORGIAN NAR
+1207 ; 4.1 # ETHIOPIC SYLLABLE HOA
+1247 ; 4.1 # ETHIOPIC SYLLABLE QOA
+1287 ; 4.1 # ETHIOPIC SYLLABLE XOA
+12AF ; 4.1 # ETHIOPIC SYLLABLE KOA
+12CF ; 4.1 # ETHIOPIC SYLLABLE WOA
+12EF ; 4.1 # ETHIOPIC SYLLABLE YOA
+130F ; 4.1 # ETHIOPIC SYLLABLE GOA
+131F ; 4.1 # ETHIOPIC SYLLABLE GGWAA
+1347 ; 4.1 # ETHIOPIC SYLLABLE TZOA
+135F..1360 ; 4.1 # [2] ETHIOPIC COMBINING GEMINATION MARK..ETHIOPIC SECTION MARK
+1380..1399 ; 4.1 # [26] ETHIOPIC SYLLABLE SEBATBEIT MWA..ETHIOPIC TONAL MARK KURT
+1980..19A9 ; 4.1 # [42] NEW TAI LUE LETTER HIGH QA..NEW TAI LUE LETTER LOW XVA
+19B0..19C9 ; 4.1 # [26] NEW TAI LUE VOWEL SIGN VOWEL SHORTENER..NEW TAI LUE TONE MARK-2
+19D0..19D9 ; 4.1 # [10] NEW TAI LUE DIGIT ZERO..NEW TAI LUE DIGIT NINE
+19DE..19DF ; 4.1 # [2] NEW TAI LUE SIGN LAE..NEW TAI LUE SIGN LAEV
+1A00..1A1B ; 4.1 # [28] BUGINESE LETTER KA..BUGINESE VOWEL SIGN AE
+1A1E..1A1F ; 4.1 # [2] BUGINESE PALLAWA..BUGINESE END OF SECTION
+1D6C..1DC3 ; 4.1 # [88] LATIN SMALL LETTER B WITH MIDDLE TILDE..COMBINING SUSPENSION MARK
+2055..2056 ; 4.1 # [2] FLOWER PUNCTUATION MARK..THREE DOT PUNCTUATION
+2058..205E ; 4.1 # [7] FOUR DOT PUNCTUATION..VERTICAL FOUR DOTS
+2090..2094 ; 4.1 # [5] LATIN SUBSCRIPT SMALL LETTER A..LATIN SUBSCRIPT SMALL LETTER SCHWA
+20B2..20B5 ; 4.1 # [4] GUARANI SIGN..CEDI SIGN
+20EB ; 4.1 # COMBINING LONG DOUBLE SOLIDUS OVERLAY
+213C ; 4.1 # DOUBLE-STRUCK SMALL PI
+214C ; 4.1 # PER SIGN
+23D1..23DB ; 4.1 # [11] METRICAL BREVE..FUSE
+2618 ; 4.1 # SHAMROCK
+267E..267F ; 4.1 # [2] PERMANENT PAPER SIGN..WHEELCHAIR SYMBOL
+2692..269C ; 4.1 # [11] HAMMER AND PICK..FLEUR-DE-LIS
+26A2..26B1 ; 4.1 # [16] DOUBLED FEMALE SIGN..FUNERAL URN
+27C0..27C6 ; 4.1 # [7] THREE DIMENSIONAL ANGLE..RIGHT S-SHAPED BAG DELIMITER
+2B0E..2B13 ; 4.1 # [6] RIGHTWARDS ARROW WITH TIP DOWNWARDS..SQUARE WITH BOTTOM HALF BLACK
+2C00..2C2E ; 4.1 # [47] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE
+2C30..2C5E ; 4.1 # [47] GLAGOLITIC SMALL LETTER AZU..GLAGOLITIC SMALL LETTER LATINATE MYSLITE
+2C80..2CEA ; 4.1 # [107] COPTIC CAPITAL LETTER ALFA..COPTIC SYMBOL SHIMA SIMA
+2CF9..2D25 ; 4.1 # [45] COPTIC OLD NUBIAN FULL STOP..GEORGIAN SMALL LETTER HOE
+2D30..2D65 ; 4.1 # [54] TIFINAGH LETTER YA..TIFINAGH LETTER YAZZ
+2D6F ; 4.1 # TIFINAGH MODIFIER LETTER LABIALIZATION MARK
+2D80..2D96 ; 4.1 # [23] ETHIOPIC SYLLABLE LOA..ETHIOPIC SYLLABLE GGWE
+2DA0..2DA6 ; 4.1 # [7] ETHIOPIC SYLLABLE SSA..ETHIOPIC SYLLABLE SSO
+2DA8..2DAE ; 4.1 # [7] ETHIOPIC SYLLABLE CCA..ETHIOPIC SYLLABLE CCO
+2DB0..2DB6 ; 4.1 # [7] ETHIOPIC SYLLABLE ZZA..ETHIOPIC SYLLABLE ZZO
+2DB8..2DBE ; 4.1 # [7] ETHIOPIC SYLLABLE CCHA..ETHIOPIC SYLLABLE CCHO
+2DC0..2DC6 ; 4.1 # [7] ETHIOPIC SYLLABLE QYA..ETHIOPIC SYLLABLE QYO
+2DC8..2DCE ; 4.1 # [7] ETHIOPIC SYLLABLE KYA..ETHIOPIC SYLLABLE KYO
+2DD0..2DD6 ; 4.1 # [7] ETHIOPIC SYLLABLE XYA..ETHIOPIC SYLLABLE XYO
+2DD8..2DDE ; 4.1 # [7] ETHIOPIC SYLLABLE GYA..ETHIOPIC SYLLABLE GYO
+2E00..2E17 ; 4.1 # [24] RIGHT ANGLE SUBSTITUTION MARKER..DOUBLE OBLIQUE HYPHEN
+2E1C..2E1D ; 4.1 # [2] LEFT LOW PARAPHRASE BRACKET..RIGHT LOW PARAPHRASE BRACKET
+31C0..31CF ; 4.1 # [16] CJK STROKE T..CJK STROKE N
+327E ; 4.1 # CIRCLED HANGUL IEUNG U
+9FA6..9FBB ; 4.1 # [22] CJK UNIFIED IDEOGRAPH-9FA6..CJK UNIFIED IDEOGRAPH-9FBB
+A700..A716 ; 4.1 # [23] MODIFIER LETTER CHINESE TONE YIN PING..MODIFIER LETTER EXTRA-LOW LEFT-STEM TONE BAR
+A800..A82B ; 4.1 # [44] SYLOTI NAGRI LETTER A..SYLOTI NAGRI POETRY MARK-4
+FA70..FAD9 ; 4.1 # [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9
+FE10..FE19 ; 4.1 # [10] PRESENTATION FORM FOR VERTICAL COMMA..PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS
+10140..1018A ; 4.1 # [75] GREEK ACROPHONIC ATTIC ONE QUARTER..GREEK ZERO SIGN
+103A0..103C3 ; 4.1 # [36] OLD PERSIAN SIGN A..OLD PERSIAN SIGN HA
+103C8..103D5 ; 4.1 # [14] OLD PERSIAN SIGN AURAMAZDAA..OLD PERSIAN NUMBER HUNDRED
+10A00..10A03 ; 4.1 # [4] KHAROSHTHI LETTER A..KHAROSHTHI VOWEL SIGN VOCALIC R
+10A05..10A06 ; 4.1 # [2] KHAROSHTHI VOWEL SIGN E..KHAROSHTHI VOWEL SIGN O
+10A0C..10A13 ; 4.1 # [8] KHAROSHTHI VOWEL LENGTH MARK..KHAROSHTHI LETTER GHA
+10A15..10A17 ; 4.1 # [3] KHAROSHTHI LETTER CA..KHAROSHTHI LETTER JA
+10A19..10A33 ; 4.1 # [27] KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER TTTHA
+10A38..10A3A ; 4.1 # [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW
+10A3F..10A47 ; 4.1 # [9] KHAROSHTHI VIRAMA..KHAROSHTHI NUMBER ONE THOUSAND
+10A50..10A58 ; 4.1 # [9] KHAROSHTHI PUNCTUATION DOT..KHAROSHTHI PUNCTUATION LINES
+1D200..1D245 ; 4.1 # [70] GREEK VOCAL NOTATION SYMBOL-1..GREEK MUSICAL LEIMMA
+1D6A4..1D6A5 ; 4.1 # [2] MATHEMATICAL ITALIC SMALL DOTLESS I..MATHEMATICAL ITALIC SMALL DOTLESS J
+
+# Total code points: 1273
+
+# ================================================
+
+# Newly assigned in Unicode 5.0.0 (July, 2006)
+
+0242..024F ; 5.0 # [14] LATIN SMALL LETTER GLOTTAL STOP..LATIN SMALL LETTER Y WITH STROKE
+037B..037D ; 5.0 # [3] GREEK SMALL REVERSED LUNATE SIGMA SYMBOL..GREEK SMALL REVERSED DOTTED LUNATE SIGMA SYMBOL
+04CF ; 5.0 # CYRILLIC SMALL LETTER PALOCHKA
+04FA..04FF ; 5.0 # [6] CYRILLIC CAPITAL LETTER GHE WITH STROKE AND HOOK..CYRILLIC SMALL LETTER HA WITH STROKE
+0510..0513 ; 5.0 # [4] CYRILLIC CAPITAL LETTER REVERSED ZE..CYRILLIC SMALL LETTER EL WITH HOOK
+05BA ; 5.0 # HEBREW POINT HOLAM HASER FOR VAV
+07C0..07FA ; 5.0 # [59] NKO DIGIT ZERO..NKO LAJANYALAN
+097B..097C ; 5.0 # [2] DEVANAGARI LETTER GGA..DEVANAGARI LETTER JJA
+097E..097F ; 5.0 # [2] DEVANAGARI LETTER DDDA..DEVANAGARI LETTER BBA
+0CE2..0CE3 ; 5.0 # [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
+0CF1..0CF2 ; 5.0 # [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADHMANIYA
+1B00..1B4B ; 5.0 # [76] BALINESE SIGN ULU RICEM..BALINESE LETTER ASYURA SASAK
+1B50..1B7C ; 5.0 # [45] BALINESE DIGIT ZERO..BALINESE MUSICAL SYMBOL LEFT-HAND OPEN PING
+1DC4..1DCA ; 5.0 # [7] COMBINING MACRON-ACUTE..COMBINING LATIN SMALL LETTER R BELOW
+1DFE..1DFF ; 5.0 # [2] COMBINING LEFT ARROWHEAD ABOVE..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
+20EC..20EF ; 5.0 # [4] COMBINING RIGHTWARDS HARPOON WITH BARB DOWNWARDS..COMBINING RIGHT ARROW BELOW
+214D..214E ; 5.0 # [2] AKTIESELSKAB..TURNED SMALL F
+2184 ; 5.0 # LATIN SMALL LETTER REVERSED C
+23DC..23E7 ; 5.0 # [12] TOP PARENTHESIS..ELECTRICAL INTERSECTION
+26B2 ; 5.0 # NEUTER
+27C7..27CA ; 5.0 # [4] OR WITH DOT INSIDE..VERTICAL BAR WITH HORIZONTAL STROKE
+2B14..2B1A ; 5.0 # [7] SQUARE WITH UPPER RIGHT DIAGONAL HALF BLACK..DOTTED SQUARE
+2B20..2B23 ; 5.0 # [4] WHITE PENTAGON..HORIZONTAL BLACK HEXAGON
+2C60..2C6C ; 5.0 # [13] LATIN CAPITAL LETTER L WITH DOUBLE BAR..LATIN SMALL LETTER Z WITH DESCENDER
+2C74..2C77 ; 5.0 # [4] LATIN SMALL LETTER V WITH CURL..LATIN SMALL LETTER TAILLESS PHI
+A717..A71A ; 5.0 # [4] MODIFIER LETTER DOT VERTICAL BAR..MODIFIER LETTER LOWER RIGHT CORNER ANGLE
+A720..A721 ; 5.0 # [2] MODIFIER LETTER STRESS AND HIGH TONE..MODIFIER LETTER STRESS AND LOW TONE
+A840..A877 ; 5.0 # [56] PHAGS-PA LETTER KA..PHAGS-PA MARK DOUBLE SHAD
+10900..10919 ; 5.0 # [26] PHOENICIAN LETTER ALF..PHOENICIAN NUMBER ONE HUNDRED
+1091F ; 5.0 # PHOENICIAN WORD SEPARATOR
+12000..1236E ; 5.0 # [879] CUNEIFORM SIGN A..CUNEIFORM SIGN ZUM
+12400..12462 ; 5.0 # [99] CUNEIFORM NUMERIC SIGN TWO ASH..CUNEIFORM NUMERIC SIGN OLD ASSYRIAN ONE QUARTER
+12470..12473 ; 5.0 # [4] CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER..CUNEIFORM PUNCTUATION SIGN DIAGONAL TRICOLON
+1D360..1D371 ; 5.0 # [18] COUNTING ROD UNIT DIGIT ONE..COUNTING ROD TENS DIGIT NINE
+1D7CA..1D7CB ; 5.0 # [2] MATHEMATICAL BOLD CAPITAL DIGAMMA..MATHEMATICAL BOLD SMALL DIGAMMA
+
+# Total code points: 1369
+
+# EOF
View
63 map/b2g_map.utf8
@@ -408,7 +408,6 @@ B5 GB
ˊ ˊ
ˇ ˇ
ˋ `
-€ □
一 一
乙 乙
丁 丁
@@ -6128,65 +6127,14 @@ B5 GB
冈 网
 耠
𧘇 衣
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
 寝
- □
- □
 嵴
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
- □
¬ 「
¦ ┆
' ’
" ”
-㈱ □
№ №
-℡ □
゛ 犏
-゜ □
-⺀ □
-⺄ □
-⺆ □
-⺇ □
-⺈ □
-⺊ □
-⺌ □
-⺍ □
-⺕ □
-⺜ □
⺝ 亻
⺥ 勹
⺧ 阝
@@ -6205,19 +6153,8 @@ B5 GB
⻗ 骺
⻞ 鲅
⻣ 锿
-�� □
-�� □
-�� □
-ʃ □
-ɐ □
-ɛ □
-ɔ □
-ɵ □
œ 镢
ø 馓
-ŋ □
-ʊ □
-ɪ □
乂 乂
乜 乜
凵 凵
View
5,052 map/g2b_map.utf8
0 additions, 5,052 deletions not shown because the diff is too large. Please use a local Git client to view these changes.
View
216 map/umap2ucm.pl
@@ -1,21 +1,21 @@
#!/usr/bin/perl
-# $File: //member/autrijus/Encode-HanConvert/map/umap2ucm.pl $ $Author: autrijus $
-# $Revision: #2 $ $Change: 3939 $ $DateTime: 2003/01/27 22:52:26 $
+# $Id$
use strict;
use Encode 1.41;
use File::Spec;
use File::Basename;
my $path = dirname($0);
-conv(File::Spec->catdir($path, 'b2g_map.txt') => 'trad-simp', 'gbk', 'big5');
-conv(File::Spec->catdir($path, 'g2b_map.txt') => 'simp-trad', 'big5', 'gbk');
+conv(File::Spec->catdir($path, 'b2g_map.utf8') => 'trad-simp');
+conv(File::Spec->catdir($path, 'g2b_map.utf8') => 'simp-trad');
sub conv {
- my ($src, $target, $enc, $fenc) = @_;
+ my ($src, $target) = @_;
my %count;
+ my @has;
- open IN, $src or die $!;
+ open IN, '<:utf8', $src or die $!;
open OUT, ">$target.ucm" or die $!;
print OUT << ".";
@@ -24,23 +24,37 @@ sub conv {
<code_set_name> "$target"
.
print OUT +HEADER();
- print OUT +B5HEADER() unless $target =~ /gbk/i;
<IN>; <IN>;
while (<IN>) {
- my $uchar = decode($enc, substr($_, 3, 2)) or next;
- my $fchar = encode_utf8(decode($fenc, substr($_, 0, 2))) or next;
- printf OUT "<U%04X> %s |%u\n",
- ord($uchar),
- join('', map sprintf('\\x%02X', ord($_)), split('', $fchar)),
- 0; # XXX - suggestions welcome to the fallback char here
+ my ($fchar, $tchar) = m/^(.) (.)/;
+ print OUT ucm_entry($fchar, $tchar);
+ $has[ord $fchar] = 1;
+ }
+ close IN;
+
+ open IN, File::Spec->catdir($path, 'DerivedAge.txt') or die $!;
+ while(<IN>) {
+ next if /<noncharacter/ || /<surrogate/;
+ if (/^([0-9A-F]+)\s+;/) {
+ $has[hex $1] || print OUT ucm_entry(chr hex $1, chr hex $1);
+ } elsif(/^([0-9A-F]+)\.\.([0-9A-F]+)\s+;/) {
+ $has[$_] || print OUT ucm_entry(chr $_, chr $_) for hex $1 .. hex $2;
+ }
}
- print OUT +B5FOOTER() unless $target =~ /gbk/i;
print OUT +FOOTER();
close OUT;
- close IN;
+}
+
+sub ucm_entry {
+ my ($fchar, $tchar) = @_;
+ my $utf8 = encode_utf8($fchar);
+ return sprintf("<U%04X> %s |%u\n",
+ ord($tchar),
+ join('', map sprintf('\\x%02X', ord($_)), split('', $utf8)),
+ 0); # XXX - suggestions welcome to the fallback char here
}
use constant HEADER => << '.';
@@ -49,180 +63,8 @@ sub conv {
<subchar> \x3F
#
CHARMAP
-<U0000> \x00 |0 # NULL
-<U0001> \x01 |0 # START OF HEADING
-<U0002> \x02 |0 # START OF TEXT
-<U0003> \x03 |0 # END OF TEXT
-<U0004> \x04 |0 # END OF TRANSMISSION
-<U0005> \x05 |0 # ENQUIRY
-<U0006> \x06 |0 # ACKNOWLEDGE
-<U0007> \x07 |0 # BELL
-<U0008> \x08 |0 # BACKSPACE
-<U0009> \x09 |0 # HORIZONTAL TABULATION
-<U000A> \x0A |0 # LINE FEED
-<U000B> \x0B |0 # VERTICAL TABULATION
-<U000C> \x0C |0 # FORM FEED
-<U000D> \x0D |0 # CARRIAGE RETURN
-<U000E> \x0E |0 # SHIFT OUT
-<U000F> \x0F |0 # SHIFT IN
-<U0010> \x10 |0 # DATA LINK ESCAPE
-<U0011> \x11 |0 # DEVICE CONTROL ONE
-<U0012> \x12 |0 # DEVICE CONTROL TWO
-<U0013> \x13 |0 # DEVICE CONTROL THREE
-<U0014> \x14 |0 # DEVICE CONTROL FOUR
-<U0015> \x15 |0 # NEGATIVE ACKNOWLEDGE
-<U0016> \x16 |0 # SYNCHRONOUS IDLE
-<U0017> \x17 |0 # END OF TRANSMISSION BLOCK
-<U0018> \x18 |0 # CANCEL
-<U0019> \x19 |0 # END OF MEDIUM
-<U001A> \x1A |0 # SUBSTITUTE
-<U001B> \x1B |0 # ESCAPE
-<U001C> \x1C |0 # FILE SEPARATOR
-<U001D> \x1D |0 # GROUP SEPARATOR
-<U001E> \x1E |0 # RECORD SEPARATOR
-<U001F> \x1F |0 # UNIT SEPARATOR
-<U0020> \x20 |0 # SPACE
-<U0021> \x21 |0 # EXCLAMATION MARK
-<U0022> \x22 |0 # QUOTATION MARK
-<U0023> \x23 |0 # NUMBER SIGN
-<U0024> \x24 |0 # DOLLAR SIGN
-<U0025> \x25 |0 # PERCENT SIGN
-<U0026> \x26 |0 # AMPERSAND
-<U0027> \x27 |0 # APOSTROPHE
-<U0028> \x28 |0 # LEFT PARENTHESIS
-<U0029> \x29 |0 # RIGHT PARENTHESIS
-<U002A> \x2A |0 # ASTERISK
-<U002B> \x2B |0 # PLUS SIGN
-<U002C> \x2C |0 # COMMA
-<U002D> \x2D |0 # HYPHEN-MINUS
-<U002E> \x2E |0 # FULL STOP
-<U002F> \x2F |0 # SOLIDUS
-<U0030> \x30 |0 # DIGIT ZERO
-<U0031> \x31 |0 # DIGIT ONE
-<U0032> \x32 |0 # DIGIT TWO
-<U0033> \x33 |0 # DIGIT THREE
-<U0034> \x34 |0 # DIGIT FOUR
-<U0035> \x35 |0 # DIGIT FIVE
-<U0036> \x36 |0 # DIGIT SIX
-<U0037> \x37 |0 # DIGIT SEVEN
-<U0038> \x38 |0 # DIGIT EIGHT
-<U0039> \x39 |0 # DIGIT NINE
-<U003A> \x3A |0 # COLON
-<U003B> \x3B |0 # SEMICOLON
-<U003C> \x3C |0 # LESS-THAN SIGN
-<U003D> \x3D |0 # EQUALS SIGN
-<U003E> \x3E |0 # GREATER-THAN SIGN
-<U003F> \x3F |0 # QUESTION MARK
-<U0040> \x40 |0 # COMMERCIAL AT
-<U0041> \x41 |0 # LATIN CAPITAL LETTER A
-<U0042> \x42 |0 # LATIN CAPITAL LETTER B
-<U0043> \x43 |0 # LATIN CAPITAL LETTER C
-<U0044> \x44 |0 # LATIN CAPITAL LETTER D
-<U0045> \x45 |0 # LATIN CAPITAL LETTER E
-<U0046> \x46 |0 # LATIN CAPITAL LETTER F
-<U0047> \x47 |0 # LATIN CAPITAL LETTER G
-<U0048> \x48 |0 # LATIN CAPITAL LETTER H
-<U0049> \x49 |0 # LATIN CAPITAL LETTER I
-<U004A> \x4A |0 # LATIN CAPITAL LETTER J
-<U004B> \x4B |0 # LATIN CAPITAL LETTER K
-<U004C> \x4C |0 # LATIN CAPITAL LETTER L
-<U004D> \x4D |0 # LATIN CAPITAL LETTER M
-<U004E> \x4E |0 # LATIN CAPITAL LETTER N
-<U004F> \x4F |0 # LATIN CAPITAL LETTER O
-<U0050> \x50 |0 # LATIN CAPITAL LETTER P
-<U0051> \x51 |0 # LATIN CAPITAL LETTER Q
-<U0052> \x52 |0 # LATIN CAPITAL LETTER R
-<U0053> \x53 |0 # LATIN CAPITAL LETTER S
-<U0054> \x54 |0 # LATIN CAPITAL LETTER T
-<U0055> \x55 |0 # LATIN CAPITAL LETTER U
-<U0056> \x56 |0 # LATIN CAPITAL LETTER V
-<U0057> \x57 |0 # LATIN CAPITAL LETTER W
-<U0058> \x58 |0 # LATIN CAPITAL LETTER X
-<U0059> \x59 |0 # LATIN CAPITAL LETTER Y
-<U005A> \x5A |0 # LATIN CAPITAL LETTER Z
-<U005B> \x5B |0 # LEFT SQUARE BRACKET
-<U005C> \x5C |0 # REVERSE SOLIDUS
-<U005D> \x5D |0 # RIGHT SQUARE BRACKET
-<U005E> \x5E |0 # CIRCUMFLEX ACCENT
-<U005F> \x5F |0 # LOW LINE
-<U0060> \x60 |0 # GRAVE ACCENT
-<U0061> \x61 |0 # LATIN SMALL LETTER A
-<U0062> \x62 |0 # LATIN SMALL LETTER B
-<U0063> \x63 |0 # LATIN SMALL LETTER C
-<U0064> \x64 |0 # LATIN SMALL LETTER D
-<U0065> \x65 |0 # LATIN SMALL LETTER E
-<U0066> \x66 |0 # LATIN SMALL LETTER F
-<U0067> \x67 |0 # LATIN SMALL LETTER G
-<U0068> \x68 |0 # LATIN SMALL LETTER H
-<U0069> \x69 |0 # LATIN SMALL LETTER I
-<U006A> \x6A |0 # LATIN SMALL LETTER J
-<U006B> \x6B |0 # LATIN SMALL LETTER K
-<U006C> \x6C |0 # LATIN SMALL LETTER L
-<U006D> \x6D |0 # LATIN SMALL LETTER M
-<U006E> \x6E |0 # LATIN SMALL LETTER N
-<U006F> \x6F |0 # LATIN SMALL LETTER O
-<U0070> \x70 |0 # LATIN SMALL LETTER P
-<U0071> \x71 |0 # LATIN SMALL LETTER Q
-<U0072> \x72 |0 # LATIN SMALL LETTER R
-<U0073> \x73 |0 # LATIN SMALL LETTER S
-<U0074> \x74 |0 # LATIN SMALL LETTER T
-<U0075> \x75 |0 # LATIN SMALL LETTER U
-<U0076> \x76 |0 # LATIN SMALL LETTER V
-<U0077> \x77 |0 # LATIN SMALL LETTER W
-<U0078> \x78 |0 # LATIN SMALL LETTER X
-<U0079> \x79 |0 # LATIN SMALL LETTER Y
-<U007A> \x7A |0 # LATIN SMALL LETTER Z
-<U007B> \x7B |0 # LEFT CURLY BRACKET
-<U007C> \x7C |0 # VERTICAL LINE
-<U007D> \x7D |0 # RIGHT CURLY BRACKET
-<U007E> \x7E |0 # TILDE
-<U007F> \x7F |0 # DELETE
-<U0080> \x80 |0 # <control>
-.
-
-use constant B5HEADER => << '.';
-<U0081> \x81 |0 # <control>
-<U0082> \x82 |0 # BREAK PERMITTED HERE
-<U0083> \x83 |0 # NO BREAK HERE
-<U0084> \x84 |0 # <control>
-<U0085> \x85 |0 # NEXT LINE
-<U0086> \x86 |0 # START OF SELECTED AREA
-<U0087> \x87 |0 # END OF SELECTED AREA
-<U0088> \x88 |0 # CHARACTER TABULATION SET
-<U0089> \x89 |0 # CHARACTER TABULATION WITH JUSTIFICATION
-<U008A> \x8A |0 # LINE TABULATION SET
-<U008B> \x8B |0 # PARTIAL LINE DOWN
-<U008C> \x8C |0 # PARTIAL LINE UP
-<U008D> \x8D |0 # REVERSE LINE FEED
-<U008E> \x8E |0 # SINGLE SHIFT TWO
-<U008F> \x8F |0 # SINGLE SHIFT THREE
-<U0090> \x90 |0 # DEVICE CONTROL STRING
-<U0091> \x91 |0 # PRIVATE USE ONE
-<U0092> \x92 |0 # PRIVATE USE TWO
-<U0093> \x93 |0 # SET TRANSMIT STATE
-<U0094> \x94 |0 # CANCEL CHARACTER
-<U0095> \x95 |0 # MESSAGE WAITING
-<U0096> \x96 |0 # START OF GUARDED AREA
-<U0097> \x97 |0 # END OF GUARDED AREA
-<U0098> \x98 |0 # START OF STRING
-<U0099> \x99 |0 # <control>
-<U009A> \x9A |0 # SINGLE CHARACTER INTRODUCER
-<U009B> \x9B |0 # CONTROL SEQUENCE INTRODUCER
-<U009C> \x9C |0 # STRING TERMINATOR
-<U009D> \x9D |0 # OPERATING SYSTEM COMMAND
-<U009E> \x9E |0 # PRIVACY MESSAGE
-<U009F> \x9F |0 # APPLICATION PROGRAM COMMAND
-<U00A0> \xA0 |0 # NO-BREAK SPACE
-.
-
-use constant B5FOOTER => << '.';
-<U00FA> \xFA |0 # LATIN SMALL LETTER U WITH ACUTE
-<U00FB> \xFC |0 # LATIN SMALL LETTER U WITH CIRCUMFLEX
-<U00FD> \xFD |0 # LATIN SMALL LETTER Y WITH ACUTE
-<U00FE> \xFE |0 # LATIN SMALL LETTER THORN
.
use constant FOOTER => << '.';
-<U00FF> \xFF |0 # LATIN SMALL LETTER Y WITH DIAERESIS
END CHARMAP
.

0 comments on commit f0f96b6

Please sign in to comment.
Something went wrong with that request. Please try again.