Consolidation of Additional Glyph & Character Suggestions (See Issue #180) #115

ShikiSuen · 2015-06-26T13:19:14Z

Currently, Source Han Sans TW does not include simplified kanji glyphs used in PRC. But MOE had shown in their 「全字庫正宋體」 and 「全字庫正楷體」 that their standards are applied to such glyphs.

The downloadable fonts of「全字庫正宋體」 and 「全字庫正楷體」 are available at this website:
http://data.gov.tw/node/5961

Meanwhile, there are other MOE-oriented fonts who adopts such MOE-standard glyphs regarding PRC-Simplified Chinese:
// PingFang, the default Traditional Chinese fallback font since OS X El Capitan and iOS9:

// DFKai-SB, a.k.a. 標楷體, supports Simplified Chinese kanji glyphs since Windows Vista:

// MOE Sung UN

// PMingLiU, MOE CNS11643 standard font since Windows Vista:

I created this thread in order to follow Ken Lunde's slogan in issue #99 :

So that people could make related discussions here (instead of in issue #99 ) before Ken Lunde make his final decisions regarding it.

ShikiSuen · 2015-06-26T13:26:57Z

Here are my opinions:

As a grown-up mainland PRC passport owner, I feel that those simplified kanji glyphs used in PRC are better-designed in 「全字庫正宋體」 and 「全字庫正楷體」 since they are easier and faster to write. Meanwhile, this makes each glyph looks more unique.

Update: PingFang is manufactured by DynaComware only.

ShikiSuen · 2015-06-26T13:27:54Z

(carbon copy sent to @jimmymasaru .)

kenlunde · 2015-06-26T13:41:18Z

PingFang is a Pan-Chinese typeface family that does not use region-specific subsets, which means that the Simplified and Traditional Chinese fonts have the same Unicode coverage. The character in question, U+604B (恋), is in CNS 11643 Plane 3, but the scope of Traditional Chinese in SHS is capped at Big Five Levels 1 and 2, which are equivalent to CNS 11643 Planes 1 and 2.

ShikiSuen · 2015-06-26T13:54:04Z

Reference:
https://zh.wikipedia.org/zh-hant/%E5%A4%A7%E4%BA%94%E7%A2%BC

Based on such reference, there are some glyphs not included in Big5. some of them are:

But, some of them are still used in current Taiwan even though they are not in Big5... such as "峯" (used in a Taiwan singer & songwriter's name "吳青峯"), "栢" (a Hong-Kong movie star "張栢芝"), "邨", and "啓" (Traditional Chinese version of C&C Red Alert 2, "天啓坦克" = "Apocalypse tank").

I am not familiar with this since I never use those fonts which supports Big5 glyphs only. Thus, I couldn't tell whether SHS TW (Regional Specific Release) supports it or not.

tamcy · 2015-06-30T08:14:38Z

Unlike 恋, words listed on the above table (蟎綫綉滙栢峯頴邨着双啓) are all covered by HKSCS-2008, which should imply that they are also covered by Source Han Sans TW.　

kenlunde · 2015-07-20T14:42:43Z

The scope of Source Han Sans TW, which is a subset of the 65,535-glyph glyph set, is Big Five + Hong Kong SCS (in terms of code points for hanzi). The best work-around is to simply use Source Han Hans TC, which includes all 65,535 glyphs, and thus has the maximum coverage of code points. (This suggestion is separate from having a glyph that is appropriate for TW.)

In terms of actually extending TW coverage, which means using glyphs that are appropriate for TW use (and thus follow MOE guidelines), the issue is of scope. Big Five is used because it represents the most common hanzi in use, and the problem that we will run into is the lack of available CIDs.

In any case, I am now working on the plan and scope for Version 2.000, and am taking all of this into consideration, though the highest priority is proper Hong Kong support.

kenlunde · 2015-07-20T14:45:57Z

I am consolidating Issue #118 (for adding a CN glyph for Extension E U+2C386, 𬎆, ⿰王莹) here.

kenlunde · 2015-08-02T13:49:19Z

I am consolidating Issue #121 (for adding glyphs for U+9FD1 through U+9FE9 and U+2B7F7) here. I am adding a note on 2017-02-21 to indicate that U+2B7F7 𫟷 is the Simplified Chinese name of Element 116, which is an outlier in terms of covering all of the elements.

extc · 2015-08-02T17:36:43Z

Adobe-CNS1-6 is the CMap standard of Traditional Chinese OpenType CIDfonts. The basis were BIG5, extended characters from DynaLab and Monotype, GCCS, HKSCS-1999, HKSCS-2001, HKSCS-2004 and HKSCS-2008. Ken Lunde had released CNS11643 Plane 1 to 7 and 15 (1992 standard) PDFs in
ftp://ftp.oreilly.com/examples/cjkvinfo/AppG/

BIG5 was a very old standard (1984). Now the CNS11643 included 107171 characters. I know It is not possible to include all as only 86655 characters are mapped to Unicode. But CNS11643-1992 was a de facto standard. It was implemented as EUC-TW encoding in UNIX terminals. Also, all EUC-TW characters had already mapped to Unicode. The Adobe CMap should at least include all the characters in CNS11643-1992 version so as to reflect its name. The development of Source Han Sans depends on CIDs. Therefore I think Adobe should update the Adobe-CNS1 Cmap in parallel to the development of Source Han Sans.

kenlunde · 2015-08-03T14:51:35Z

@extc: I read and re-read your note above, and am still at a loss as to what you are requesting, but perhaps what I wrote below may help.

Adobe-CNS1-6 is cumulative, meaning that glyphs are added incrementally, so there is a diachronic effect. Supplement 0 supported only Big Five and the ETen extensions, but the /Ordering was set to CNS1 because of Big Five's relationship to CNS 11643, and such a name opened the possibility of extending the glyph set to cover additional CNS 11643 planes. Supplement 1 added support for Hong Kong GCCS and the Hong Kong extensions from DynaComware (Dynalab) and Monotype Imaging (Monotype). Supplement 2 simply added pre-rotated versions of non–full-width glyphs that are accessible via the (now-deprecated) 'vrt2' GSUB feature. Supplements 3 through 6 were for supporting the 1999, 2001, 2004, and 2008 versions of Hong Kong SCS, respectively.

The PDFs from CJKV Information Processing (First Edition) were made by using an experimental Adobe-CNS2-0 glyph set whose purpose was to simply show all characters in CNS 11643-1992, along with Plane 15.

Although CNS 11643 is large, and has expanded beyond the 1992 version, it is not nearly as interesting as the CJK Unified Ideographs in Unicode, meaning the URO and Extensions A through E. The latter has excellent interchange, but the former has very poor interchange. CNS standards are also quite messy, and provide little or no metadata, such as dictionary mappings or other ways to verify a character's meaning or shape.

fei0316 · 2015-11-08T16:25:01Z

Add the character 𧒽(U+274BD) #133

The character, although not being in the any of the supported standards, should be added. This character is used as a station name of Guangzhou Metro (𧒽岗站), as a park name near that station (𧒽岗公园) and also as a name of a type seafood produced in that area. The character is supported by MingLiU_HKSCS-ExtB font and it's also successfully shown properly on OS X 10.10 Yosemite and Windows 10 by deault. This character was proposed to be added, but later removed from 通用规范汉字表. Any documents, banners, and websites with that character would usually be written as 「虫雷」or「礌」. People also claimed to have problems finding that station on the mobile phone app. Maps showing that station or the park have to use other words to replace the unsupported word. As the goal of this font is to maximize compatibility, adding this character can really benefit a lot of people considering the fact that all Android devices running Android 5.0 or above are using this font.
Reference:
https://zh.wikipedia.org/wiki/%F0%A7%92%BD%E5%B4%97%E7%AB%99
https://zh.wikipedia.org/wiki/%E9%BB%83%E6%B2%99%E8%9C%86
http://news.sina.com.cn/o/2014-07-24/142430572774.shtml
http://baike.baidu.com/view/4731307.htm
https://www.google.com/maps/place/%E7%A4%8C%E5%B2%97/@23.0442069,113.1465266,16z/data=!4m5!1m2!2m1!1z6Jmr6Zu35bKX5YWs5ZyS!3m1!1s0x0000000000000000:0x84aea54ce06ea2e9

kenlunde · 2015-12-23T18:44:35Z

Consider adding (KR) glyphs for U+200D7 𠃗, U+2042D 𠐭, U+224E1 𢓡, and U+23D18 𣴘. The last three are used in traditional Korean musical notation.

hfhchan · 2015-12-24T14:50:40Z

The HKSCS 2015 update is redefining some mappings from big5 to ucs. Would that affect character coverage, especially the full-width symbols?

kenlunde · 2015-12-24T20:45:21Z

@hfhchan: With regard to Hong Kong support, we sort of have a fresh slate, because to date the project does not include any Hong Kong font resources. This effectively means that accommodating mapping changes should not be problematic.

kenlunde · 2016-01-07T23:47:56Z

New CN glyphs for U+35F4 (㗴) and U+6D73 (浳), uni35F4-CN and uni6D73-CN, need to be added.

kenlunde · 2016-01-13T12:54:01Z

Consider adding KR glyphs for Extension B characters 𪓟 (U+2A4DF) and 𣖄 (U+23584) per Issue #136.

kenlunde · 2016-01-13T13:33:50Z

Per Issue #137, VN (Chữ nôm) glyphs will be supported when Extension B and beyond are supported in their entirety.

hfhchan · 2016-03-21T19:27:45Z

is "𠻹" (H-9E77) supported? It doesn't show up correctly using Noto Sans TC (http://fonts.googleapis.com/earlyaccess/notosanstc.css) on hk01.com (the character uses SimSun-ExtB instead on both MSEdge and Chrome)

Edit: Nor does 䮎 (H-92D7). On the other hand, 罉 (H-9DD1) displays correctly. 𦉘 (H-9DBC) doesn't.

kenlunde · 2016-03-21T19:40:39Z

@hfhchan: 𠻹 (Extension B U+20EF9; CID+59693) is supported by Source Han Sans / Noto Sans CJK, and is also included in the region-specific subset OTFs for Traditional Chinese (which are the fonts that are referenced in that CSS file). I inspected one of the OTFs that is referenced by the CSS file, and it has been further subsetted, and includes only 9,876 glyphs, and only three characters outside the BMP are supported: U+210C1, U+24A12, and U+25683. This is therefore a question to pose to Google.

kenlunde · 2016-03-21T19:50:47Z

罉 (URO U+7F49; CID+32230) is among the 9,876 glyphs in the OTFs that are referenced by that CSS file. 䮎 (Extension A U+4B8E; CID+9231) and 𦉘 (Extension B U+26258; CID+60806), on the other hand, are not. The glyphs for all three characters are in the official region-specific subset OTFs for Traditional Chinese. I recommend that you ask Google here.

kenlunde · 2016-11-12T13:55:08Z

@acuteaccent: These have been on my Version 2.000 list for some time, and as Frank mentioned, that list specifies that U+1F10B and U+1F10C will be handled as double mappings. Also, U+312E, U+312F, U+9FD1 through U+9FEA, and U+1F12F are on the same list.

jungshik · 2017-02-07T21:50:54Z

from: notofonts/noto-cjk#80

I compared the character repertoire of Noto Sans CJK 1.004 against the list of characters allowed for South Korean family registry and found that 47 characters are missing.

The list is
kr_names_missing_in_noto_sans.txt

The 1st column is Korean reading in Hangul. The second column is a Unicode code point. The 3rd is a character.

kenlunde · 2017-02-07T22:10:01Z

@jungshik: Thank you. I count 48 characters in your list, not 47, but U+23343 𣍃 appears twice, making it actually 47.

jungshik · 2017-02-07T23:36:00Z

@kenlunde Yes, that's why I said there are 47 characters :-) (I should have deleted the 2nd line with U+23343 before uploading).

acuteaccent · 2017-03-24T06:15:14Z

@kenlunde @jungshik Well, in fact, there are indeed 48 missing, as there is unencoded ⿰氵恩 (은). If Source Han Sans is targeting all the South Korean personal name hanja, one glyph needs to be reserved for ⿰氵恩.

Also, I think notofonts/noto-cjk#80 (comment) this is a very good idea, as no one actually uses/needs halfwidth hangul jamo. To begin with, I wonder why they are encoded in Unicode.

acuteaccent · 2017-03-24T06:43:13Z

(This is in regard to #115 (comment))

Oh, the suggestion about U+02EA and U+02EB was already made before (notofonts/noto-cjk#56). As I usually don't check the Noto Sans CJK side, I was not aware of it until now.
FYI, I learned about those two characters from here: http://www.unicode.org/versions/Unicode9.0.0/ch18.pdf#page=27

acuteaccent · 2017-04-04T10:26:27Z

BTW, if you are running out of glyphs, you can get rid of Œ, œ, and ƒ, as they are not used in CJKV languages (including common romanization systems).
(If Œ and œ are included to cover French, then Ÿ also needs to be included.)

justinrleung · 2017-04-04T19:07:21Z

œ might be used in IPA and its derivative romanizations, like S. L. Wong (phonetic symbols). It might be useful to keep it for people who need to use IPA (e.g. when dealing with a Chinese dialect that does not have a romanization system).

acuteaccent · 2017-04-04T22:39:19Z

Well, I don't think the IPA is the reason for the inclusion of œ though. Source Han Sans does not cover most letters used in the IPA and its derivative romanizations (ɐ, ɛ, ɔ, ŋ, etc.) anyway.

acuteaccent · 2017-04-04T22:41:14Z

U+2780 ➀ to U+2789 ➉ and U+278A ➊ to U+2793 ➓ can be covered by using the glyphs at U+2460 ① to U+2469 ⑩ and the ones at U+2776 ❶ to U+277F ❿ respectively, as Source Han Sans is a sans-serif font. As this can simply be done by inserting additional code point mappings to existing glyphs, no new glyphs are needed.

jimmymasaru · 2017-04-04T22:42:30Z

Well, probably œ and other Latin alphabets are included in AdobeJapan1-6 which is why they are included in SHS.

acuteaccent · 2017-04-06T12:00:09Z

Another suggestion if you are running out of glyphs:
You can remove the glyphs for Cyrillic letters, as Cyrillic letters are not used in CJK texts.
(Greek letters are needed because they are used in mathematics and science. But Cyrillic letters are not used anywhere in CJK texts.)

jungshik · 2017-04-07T17:35:50Z

@acuteaccent There are a lot of characters I personally want to drop (not just Cyrillic but also an incomplete set of box drawing, various symbol characters, Latin outside ASCII, Korean Half-width Jamo, etc) to make room for more/better CJK coverage.

jungshik · 2017-04-07T17:40:21Z

, I think notofonts/noto-cjk#80 (comment) this is a very good idea, as no one actually uses/needs halfwidth hangul jamo. To begin with, I wonder why they are encoded in Unicode.

My guess is that they're encoded in the Unicode because they're encoded in a pre-KS C 5601-1987 (pre-KS X 1001) standard. Nonetheless, they're completely useless and nobody would notice that if they're gone. If we want to keep the character coverage, we can just map them to the corresponding glyphs for U+313x (Hangul Compatibility Jamos).

Well, in fact, there are indeed 48 missing, as there is unencoded ⿰氵恩 (은). If Source Han Sans is targeting all the South Korean personal name hanja, one glyph needs to be reserved for ⿰氵恩.

You're right, @acuteaccent

jungshik · 2017-04-07T17:43:54Z

As for the South Korean Hanja list for names, see also http://www.unicode.org/L2/L2017/17084-korean-name-var.pdf (Jaemin Chung's proposal).

acuteaccent · 2017-04-12T10:43:38Z

For Latin letters, I think covering ISO/IEC 8859-1 (or Windows-1252) and the characters used in Hanyu Pinyin and McCune-Reischauer (as McCune-Reischauer is used by libraries around the world) is good enough.
Source Han Sans and Serif don't need to cover Vietnamese Latin letters, as modern Vietnamese is far from CJK characters. Vietnamese can be (and should be) covered by Latin fonts, not by CJK fonts.

I agree with removing glyphs for box drawing characters (retracted). I also agree with removing glyphs for halfwidth hangul jamo (and map Hangul Compatibility Jamo glyphs for those halfwidth hangul jamo instead).

rschiang · 2017-04-12T17:48:10Z

Box drawing characters are still widely used in Traditional Chinese context, primarily on BBS (e.g. PTT) and plain-text documents with tabular content; removing these characters would break these preformatted tables, rendering them unreadable.

If glyph count really matters, I would suggest fulfilling the character set only on TC / half-width variant, or extracting a fallback font compatible with SHS metrics.

acuteaccent · 2017-04-12T18:34:20Z

Oh, okay. Then I take back what I said about box drawing characters.

acuteaccent · 2017-04-12T19:02:51Z

Come to think of it, since box drawing characters are in the two-byte range of GB 18030, the glyphs for them will not be removed (as Ken wants to completely cover the mandatory portion of GB 18030).

kenlunde · 2017-04-12T19:15:02Z

@acuteaccent wrote:

as Ken wants to completely cover the mandatory portion of GB 18030

Precisely.

Also note that glyphs will not be removed on a whim. Everything discussed in this issue will be considered, but there are several factors that will play into the actual decisions.

kenlunde · 2017-05-26T19:53:50Z

Consolidated with Issue #180.

kenlunde added the enhancement label Jun 26, 2015

kenlunde self-assigned this Jun 26, 2015

kenlunde changed the title ~~Discussions regarding necessity of extending TW glyphs.~~ Consolidation of Additional Glyphs & Characters, Mainly for TW Jul 20, 2015

kenlunde mentioned this issue Jul 20, 2015

Please add U+2C386 (𬎆, ⿰王莹) #118

Closed

kenlunde changed the title ~~Consolidation of Additional Glyphs & Characters, Mainly for TW~~ Consolidation of Additional Glyph & Character Suggestions, Mainly for TW Jul 20, 2015

kenlunde mentioned this issue Aug 2, 2015

Five characters (U+9FD1-U+9FD5，鿑鿒鿓鿔鿕) in Unicode 8.0 should be added. #121

Closed

kenlunde mentioned this issue Nov 8, 2015

Add the character 𧒽(U+274BD) #133

Closed

kenlunde changed the title ~~Consolidation of Additional Glyph & Character Suggestions, Mainly for TW~~ Consolidation of Additional Glyph & Character Suggestions Jan 13, 2016

This was referenced Jan 13, 2016

Require missing characters #136

Closed

Add Chữ nôm characters #137

Closed

jungshik mentioned this issue Feb 7, 2017

47 ideographs missing from the South Korean list of Hanja for family registry notofonts/noto-cjk#80

Closed

This was referenced May 2, 2017

Consolidation of Glyph Correction Suggestions (See Issue #178) #99

Closed

Mapping Difference between Source Han Sans and Source Han Serif adobe-fonts/source-han-serif#53

Closed

kenlunde changed the title ~~Consolidation of Additional Glyph & Character Suggestions~~ Consolidation of Additional Glyph & Character Suggestions (TO CLOSE) May 26, 2017

kenlunde added the consolidated label May 26, 2017

kenlunde changed the title ~~Consolidation of Additional Glyph & Character Suggestions (TO CLOSE)~~ Consolidation of Additional Glyph & Character Suggestions (See Issue #180) May 26, 2017

kenlunde closed this as completed May 26, 2017

kenlunde mentioned this issue May 27, 2017

Consolidation of Character/Glyph Addition Suggestions #180

Closed

adobe-fonts locked as resolved and limited conversation to collaborators Nov 20, 2018

Consolidation of Additional Glyph & Character Suggestions (See Issue #180) #115

Consolidation of Additional Glyph & Character Suggestions (See Issue #180) #115

Comments

ShikiSuen commented Jun 26, 2015

ShikiSuen commented Jun 26, 2015

ShikiSuen commented Jun 26, 2015

kenlunde commented Jun 26, 2015

ShikiSuen commented Jun 26, 2015

tamcy commented Jun 30, 2015

kenlunde commented Jul 20, 2015

kenlunde commented Jul 20, 2015

kenlunde commented Aug 2, 2015 • edited Loading

extc commented Aug 2, 2015

kenlunde commented Aug 3, 2015

fei0316 commented Nov 8, 2015

kenlunde commented Dec 23, 2015

hfhchan commented Dec 24, 2015

kenlunde commented Dec 24, 2015

kenlunde commented Jan 7, 2016

kenlunde commented Jan 13, 2016

kenlunde commented Jan 13, 2016

hfhchan commented Mar 21, 2016

kenlunde commented Mar 21, 2016

kenlunde commented Mar 21, 2016

kenlunde commented Nov 12, 2016 • edited by miguelsousa Loading

jungshik commented Feb 7, 2017 • edited Loading

kenlunde commented Feb 7, 2017

jungshik commented Feb 7, 2017 • edited Loading

acuteaccent commented Mar 24, 2017

acuteaccent commented Mar 24, 2017

acuteaccent commented Apr 4, 2017

justinrleung commented Apr 4, 2017

acuteaccent commented Apr 4, 2017 • edited Loading

acuteaccent commented Apr 4, 2017

jimmymasaru commented Apr 4, 2017

acuteaccent commented Apr 6, 2017 • edited Loading

jungshik commented Apr 7, 2017

jungshik commented Apr 7, 2017 • edited Loading

jungshik commented Apr 7, 2017

acuteaccent commented Apr 12, 2017 • edited Loading

rschiang commented Apr 12, 2017

acuteaccent commented Apr 12, 2017

acuteaccent commented Apr 12, 2017

kenlunde commented Apr 12, 2017

kenlunde commented May 26, 2017

kenlunde commented Aug 2, 2015 •

edited

Loading

kenlunde commented Nov 12, 2016 •

edited by miguelsousa

Loading

jungshik commented Feb 7, 2017 •

edited

Loading

jungshik commented Feb 7, 2017 •

edited

Loading

acuteaccent commented Apr 4, 2017 •

edited

Loading

acuteaccent commented Apr 6, 2017 •

edited

Loading

jungshik commented Apr 7, 2017 •

edited

Loading

acuteaccent commented Apr 12, 2017 •

edited

Loading