Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidation of Glyph Correction Suggestions #39

Open
kenlunde opened this issue Apr 11, 2017 · 68 comments
Open

Consolidation of Glyph Correction Suggestions #39

kenlunde opened this issue Apr 11, 2017 · 68 comments
Assignees

Comments

@kenlunde
Copy link
Contributor

kenlunde commented Apr 11, 2017

This issue is meant for tracking and submitting suggestions for glyph corrections. Issues that were submitted before this consolidation issue was opened are referenced by issue number.

The following changes were made in Version 1.001:

  • Fixed the glyphs uni11ED, uni11ED.tjmo01 through uni11ED.tjmo04 (4), uniD7F5, uniD7F5.tjmo01 through uniD7F5.tjmo04 (4), uniD7F6, and uniD7F6.tjmo01 through uniD7F6.tjmo04 (4) per Issue Glyphs of U+11ED ᇭ, U+D7F5 ퟵ, U+D7F6 ퟶ #6. Also to be fixed are the glyphs uni118C.vjmo01, uni1190.vjmo01, uni1192.vjmo01, and uni1112uni119Euni11D9 per Sandoll's designer.
  • Fixed the CN glyphs for U+5F73 彳 and U+6C11 民, uni5F73-CN and uni6C11-CN, respectively, so that they are centered of more balanced within the em-box per Issue Glyph issues regarding 彳(U+5F73), 民(U+6C11, TW & CN) and 魘 (U+9B58, JP & KR) #11.
  • Fixed the interpolation issue in the JP glyph for U+9B58 魘, uni9B58-JP (Adobe-Japan1-6 CID+7307), per Issue Glyph issues regarding 彳(U+5F73), 民(U+6C11, TW & CN) and 魘 (U+9B58, JP & KR) #11.
  • Fixed the CN glyphs for U+5316 化 and U+82B1 花, uni5316uE0101-JP (Adobe-Japan1-6 CID+13665) and uni82B1uE0101-JP (Adobe-Japan1-6 CID+13666), respectively, per Issue Glyph issue concerning 化 (U+5316) for CN. #14.
  • Fixed the glyphs uni1178, uniD7B5, and uniD7B5.vjmo01 per Issue Nominal form of U+1178 ᅸ, nominal form & vjmo01 glyph of U+D7B5 ힵ #25.
  • Fixed the glyphs uni1140uni1175uni11D9 and uni114Cuni116Funi11D9 per Issue Consolidation of Glyph Correction Suggestions (See Issue #39) #27. Also to be adjusted are the glyphs for U+C625 옥 and U+C73D 윽, uniC625 and uniC73D, respectively, per Sandoll's designer.
  • Fixed the CN glyph for U+4FB9 侹, uni4FB9-CN, so that the second stroke of the top-right component is the shortest per Issue Consolidation of Glyph Correction Suggestions (See Issue #39) #27.
  • Fixed the TW glyph for U+5FB5 徵, uni5FB5-TW, so that the 9th stroke is shorter than the 8th and 11th ones per Issue Consolidation of Glyph Correction Suggestions (See Issue #39) #27.
  • Fixed the CN glyphs for U+4F5B 佛, U+602B 怫, U+62C2 拂, U+6C1F 氟, U+6CB8 沸, U+7829 砩, and U+7ECB 绋, uni4F5B-CN, uni602B-CN, uni62C2-CN, uni6C1F-CN, uni6CB8-CN, uni7829-CN, and uni7ECB-CN, respectively, so that the two common vertical strokes are of uniform weight per Issue Consolidation of Glyph Correction Suggestions (See Issue #39) #27.
  • Fixed the JP glyph for U+3CDA 㳚, uni3CDA-JP, so that the dot touches the curved stroke to its left per Issue Consolidation of Glyph Correction Suggestions (See Issue #39) #27.
  • Fixed the CN glyph for U+2CD9F 𬶟, u2CD9F-CN, so that the center element is a box per Issue Consolidation of Glyph Redesign Suggestions #36.
  • Fixed the TW glyph for U+5A6C 婬, uni5A6C-TW, so that the 9th stroke is shorter than the 8th and 11th ones.
  • Fixed the JP glyph for U+8285 芅, uni8285-JP, to remove its final stroke.
  • Fixed the TW glyph for U+83E1 菡, uni83E1-TW so that the four dots are not touching the vertical stroke.
  • Fixed the CN glyphs for U+3E76 㹶, U+414D 䅍, U+4A60 䩠, U+4BD5 䯕, U+4C53 䱓, U+5A17 娗, U+5EAD 庭, U+5EF7 廷, U+633A 挺, U+6883 梃, U+6D8F 涏, U+70F6 烶, U+73FD 珽, U+7D8E 綎, U+8121 脡, U+8247 艇, U+8713 蜓, U+8A94 誔, U+92CC 鋌, U+94E4 铤, U+95AE 閮, and U+9F2E 鼮, uni3E76-CN, uni414D-CN, uni4A60-CN, uni4BD5-CN, uni4C53-CN, uni5A17-CN, uni5EAD-CN, uni5EF7-CN, uni633A-CN, uni6883-CN, uni6D8F-CN, uni70F6-CN, uni73FD-CN, uni7D8E-CN, uni8121-CN, uni8247-CN, uni8713-CN, uni8A94-CN, uni92CC-CN, uni94E4-CN, uni95AE-CN, and uni9F2E-CN, respectively, so that the center horizontal stroke of the upper-right portion of the common 廷 component is the longer of the three horizontal strokes.
  • Fixed the JP glyph for U+7669 癩, uni7669-JP, by fixing the connection of the 13th and 14th strokes (SemiBold through Heavy).
  • Fixed the TW glyph for U+750B 甋, uni750B-TW, by making its dot touch the stroke to its left.
  • Fixed the TW glyph for U+7D73 絳, uni7D73-TW, by adjusting its ExtraLight master so that the diagonal stroke in the lower-right component does not penetrate the horizontal stroke to which it connects.

Post Version 1.001 Fixes:

  • Fix the TW glyphs for U+64FB 擻, U+6578 數, U+7C54 籔, and U+85EA藪, uni64FB-TW, uni6578-TW, uni7C54-TW and uni85EA-TW, respectively, so that the third stroke of the Radical 38 component is horizontal, not diagonal, per Issue Consolidation of Glyph Redesign Suggestions #36.
  • Fix the TW glyph for U+9AD3 髓, uni9AD3-TW, so that the lower-right stroke doesn't extend outside the em-box per Issue Consolidation of Glyph Redesign Suggestions #36.
  • Fix the CN glyphs for 䙶 U+4676, 䜛 U+471B, and 䞅 U+4785, uni4676-CN, uni471B-CN, and uni4785-CN, respectively, per Issue Design difference between Source Han Sans SC and Source Han Serif SC 2 #56.
  • Fix the TW glyph for U+5DD5 巕, uni5DD5-TW, by changing its lower-right component from 子 (Radical 39) to 女 (Radical 38).
  • Fix the CN glyph for U+8A7C 詼, uni8A7C-CN, by fixing the LE connection error in the lower-left component (Heavy master only).
  • Adjust the glyph for U+20DE, uni20DE, by shifting it to the left of the origin (0,0), making it zero-width, and adding it to the 'vert' GPOS (not GSUB) feature. This is the same treatment as the glyph for U+20DD, uni20DD.
  • Adjust the glyphs for U+2015 and U+FE31, uni2015 (Adobe-Japan1-6 CID+661) and uniFE31 (Adobe-Japan1-6 CID+7892), respectively, to match those of Source Han Sans in that they do not touch the edges of the em-box.
  • Fix the LE issue in the lower-right component of the Heavy master of the JP glyph for U+5D1E 崞, uni5D1E-JP, which is outside the scope of Adobe-Japan1-6 and slated for removal in Version 2.000 to make room for HK glyphs.
  • Adjust the glyph for U+4E3F 丿, uni4E3F-JP (Adobe-Japan1-6 CID+4097), by adjusting its stroke so that it better fills the em-box or is centered within the em-box, like it does in Kozuka Mincho and Source Han Sans.
@hfhchan
Copy link

hfhchan commented Apr 14, 2017

image
U+6DEB: The middle stroke of the bottom left component for TW must be shorter than the bottom stroke, because 𡈼 (to grow) is the semantic component of U+6DEB (in excess), used instead of 壬.

@hfhchan
Copy link

hfhchan commented Apr 14, 2017

image
U+5A6C is also affected.

@hfhchan
Copy link

hfhchan commented Apr 16, 2017

image
A new glyph is needed for U+76E4 CN, it cannot share the same glyph with JP.

@hfhchan
Copy link

hfhchan commented Apr 16, 2017

image
CN glyph is wrong/unnecessary; it should be the same glyph as TW according to code charts:
image

@kenlunde
Copy link
Contributor Author

@hfhchan: U+76EC 盬 is best handled by swapping the glyphs for uni76EC-CN and uni76EC-TW (in my backend sources), then marking the latter (uni76EC-TW) for removal (see Issue #38).

@kenlunde
Copy link
Contributor Author

U+76E4 盤 will be handled by adding a new CN glyph, uni76E4-CN (see Issue #40).

@hfhchan
Copy link

hfhchan commented Apr 16, 2017

image

U+7A07 稇: Second last stroke should be a dot instead of 捺.

@kenlunde
Copy link
Contributor Author

Add CN glyph for U+7A07 稇, uni7A07-CN (see Issue #40).

@hfhchan
Copy link

hfhchan commented Apr 20, 2017

image
U+8285, the JP glyph is incorrect.

@kenlunde
Copy link
Contributor Author

Nice catch. Thanks. Noted. Although this particular glyph is a candidate for removal for Version 2.000, we'll go ahead and fix it for the Version 1.000 dot-release.

@hfhchan
Copy link

hfhchan commented Apr 21, 2017

For the whole lot of characters containing 廷, Source Han Serif has unified the shapes of the CN and TW in error. Excerpt:
image

According to Tongyong Guifan Hanzi Biao, the second horizontal stroke should be the longest, while for MOE Standard Song, the second horizontal stroke should be the shortest. The length of the second stroke was a point of contention as it was inconsistently printed as the shortest stroke in various standards preceding Tongyong Guifan Hanzi Biao.

Affected characters include at least U+5EF7, U+4FB9, U+5A17, U+5EAD, U+633A, U+6883, U+6D8F, U+70F6, U+73FD, U+7B73, U+7D8E, U+8121, U+8247, U+839B, U+8713, U+8A94, U+92CC, U+94E4, U+95AE, U+9706, U+9832, U+988B, U+9F2E, U+3E76, U+414D, U+4A60, U+4BD5 and U+4C53.

(As a side note, the Code Charts should probably be updated to use the fonts for Tongyong Guifan Hanzi Biao instead.)

@hfhchan
Copy link

hfhchan commented Apr 21, 2017

image

For U+63EF, the second stroke of the right-most component of the CN, TW and KR glyph should be a slanted 豎, not 撇.

@hfhchan
Copy link

hfhchan commented Apr 21, 2017

image
The dots for 35144 (U+83E1-TW) should be spread out like the JP glyph (35142)

@kenlunde
Copy link
Contributor Author

@hfhchan: U+63EF 揯 will be handled by adding a new CN glyph, uni63EF-CN (and its JP glyph, uni63EF-JP, is targeted for removal), and the TW glyph for U+83E1 菡, uni83E1-TW, will be adjusted.

@kenlunde
Copy link
Contributor Author

@hfhchan: I am going through the CN and TW glyphs for the characters that include the shared 廷 component, and the current CN and TW glyphs for three of the 28 that you referenced above—U+7B73 筳, U+839B 莛, and U+9706 霆—appear to be okay as-is:

three-glyphs

@hfhchan
Copy link

hfhchan commented Apr 22, 2017

@kenlunde Yes, these three examples are fine. I must have copied the wrong string.

@kenlunde
Copy link
Contributor Author

With regard to the other 25 characters, I finished my analysis and action plan.

The CN glyphs for the following 22 characters will be adjusted: U+3E76 㹶, U+414D 䅍, U+4A60 䩠, U+4BD5 䯕, U+4C53 䱓, U+5A17 娗, U+5EAD 庭, U+5EF7 廷, U+633A 挺, U+6883 梃, U+6D8F 涏, U+70F6 烶, U+73FD 珽, U+7D8E 綎, U+8121 脡, U+8247 艇, U+8713 蜓, U+8A94 誔, U+92CC 鋌, U+94E4 铤, U+95AE 閮, and U+9F2E 鼮.

New TW glyphs for the following two characters will be added: U+4FB9 侹 and U+9832 頲.

New TW glyphs for the following 10 characters will be added by using the current (Version 1.000) CN glyphs as-is: U+5EAD 庭, U+5EF7 廷, U+633A 挺, U+6883 梃, U+6D8F 涏, U+73FD 珽, U+7D8E 綎, U+8713 蜓, U+92CC 鋌, and U+95AE 閮.

The CN glyph for U+988B 颋, which is specific to CN, looks okay as-is.

@tamcy
Copy link

tamcy commented Apr 23, 2017

The strokes of 刀 component in 癩 U+7669 of JP/KR version (CID 28287) aren't aligned properly in Heavy weight.

u 7669_28297

@kenlunde
Copy link
Contributor Author

@tamcy: Thank you for finding and reporting this. To clarify, this is an Adobe-designed JP glyph that corresponds to Adobe-Japan1-6 CID+5777, and the affected weights are SemiBold through Heavy.

@hfhchan
Copy link

hfhchan commented Apr 24, 2017

image
TW glyph of U+750B: the second last dot should be completely joined to the side of 瓦 for "correctness" and consistency, compare with the other circled characters.

@GamenRoll
Copy link

GamenRoll commented Dec 6, 2017

微软雅黑 = Microsoft YaHei
Microsoft YaHei UI is the UI-optimized version

wow, yes! i try that on【Noto Serif CJK JP/KR】, they are perfect!
but【Noto Serif CJK SC/TC】 got the same issues. can we fix this?

i notice that【Microsoft YaHei UI】had already fix this issues from 【微软雅黑】,
and even the 【微软雅黑 Light】fix this issues from 【微软雅黑】

the best default for the single and double smart/curly quotes, at least for Chinese, are the full-width glyphs.

in Chinese, double quotes are used 99%, and single quotes only 1%.
and usually, double quotes start and end with spaces, so there will be no problem when they showed up in English.
but the single quotes are totally different, caz only 1% usage in Chinese, and they had to come up inside doube quotes.

even when we start a paper work, an A4 paper work, in the end, we use a Latin font (like Times New Roman) to "flash" the whole words to make the punctuation marks tight close to 汉字(Chinese character) and match the same style with numbers.
That's the Standard Operating Procedures for ervery editor in China.
if you don't do this, that means your are not a Pro.
and that's the common sense thing out there in market today.

so, what i want to say is that don't worry about the single qutoes, and even the whole Punctuation marks, caz we don't use them at all (thanks to microsoft) on papers.

but we had to read them on screen, and that we can't edit.

@Explorer09
Copy link

@opumps
I would argue that your use percentages are unsourced, besides it's not a sufficient argument for changing the default. Otherwise we might break the formatting of more contexts than what we have fixed.
I would suggest using language tagging as a workaround solution.

@Explorer09
Copy link

Explorer09 commented Dec 6, 2017

@kenlunde
Sorry to bother again, but I wonder if it's possible to workaround the halfwidth/fullwidth quotes problem, by utilizing kerning or ligatures features in the font files? That is, have the curly quotes fullwidth by default in CN version, but kern them to proportional if the quotes precedes or follows a halfwidth Latin (or halfwidth punctuation)? Do you know what I mean?

@kenlunde
Copy link
Contributor Author

kenlunde commented Dec 6, 2017

While I prefer to stop the discussion about the smart/curly quotes in this issue, please see L2/17-056, which is my preliminary proposal to use SVSes to handle these and similar characters. If you have any meaningful or constructive feedback, please send it to me offline (aka via email).

@tamcy
Copy link

tamcy commented Dec 28, 2017

There is a bug in the heaviest master of the JP/KR glyph of U+5D1E(崞). Intermediate weights are affected.
5d1e

@kenlunde
Copy link
Contributor Author

Noted, and thank you. I also noted that this particular glyph, uni5D1E-JP, is slated for removal in Version 2.000 in order to make room for HK glyphs.

@Man-Ting-Fang
Copy link

@jimmymasaru

A new CN glyph for · (U+00B7) is needed since a full-width form is usually preferred in this region and is a de facto standard.

GB/T 15834―2011 标点符号用法 (General Rules for Punctuation)

In Simplified Chinese, U+00B7 usually has a full-width form in practice; however, in theory it should be half-width according to GB/T 15834-2011, see the following figure. (Ironically, U+00B7 in GB/T 15834-2011 itself is NOT half-width.) Fortunately, GB/T 15834-2011 is NOT a compulsory standard ("GB/T" means that this is a recommendatory standard, T = 推荐 (Tuījiàn)).

00

@kenlunde
Copy link
Contributor Author

kenlunde commented Jan 5, 2018

@Man-Ting-Fang: I have been thinking about U+00B7 recently, in my attempt to revise L2/17-056 by splitting it into two separate (and revised) proposals, and I think that the best way to address this issue for both the Simplified Chinese and Traditional Chinese fonts is to map U+00B7 to the glyph for U+30FB ・, uni30FB, which functions in a similar way, and I see no reason why the glyphs cannot be shared for the Source Han families.

@tamcy
Copy link

tamcy commented Jan 11, 2018

The JP/KR glyph of 魢(U+9B62, ⿰魚己) is ⿰魚巳, which is the exact composition of U+29D57 𩵗. U+29D57 isn't covered by SHS, while U+9B62 doesn't have a JP/KR source in the code chart. Not sure if this is an issue, I'm reporting it here anyway.

@kenlunde
Copy link
Contributor Author

@tamcy: The good news is that the JP glyph for U+9B62 魢, uni9B62-JP, is slated for removal as part of the Version 2.000 update in order to make room for a large number of incoming HK glyphs, which means that this is likely to become a non-issue.

@extc
Copy link

extc commented Jul 6, 2018

I'd like to comment about the uni9FE6-TW. The version 1.001 is like this:
image

[英微] It should be 一ル instead of 一几 for Taiwan/Hong Kong writing convention. The last stroke
of 英 should be ㇏(捺) instead of ㇔(長點).
I refer the IRGN1954R.pdf (page 16) from IRG42 website. Also, from the source document provided from the said PDF file, I find the original image of the "New Testament in Chinese. (1864). (Archimandrite Gurias, Trans.) Beijing: Russian Mission." Retrieved from http://archive.wul.waseda.ac.jp/kosho/bunko08/bunko08_d0417/

[英微] should be like this:
image

Please take reference the design of 兀 in the 微 of the above image, thank you.

@CNMan
Copy link

CNMan commented Sep 24, 2018

GB/T 22321.1-2018 信息技术 中文编码字符集 汉字48点阵字型 第1部分:宋体
发布日期2018-06-07,实施日期2019-01-01

应该是中国大陆第一个包含CJK-C/D/E字符的字形标准

@kenlunde
Copy link
Contributor Author

@CNMan I suspect that the Extension C, D, and E additions are to support 通用规范汉字表, which includes 44 Extension C characters, eight Extension D characters, and 108 Extension E characters. The Source Han typefaces already support 通用规范汉字表 in its entirety.

@hfhchan
Copy link

hfhchan commented Sep 25, 2018

I recommend against relying on that standard, because the standard was published before the new GB18030 checking efforts were finalized, and so do not incorporate proper normalization of the C/D/E glyphs, nor corrections to existing URO/AB glyphs.

@tamcy
Copy link

tamcy commented Oct 16, 2018

卩 (U+5369) should be center-aligned (the glyph is inclined to the right, same for for Source Han Sans 1.x).

@kenlunde
Copy link
Contributor Author

kenlunde commented Oct 17, 2018

@tamcy Don't hold your breath for this one, particularly for Source Han Sans Version 2.000. Fixing it in Source Han Serif is now on the radar.

@kenlunde
Copy link
Contributor Author

@tamcy To clarify, the glyph for U+5369 卩 should basically be identical to that of U+536A 卪, but without the center dot, right?

@tamcy
Copy link

tamcy commented Oct 17, 2018

Yes, this is what I meant, at least for HK. But after some digging I'm not very sure about the CN form. Here're some more information for your reference:

卩 U+5369 is a CJK Unified Ideograph. Source Han Sans/Serif supports it as an HK and probably a CN character (not 100% sure about the latter). Here's what the Unicode chart shows:

unicode

And here's what the HKSCS document shows:

hkscs

So I have confidence that it should be center aligned, or its form should look like 卪 without the dot as you have said. This makes sense as 卪 and 卩 are essentially the ancient form of 節, so there's no reason for 卩 as a word to not align to the center. Also "⼙" as a radical is encoded differently (U+2F19, but it's also appear center-aligned in the chart).

But for the CN glyph, I do found the "right-aligned" form on the Unihan website and some other websites:

unihan_zdic

I suppose the latest Unicode chart is more authoritive?

@CNMan
Copy link

CNMan commented Oct 17, 2018

@kenlunde
Copy link
Contributor Author

@CNMan I don't trust any one standard, and given that most standards, including older GB ones (a good example is GB/T 12345-1990), center the representative glyph for U+5369 卩, doing so makes the most sense. And, given that this glyph serves double-duty as a Kangxi Radical (U+2F19 ⼙), it makes for it to be centered in the em-box.

@kenlunde
Copy link
Contributor Author

@extc With regard to the Traditional Chinese (TW and HK) glyph for U+9FE6 鿦, uni9FE6-HK, it will be good in Source Han Sans Version 2.000. I made a note to fix the Source Han Serif glyph in the next update, which is likely to be Version 2.000.

@lapomme
Copy link

lapomme commented Nov 1, 2018

I think the JP/KR glyph for 这 would look better, and more consistent with other characters if it utilized a 長點 instead of a 捺 for the last stroke of 文. (Rough mockup at the bottom)

uni8fd9

@kenlunde
Copy link
Contributor Author

kenlunde commented Nov 1, 2018

@lapomme This may or may not be good news, but this issue is likely to become a no-op, because the JP glyph for U+8FD9 这, uni8FD9-JP, is flagged for removal in Version 2.000, in order to make room for a large number of new HK glyphs.

@tamcy
Copy link

tamcy commented May 8, 2019

uni5929-CN (天): suggest to tune the heavy master so that the third stroke doesn't protrude the top of the first horizontal stroke.
uni5929-CN

@stone-zeng
Copy link

uni3514 (㔔): the points at the "circle" is not smooth:

@KrasnayaPloshchad
Copy link

@tamcy @stone-zeng I’m shocked too for these problems, they looks just like how Rolls-Royce products the oil pipe to make engine explosion on Quantas Flight 32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests