New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atom doesn't soft wrap CJK(Chinese/Japanese/Korean) correctly #1783

Closed
izuzak opened this Issue Mar 22, 2014 · 59 comments

Comments

Projects
None yet
@izuzak
Member

izuzak commented Mar 22, 2014

Reported in Halp:

  • support/b47caffcaba511e3903bfcd96bef4da2

If I have a paragraph in CJK characters and toggle soft wrap, the wrapping point (should be the right edge of current window) is far away than the current edge and some characters are displayed out of window which is not visible.
I think this may because of the wrong processing of character wide. A latin letter has a width "one character width", but usually a CJK character has a width twice as a latin letter, it's "two character width". Hope this information may help you.
The debug info auto attached should include the test text I open now. It's include (from up to down): Chinese, Japanese, Korean, English. Actully they tell the same thing:)

I can reproduce this in Atom 0.75.0. (notice how CJK characters are not wrapped correctly at the right edge of the app)

screen shot 2014-03-22 at 7 24 05 pm

@izuzak

This comment has been minimized.

Show comment
Hide comment
@izuzak

izuzak Mar 31, 2014

Member

Another report: support/f362892cb61211e39df45686cfed1b84

Member

izuzak commented Mar 31, 2014

Another report: support/f362892cb61211e39df45686cfed1b84

@izuzak

This comment has been minimized.

Show comment
Hide comment
@izuzak

izuzak Apr 22, 2014

Member

More: support/ce87b354c9fc11e384d02a11a8594098

Member

izuzak commented Apr 22, 2014

More: support/ce87b354c9fc11e384d02a11a8594098

@leafduo

This comment has been minimized.

Show comment
Hide comment
@leafduo

leafduo May 12, 2014

I have encounter the same issue. But I'm not familiar with CoffeeScript/JavaScript, any guidance on how to solve it/which file to read?

leafduo commented May 12, 2014

I have encounter the same issue. But I'm not familiar with CoffeeScript/JavaScript, any guidance on how to solve it/which file to read?

@littlebat

This comment has been minimized.

Show comment
Hide comment
@littlebat

littlebat Jun 25, 2014

The same issue I found in current git version of Atom(Version 0.106.0-781a51a) compiled from source in Debian Jessie amd64. I have ever reported the similar issue of Dillo2 explorer, and they solved it.

The noticeable difference between Chinese, Japanese and Latin character is: There is no space between Chinese, Japanese characters when there is space between Latin words. So, treat one Chinese, Japanese character as a word maybe solve this kind of soft wrapping issue.

But, it seems there are spaces between Korean characters, the Atom still can't soft wrap Korean characters properly, I don't know why did this happen.

Dillo2 explorer changeset: treat ideographic characters (Chinese/Japanese) as words http://hg.dillo.org/dillo/rev/5d6869b28e4d

dillo
changeset 1255:5d6869b28e4d

treat ideographic characters (Chinese/Japanese) as words
author corvid corvid@lavabit.com
date Sun Aug 02 03:59:14 2009 +0000 (2009-08-02)
parents 68190badd2bf
children 6a1e98ad782e
files src/html.cc src/utf8.cc src/utf8.hh
line diff

 1.1 --- a/src/html.cc  Sun Aug 02 03:31:55 2009 +0000
 1.2 +++ b/src/html.cc  Sun Aug 02 03:59:14 2009 +0000
 1.3 @@ -1189,17 +1189,30 @@
 1.4           }
 1.5        }
 1.6        for (start = i = 0; Pword[i]; start = i) {
 1.7 +         int len;
 1.8 +
 1.9           if (isspace(Pword[i])) {
1.10              while (Pword[++i] && isspace(Pword[i])) ;
1.11              Html_process_space(html, Pword + start, i - start);
1.12 -         } else {
1.13 -            while (Pword[++i] && !isspace(Pword[i])) ;
1.14 +         } else if (a_Utf8_ideographic(Pword+i, Pword_end, &len)) {
1.15 +            i += len;
1.16              ch = Pword[i];
1.17              Pword[i] = '\0';
1.18              HT2TB(html)->addText(Pword + start,
1.19                                   html->styleEngine->wordStyle ());
1.20              Pword[i] = ch;
1.21              html->PrevWasSPC = false;
1.22 +         } else {
1.23 +            do {
1.24 +               i += len;
1.25 +            } while (Pword[i] && !isspace(Pword[i]) &&
1.26 +                     (!a_Utf8_ideographic(Pword+i, Pword_end, &len)));
1.27 +            ch = Pword[i];
1.28 +            Pword[i] = 0;
1.29 +            HT2TB(html)->addText(Pword + start,
1.30 +                                 html->styleEngine->wordStyle ());
1.31 +            Pword[i] = ch;
1.32 +            html->PrevWasSPC = false;
1.33           }
1.34        }
1.35        if (word != Pword)

 2.1 --- a/src/utf8.cc  Sun Aug 02 03:31:55 2009 +0000
 2.2 +++ b/src/utf8.cc  Sun Aug 02 03:59:14 2009 +0000
 2.3 @@ -11,6 +11,7 @@
 2.4  
 2.5  #include <fltk/utf.h>
 2.6  
 2.7 +#include "../dlib/dlib.h"    /* TRUE/FALSE */
 2.8  #include "utf8.hh"
 2.9  
2.10  // C++ functions with C linkage ----------------------------------------------
2.11 @@ -64,3 +65,30 @@
2.12  {
2.13     return utf8test(src, srclen);
2.14  }
2.15 +
2.16 +/*
2.17 + * Does s point to a UTF-8-encoded ideographic character?
2.18 + *
2.19 + * This is based on http://unicode.org/reports/tr14/#ID plus some guesses
2.20 + * for what might make the most sense for Dillo. Surprisingly, they include
2.21 + * Hangul Compatibility Jamo, but they're the experts, so I'll follow along.
2.22 + */
2.23 +bool_t a_Utf8_ideographic(const char *s, const char *end, int *len)
2.24 +{
2.25 +   bool_t ret = FALSE;
2.26 +
2.27 +   if ((uchar_t)*s >= 0xe2) {
2.28 +      /* Unicode char >= U+2000. */
2.29 +      unsigned unicode = a_Utf8_decode(s, end, len);
2.30 +
2.31 +      if (unicode >= 0x2e80 &&
2.32 +           ((unicode <= 0xa4cf) ||
2.33 +            (unicode >= 0xf900 && unicode <= 0xfaff) ||
2.34 +            (unicode >= 0xff00 && unicode <= 0xff9f))) {
2.35 +         ret = TRUE;
2.36 +     }
2.37 +   } else {
2.38 +      *len = 1 + (int)a_Utf8_end_of_char(s, 0);
2.39 +   }
2.40 +   return ret;
2.41 +}

 3.1 --- a/src/utf8.hh  Sun Aug 02 03:31:55 2009 +0000
 3.2 +++ b/src/utf8.hh  Sun Aug 02 03:59:14 2009 +0000
 3.3 @@ -19,6 +19,7 @@
 3.4  uint_t a_Utf8_decode(const char*, const char* end, int* len);
 3.5  int a_Utf8_encode(unsigned int ucs, char *buf);
 3.6  int a_Utf8_test(const char* src, unsigned int srclen);
 3.7 +bool_t a_Utf8_ideographic(const char *s, const char *end, int *len);
 3.8  
 3.9  #ifdef __cplusplus
3.10  }

littlebat commented Jun 25, 2014

The same issue I found in current git version of Atom(Version 0.106.0-781a51a) compiled from source in Debian Jessie amd64. I have ever reported the similar issue of Dillo2 explorer, and they solved it.

The noticeable difference between Chinese, Japanese and Latin character is: There is no space between Chinese, Japanese characters when there is space between Latin words. So, treat one Chinese, Japanese character as a word maybe solve this kind of soft wrapping issue.

But, it seems there are spaces between Korean characters, the Atom still can't soft wrap Korean characters properly, I don't know why did this happen.

Dillo2 explorer changeset: treat ideographic characters (Chinese/Japanese) as words http://hg.dillo.org/dillo/rev/5d6869b28e4d

dillo
changeset 1255:5d6869b28e4d

treat ideographic characters (Chinese/Japanese) as words
author corvid corvid@lavabit.com
date Sun Aug 02 03:59:14 2009 +0000 (2009-08-02)
parents 68190badd2bf
children 6a1e98ad782e
files src/html.cc src/utf8.cc src/utf8.hh
line diff

 1.1 --- a/src/html.cc  Sun Aug 02 03:31:55 2009 +0000
 1.2 +++ b/src/html.cc  Sun Aug 02 03:59:14 2009 +0000
 1.3 @@ -1189,17 +1189,30 @@
 1.4           }
 1.5        }
 1.6        for (start = i = 0; Pword[i]; start = i) {
 1.7 +         int len;
 1.8 +
 1.9           if (isspace(Pword[i])) {
1.10              while (Pword[++i] && isspace(Pword[i])) ;
1.11              Html_process_space(html, Pword + start, i - start);
1.12 -         } else {
1.13 -            while (Pword[++i] && !isspace(Pword[i])) ;
1.14 +         } else if (a_Utf8_ideographic(Pword+i, Pword_end, &len)) {
1.15 +            i += len;
1.16              ch = Pword[i];
1.17              Pword[i] = '\0';
1.18              HT2TB(html)->addText(Pword + start,
1.19                                   html->styleEngine->wordStyle ());
1.20              Pword[i] = ch;
1.21              html->PrevWasSPC = false;
1.22 +         } else {
1.23 +            do {
1.24 +               i += len;
1.25 +            } while (Pword[i] && !isspace(Pword[i]) &&
1.26 +                     (!a_Utf8_ideographic(Pword+i, Pword_end, &len)));
1.27 +            ch = Pword[i];
1.28 +            Pword[i] = 0;
1.29 +            HT2TB(html)->addText(Pword + start,
1.30 +                                 html->styleEngine->wordStyle ());
1.31 +            Pword[i] = ch;
1.32 +            html->PrevWasSPC = false;
1.33           }
1.34        }
1.35        if (word != Pword)

 2.1 --- a/src/utf8.cc  Sun Aug 02 03:31:55 2009 +0000
 2.2 +++ b/src/utf8.cc  Sun Aug 02 03:59:14 2009 +0000
 2.3 @@ -11,6 +11,7 @@
 2.4  
 2.5  #include <fltk/utf.h>
 2.6  
 2.7 +#include "../dlib/dlib.h"    /* TRUE/FALSE */
 2.8  #include "utf8.hh"
 2.9  
2.10  // C++ functions with C linkage ----------------------------------------------
2.11 @@ -64,3 +65,30 @@
2.12  {
2.13     return utf8test(src, srclen);
2.14  }
2.15 +
2.16 +/*
2.17 + * Does s point to a UTF-8-encoded ideographic character?
2.18 + *
2.19 + * This is based on http://unicode.org/reports/tr14/#ID plus some guesses
2.20 + * for what might make the most sense for Dillo. Surprisingly, they include
2.21 + * Hangul Compatibility Jamo, but they're the experts, so I'll follow along.
2.22 + */
2.23 +bool_t a_Utf8_ideographic(const char *s, const char *end, int *len)
2.24 +{
2.25 +   bool_t ret = FALSE;
2.26 +
2.27 +   if ((uchar_t)*s >= 0xe2) {
2.28 +      /* Unicode char >= U+2000. */
2.29 +      unsigned unicode = a_Utf8_decode(s, end, len);
2.30 +
2.31 +      if (unicode >= 0x2e80 &&
2.32 +           ((unicode <= 0xa4cf) ||
2.33 +            (unicode >= 0xf900 && unicode <= 0xfaff) ||
2.34 +            (unicode >= 0xff00 && unicode <= 0xff9f))) {
2.35 +         ret = TRUE;
2.36 +     }
2.37 +   } else {
2.38 +      *len = 1 + (int)a_Utf8_end_of_char(s, 0);
2.39 +   }
2.40 +   return ret;
2.41 +}

 3.1 --- a/src/utf8.hh  Sun Aug 02 03:31:55 2009 +0000
 3.2 +++ b/src/utf8.hh  Sun Aug 02 03:59:14 2009 +0000
 3.3 @@ -19,6 +19,7 @@
 3.4  uint_t a_Utf8_decode(const char*, const char* end, int* len);
 3.5  int a_Utf8_encode(unsigned int ucs, char *buf);
 3.6  int a_Utf8_test(const char* src, unsigned int srclen);
 3.7 +bool_t a_Utf8_ideographic(const char *s, const char *end, int *len);
 3.8  
 3.9  #ifdef __cplusplus
3.10  }
@littlebat

This comment has been minimized.

Show comment
Hide comment
@littlebat

littlebat Jun 25, 2014

But, it seems there are spaces between Korean characters, the Atom still can't soft wrap Korean characters properly, I don't know why did this happen.

littlebat commented Jun 25, 2014

But, it seems there are spaces between Korean characters, the Atom still can't soft wrap Korean characters properly, I don't know why did this happen.

@littlebat

This comment has been minimized.

Show comment
Hide comment
@littlebat

littlebat Jun 25, 2014

But, I found Iceweasel browser, Gedit, Kate and Leafpad editor still soft wrap at every single Korean character when Dillo3 browser soft wrap Korean character at the space.

Below is the test text translated by Google translator(Translated from English to CJK), It's include (from up to down): Chinese, Japanese, Korean, English.(of course, you need install CJK font to view them.)

我们的论坛是一个地方,数千名学生,业余爱好者和来自世界各地的共享知识与理念的专业人士。您正在浏览我们的板作为客人,让你有限的访问,查看大多数的讨论和访问我们的其他功能。加入我们的免费社区你将有机会发表主题,与其他成员(下午)私下沟通,回应民意调查,上传内容和访问其他许多特殊功能。注册是快速,简单,并且完全免费的,所以请

私たちのフォーラムは、場所です学生、愛好家と世界シェアの知識やアイデア各国からの専門家の数千。現在、ほとんどの議論、回答を見ると、私たちの他の機能にアクセスするためにあなたの限られたアクセスを提供しますゲストとして私たちのボードを表示しています。私達の自由なコミュニティに参加することで、あなたは、トピックを投稿他のメンバー(PM)と個人的に通信し、投票に対応する、プロモーションおよび特別な機能をアップロードするためにアクセスできるようになります。登録は、高速でシンプルで絶対に無料ですので、ご覧ください

우리 포럼은 장소입니다 학생, 취미 및 세계 지식 공유 및 아이디어 각국에서 전문가의 수천. 현재는 대부분의 논의를 볼 수 있으며 다른 기능에 액세스 할 수 당신에게 제한된 액세스를 제공하는 게스트로 우리 보드를 볼 수 있습니다. 에 가입함으로써 우리 무료 커뮤니티는 주제를 게시 다른 회원 (PM) 개인적으로 통신, 설문 조사에 응답 콘텐츠 및 액세스 많은 다른 특수 기능을 업로드 할 수 액세스 할 수 있습니다. 등록은 빠르고, 간단하고 절대적으로 무료 이렇게하세요

Our forum is a place where thousands of students, hobbyists and professionals from around the world share knowledge and ideas. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please

littlebat commented Jun 25, 2014

But, I found Iceweasel browser, Gedit, Kate and Leafpad editor still soft wrap at every single Korean character when Dillo3 browser soft wrap Korean character at the space.

Below is the test text translated by Google translator(Translated from English to CJK), It's include (from up to down): Chinese, Japanese, Korean, English.(of course, you need install CJK font to view them.)

我们的论坛是一个地方,数千名学生,业余爱好者和来自世界各地的共享知识与理念的专业人士。您正在浏览我们的板作为客人,让你有限的访问,查看大多数的讨论和访问我们的其他功能。加入我们的免费社区你将有机会发表主题,与其他成员(下午)私下沟通,回应民意调查,上传内容和访问其他许多特殊功能。注册是快速,简单,并且完全免费的,所以请

私たちのフォーラムは、場所です学生、愛好家と世界シェアの知識やアイデア各国からの専門家の数千。現在、ほとんどの議論、回答を見ると、私たちの他の機能にアクセスするためにあなたの限られたアクセスを提供しますゲストとして私たちのボードを表示しています。私達の自由なコミュニティに参加することで、あなたは、トピックを投稿他のメンバー(PM)と個人的に通信し、投票に対応する、プロモーションおよび特別な機能をアップロードするためにアクセスできるようになります。登録は、高速でシンプルで絶対に無料ですので、ご覧ください

우리 포럼은 장소입니다 학생, 취미 및 세계 지식 공유 및 아이디어 각국에서 전문가의 수천. 현재는 대부분의 논의를 볼 수 있으며 다른 기능에 액세스 할 수 당신에게 제한된 액세스를 제공하는 게스트로 우리 보드를 볼 수 있습니다. 에 가입함으로써 우리 무료 커뮤니티는 주제를 게시 다른 회원 (PM) 개인적으로 통신, 설문 조사에 응답 콘텐츠 및 액세스 많은 다른 특수 기능을 업로드 할 수 액세스 할 수 있습니다. 등록은 빠르고, 간단하고 절대적으로 무료 이렇게하세요

Our forum is a place where thousands of students, hobbyists and professionals from around the world share knowledge and ideas. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please

@littlebat

This comment has been minimized.

Show comment
Hide comment
@littlebat

littlebat Jun 25, 2014

I attach the screenshot of view the test text above in Dillo3, Iceweasel browser and Leafpad, Atom editor below(from left to right, up to down, in Debian Jessie amd64, zh_CN.UTF-8 locale):
test_soft_wrap

littlebat commented Jun 25, 2014

I attach the screenshot of view the test text above in Dillo3, Iceweasel browser and Leafpad, Atom editor below(from left to right, up to down, in Debian Jessie amd64, zh_CN.UTF-8 locale):
test_soft_wrap

@azu

This comment has been minimized.

Show comment
Hide comment
@azu

azu Jul 30, 2014

👍 I have encountered same problem.

azu commented Jul 30, 2014

👍 I have encountered same problem.

@zhanzhenzhen

This comment has been minimized.

Show comment
Hide comment
@zhanzhenzhen

zhanzhenzhen Aug 18, 2014

Still not fixed. I recommend:

  1. Treat every CJK character (except punctuation characters) as a word.
  2. Recognize CJK punctuation characters (,。“”!、?‘’:《》;…()...). CJK punctuation characters are square-shaped and different from latin punctuation characters.

zhanzhenzhen commented Aug 18, 2014

Still not fixed. I recommend:

  1. Treat every CJK character (except punctuation characters) as a word.
  2. Recognize CJK punctuation characters (,。“”!、?‘’:《》;…()...). CJK punctuation characters are square-shaped and different from latin punctuation characters.
@dice

This comment has been minimized.

Show comment
Hide comment
@dice

dice Aug 18, 2014

There's this recently released package for handling Japanese encodings, it is still a work in progress but perhaps it can be a starting point for packages for Chinese and/or Korean: https://github.com/raccy/japanese-wrap

dice commented Aug 18, 2014

There's this recently released package for handling Japanese encodings, it is still a work in progress but perhaps it can be a starting point for packages for Chinese and/or Korean: https://github.com/raccy/japanese-wrap

@be5invis

This comment has been minimized.

Show comment
Hide comment
@be5invis

be5invis Aug 18, 2014

Korean corpus does not need special word wrapping handler -- there ARE spaces between them.
Japanese word wrapper should also work for Chinese, due to the similar punctuation system used.

be5invis commented Aug 18, 2014

Korean corpus does not need special word wrapping handler -- there ARE spaces between them.
Japanese word wrapper should also work for Chinese, due to the similar punctuation system used.

@zhanzhenzhen

This comment has been minimized.

Show comment
Hide comment
@zhanzhenzhen

zhanzhenzhen Aug 18, 2014

My proposal:

Group all CJK punctuation characters into 3 groups:

suffix:

,。”!?、’:;》…)]}』」

prefix:

“《‘([{『「

neutral:

—·

For example, a sentence 我说:“文本编辑器Atom真是太——棒——了!” should be parsed into:

我
说:
“文
本
编
辑
器
Atom
真
是
太
——
棒
——
了!”

Note:

  • The "neutral" punctuation character often appear in doubled form ——. There shouldn't be a word break between doubled punctuation marks.
  • It should be error-tolerant. For example, although 我说:“。” doesn't satisfy the language syntax, it should be parsed into
我
说:
“。”
  • Some other CJK characters such as are treated as punctuation characters by some browsers, but not by all browsers. So it would be better that we treat them as normal CJK characters.
  • I think latin punctuation marks ()[]{} should also be included, because sometimes people use the thin latin one instead of the fat CJK one. For example, we use the latin parentheses 这是Atom(一款编辑器)。. It should be parsed into:
这
是
Atom
(一
款
编
辑
器)。

For Korean text, although I don't understand Korean, but my browser test tells me it also has a word break between each character.

zhanzhenzhen commented Aug 18, 2014

My proposal:

Group all CJK punctuation characters into 3 groups:

suffix:

,。”!?、’:;》…)]}』」

prefix:

“《‘([{『「

neutral:

—·

For example, a sentence 我说:“文本编辑器Atom真是太——棒——了!” should be parsed into:

我
说:
“文
本
编
辑
器
Atom
真
是
太
——
棒
——
了!”

Note:

  • The "neutral" punctuation character often appear in doubled form ——. There shouldn't be a word break between doubled punctuation marks.
  • It should be error-tolerant. For example, although 我说:“。” doesn't satisfy the language syntax, it should be parsed into
我
说:
“。”
  • Some other CJK characters such as are treated as punctuation characters by some browsers, but not by all browsers. So it would be better that we treat them as normal CJK characters.
  • I think latin punctuation marks ()[]{} should also be included, because sometimes people use the thin latin one instead of the fat CJK one. For example, we use the latin parentheses 这是Atom(一款编辑器)。. It should be parsed into:
这
是
Atom
(一
款
编
辑
器)。

For Korean text, although I don't understand Korean, but my browser test tells me it also has a word break between each character.

@be5invis

This comment has been minimized.

Show comment
Hide comment
@be5invis

be5invis Aug 18, 2014

The regular expression /[$£¥‘“〈《「『【〔$([{「£¥]*[&#x3000;-&#x9fff;][!%,.:;?¢°’”‰′″℃、。々〉》」』】〕ぁぃぅぇぉっゃゅょゎ゛゜ゝゞァィゥェォッャュョヮヵヶゕゖㇰㇱㇲㇳㇴㇵㇶㇷㇸㇹㇺㇻㇼㇽㇾㇿ・ーヽヾ!%),.:;?]}。」、・ァィゥェォャュョッー゙゚¢]*/ should match "CJK words" correctly.

be5invis commented Aug 18, 2014

The regular expression /[$£¥‘“〈《「『【〔$([{「£¥]*[&#x3000;-&#x9fff;][!%,.:;?¢°’”‰′″℃、。々〉》」』】〕ぁぃぅぇぉっゃゅょゎ゛゜ゝゞァィゥェォッャュョヮヵヶゕゖㇰㇱㇲㇳㇴㇵㇶㇷㇸㇹㇺㇻㇼㇽㇾㇿ・ーヽヾ!%),.:;?]}。」、・ァィゥェォャュョッー゙゚¢]*/ should match "CJK words" correctly.

@saschanaz

This comment has been minimized.

Show comment
Hide comment
@saschanaz

saschanaz Sep 22, 2014

Hi.

Note that some characters including can also be double-spaced. Some fonts even disagree on certain character widths.

I once used measureText method on HTML5 canvas context to use soft-wrap in JavaScript, so I wonder whether similar approach is available on Atom.

saschanaz commented Sep 22, 2014

Hi.

Note that some characters including can also be double-spaced. Some fonts even disagree on certain character widths.

I once used measureText method on HTML5 canvas context to use soft-wrap in JavaScript, so I wonder whether similar approach is available on Atom.

@zhanzhenzhen

This comment has been minimized.

Show comment
Hide comment
@zhanzhenzhen

zhanzhenzhen Sep 22, 2014

In fact, CJK characters even in monospace fonts are not double-spaced. It takes approximately 1.6-1.7 spaces in Menlo. In terminal all non-latin characters takes 2 spaces so they look ugly (usually the character spacing is too big, but may be too small for wider Unicode characters).

I think the only way for Atom is to "test" the actual width, using something like measureText.

zhanzhenzhen commented Sep 22, 2014

In fact, CJK characters even in monospace fonts are not double-spaced. It takes approximately 1.6-1.7 spaces in Menlo. In terminal all non-latin characters takes 2 spaces so they look ugly (usually the character spacing is too big, but may be too small for wider Unicode characters).

I think the only way for Atom is to "test" the actual width, using something like measureText.

@saschanaz

This comment has been minimized.

Show comment
Hide comment
@saschanaz

saschanaz Sep 22, 2014

I built a quick JavaScript sample using measureText as a proof-of-concept. The upper one is just automatically wrapped one by browser, and the bottom one is wrapped by measureText. Try resizing the window.

saschanaz commented Sep 22, 2014

I built a quick JavaScript sample using measureText as a proof-of-concept. The upper one is just automatically wrapped one by browser, and the bottom one is wrapped by measureText. Try resizing the window.

@be5invis

This comment has been minimized.

Show comment
Hide comment
@be5invis

be5invis Sep 22, 2014

@saschanaz Well... you break up words? There is no Kinshoku Shori at all.
As @zhanzhenzhen metioned, the width of a Han character is not exactly twice as an Latin character. Gridfits make this situation even worse: to increase readability, the width of glyphs could be changed using instructions, causes the actual glyph advance not equal to the value recorded in the font.

be5invis commented Sep 22, 2014

@saschanaz Well... you break up words? There is no Kinshoku Shori at all.
As @zhanzhenzhen metioned, the width of a Han character is not exactly twice as an Latin character. Gridfits make this situation even worse: to increase readability, the width of glyphs could be changed using instructions, causes the actual glyph advance not equal to the value recorded in the font.

@saschanaz

This comment has been minimized.

Show comment
Hide comment
@saschanaz

saschanaz Sep 22, 2014

@be5invis Well, this sample loads and breaks each text line and tests its width by HTML5 measureText method. There is no gridfits nor double-space assumption, but the line widths which are reported directly from browser using proper font data.

saschanaz commented Sep 22, 2014

@be5invis Well, this sample loads and breaks each text line and tests its width by HTML5 measureText method. There is no gridfits nor double-space assumption, but the line widths which are reported directly from browser using proper font data.

@be5invis

This comment has been minimized.

Show comment
Hide comment
@be5invis

be5invis Sep 22, 2014

@saschanaz We should calculate the width using "words" instead of characters, and place "words" instead of characters either.

be5invis commented Sep 22, 2014

@saschanaz We should calculate the width using "words" instead of characters, and place "words" instead of characters either.

@zhanzhenzhen

This comment has been minimized.

Show comment
Hide comment
@zhanzhenzhen

zhanzhenzhen Sep 22, 2014

@saschanaz: Thanks for the example! Seems measureText works!
@be5invis: I think this example just serves as a demo of measureText, so word breaks here isn't important.

zhanzhenzhen commented Sep 22, 2014

@saschanaz: Thanks for the example! Seems measureText works!
@be5invis: I think this example just serves as a demo of measureText, so word breaks here isn't important.

@saschanaz

This comment has been minimized.

Show comment
Hide comment
@saschanaz

saschanaz Sep 22, 2014

@be5invis: That's right, but I think it can be achieved by some more tweak on this sample. I posted it to prove that we can utilize measureText to test the actual text width.

I think this wrapping problem is because of a false pre-assumption that every characters are strictly mono-spaced. Even an English-only document cannot be wrapped correctly if you choose a non-monospace font.

dynamic space

Testing by measureText may help us correct this issue.

(ConEmu also has similar wrapping problem. It does not wrap a CJK text line until its width becomes almost twice of the window width.)

@zhanzhenzhen: Great! :D

saschanaz commented Sep 22, 2014

@be5invis: That's right, but I think it can be achieved by some more tweak on this sample. I posted it to prove that we can utilize measureText to test the actual text width.

I think this wrapping problem is because of a false pre-assumption that every characters are strictly mono-spaced. Even an English-only document cannot be wrapped correctly if you choose a non-monospace font.

dynamic space

Testing by measureText may help us correct this issue.

(ConEmu also has similar wrapping problem. It does not wrap a CJK text line until its width becomes almost twice of the window width.)

@zhanzhenzhen: Great! :D

@raccy

This comment has been minimized.

Show comment
Hide comment
@raccy

raccy Apr 17, 2015

@saschanaz Great working!
I'm looking forward to that your work is complete.
and... I'll be happy, because I don't have to fix japanese-wrap with measureText,
and for Chinese and Korean.
やっとメンテナンスの重圧から逃れられる(-д-; (Do not translate!)

raccy commented Apr 17, 2015

@saschanaz Great working!
I'm looking forward to that your work is complete.
and... I'll be happy, because I don't have to fix japanese-wrap with measureText,
and for Chinese and Korean.
やっとメンテナンスの重圧から逃れられる(-д-; (Do not translate!)

@saschanaz

This comment has been minimized.

Show comment
Hide comment
@saschanaz

saschanaz Apr 18, 2015

@raccy A big 👍 for your pioneering work. :D I'm also looking forward to seeing this issue fixed.

saschanaz commented Apr 18, 2015

@raccy A big 👍 for your pioneering work. :D I'm also looking forward to seeing this issue fixed.

@raccy raccy referenced this issue May 3, 2015

Closed

Soft wrap for CJK #14

@huangxg

This comment has been minimized.

Show comment
Hide comment
@huangxg

huangxg May 27, 2015

@saschanaz Thanks for your AuomicChar. Once I enable the package, lines with Chinese characters do wrap at window edge, but not at Preferred Line Length 80. And lines with Latin characters lost wrap, I have to disable it to bring the wrap on Latin lines back.

OS: Mac OS X Yosemite 10.10.3
Atom 0.201.0
AtomicChar 0.3.6

huangxg commented May 27, 2015

@saschanaz Thanks for your AuomicChar. Once I enable the package, lines with Chinese characters do wrap at window edge, but not at Preferred Line Length 80. And lines with Latin characters lost wrap, I have to disable it to bring the wrap on Latin lines back.

OS: Mac OS X Yosemite 10.10.3
Atom 0.201.0
AtomicChar 0.3.6

@Tedko

This comment has been minimized.

Show comment
Hide comment
@Tedko

Tedko commented May 29, 2015

2015-05-28 18_41_54-undeliverable_ visual studio code cannot wrap cjk characters properly when soft

@flyisland

This comment has been minimized.

Show comment
Hide comment
@flyisland

flyisland Jul 31, 2015

@saschanaz Thanks for your AuomicChar. Both Chinese and English characters are wrapped correctly now!

OS: Win 7
Atom 1.03
AtomicChar 0.3.8

flyisland commented Jul 31, 2015

@saschanaz Thanks for your AuomicChar. Both Chinese and English characters are wrapped correctly now!

OS: Win 7
Atom 1.03
AtomicChar 0.3.8

@frantic1048

This comment has been minimized.

Show comment
Hide comment
@frantic1048

frantic1048 Aug 6, 2015

Contributor

@saschanaz

Thanks you, works well !

BTW, is the gap on the right edge sometimes too big ?

preview

OS : Linux 4.1.4-1-ARCH
Package : archlinuxcn/atom-editor-bin, Precompiled binary from official repository
Atom Version : 1.03
AtomicChar Version : 0.3.8

Contributor

frantic1048 commented Aug 6, 2015

@saschanaz

Thanks you, works well !

BTW, is the gap on the right edge sometimes too big ?

preview

OS : Linux 4.1.4-1-ARCH
Package : archlinuxcn/atom-editor-bin, Precompiled binary from official repository
Atom Version : 1.03
AtomicChar Version : 0.3.8

@jkiss

This comment has been minimized.

Show comment
Hide comment
@jkiss

jkiss Aug 8, 2015

japanese-wrap can work well, thanks!

jkiss commented Aug 8, 2015

japanese-wrap can work well, thanks!

@Superpencil

This comment has been minimized.

Show comment
Hide comment
@Superpencil

Superpencil commented Aug 21, 2015

+1

@Blaisorblade

This comment has been minimized.

Show comment
Hide comment
@Blaisorblade

Blaisorblade Aug 27, 2015

Have you considered using the wcwidth API to measure how many columns a string takes? I suggested that earlier, but that was somewhere else. I also pointed to the code I believe should be changed to fix this cleanly:
https://discuss.atom.io/t/preferred-line-length-dont-support-chinese/4666/3?u=blaisorblade

Blaisorblade commented Aug 27, 2015

Have you considered using the wcwidth API to measure how many columns a string takes? I suggested that earlier, but that was somewhere else. I also pointed to the code I believe should be changed to fix this cleanly:
https://discuss.atom.io/t/preferred-line-length-dont-support-chinese/4666/3?u=blaisorblade

@Superpencil

This comment has been minimized.

Show comment
Hide comment
@Superpencil

Superpencil Sep 1, 2015

Cant we just write some js that breaks the line at window width by adding a linebreak?

Superpencil commented Sep 1, 2015

Cant we just write some js that breaks the line at window width by adding a linebreak?

@benogle

This comment has been minimized.

Show comment
Hide comment
@benogle

benogle Oct 15, 2015

Contributor

Hi all, there is a PR open to fix this #9162 If anyone of you are building from master, please give this PR a try!

cc @as-cii

Contributor

benogle commented Oct 15, 2015

Hi all, there is a PR open to fix this #9162 If anyone of you are building from master, please give this PR a try!

cc @as-cii

@benogle benogle referenced this issue Oct 16, 2015

Merged

CJK soft wrap #9162

@benogle

This comment has been minimized.

Show comment
Hide comment
@benogle

benogle Oct 16, 2015

Contributor

This has been fixed by #9162. It will be out in the beta channel next week, and is in master now! ❤️ @as-cii

Contributor

benogle commented Oct 16, 2015

This has been fixed by #9162. It will be out in the beta channel next week, and is in master now! ❤️ @as-cii

@benogle benogle closed this Oct 16, 2015

@chloerei

This comment has been minimized.

Show comment
Hide comment
@chloerei

chloerei Oct 30, 2015

Still some problem.

Text:

文字文字 English 文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字

1.2.0-beta0:

8

When mixing English and Chinese(or Japanese) with whitespace, Chinese(or Japanese) sentence can be break, instead of break in whitespace. like browser:

文字文字 English 文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字

chloerei commented Oct 30, 2015

Still some problem.

Text:

文字文字 English 文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字

1.2.0-beta0:

8

When mixing English and Chinese(or Japanese) with whitespace, Chinese(or Japanese) sentence can be break, instead of break in whitespace. like browser:

文字文字 English 文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字文字

@benogle

This comment has been minimized.

Show comment
Hide comment
@benogle

benogle Nov 4, 2015

Contributor

Can you make a new issue @chloerei?

Contributor

benogle commented Nov 4, 2015

Can you make a new issue @chloerei?

@chloerei

This comment has been minimized.

Show comment
Hide comment
@chloerei

chloerei commented Nov 5, 2015

@webhacking

This comment has been minimized.

Show comment
Hide comment
@webhacking

webhacking Feb 11, 2016

using support fonts CamingoCode, Liberation Mono, DejaVu Sans Mono, monospace

webhacking commented Feb 11, 2016

using support fonts CamingoCode, Liberation Mono, DejaVu Sans Mono, monospace

@lock

This comment has been minimized.

Show comment
Hide comment
@lock

lock bot Apr 14, 2018

This issue has been automatically locked since there has not been any recent activity after it was closed. If you can still reproduce this issue in Safe Mode then please open a new issue and fill out the entire issue template to ensure that we have enough information to address your issue. Thanks!

lock bot commented Apr 14, 2018

This issue has been automatically locked since there has not been any recent activity after it was closed. If you can still reproduce this issue in Safe Mode then please open a new issue and fill out the entire issue template to ensure that we have enough information to address your issue. Thanks!

@lock lock bot locked and limited conversation to collaborators Apr 14, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.