New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect spacing in REPL for combining characters #6939

Closed
stevengj opened this Issue May 23, 2014 · 34 comments

Comments

Projects
None yet
8 participants
@stevengj
Member

stevengj commented May 23, 2014

If you type e.g. \alpha<TAB>\hat<TAB>, it makes α̂. However, on my machine (MacOS) it displays an extra space after the character, which weirdly disappears when you hit <RETURN> or <TAB>.

cc: @loladiro

@stevengj stevengj added REPL labels May 23, 2014

@aelg

This comment has been minimized.

Show comment
Hide comment
@aelg

aelg May 27, 2014

Also doing julia> \hat<TAB> puts the ^ on the > in the prompt.

aelg commented May 27, 2014

Also doing julia> \hat<TAB> puts the ^ on the > in the prompt.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj May 27, 2014

Member

@aelg, \hat generates a Unicode combining character, which applies the hat to the previous character. So it isn't going to work quite like LaTeX (which applies the hat to the subsequent character).

Member

stevengj commented May 27, 2014

@aelg, \hat generates a Unicode combining character, which applies the hat to the previous character. So it isn't going to work quite like LaTeX (which applies the hat to the subsequent character).

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno May 27, 2014

Member

We could probably put some kind of noncombining separator after the prompt though to prevent that from happening.

Member

Keno commented May 27, 2014

We could probably put some kind of noncombining separator after the prompt though to prevent that from happening.

@aelg

This comment has been minimized.

Show comment
Hide comment
@aelg

aelg May 27, 2014

@stevengj No I understand that, I just thought it was worth mentioning, as it's probably not what anyone would want. It seems related enough, to the bug you reported, to mention it here instead of creating a new issue.

aelg commented May 27, 2014

@stevengj No I understand that, I just thought it was worth mentioning, as it's probably not what anyone would want. It seems related enough, to the bug you reported, to mention it here instead of creating a new issue.

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Jun 9, 2014

Member

What should the behavior for navigating across combining characters be?
Some options:

  1. Do the same thing as now and just fix the display. This would mean that a|^ and a^| (where ^ indicates combining hat and | indicates cursor position) look the same.
  2. Step over the combined character.
  3. Do something more fancy such as splitting the combining characters when the cursor is in between them.
Member

Keno commented Jun 9, 2014

What should the behavior for navigating across combining characters be?
Some options:

  1. Do the same thing as now and just fix the display. This would mean that a|^ and a^| (where ^ indicates combining hat and | indicates cursor position) look the same.
  2. Step over the combined character.
  3. Do something more fancy such as splitting the combining characters when the cursor is in between them.
@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Jun 9, 2014

Member

Option 1 sounds good.

Member

JeffBezanson commented Jun 9, 2014

Option 1 sounds good.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jun 9, 2014

Member

Option 2 sounds better to me. (But I'd still like to have to hit twice to delete the combined character, so that I can delete just the decoration.) But option 1 should be fine for now.

Note that utf8proc will identify graphemes for you, if you want to move the cursor in units of graphemes.

Member

stevengj commented Jun 9, 2014

Option 2 sounds better to me. (But I'd still like to have to hit twice to delete the combined character, so that I can delete just the decoration.) But option 1 should be fine for now.

Note that utf8proc will identify graphemes for you, if you want to move the cursor in units of graphemes.

@carlobaldassi

This comment has been minimized.

Show comment
Hide comment
@carlobaldassi

carlobaldassi Jun 9, 2014

Member

FWIW in vim the behaviour is option 2. I don't know about other editors but it should be easy to test now that all of them implement latex substitution.

Member

carlobaldassi commented Jun 9, 2014

FWIW in vim the behaviour is option 2. I don't know about other editors but it should be easy to test now that all of them implement latex substitution.

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Jun 9, 2014

Member

I've been using option 1 for the past 5 minutes and I hate it, so I'll try option 2 now.

Member

Keno commented Jun 9, 2014

I've been using option 1 for the past 5 minutes and I hate it, so I'll try option 2 now.

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Jun 9, 2014

Member

Any by that I mean just navigation. Deletion will still delete the combining character.

Member

Keno commented Jun 9, 2014

Any by that I mean just navigation. Deletion will still delete the combining character.

Keno added a commit that referenced this issue Jun 9, 2014

@Keno Keno closed this in 953a1d4 Jun 10, 2014

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jun 10, 2014

Member

Still doesn't work for me in MacOS 10.8.5 Terminal. Typing x\hat<TAB> gives an extra space (and then typing subsequent characters, arrow keys, delete etc. is wonky).

Member

stevengj commented Jun 10, 2014

Still doesn't work for me in MacOS 10.8.5 Terminal. Typing x\hat<TAB> gives an extra space (and then typing subsequent characters, arrow keys, delete etc. is wonky).

@stevengj stevengj reopened this Jun 10, 2014

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Jun 10, 2014

Member

Odd, let me see.

Member

Keno commented Jun 10, 2014

Odd, let me see.

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Jun 10, 2014

Member

Works for me on OS X 10.9.3. What is strwidth(string(:x\hat<TAB>)) ?

Member

Keno commented Jun 10, 2014

Works for me on OS X 10.9.3. What is strwidth(string(:x\hat<TAB>)) ?

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jun 10, 2014

Member

strwidth(string(:x̂)) gives 2.

Member

stevengj commented Jun 10, 2014

strwidth(string(:x̂)) gives 2.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jun 10, 2014

Member

charwidth(char(0x0302)) returns 1 for me. Looks like the wcwidth function may not be trustworthy on MacOS 8?

Member

stevengj commented Jun 10, 2014

charwidth(char(0x0302)) returns 1 for me. Looks like the wcwidth function may not be trustworthy on MacOS 8?

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Jun 10, 2014

Member

You must mean OS X 10.8, right, not actually MacOS 8? (MacOS 8 predates Unicode.)

Member

StefanKarpinski commented Jun 10, 2014

You must mean OS X 10.8, right, not actually MacOS 8? (MacOS 8 predates Unicode.)

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Jun 10, 2014

Member

Ah, that's wrong. Maybe we should include the appropriate table? Last time the policy that @StefanKarpinski proposed on that was "Get a better OS", but maybe now that it's important that's different.

Member

Keno commented Jun 10, 2014

Ah, that's wrong. Maybe we should include the appropriate table? Last time the policy that @StefanKarpinski proposed on that was "Get a better OS", but maybe now that it's important that's different.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Jun 10, 2014

Member

Haha. I can't be held to every asinine thing I've ever said ;-)

Member

StefanKarpinski commented Jun 10, 2014

Haha. I can't be held to every asinine thing I've ever said ;-)

@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Jun 10, 2014

Member

At least what you said wasn't "arsenate".

Member

jiahao commented Jun 10, 2014

At least what you said wasn't "arsenate".

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jun 10, 2014

Member

We are already using a replacement wcwidth function (src/support/wcwidth.c) for Windows, as I understand it. Maybe just use it elsewhere as well? At least on MacOS 10.8 and earlier (including MacOS 8, of course).

(Though it might be a bit out of date; it looks like it needs to be updated for Unicode 6.)

Member

stevengj commented Jun 10, 2014

We are already using a replacement wcwidth function (src/support/wcwidth.c) for Windows, as I understand it. Maybe just use it elsewhere as well? At least on MacOS 10.8 and earlier (including MacOS 8, of course).

(Though it might be a bit out of date; it looks like it needs to be updated for Unicode 6.)

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jun 10, 2014

Member

Or maybe we should just use utf8proc to get the unicode category, and assign a width of 0 to combining characters and 1 to everything else?

@jiahao, does the latest REPL handle CJK characters sensibly if they are assigned a charwidth of 2 (which is what src/support/wcwidth.c seems to do)?

Member

stevengj commented Jun 10, 2014

Or maybe we should just use utf8proc to get the unicode category, and assign a width of 0 to combining characters and 1 to everything else?

@jiahao, does the latest REPL handle CJK characters sensibly if they are assigned a charwidth of 2 (which is what src/support/wcwidth.c seems to do)?

@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Jun 11, 2014

Member

I haven't noticed much craziness with displaying CJK characters. Korean input however relies heavily on combining vowels and consonants (which can be input separately) into syllables (which are rendered as individual characters); those should be doublewidth.

Member

jiahao commented Jun 11, 2014

I haven't noticed much craziness with displaying CJK characters. Korean input however relies heavily on combining vowels and consonants (which can be input separately) into syllables (which are rendered as individual characters); those should be doublewidth.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jun 11, 2014

Member

@jiahao, we might only use our custom wcwidth on Windows. What is the charwidth of a CJK character on your machine?

Member

stevengj commented Jun 11, 2014

@jiahao, we might only use our custom wcwidth on Windows. What is the charwidth of a CJK character on your machine?

@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Jun 12, 2014

Member

Most of them are 2 but there are a few exceptions (notably, half-width katakana variants in Japanese):

julia> charwidth('')
2

julia> charwidth('')
2

julia> charwidth('')
2

julia> charwidth('')
2

julia> charwidth('')
1

julia> charwidth('')
2
Member

jiahao commented Jun 12, 2014

Most of them are 2 but there are a few exceptions (notably, half-width katakana variants in Japanese):

julia> charwidth('')
2

julia> charwidth('')
2

julia> charwidth('')
2

julia> charwidth('')
2

julia> charwidth('')
1

julia> charwidth('')
2
@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Jun 12, 2014

Member

I don't speak Malayalam, but this seems off. (OSX 10.9.3):

julia> charwidth('') #U+0d60
1
Member

jiahao commented Jun 12, 2014

I don't speak Malayalam, but this seems off. (OSX 10.9.3):

julia> charwidth('') #U+0d60
1
@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Jun 12, 2014

Member

This gist identifies 281 discrepancies between charwidth on OSX 10.9.3 and the officially documented character width specifications in the latest (v6.3) Unicode tables.

Member

jiahao commented Jun 12, 2014

This gist identifies 281 discrepancies between charwidth on OSX 10.9.3 and the officially documented character width specifications in the latest (v6.3) Unicode tables.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jun 12, 2014

Member

It would be nice if we could get this from utf8proc, but I don't see a charwidth there at first glance. Of course, first utf8proc has to be updated for Unicode 6, and maybe at the same time its database could be updated to include character widths. (Unfortunately, there is no public version-control repository for utf8proc, although the author told me in February that he was willing to do so, pending some cleanup.)

Member

stevengj commented Jun 12, 2014

It would be nice if we could get this from utf8proc, but I don't see a charwidth there at first glance. Of course, first utf8proc has to be updated for Unicode 6, and maybe at the same time its database could be updated to include character widths. (Unfortunately, there is no public version-control repository for utf8proc, although the author told me in February that he was willing to do so, pending some cleanup.)

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Jun 12, 2014

Member

Yes, that would be ideal.

Member

Keno commented Jun 12, 2014

Yes, that would be ideal.

jiahao added a commit to jiahao/jin that referenced this issue Jun 13, 2014

Added quick check of charwidth
Checks output of charwidth against latest Unicode charcater tables (see
UAX #11)

Ref: JuliaLang/julia#6939
@mbauman

This comment has been minimized.

Show comment
Hide comment
@mbauman

mbauman Jul 2, 2014

Member

Hrm. Now some of the super- and sub-script latex characters are behaving funny in the REPL, too. Mac OS 10.9.2 seems to think that all super- and sub-script letters have width 0. Symbols and numbers seem to be ok, though.

julia> charwidth('ᴿ')
0

julia> charwidth('ᵦ')
0

julia> charwidth('₁')
1

julia> charwidth('⁺')
1

I haven't had a chance to figure out when this broke, but I'm pretty sure this worked at one point.

Member

mbauman commented Jul 2, 2014

Hrm. Now some of the super- and sub-script latex characters are behaving funny in the REPL, too. Mac OS 10.9.2 seems to think that all super- and sub-script letters have width 0. Symbols and numbers seem to be ok, though.

julia> charwidth('ᴿ')
0

julia> charwidth('ᵦ')
0

julia> charwidth('₁')
1

julia> charwidth('⁺')
1

I haven't had a chance to figure out when this broke, but I'm pretty sure this worked at one point.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jul 3, 2014

Member

@mbauman, charwidth('ᴿ') == 0 on MacOS 10.7.5 as well.

Member

stevengj commented Jul 3, 2014

@mbauman, charwidth('ᴿ') == 0 on MacOS 10.7.5 as well.

@mbauman

This comment has been minimized.

Show comment
Hide comment
@mbauman

mbauman Jul 3, 2014

Member

Thinking about this more, I bet a bisect would blame the fix for this issue (953a1d4). These super- and sub-scripts are just collateral damage in making combining characters work properly.

Member

mbauman commented Jul 3, 2014

Thinking about this more, I bet a bisect would blame the fix for this issue (953a1d4). These super- and sub-scripts are just collateral damage in making combining characters work properly.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jul 3, 2014

Member

@mbauman, I don't follow you. The charwidth function is simply a thin wrapper around the wcwidth C library function, so I don't see how it would have been affected by REPL patches.

What is happening seems to be that the OS X wcwidth function is simply buggy. (On GNU/Linux, charwidth('ᴿ') returns 1.) And on Windows the wcwidth function is utterly broken because it takes a 16-bit argument, so it has no chance of working except for a subset of Unicode (the BMP), which is why on Windows we already use our own wcwidth replacement function.

Member

stevengj commented Jul 3, 2014

@mbauman, I don't follow you. The charwidth function is simply a thin wrapper around the wcwidth C library function, so I don't see how it would have been affected by REPL patches.

What is happening seems to be that the OS X wcwidth function is simply buggy. (On GNU/Linux, charwidth('ᴿ') returns 1.) And on Windows the wcwidth function is utterly broken because it takes a 16-bit argument, so it has no chance of working except for a subset of Unicode (the BMP), which is why on Windows we already use our own wcwidth replacement function.

@mbauman

This comment has been minimized.

Show comment
Hide comment
@mbauman

mbauman Jul 3, 2014

Member

Yup, exactly. It's just that (I think) the REPL didn't honor charwidth until that patch (actually, maybe it was a different patch; I haven't looked closely at the changes). It's the correct behavior… it just stinks that we need to work around buggy implementations.

Member

mbauman commented Jul 3, 2014

Yup, exactly. It's just that (I think) the REPL didn't honor charwidth until that patch (actually, maybe it was a different patch; I haven't looked closely at the changes). It's the correct behavior… it just stinks that we need to work around buggy implementations.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Dec 17, 2014

Member

Couldn't the charwidth(c) != 0 tests in LineEdit.jl be replaced by isprint(c)?

Member

stevengj commented Dec 17, 2014

Couldn't the charwidth(c) != 0 tests in LineEdit.jl be replaced by isprint(c)?

stevengj added a commit to stevengj/julia that referenced this issue Mar 28, 2015

stevengj added a commit to stevengj/julia that referenced this issue Mar 28, 2015

stevengj added a commit to stevengj/julia that referenced this issue Mar 29, 2015

stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

@stevengj stevengj closed this in 58578b0 Mar 30, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment