# incorrect spacing in REPL for combining characters #6939

Closed
opened this Issue May 23, 2014 · 34 comments

Projects
None yet
8 participants
Member

### stevengj commented May 23, 2014

 If you type e.g. \alpha\hat, it makes α̂. However, on my machine (MacOS) it displays an extra space after the character, which weirdly disappears when you hit  or . cc: @loladiro

### aelg commented May 27, 2014

 Also doing julia> \hat puts the ^ on the > in the prompt.
Member

### stevengj commented May 27, 2014

 @aelg, \hat generates a Unicode combining character, which applies the hat to the previous character. So it isn't going to work quite like LaTeX (which applies the hat to the subsequent character).
Member

### Keno commented May 27, 2014

 We could probably put some kind of noncombining separator after the prompt though to prevent that from happening.

### aelg commented May 27, 2014

 @stevengj No I understand that, I just thought it was worth mentioning, as it's probably not what anyone would want. It seems related enough, to the bug you reported, to mention it here instead of creating a new issue.
Member

### Keno commented Jun 9, 2014

 What should the behavior for navigating across combining characters be? Some options: Do the same thing as now and just fix the display. This would mean that a|^ and a^| (where ^ indicates combining hat and | indicates cursor position) look the same. Step over the combined character. Do something more fancy such as splitting the combining characters when the cursor is in between them.
Member

### JeffBezanson commented Jun 9, 2014

 Option 1 sounds good.
Member

### stevengj commented Jun 9, 2014

 Option 2 sounds better to me. (But I'd still like to have to hit twice to delete the combined character, so that I can delete just the decoration.) But option 1 should be fine for now. Note that utf8proc will identify graphemes for you, if you want to move the cursor in units of graphemes.
Member

### carlobaldassi commented Jun 9, 2014

 FWIW in vim the behaviour is option 2. I don't know about other editors but it should be easy to test now that all of them implement latex substitution.
Member

### Keno commented Jun 9, 2014

 I've been using option 1 for the past 5 minutes and I hate it, so I'll try option 2 now.
Member

### Keno commented Jun 9, 2014

 Any by that I mean just navigation. Deletion will still delete the combining character.

### Keno added a commit that referenced this issue Jun 9, 2014

 Fix #6939 
 982f7c7 

### Keno added a commit that referenced this issue Jun 10, 2014

 Merge pull request #7210 from JuliaLang/kf/replcombine 
Fix #6939
 10f5754 
Member

### stevengj commented Jun 10, 2014

 Still doesn't work for me in MacOS 10.8.5 Terminal. Typing x\hat gives an extra space (and then typing subsequent characters, arrow keys, delete etc. is wonky).

Member

### Keno commented Jun 10, 2014

 Odd, let me see.
Member

### Keno commented Jun 10, 2014

 Works for me on OS X 10.9.3. What is strwidth(string(:x\hat)) ?
Member

### stevengj commented Jun 10, 2014

 strwidth(string(:x̂)) gives 2.
Member

### stevengj commented Jun 10, 2014

 charwidth(char(0x0302)) returns 1 for me. Looks like the wcwidth function may not be trustworthy on MacOS 8?
Member

### StefanKarpinski commented Jun 10, 2014

 You must mean OS X 10.8, right, not actually MacOS 8? (MacOS 8 predates Unicode.)
Member

### Keno commented Jun 10, 2014

 Ah, that's wrong. Maybe we should include the appropriate table? Last time the policy that @StefanKarpinski proposed on that was "Get a better OS", but maybe now that it's important that's different.
Member

### StefanKarpinski commented Jun 10, 2014

 Haha. I can't be held to every asinine thing I've ever said ;-)
Member

### jiahao commented Jun 10, 2014

 At least what you said wasn't "arsenate".
Member

### stevengj commented Jun 10, 2014

 We are already using a replacement wcwidth function (src/support/wcwidth.c) for Windows, as I understand it. Maybe just use it elsewhere as well? At least on MacOS 10.8 and earlier (including MacOS 8, of course). (Though it might be a bit out of date; it looks like it needs to be updated for Unicode 6.)
Member

### stevengj commented Jun 10, 2014

 Or maybe we should just use utf8proc to get the unicode category, and assign a width of 0 to combining characters and 1 to everything else? @jiahao, does the latest REPL handle CJK characters sensibly if they are assigned a charwidth of 2 (which is what src/support/wcwidth.c seems to do)?
Member

### jiahao commented Jun 11, 2014

 I haven't noticed much craziness with displaying CJK characters. Korean input however relies heavily on combining vowels and consonants (which can be input separately) into syllables (which are rendered as individual characters); those should be doublewidth.
Member

### stevengj commented Jun 11, 2014

 @jiahao, we might only use our custom wcwidth on Windows. What is the charwidth of a CJK character on your machine?
Member

### jiahao commented Jun 12, 2014

 Most of them are 2 but there are a few exceptions (notably, half-width katakana variants in Japanese): julia> charwidth('零') 2 julia> charwidth('や') 2 julia> charwidth('수') 2 julia> charwidth('ㅅ') 2 julia> charwidth('ｶ') 1 julia> charwidth('カ') 2
Member

### jiahao commented Jun 12, 2014

 I don't speak Malayalam, but this seems off. (OSX 10.9.3): julia> charwidth('ൠ') #U+0d60 1
Member

### jiahao commented Jun 12, 2014

 This gist identifies 281 discrepancies between charwidth on OSX 10.9.3 and the officially documented character width specifications in the latest (v6.3) Unicode tables.
Member

### stevengj commented Jun 12, 2014

 It would be nice if we could get this from utf8proc, but I don't see a charwidth there at first glance. Of course, first utf8proc has to be updated for Unicode 6, and maybe at the same time its database could be updated to include character widths. (Unfortunately, there is no public version-control repository for utf8proc, although the author told me in February that he was willing to do so, pending some cleanup.)
Member

### Keno commented Jun 12, 2014

 Yes, that would be ideal.

### jiahao added a commit to jiahao/jin that referenced this issue Jun 13, 2014

 Added quick check of charwidth 
Checks output of charwidth against latest Unicode charcater tables (see
UAX #11)

Ref: JuliaLang/julia#6939
 c5b5474 

Closed

Member

### mbauman commented Jul 2, 2014

 Hrm. Now some of the super- and sub-script latex characters are behaving funny in the REPL, too. Mac OS 10.9.2 seems to think that all super- and sub-script letters have width 0. Symbols and numbers seem to be ok, though. julia> charwidth('ᴿ') 0 julia> charwidth('ᵦ') 0 julia> charwidth('₁') 1 julia> charwidth('⁺') 1  I haven't had a chance to figure out when this broke, but I'm pretty sure this worked at one point.
Member

### stevengj commented Jul 3, 2014

 @mbauman, charwidth('ᴿ') == 0 on MacOS 10.7.5 as well.
Member

### mbauman commented Jul 3, 2014

 Thinking about this more, I bet a bisect would blame the fix for this issue (953a1d4). These super- and sub-scripts are just collateral damage in making combining characters work properly.
Member

### stevengj commented Jul 3, 2014

 @mbauman, I don't follow you. The charwidth function is simply a thin wrapper around the wcwidth C library function, so I don't see how it would have been affected by REPL patches. What is happening seems to be that the OS X wcwidth function is simply buggy. (On GNU/Linux, charwidth('ᴿ') returns 1.) And on Windows the wcwidth function is utterly broken because it takes a 16-bit argument, so it has no chance of working except for a subset of Unicode (the BMP), which is why on Windows we already use our own wcwidth replacement function.
Member

### mbauman commented Jul 3, 2014

 Yup, exactly. It's just that (I think) the REPL didn't honor charwidth until that patch (actually, maybe it was a different patch; I haven't looked closely at the changes). It's the correct behavior… it just stinks that we need to work around buggy implementations.

Closed

Closed

Closed

Closed

Member

### stevengj commented Dec 17, 2014

 Couldn't the charwidth(c) != 0 tests in LineEdit.jl be replaced by isprint(c)?

Closed

### stevengj added a commit to stevengj/julia that referenced this issue Mar 28, 2015

 replace wcwidth by utf8proc_charwidth (fixes #3721, closes #6939) 
 40d5719 

### stevengj added a commit to stevengj/julia that referenced this issue Mar 28, 2015

 replace wcwidth by utf8proc_charwidth (fixes #3721, closes #6939) 
 7d908f9 

Merged

### stevengj added a commit to stevengj/julia that referenced this issue Mar 29, 2015

 replace wcwidth by utf8proc_charwidth (fixes #3721, closes #6939) 
 6bf5e61 

### stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

 replace wcwidth by utf8proc_charwidth (fixes #3721, closes #6939) 
 e48d8d0 

### stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

 update libmojibake -> utf8proc 1.2 (closes #10654); replace wcwidth b… 
…y utf8proc_charwidth (fixes #3721, closes #6939)
 baefdfb 

### stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

 update libmojibake -> utf8proc 1.2 (closes #10654); replace wcwidth b… 
…y utf8proc_charwidth (fixes #3721, closes #6939)
 9f58ecd 

### stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

 update libmojibake -> utf8proc 1.2 (closes #10654); replace wcwidth b… 
…y utf8proc_charwidth (fixes #3721, closes #6939)
 0fb98b3 

### stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

 update libmojibake -> utf8proc 1.2 (closes #10654); replace wcwidth b… 
…y utf8proc_charwidth (fixes #3721, closes #6939)
 90c0fb4 

### stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

 update libmojibake -> utf8proc 1.2 (closes #10654); replace wcwidth b… 
…y utf8proc_charwidth (fixes #3721, closes #6939)
 95e29d2 

### stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

 update libmojibake -> utf8proc 1.2 (closes #10654); replace wcwidth b… 
…y utf8proc_charwidth (fixes #3721, closes #6939)
 4813b29 

### stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

 update libmojibake -> utf8proc 1.2 (closes #10654); replace wcwidth b… 
…y utf8proc_charwidth (fixes #3721, closes #6939)
 60aee18 

### stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015

 update libmojibake -> utf8proc 1.2 (closes #10654); replace wcwidth b… 
…y utf8proc_charwidth (fixes #3721, closes #6939)
 c3c0411