Skip to content

Ace Editor changes certain UTF8 characters to be equivalent (but not equal) codepoints #716

Closed
paulcbetts opened this Issue Apr 12, 2012 · 7 comments

4 participants

@paulcbetts

A user on GitHub used the Ace editor to change a file to submit the following pull request:

robsimmons/utf8islove#1

In doing so, Ace made several changes to the file that were unintended (in this case, the diff should only have two added lines). Here's an excerpt from the original as hex:

00038f0: bb29 0a20 2822 5c5c 6c61 6e67 6c65 2220  .). ("\\langle" 
0003900: 3fe2 9fa8 293b 3b28 225c 5c6c 616e 676c  ?...);;("\\langl
0003910: 6522 203f e380 8829 0a20 2822 5c5c 6c62  e" ?...). ("\\lb

And the result (note how the 3rd byte in 3910 has changed from e380 to e28c):

00038f0: bb29 0a20 2822 5c5c 6c61 6e67 6c65 2220  .). ("\\langle" 
0003900: 3fe2 9fa8 293b 3b28 225c 5c6c 616e 676c  ?...);;("\\langl
0003910: 6522 203f e28c a929 0a20 2822 5c5c 6c62  e" ?...). ("\\lb

Here's the specific code points that are being transformed, from:

U+0065 LATIN SMALL LETTER E character 
U+0022 QUOTATION MARK character (") 
U+0020 SPACE character 
U+003F QUESTION MARK character (?) 
U+3008 LEFT ANGLE BRACKET character (〈) 

to

U+0065 LATIN SMALL LETTER E character 
U+0022 QUOTATION MARK character (") 
U+0020 SPACE character 
U+003F QUESTION MARK character (?) 
U+2329 LEFT-POINTING ANGLE BRACKET character (〈) 
@nightwing
Ajax.org B.V. member

i've tried but couldn't reproduce this on firefox (nightwing/utf8islove@2cb03d1 typed something at the end copy-pasted whole text)

maybe if character encoding of the page isn't correct some browsers do this, but then it should happen with simple textarea as well

@jackpoz
jackpoz commented Aug 8, 2012

I reproduced a similar issue with arcemu/arcemu@17c962b . The online GitHub text editor modified few non-ascii characters used in German language, like Ü and ä . I used Google Chrome 21 on Windows 7 32 bits.

@nightwing
Ajax.org B.V. member

did you notice if characters were already modified when you started editing?

i can easily reproduce this when setting page encoding to something different than utf-8
but same happens with plain textarea too

@xpaulbettsx maybe github should warn if user have overridden page encoding?

@rick
rick commented Aug 20, 2012

I see possibly this same issue coming up in another context, if it helps with diagnosis...

If I go to https://github.com/megastep/makeself/blob/master/makeself.sh and look at line 71, there is an Unicode character there that's represented/displayed properly.

I click "Edit" to use the GitHub online editor and I see this:

If I then save, that Unicode character has been changed and it will save as the changed character.

@nightwing
Ajax.org B.V. member

thanks @rick
the page here is served as utf-8 but file itself is western (ISO-8859-1)
(open https://raw.github.com/megastep/makeself/master/makeself.sh to check)

and plain textarea has the same error see document.querySelector(".file-editor-textarea").value
so i think we can close this issue

btw. is there an issue tracker for github? i always wanted to tell about mouse bug on network graph but never found where

@rick
rick commented Aug 20, 2012

@nightwing: Thanks -- I'll pass this along to our folks and see what comes out. Also, the best way currently to report issues for github.com is just to fire off an email to support@github.com.

@nightwing
Ajax.org B.V. member

closing this issue for now, please reopen if you find any related problem in ace
and thanks for support address.

@nightwing nightwing closed this Aug 25, 2012
@cdjackson cdjackson referenced this issue in cdjackson/HABmin Apr 23, 2014
Closed

Rules editor character encoding problem #121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.