Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GM should fail gracefully when encountering scripts with invalid encoding #1588

Closed
Ventero opened this issue Jul 10, 2012 · 4 comments
Closed
Milestone

Comments

@Ventero
Copy link
Contributor

Ventero commented Jul 10, 2012

When opening a script file with invalid encoding (for example http://ventero.de/temp/encoding.user.js - which doesn't contain valid UTF-8), Greasemonkey throws two errors:

Error: Component returned failure code: 0x8050000e (NS_ERROR_ILLEGAL_INPUT) [nsIScriptableUnicodeConverter.convertFromByteArray]
Source: resource://greasemonkey/remoteScript.js
Line: 104

and

Error: this._uri is undefined
Source: resource://greasemonkey/remoteScript.js
Line: 188

and doesn't display the install dialogue or the script's source.
It's probably a good idea to display a message explaining the problem.

@arantius
Copy link
Collaborator

I agree that things should fail gracefully. But the linked script does not contain UTF-8.

$ curl -s http://ventero.de/temp/encoding.user.js | hexdump -C | tail -n 3 | head -n 1
00000060  65 72 53 63 72 69 70 74  3d 3d 0a 0a 27 e8 27 3b  |erScript==..'.';|

That character between the single quotes is an 0xe8 or in binary 0b11101000. In UTF-8 encoding, a set MSB means that this is not the last byte in the character. A lone 0xe8 byte is not valid UTF-8.

If I load that script (with GM disabled, to see the source directly) then I see a question mark in a box there. If I manually set ISO-8859-1 character encoding (Firefox has guessed UTF-8 without anything from the headers) then I get an 'è'.

http://www.fileformat.info/info/unicode/char/e8/index.htm
This character is correctly encoded to UTF-8 as two bytes: 0xC3 0xA8.

I do still get the same errors from a script that's really delivered in UTF-8 w/out a charset in the headers:

$ curl -v http://arantius.com/misc/gm-test/utf8-encoding-no-header.user.js 2>&1 | grep Content-Type
< Content-Type: text/html
$ curl -s http://arantius.com/misc/gm-test/utf8-encoding-no-header.user.js | hexdump -C | tail -n 3 | head -n 1
00000060  65 72 53 63 72 69 70 74  3d 3d 0a 0a 27 c3 a8 27  |erScript==..'..'|

In this case, interestingly enough, Firefox guesses ISO even though the content is UTF-8.

@Ventero
Copy link
Contributor Author

Ventero commented Jul 10, 2012

Yeah, sorry, I was experimenting a bit, so I've now uploaded a new version and edited my post. The script now indeed contains valid ISO-8859-1.

It's a bit weird though that for you Firefox guesses UTF-8 as content-type, while it (correctly, especially when following the RFC) guesses ISO-8859-1 for me. If I disable GM and open to the script, I see the è just fine.

But even then, GM still throws the error.

@Ventero
Copy link
Contributor Author

Ventero commented Jul 10, 2012

Interestingly, for me a script which contains valid UTF-8 and is served without charset (but with "Content-Type: text/javascript") works just fine (http://ventero.de/temp/encoding-utf8.user.js). Your example is sent with "Content-Type: text/html", so that GM doesn't try to install it anyway.

@arantius
Copy link
Collaborator

Greasemonkey: 1.0beta5
Firefox: 14.0

Test scripts:

$ for x in iso no utf8; do echo -en "$x\t"; curl -sv "http://arantius.com/misc/gm-test/utf8-encoding-$x-header.user.js" 2>&1 | grep Content-Type; done
iso     < Content-Type: text/plain; charset=iso-8859-1
no      < Content-Type: text/plain
utf8    < Content-Type: text/plain; charset=utf-8

If I disable greasemonkey, and navigate to each as a page, then inspect View>Character Encoding, the first two both choose ISO-8859-1, and display incorrectly, the last displays correctly. In all cases, if I enable Greasemonkey and install the script, they work as the actual delivered content is UTF-8. This is as I believe it should be; it's easy for script authors to control the content of hosted files, and often harder for them to get the HTTP headers set correctly.

Now: if I put ISO-8859 encoding in the file (for all six test cases see http://arantius.com/misc/gm-test/ ): Firefox makes the same guesses, which work for the first two cases (guess is right), but when I explicitly say (in headers) that the content is UTF-8, but the body contains ISO-8859, then Firefox fails to display correctly.

In all file-contains-ISO cases, Greasemonkey dies during the install. This needs to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants