-
Notifications
You must be signed in to change notification settings - Fork 983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
command-line-arguments can't read umlauts with utf-8 encoding #81
Comments
I've tested this out on my Mac using the standard terminal, and it also fails here if the expression editor is enabled:
However, when I turn the expeditor off, it seems to work fine:
Additionally, echoing the instructions in from the terminal worked in both cases:
and with
However, command line arguments (which allow us to name files to load also fails). So, for instance if we have two files
and
Then Chez can load
But not
However, Chez can load the
So, anyway, it seems like there are two problems here:
I think in both cases OS X is probably providing the characters in UTF-8, but I was a little surprised by the number of ? characters in the load error report. So, there are some work arounds (though not being able to use the expression editor is a pretty big bummer). Worth noting though is that file and console IO seem to do the right thing when the expression editor isn't involved. I'll also try to take a look into this and see what I can figure out. |
The inability to enter non-latin characters in the expression editor is #32. This issue would be more accurately titled "Command line arguments always treated as bytes". The C spec (at least as of C99) says that In any case, if you want to take a stab at making things better, looks like |
Yes, I was just looking at that file and the that stack overflow article. The pertinent code for the expression editor is in The |
The command-line arguments are converted to Scheme strings using The command-line argument handling should account for the encoding used by the operating system. For unix-like systems, it is UTF-8. For Windows, it's UTF-16LE when the arguments are obtained from |
It would be helpful to add |
@dybvig, do you think we should add Sstring_utf8, or update S_string to process UTF-8? I don't see any cases in the C code that use an 8-bit encoding other than UTF-8. |
Commit aa1c2c4 addresses this issue. The command-line arguments and environment variables are now processed for Unicode. I added |
Thank you for fixing this issue. Closed. |
Chez Scheme doesn't recognize umlauts read by command-line' and 'command-line-arguments'. Tested with version 9.4 and 9.4.1 commit a664335.
Link to mailing list discussion.
The text was updated successfully, but these errors were encountered: