Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upProblems with encoding (with non-ASCII chars) #28
Comments
|
Just to double check: what platform are you on? |
|
Good point, this was Windows 2008. I will check Linux in few hours and report back. |
|
In a way that's good -- Windows seems to be the worst. So if |
|
On Linux: Parsing character value:
and parsing files:
So the problem persists (lost encoding attribute) but the consequences are negligible because "unknown" encoding on modern linux is "UTF-8" anyway. |
|
I suggest
The first would solve non-file input. The second would solve both file and non-file inputs given that user is assumed to provide UTF-8 files by TOML spec. |
|
Sounds good. Also see #20 which is pretty much the same, no? |
|
Yeah, sorry about that. I will try to come up with a pull request soon as this bug bites my application quite a bit. |
|
That's ok. You are being careful, and you are constructing good examples. One change at a time... |
|
Ok, I just pushed a PR with a change I had following the 0.1.4 release. I'd like to make one more change in there and properly document your last change -- see ChangeLog which is a standard (older) format well supported by Emacs :) Can you drop me a full name please, either here or if you prefer by email? And if you want an email different from the one used by |
My name is Václav Hausenblas, nice to meet you :-)
My pleasure! I am going to play with encoding now... |
|
Ok, you're in the ChangeLog now :) And I got my other issue taken care of -- the (new in 0.5.0) local_time type comes back to us now too (as a string, there is no real type for it and I don't think I want to pull in So if/when you something for encoding feel free to branch or fork again and show it :) |
|
Fixed in #30 |
fix #28 again (declare UTF-8 in arrays of strings)
I run into encoding issues when parsing files as well as R characters.
Parsing from text (R characters):
The encoding attribute is lost. But may be set again
Parsing from files
Example: test.txt
TOML files are assumed to be UTF-8 Unicode texts. However R characters obtained from parsing via
parseTOML()are labeled as "unknown" encoding.In case of files, the solution may be relatively easy, I think. We can assume that input is UTF-8 and label every string output as "UTF-8".