Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Bad UTF-8 Continuation type while regenning CACHE #455
RoR.log(Exceeded pastebin max size limit) http://puu.sh/lrqiT/166c8fc19f.log
Thank you for taking time to look this over. Due to this I can't even play RoR. Good luck fixing it if you find any problem!!
I assume this is the vehicle: http://www.rigsofrods.com/repository/view/751 EDIT: Nope, it spawns fine for me.
"@" is an ASCII character, that's not the issue. The problem is somewhere else.
And yes, this is the same problem as #364
I'm starting to see the issue: I used MyGUI's UStrings to handle loaded data. UStrings don't only store the data, but try to decode them as well.
@Hiradur I reproduced the other issue.
I suggest incorporating the "utf8cpp" library (tiny, header-only) http://sourceforge.net/p/utfcpp/code/HEAD/tree/v2_0/source/, licensed with "boost license" (OSI approved) https://tldrlegal.com/license/boost-software-license-1.0-explained. With this library, I can sanitize input from truckfiles, which are (by nature) saved in a variety of ANSI/OEM encodings.
@only-a-ptr Maybe I'm overlooking something obvious, but I don't think input from truckfiles can be easily sanitized with that library. The main application of this library seems to be to facilitate iteration over utf8 codepoints (which may have varying byte-length). This is probaly only useful if you plan to do actual text editing, or need to count the number of visible characters instead of bytes for some other reason. Besides utf8cpp supports conversion between the different unicode types (utf8, utf16, utf32). There is no support for arbitrary encodings such as codepage latin1. To my knowledge it is sadly impossible to identify with 100% certainty the encoding used in an arbitrary textfile.
Regarding the use of MyGUI::UString, it internally uses utf16 encoding. Is there a good reason not to straightforwardly use std::string from the standard library (assuming utf8 encoding) consistently across all RoR sources (excluding parts which are considered with actual userinteface/MyGUI)?
edit: Heuristic detection and conversion from textfile with unknown encoding can be performed with the ICU library (see http://userguide.icu-project.org/conversion/detection#TOC-Detected-Encodings).
Regarding UString: #364 crashes in the moment the "Spawner report" MyGUI window is initialized. The report contains text "invalid syntax in line: [author]" which contains the invalid char. The text is passed as std::string, but MyGUI uses UStrings internally, there isn't a way around it.
Regarding utf8cpp: UTF8 has a scheme (https://en.wikipedia.org/wiki/UTF-8#Description) which defines legal/illegal input. If the input passes a check against this scheme, it's an utf8 character. The worst thing that can happen is that multiple ANSI characters, by coincidence, form a legal UTF8 sequence - well, then the user will see a stray character, but no technical issues will occur. However, this is unlikely enough.