New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad UTF-8 Continuation type while regenning CACHE #455

Closed
disloyalpick opened this Issue Nov 19, 2015 · 8 comments

Comments

Projects
None yet
5 participants
@disloyalpick

disloyalpick commented Nov 19, 2015

I'm not sure which vehicles or stuff is causing this, but I will delete contents of all the log files, and try to clear/regen CACHE and upload the log files for someone to check out.
ss 2015-11-19 at 02 43 03

AngleScript.log: http://pastebin.com/9rb2hNFq

ConfigLog.txt: http://pastebin.com/2WgNXuYn

mygui.log: http://pastebin.com/cHSNU6Pc

RoRConfig.log: http://pastebin.com/xw13p3d4

RoR.log(Exceeded pastebin max size limit) http://puu.sh/lrqiT/166c8fc19f.log

Thank you for taking time to look this over. Due to this I can't even play RoR. Good luck fixing it if you find any problem!!

@Hiradur

This comment has been minimized.

Show comment
Hide comment
@Hiradur

Hiradur Nov 19, 2015

Contributor

Could you check if caboverpete.truck has a similar problem to what is described here: #364

Contributor

Hiradur commented Nov 19, 2015

Could you check if caboverpete.truck has a similar problem to what is described here: #364

@disloyalpick

This comment has been minimized.

Show comment
Hide comment

disloyalpick commented Nov 19, 2015

@DirtGamer301

This comment has been minimized.

Show comment
Hide comment
@DirtGamer301

DirtGamer301 Nov 19, 2015

You can try removing the mail adress, but it's actually supposed to be there so I personally don't think it causes problems.

DirtGamer301 commented Nov 19, 2015

You can try removing the mail adress, but it's actually supposed to be there so I personally don't think it causes problems.

@only-a-ptr

This comment has been minimized.

Show comment
Hide comment
@only-a-ptr

only-a-ptr Nov 19, 2015

Member

I assume this is the vehicle: http://www.rigsofrods.com/repository/view/751 EDIT: Nope, it spawns fine for me.
EDIT2: Trying http://www.rigsofrods.com/repository/view/1574. EDIT3: Nope, also spawned fine.

"@" is an ASCII character, that's not the issue. The problem is somewhere else.

And yes, this is the same problem as #364

I'm starting to see the issue: I used MyGUI's UStrings to handle loaded data. UStrings don't only store the data, but try to decode them as well.

Member

only-a-ptr commented Nov 19, 2015

I assume this is the vehicle: http://www.rigsofrods.com/repository/view/751 EDIT: Nope, it spawns fine for me.
EDIT2: Trying http://www.rigsofrods.com/repository/view/1574. EDIT3: Nope, also spawned fine.

"@" is an ASCII character, that's not the issue. The problem is somewhere else.

And yes, this is the same problem as #364

I'm starting to see the issue: I used MyGUI's UStrings to handle loaded data. UStrings don't only store the data, but try to decode them as well.

@only-a-ptr only-a-ptr added this to the 0.4.6.0 milestone Nov 19, 2015

@only-a-ptr

This comment has been minimized.

Show comment
Hide comment
@only-a-ptr

only-a-ptr Nov 19, 2015

Member

@Hiradur I reproduced the other issue.

I suggest incorporating the "utf8cpp" library (tiny, header-only) http://sourceforge.net/p/utfcpp/code/HEAD/tree/v2_0/source/, licensed with "boost license" (OSI approved) https://tldrlegal.com/license/boost-software-license-1.0-explained. With this library, I can sanitize input from truckfiles, which are (by nature) saved in a variety of ANSI/OEM encodings.

Member

only-a-ptr commented Nov 19, 2015

@Hiradur I reproduced the other issue.

I suggest incorporating the "utf8cpp" library (tiny, header-only) http://sourceforge.net/p/utfcpp/code/HEAD/tree/v2_0/source/, licensed with "boost license" (OSI approved) https://tldrlegal.com/license/boost-software-license-1.0-explained. With this library, I can sanitize input from truckfiles, which are (by nature) saved in a variety of ANSI/OEM encodings.

@mikadou

This comment has been minimized.

Show comment
Hide comment
@mikadou

mikadou Nov 20, 2015

Contributor

@only-a-ptr Maybe I'm overlooking something obvious, but I don't think input from truckfiles can be easily sanitized with that library. The main application of this library seems to be to facilitate iteration over utf8 codepoints (which may have varying byte-length). This is probaly only useful if you plan to do actual text editing, or need to count the number of visible characters instead of bytes for some other reason. Besides utf8cpp supports conversion between the different unicode types (utf8, utf16, utf32). There is no support for arbitrary encodings such as codepage latin1. To my knowledge it is sadly impossible to identify with 100% certainty the encoding used in an arbitrary textfile.

Regarding the use of MyGUI::UString, it internally uses utf16 encoding. Is there a good reason not to straightforwardly use std::string from the standard library (assuming utf8 encoding) consistently across all RoR sources (excluding parts which are considered with actual userinteface/MyGUI)?

Some good resources:
https://www.youtube.com/watch?v=n0GK-9f4dl8
http://utf8everywhere.org/

edit: Heuristic detection and conversion from textfile with unknown encoding can be performed with the ICU library (see http://userguide.icu-project.org/conversion/detection#TOC-Detected-Encodings).

Contributor

mikadou commented Nov 20, 2015

@only-a-ptr Maybe I'm overlooking something obvious, but I don't think input from truckfiles can be easily sanitized with that library. The main application of this library seems to be to facilitate iteration over utf8 codepoints (which may have varying byte-length). This is probaly only useful if you plan to do actual text editing, or need to count the number of visible characters instead of bytes for some other reason. Besides utf8cpp supports conversion between the different unicode types (utf8, utf16, utf32). There is no support for arbitrary encodings such as codepage latin1. To my knowledge it is sadly impossible to identify with 100% certainty the encoding used in an arbitrary textfile.

Regarding the use of MyGUI::UString, it internally uses utf16 encoding. Is there a good reason not to straightforwardly use std::string from the standard library (assuming utf8 encoding) consistently across all RoR sources (excluding parts which are considered with actual userinteface/MyGUI)?

Some good resources:
https://www.youtube.com/watch?v=n0GK-9f4dl8
http://utf8everywhere.org/

edit: Heuristic detection and conversion from textfile with unknown encoding can be performed with the ICU library (see http://userguide.icu-project.org/conversion/detection#TOC-Detected-Encodings).

@only-a-ptr

This comment has been minimized.

Show comment
Hide comment
@only-a-ptr

only-a-ptr Nov 20, 2015

Member

@mikadou 👍 for http://utf8everywhere.org/. I'll watch the video later, looks interesting, thanks.
To my best knowledge there really isn't a way to detect charset used by plaintext files with ANSI (1 byte per character) / OEM (single/multi-byte, non-unicode) encodings.

Regarding UString: #364 crashes in the moment the "Spawner report" MyGUI window is initialized. The report contains text "invalid syntax in line: [author]" which contains the invalid char. The text is passed as std::string, but MyGUI uses UStrings internally, there isn't a way around it.

Regarding utf8cpp: UTF8 has a scheme (https://en.wikipedia.org/wiki/UTF-8#Description) which defines legal/illegal input. If the input passes a check against this scheme, it's an utf8 character. The worst thing that can happen is that multiple ANSI characters, by coincidence, form a legal UTF8 sequence - well, then the user will see a stray character, but no technical issues will occur. However, this is unlikely enough.

Member

only-a-ptr commented Nov 20, 2015

@mikadou 👍 for http://utf8everywhere.org/. I'll watch the video later, looks interesting, thanks.
To my best knowledge there really isn't a way to detect charset used by plaintext files with ANSI (1 byte per character) / OEM (single/multi-byte, non-unicode) encodings.

Regarding UString: #364 crashes in the moment the "Spawner report" MyGUI window is initialized. The report contains text "invalid syntax in line: [author]" which contains the invalid char. The text is passed as std::string, but MyGUI uses UStrings internally, there isn't a way around it.

Regarding utf8cpp: UTF8 has a scheme (https://en.wikipedia.org/wiki/UTF-8#Description) which defines legal/illegal input. If the input passes a check against this scheme, it's an utf8 character. The worst thing that can happen is that multiple ANSI characters, by coincidence, form a legal UTF8 sequence - well, then the user will see a stray character, but no technical issues will occur. However, this is unlikely enough.

@disloyalpick

This comment has been minimized.

Show comment
Hide comment
@disloyalpick

disloyalpick Dec 11, 2015

I think I may have found a work-around to this issue. Simply take all files out of packs folder, then clear cache, and put them back 20 files at a time, regenning cache after each time you place the files.

disloyalpick commented Dec 11, 2015

I think I may have found a work-around to this issue. Simply take all files out of packs folder, then clear cache, and put them back 20 files at a time, regenning cache after each time you place the files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment