Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed character encodings caused by SAUCE #169

Open
bengarrett opened this issue Oct 29, 2020 · 2 comments
Open

Mixed character encodings caused by SAUCE #169

bengarrett opened this issue Oct 29, 2020 · 2 comments

Comments

@bengarrett
Copy link

I don't know if this is an intentional design, but ANS files created in Moebius can have mixed character encodings that technically could be considered corrupt?

While the drawing box matches the legacy character encoding determined by the font choice, using IBM VGA means the text encoding is CP-437. The SAUCE metadata accepts Unicode input and will also embed those into the CP-437 ANS file.

Should the SAUCE metadata (title, author, group, comments fields) use the same character encoding as the rest of the ANS document?

Here, I have drawn full block characters and then added utf8 blocks copied from the web into the SAUCE data.

input

When viewed in a utf8 terminal, the CP437 blocks are unknown.

Screenshot from 2020-10-29 12-37-32

When converted from CP437 the SAUCE blocks are malformed.

Screenshot from 2020-10-29 12-38-49

Real world example, at the last line. The left quotation is an ASCII compatible decimal 34 while the right is U+201D, a right double quotation. While the rest of the document is in CP-437.
Charles Martin "Terminal ColloquyΓÇ¥

Screenshot from 2020-10-29 12-44-03

@bart-d
Copy link
Contributor

bart-d commented Oct 29, 2020

I don't know if this is an intentional design, but ANS files created in Moebius can have mixed character encodings that technically could be considered corrupt?
Should the SAUCE metadata (title, author, group, comments fields) use the same character encoding as the rest of the ANS document?

We have the SAUCE spec as a reference here, although it might not always be 100% clear. It specifically mentions in note 3 of the layout that prior to revision 00.5 Character fields were expected to be in CP437 but that other 'codepages' were used too along the way for both the file and SAUCE. So one could assume they need to or should be in the same encoding.

I don't think that prior to Moebius any other tool allowed UTF-8 (or any other non IBM codepage) in the SAUCE fields. Our options would either be to prevent or show a warning whenever a user enters non ASCII characters in those fields.

@bengarrett
Copy link
Author

As Moebius already features the Export As UTF-8 option. Maybe that could be the method of saving files that have any UTF-8 unique characters in the SAUCE fields? That warning could also be bought up, alerting the user and stating why the file can only be saved to UTF-8 and the negatives of doing so?

I would guess with SAUCE; they assumed you would not mix up codepages because technically you couldn't. The legacy 256 character encodings all share the same code points, so they couldn't mix-in outside characters. I think that's why the web eventually moved on from ASCII and ISO-8859-x over to UTF-8 because it was impossible to mix in langagues and say display 日本人, ไทย or русский on the same page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants