Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deparse Depends on Locales For Strings #289

Closed
brodieG opened this issue Jan 12, 2022 · 2 comments
Closed

Deparse Depends on Locales For Strings #289

brodieG opened this issue Jan 12, 2022 · 2 comments

Comments

@brodieG
Copy link
Owner

brodieG commented Jan 12, 2022

For example, we see in an ISO-8859:

nchar_ctl("\033\200")

vs.

nchar_ctl("\033\x80")

Probably because the "\x80" is a non-character in 8859 vs latin-1.

@brodieG brodieG added the bug label Jan 12, 2022
@brodieG brodieG modified the milestones: 1.4.17, 1.4.19, 1.4.18 Jan 12, 2022
@brodieG
Copy link
Owner Author

brodieG commented Mar 15, 2022

There is not a good solution to this. We contemplated trying to add a parse/deparse cycle on loading to ensure the same deparse, but the original deparse could be invalid if we attempt to do the parse/deparse in a different locale (if the raw bytes are valid in the reference locales, those bytes are output by the deparser as themselves, but the parser will refuse to ingest them if they are invalid in the new locale).

Potentially we could use the new version of RDS files that records the locale, in which case those literals are translated from whatever locale they were in originally to the new locale, but whether that was the intent or not is highly questionable, particularly since there is no way to directly mark a literal with its encoding (Unicode escapes kind of do it).

So probably we just need to make sure this is clearly documented.

@brodieG
Copy link
Owner Author

brodieG commented Mar 15, 2022

Documented as a "fix"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant