Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 implicity doing bad stuff on Windows #584

Open
stucka opened this issue Oct 19, 2023 · 2 comments
Open

UTF-8 implicity doing bad stuff on Windows #584

stucka opened this issue Oct 19, 2023 · 2 comments

Comments

@stucka
Copy link
Contributor

stucka commented Oct 19, 2023

#201 was caused by two different libraries making assumptions about how to save text files.

This appears to have been developed on Macs and Linux, upon which I think Python might be assuming everything gets saved as utf-8 by default. But on Windows, I think Python may be defaulting to its own character set, which IIRC is cp-1252. Character encoding is not explicit on any of these text file operations.

So on Windows many text operations are going to fail because UTF-8 characters are getting pulled from web pages and Excel files.

Per Pep 20, explicit is better than implicit.

It may not affect anything to permanently add encoding="utf-8" to these file operations, but I'd want to test that better first.

It's a decent bet similar flaws exist in warn-transformer, as well.

in cache

cache.write includes this:
with open(out, "w", newline="") as fh:

cache.read includes this:
with open(path, newline="") as infile:

cache.reaad_csv includes this:
with open(path) as fh:

and in utils ...

utils.write_dict_rows_to_csv includes this:
with open(output_path, mode, newline="") as f:

utils.write_dict_rows_to_csv includes this:
with open(output_path, mode, newline="") as f:

@stucka stucka mentioned this issue Oct 19, 2023
@stucka
Copy link
Contributor Author

stucka commented Oct 19, 2023

To clarify:
"It may not affect anything to permanently add encoding="utf-8" to these file operations, but I'd want to test that better first." -- I'd meant on Linux and Macs. Hopefully that would prevent other problems within Windows.

Opening a bug in warn-transformer along similar lines.

@stucka
Copy link
Contributor Author

stucka commented Oct 21, 2023

Looks like the encoding default is indeed UTF-8 on Linux and Mac, but cp1252 on Windows. Fixing the libraries would be easy.

https://peps.python.org/pep-0686/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant