Encoding issue #5

ghost · 2022-04-12T18:01:05Z

OS: Windows 10 21H2
Python: Python 3.10.4

Steps:

pip install pyquotes

example.py (file is saved as UTF-8 with/without a BOM)

print('こんにちは世界')

pyquotes example.py

    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 11: character maps to <undefined>

If read_text() and write_text() are replaced with read_bytes() and write_bytes() equivalents, the quote processing is fixed (not tested on Linux). Although, additional new lines are erroneously added, I haven't had a chance to look into why.

Thank you for your time, this library has saved my many hours.

The text was updated successfully, but these errors were encountered:

ThiefMaster · 2022-04-12T18:05:07Z

Is your default encoding UTF8? What does import sys; sys.getdefaultencoding() return?

ghost · 2022-04-12T18:06:03Z

Python 3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys; sys.getdefaultencoding()
'utf-8'

ThiefMaster · 2022-04-12T18:07:28Z

Can you provide a git repo with your test file so I have the exact same content and not whatever I copy from the github issue?

ThiefMaster · 2022-04-12T18:07:39Z

Also, please pust full tracebacks, not just the last line...

ghost · 2022-04-12T18:13:34Z

example.txt

Switched to Python 3.9 to match your environment

pyquotes example.txt
Error while processing example.txt
Traceback (most recent call last):
  File "C:\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Python39\Scripts\pyquotes.exe\__main__.py", line 7, in <module>
  File "C:\Python39\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Python39\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Python39\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Python39\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\Python39\lib\site-packages\pyquotes\cli.py", line 89, in main
    changed = _process_file(file, config=config)
  File "C:\Python39\lib\site-packages\pyquotes\cli.py", line 113, in _process_file
    old_code = file.read_text()
  File "C:\Python39\lib\pathlib.py", line 1267, in read_text
    return f.read()
  File "C:\Python39\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 8: character maps to <undefined>

ThiefMaster · 2022-04-12T18:15:31Z

For some weird reason it's using cp1252 instead of utf-8, even though utf-8 should be the default in Python 3...

ThiefMaster · 2022-04-12T18:22:56Z

after reading https://discuss.python.org/t/pep-597-use-utf-8-for-default-text-file-encoding/1819 and asking on IRC apparently text files default to whatever encoding the OS tells python to use... and of course on windows you seem to get random crap instead of consistent utf8 :)

PR welcome that forces utf8 there (read_text has an encoding arg), I have not much interest in windows to be honest, but if I get a good PR i don't mind fixing it

ghost · 2022-04-12T18:35:25Z

I'm not very familiar with GitHub's interface - apologies.

I have confirmed the issue is fixed, when implementing your suggestion.
cli.py:113 old_code = file.read_text(encoding='UTF-8')
cli.py:148 tmp_file.write_text(content, encoding='UTF-8')

ghost mentioned this issue Apr 12, 2022

Resolves #3 and #5 #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issue #5

Encoding issue #5

ghost commented Apr 12, 2022

ThiefMaster commented Apr 12, 2022

ghost commented Apr 12, 2022

ThiefMaster commented Apr 12, 2022

ThiefMaster commented Apr 12, 2022

ghost commented Apr 12, 2022

ThiefMaster commented Apr 12, 2022

ThiefMaster commented Apr 12, 2022 •

edited

ghost commented Apr 12, 2022

Encoding issue #5

Encoding issue #5

Comments

ghost commented Apr 12, 2022

ThiefMaster commented Apr 12, 2022

ghost commented Apr 12, 2022

ThiefMaster commented Apr 12, 2022

ThiefMaster commented Apr 12, 2022

ghost commented Apr 12, 2022

ThiefMaster commented Apr 12, 2022

ThiefMaster commented Apr 12, 2022 • edited

ghost commented Apr 12, 2022

ThiefMaster commented Apr 12, 2022 •

edited