You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
also when trying to import; it throws an encoding error on the documentation
SyntaxError: Non-ASCII character '\xd1' in file /home/xyz/xyz-service/charset_normalizer/__init__.py on line 12, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Thanks for your report. It has to do with the mess detection, the way Charset-Normalizer has been made makes it expected for those particular bytes.
I will try to improve ASCII support with another PR.
2021-07-15 08:39:36,420 | WARNING | cp_isolation is set. use this flag for debugging purpose. limited list of encoding allowed : ascii, utf_7, utf_16_le.
2021-07-15 08:39:36,420 | WARNING | override steps (5) and chunk_size (512) as content does not fit (76 byte(s) given) parameters.
2021-07-15 08:39:36,422 | WARNING | ascii was excluded because of initial chaos probing. Gave up 1 time(s). Computed mean chaos is 27.600000 %.
2021-07-15 08:39:36,422 | INFO | Code page utf_16_le is a multi byte encoding table and it appear that at least one character was encoded using n-bytes. Should not be a coincidence. Priority +1 given.
2021-07-15 08:39:36,423 | INFO | utf_16_le passed initial chaos probing. Mean measured chaos is 0.000000 %
2021-07-15 08:39:36,424 | WARNING | utf_7 was excluded because of initial chaos probing. Gave up 1 time(s). Computed mean chaos is 27.600000 %.
My preliminary thought is to bypass the MD for ASCII detection. At least partially.
Describe the bug:
Looks like charset_normalizer detects the below ascii string incorrectly as utf_16_le while charset detects it as ascii.
To Reproduce:
Expected behavior:
String should be detected as ascii
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: