[BUG] UnicodeDecodeError: 'ascii' codec can't decode byte
when using from_path
#136
Labels
UnicodeDecodeError: 'ascii' codec can't decode byte
when using from_path
#136
Describe the bug
I have a file such as this:
and I'm trying to parse it with:
using this version of
charset_normalizer
:On the main file, I get this exception:
However, it seems that something "weird" goes on at around the 10000082 character mark:
This crashes (file size: 10000082 chars):
where as this does not finish within 5 seconds (maybe that's reasonable for a ~10 MiB file) (file size: 9999820 chars):
Now, it would be reasonable to say "okay, but what happens in the one line you've removed?", so we take slightly more
head
and leavetail
alone (file size: 9999847 chars):To Reproduce
Unfortunately, I am not able to immediately share this file -- I tried to use
cvise
andhalfempty
on it to find the smallest file, but hit the road-block at around the 10000082 character markExpected behavior
I believe that
charset_normalizer
shouldn't crash withUnicodeDecodeError: 'ascii' codec can't decode byte
when usingfrom_path
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: