Failure when converting unicode data on a non-unicode locale (python3) #122

valholl · 2016-11-05T13:02:25Z

to reproduce:

pypandoc(master)$ export LANG=C
pypandoc(master)$ export LANGUAGE=C
pypandoc(master)$ python3 tests.py

results in:

======================================================================
ERROR: test_unicode_input (__main__.TestPypandoc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests.py", line 279, in test_unicode_input
    written = pypandoc.convert(u'<h1>\xfc\xe4\xf6\xee\xf4\xfb</h1>', 'md', format='html')
  File "[...]/pypandoc/pypandoc/__init__.py", line 58, in convert
    path = _identify_path(source)
  File "[...]/pypandoc/pypandoc/__init__.py", line 159, in _identify_path
    result = urlparse(source)
  File "/usr/lib/python3.5/urllib/parse.py", line 295, in urlparse
    url, scheme, _coerce_result = _coerce_args(url, scheme)
  File "/usr/lib/python3.5/urllib/parse.py", line 115, in _coerce_args
    return _decode_args(args) + (_encode_result,)
  File "/usr/lib/python3.5/urllib/parse.py", line 99, in _decode_args
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
  File "/usr/lib/python3.5/urllib/parse.py", line 99, in <genexpr>
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

Apparently the failure is in _identify_path(source) where source gets encoded to utf-8 while looking for a local file, but urlparse tries to decode it.

The attached patch fixes the issue by not saving the encoded data into source but only encoding it temporarly, and does not break any other test (I've checked both with the C locale and an utf-8 one)

0001-Fix-parsing-of-unicode-paths-on-non-unicode-locales.txt

(please let me know if you prefer it as a PR.)

The text was updated successfully, but these errors were encountered:

valholl · 2016-11-05T13:07:44Z

Forgot to mention: this happens under linux, I don't know how to change locale to reproduce the issue in other OSs

jankatins · 2016-11-05T19:20:13Z

Yes, a PR would be wonderfull!

jankatins · 2018-04-29T21:31:08Z

Closed by #139

valholl mentioned this issue May 19, 2017

Fix parsing of unicode paths on non-unicode locales #139

Merged

jankatins closed this as completed Apr 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure when converting unicode data on a non-unicode locale (python3) #122

Failure when converting unicode data on a non-unicode locale (python3) #122

valholl commented Nov 5, 2016

valholl commented Nov 5, 2016

jankatins commented Nov 5, 2016

jankatins commented Apr 29, 2018

Failure when converting unicode data on a non-unicode locale (python3) #122

Failure when converting unicode data on a non-unicode locale (python3) #122

Comments

valholl commented Nov 5, 2016

valholl commented Nov 5, 2016

jankatins commented Nov 5, 2016

jankatins commented Apr 29, 2018