Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure when converting unicode data on a non-unicode locale (python3) #122

Closed
valholl opened this issue Nov 5, 2016 · 3 comments
Closed

Comments

@valholl
Copy link

valholl commented Nov 5, 2016

to reproduce:

pypandoc(master)$ export LANG=C
pypandoc(master)$ export LANGUAGE=C
pypandoc(master)$ python3 tests.py

results in:

======================================================================
ERROR: test_unicode_input (__main__.TestPypandoc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests.py", line 279, in test_unicode_input
    written = pypandoc.convert(u'<h1>\xfc\xe4\xf6\xee\xf4\xfb</h1>', 'md', format='html')
  File "[...]/pypandoc/pypandoc/__init__.py", line 58, in convert
    path = _identify_path(source)
  File "[...]/pypandoc/pypandoc/__init__.py", line 159, in _identify_path
    result = urlparse(source)
  File "/usr/lib/python3.5/urllib/parse.py", line 295, in urlparse
    url, scheme, _coerce_result = _coerce_args(url, scheme)
  File "/usr/lib/python3.5/urllib/parse.py", line 115, in _coerce_args
    return _decode_args(args) + (_encode_result,)
  File "/usr/lib/python3.5/urllib/parse.py", line 99, in _decode_args
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
  File "/usr/lib/python3.5/urllib/parse.py", line 99, in <genexpr>
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

Apparently the failure is in _identify_path(source) where source gets encoded to utf-8 while looking for a local file, but urlparse tries to decode it.

The attached patch fixes the issue by not saving the encoded data into source but only encoding it temporarly, and does not break any other test (I've checked both with the C locale and an utf-8 one)

0001-Fix-parsing-of-unicode-paths-on-non-unicode-locales.txt

(please let me know if you prefer it as a PR.)

@valholl
Copy link
Author

valholl commented Nov 5, 2016

Forgot to mention: this happens under linux, I don't know how to change locale to reproduce the issue in other OSs

@jankatins
Copy link
Contributor

Yes, a PR would be wonderfull!

@jankatins
Copy link
Contributor

Closed by #139

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants