-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeEncodeError
when docstring contain non-ascii characters
#91
Comments
Hey @masci, thanks for the bug report. Dang, it seems I didn't test this sufficiently and trusted StackOverflow a bit too much 👀 The encode/decode code here was introduced to convert a Python literal string into an actual string as it would be parsed by the Python interpreter to memory (so when you write Unless there's another better working solution using the encode/decode logic, I suppose we need to manually parse the string and convert special character sequences. |
Thanks for following up! I'm not sure I get 100% the logic of the answer in SO but at some point I see ...
s.encode('latin1') # To bytes, required by 'unicode-escape'
... and I wonder, if the goal of that step is just to have bytes out of the original string, can't we just encode using something more flexible than |
The reason is that
|
It seems like you already found the PR and thus the StackOverflow answer I was referring to, but for reference: #83 and https://stackoverflow.com/a/58829514/791713 The best alternative that I can think of without re-implementing the decoding of raw strings is to use
|
….decode(unicode_escape)` method.
….decode(unicode_escape)` method. (#92)
In 2.1.2 |
Describe the bug
When a docstrings contains non-ascii character the conversion fails
To Reproduce
Steps to reproduce the behavior:
foo.py
containing the following:pydoc-markdown -I . -m foo
Expected behavior
No errors like it was with version<2.1.0
The text was updated successfully, but these errors were encountered: