Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support funky Microsoft Word XHTML unicode escapes #60

Closed
wants to merge 1 commit into from

Conversation

malsmith
Copy link

No description provided.

@malsmith
Copy link
Author

These escape sequences are common in MS Word XHTML docs.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 76.18% when pulling bfe6164 on malsmith:master into e902a7c on Alir3z4:master.

4 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 76.18% when pulling bfe6164 on malsmith:master into e902a7c on Alir3z4:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 76.18% when pulling bfe6164 on malsmith:master into e902a7c on Alir3z4:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 76.18% when pulling bfe6164 on malsmith:master into e902a7c on Alir3z4:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 76.18% when pulling bfe6164 on malsmith:master into e902a7c on Alir3z4:master.

@Alir3z4
Copy link
Owner

Alir3z4 commented Apr 22, 2015

@malsmith Could you please provide tests as well ?


from html2text.compat import htmlentitydefs

# Based on http://stackoverflow.com/questions/7105874/valueerror-unichr-arg-not-in-range0x10000-narrow-python-build-please-hel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to the share link for this question instead: http://stackoverflow.com/q/7105874/173630

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nikolas
Copy link
Contributor

nikolas commented Jun 4, 2015

Will this fix this error I'm getting?

  File "/home/nnyby/src/dmt/dmt/main/migrations/0018_html2text.py", line 13, in populate_comment_src
    comment.comment.decode('unicode-escape'))
  File "/home/nnyby/src/dmt/ve/local/lib/python2.7/site-packages/html2text/__init__.py", line 794, in html2text
    return h.handle(html)
  File "/home/nnyby/src/dmt/ve/local/lib/python2.7/site-packages/html2text/__init__.py", line 122, in handle
    self.feed(data)
  File "/home/nnyby/src/dmt/ve/local/lib/python2.7/site-packages/html2text/__init__.py", line 119, in feed
    HTMLParser.HTMLParser.feed(self, data)
  File "/usr/lib/python2.7/HTMLParser.py", line 117, in feed
    self.goahead(0)
  File "/usr/lib/python2.7/HTMLParser.py", line 191, in goahead
    self.handle_charref(name)
  File "/home/nnyby/src/dmt/ve/local/lib/python2.7/site-packages/html2text/__init__.py", line 165, in handle_charref
    charref = self.charref(c)
  File "/home/nnyby/src/dmt/ve/local/lib/python2.7/site-packages/html2text/__init__.py", line 706, in charref
    return unichr(c)
ValueError: unichr() arg not in range(0x110000) (wide Python build)

@Alir3z4
Copy link
Owner

Alir3z4 commented Jun 5, 2015

@nikolas Could you try this branch to see if it solve the issue you're facing ?

@theSage21
Copy link
Collaborator

@Alir3z4 I tried this branch, the issue is not solved. Failure at the struct module decode.

@Alir3z4
Copy link
Owner

Alir3z4 commented Jun 18, 2015

I close this pull-request, the author of the patch @malsmith has not respond regarding to this patch and it comes with conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants