UnicodeEncodeError when using Stream flavor #183

stpete111 · 2020-08-14T17:54:06Z

Python 3.7 on Windows

Using this pdf: http://tsbde.texas.gov/78i8ljhbj/Fiscal-Year-2014-Disciplinary-Actions.pdf

I am running it through Camelot to convert to html using Stream flavor and I get the following error at execution of the export line, once it reaches page 4 of 8:

"UnicodeEncodeError -'charmap' codec can't encode character '\u2010' in position y: character maps to undefined."

Pages 1 through 3 get converted nicely - it crashes somewhere between page 4 and 5. In debug with the breakpoint after the tables.export line, it also brings me to line 19 of cp1252.py, if that's helpful.

I am on Windows, and this seems not to be an issue on Mac. But Windows is our environment so I have to figure this out. I have done a ton of research on this error and everything for this in Python world points to either adding encoding="utf-8" or errors="ignore", but those both relate to the file.read method and can't be used in Camelot's export method.

Any thoughts on what I could add to the script to get around this error? We can't avoid using Windows, and this seems to be the final blocker for us for being able to really make great use of this tool for our PDF's.

The text was updated successfully, but these errors were encountered:

stpete111 · 2020-08-14T20:54:13Z

At this point I'm willing to put try/except code around the export method (but would need guidance on how to do that). You should see how many Stack Overflow tabs I have open in my browser right now, trying every solution I can find, and still getting the same error no matter what.

anakin87 · 2020-08-17T14:37:02Z

I found this solution (it is a monkey patch): https://stackoverflow.com/questions/63403629/python-camelot-pdf-unicodeencodeerror-when-using-stream-flavor-on-windows/

stpete111 · 2020-08-17T15:05:42Z

Thanks @anakin87 this works great.

vinayak-mehta · 2020-08-25T13:13:02Z

@anakin87 Would you like to open a PR to fix this in the library itself? :)

anakin87 · 2020-08-25T13:31:10Z

#188

It is my first PR. If it is uncorrect, please provide some help.

vinayak-mehta · 2020-08-25T13:43:50Z

@anakin87 It looks good! I'm waiting for the the tests to pass so that I can merge it, even though there isn't a test for the to_html method right now. (You can add it in a new PR if you want to work on it)

Also, I've noticed that you use a lot of different camelot features, based on your issue tracker replies and SO answers. I would love to chat about how you use camelot if you have some time this / next week!

anakin87 mentioned this issue Aug 25, 2020

[MRG] Add encoding kwarg to camelot.core.Table.to_html method #188

Merged

vinayak-mehta closed this as completed in #188 Aug 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeEncodeError when using Stream flavor #183

UnicodeEncodeError when using Stream flavor #183

stpete111 commented Aug 14, 2020 •

edited

Loading

stpete111 commented Aug 14, 2020 •

edited

Loading

anakin87 commented Aug 17, 2020

stpete111 commented Aug 17, 2020

vinayak-mehta commented Aug 25, 2020

anakin87 commented Aug 25, 2020 •

edited

Loading

vinayak-mehta commented Aug 25, 2020 •

edited

Loading

UnicodeEncodeError when using Stream flavor #183

UnicodeEncodeError when using Stream flavor #183

Comments

stpete111 commented Aug 14, 2020 • edited Loading

stpete111 commented Aug 14, 2020 • edited Loading

anakin87 commented Aug 17, 2020

stpete111 commented Aug 17, 2020

vinayak-mehta commented Aug 25, 2020

anakin87 commented Aug 25, 2020 • edited Loading

vinayak-mehta commented Aug 25, 2020 • edited Loading

stpete111 commented Aug 14, 2020 •

edited

Loading

stpete111 commented Aug 14, 2020 •

edited

Loading

anakin87 commented Aug 25, 2020 •

edited

Loading

vinayak-mehta commented Aug 25, 2020 •

edited

Loading