Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errror decode() argument 1 must be string, not None when run textract.process #135

Closed
frostchick opened this issue Mar 17, 2017 · 4 comments
Labels

Comments

@frostchick
Copy link

I am having trouble when convertTotext with UTF-8 file...
text = textract.process("1.pdf", method='pdfminer')

Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "/usr/local/lib/python2.7/dist-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "td-net.py", line 178, in close
textData = convertToText(path,self.date) # convert pdf to text after download
File "td-net.py", line 239, in convertToText
text = textract.process("data/pdf/{1}/{0}.pdf".format(path,sDate), method='pdfminer')
File "/usr/local/lib/python2.7/dist-packages/textract/parsers/init.py", line 58, in process
return parser.process(filename, encoding, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/textract/parsers/utils.py", line 46, in process
unicode_string = self.decode(byte_string)
File "/usr/local/lib/python2.7/dist-packages/textract/parsers/utils.py", line 65, in decode
return text.decode(result['encoding'])
TypeError: decode() argument 1 must be string, not None

@deanmalmgren
Copy link
Owner

That's odd. Looks like chardet could not determine an encoding for your file 1.pdf. Can you try running chardet 1.pdf to see what the output looks like?

I wonder if this is related to #133 somehow...

@JermellB
Copy link

JermellB commented Mar 21, 2017

This is exactly the problem I was having.

Uninstalled and reinstalled from github and it worked? Really befuddled now... Maybe a failed pip --upgrade?

@deanmalmgren
Copy link
Owner

I just pinned chardet to 2.1.1 to address #107. I think this will likely address your issue as well. Try pulling from the latest master on github to see if that fixes it. I'm going to close this, but feel free to reopen if it remains a problem.

@marija-stanojevic
Copy link

Hello,

I have this issue. I went back to 2.1.1 and now I got another error: ModuleNotFoundError: No module named 'universaldetector' which happens because chardet 2.1.1 is too old. What should I do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants