New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pdftotext isn't included on non-linux OSes #21
Conversation
Thanks for posting this problem. I haven't done very thorough testing on non-Ubuntu OSes so I'm sure that things like this will come up. One thing is clear: we should throw a better error if the But I've got two follow-up questions here that are on a related topic:
Thanks again for the comment. I'll post some code to get this fixed (or at least improved) ASAP and I look forward to your thoughts on 1 and 2 above. |
Just to confirm that by default OS X doesn't have |
…a helpful error message pointing to the documentation
@ojosdegris @fabiantheblind @aphexcx I added some documentation for installing things on OSX that I believe is correct. Hopefully this will help make it easier to install on non-Ubuntu distributions. @aphexcx I also improved the error messages to be more helpful in the event that an executable is not installed and, in the case of pdfs, have this falling back to using pdfminer, which should exist no matter what because it is a python package that is installed by the requirements of textract. Hopefully this fixes the issue. If there continue to be problems, let me know! |
…sier to find them
pdftotext isn't included on non-linux OSes
With these changes, I'm going to close this issue for now but wouldn't be surprised if the OSX installation instructions could be improved. |
Extracting PDFs doesn't work on windows, because windows doesn't come with pdftotext:
Maybe require pdftotext or xpdf support? http://en.wikipedia.org/wiki/Pdftotext