New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command failed with exit code 127 #26

Closed
SeanMaday opened this Issue Aug 4, 2014 · 22 comments

Comments

Projects
None yet
@SeanMaday

SeanMaday commented Aug 4, 2014

I am getting a "Command failed with exit code 127" message when I try to convert a PDF on my Mac OS X machine.

@fabianmoronzirfas

This comment has been minimized.

fabianmoronzirfas commented Aug 5, 2014

+1

@fabianmoronzirfas

This comment has been minimized.

fabianmoronzirfas commented Aug 5, 2014

pdftotext is missing on OSX. See issue #21
Fixed by installing poppler via homebrew

brew install poppler   

based on this superuser post

@SeanMaday

This comment has been minimized.

SeanMaday commented Aug 5, 2014

Note that XQuartz is a dependency for poppler and can be downloaded here:
https://xquartz.macosforge.org

@deanmalmgren

This comment has been minimized.

Owner

deanmalmgren commented Aug 5, 2014

In #21 I ended up adding a section to the documentation for installing the dependencies of textract for different operating systems. I believe I got the OSX dependencies correct although I didn't have a fresh environment to test them on. Can you confirm that these are the correct steps and then we can close this issue out?

@fabianmoronzirfas

This comment has been minimized.

fabianmoronzirfas commented Aug 6, 2014

I just installed poppler via homebrew and had XQuartz already installed. I don't have libxml2 libxslt antiword installed and did not have to do any additional linking.

@deanmalmgren

This comment has been minimized.

Owner

deanmalmgren commented Aug 6, 2014

huh, ok. Do you have any suggested changes to the OSX installation instructions then?

@fabianmoronzirfas

This comment has been minimized.

fabianmoronzirfas commented Aug 6, 2014

As I said - I already had XQartz installalled from the binary. It also can be installed via brew
using cask

brew install caskroom/cask/brew-cask  
brew cask install xquartz  
brew install poppler  
#sudo is optional
sudo easy_install pip  
#sudo is optional
sudo pip install textract  

I don't know if the other dependiencies you've listed are needed for other export formats. I only tested PDF export.

@deanmalmgren

This comment has been minimized.

Owner

deanmalmgren commented Aug 9, 2014

Thanks for passing along your installation instructions. I just updated the documentation.

@fabianmoronzirfas

This comment has been minimized.

fabianmoronzirfas commented Aug 11, 2014

you're welcome

@bef55

This comment has been minimized.

bef55 commented Jan 17, 2018

In Windows 10 I'm still getting the 127 exit code error on PDF files, and it doesn't appear that the built-in fallback to PDFMiner is working:
Traceback (most recent call last): File "20180116_test_text_extraction.py", line 38, in <module> text = textract.process(fn) File "C:\Users\b\Anaconda2\lib\site-packages\textract\parsers\__init__.py", line 77, in process return parser.process(filename, encoding, **kwargs) File "C:\Users\b\Anaconda2\lib\site-packages\textract\parsers\utils.py", line 46, in process byte_string = self.extract(filename, **kwargs) File "C:\Users\b\Anaconda2\lib\site-packages\textract\parsers\pdf_parser.py", line 28, in extract raise ex textract.exceptions.ShellError: The command pdftotext \users\b\Dropbox\fein\ua\ocr\2016 02 02 Dagger OCR information sheet.pdf -` failed with exit code 127
------------- stdout -------------
------------- stderr -------------

`
Any ideas on how to fix would be greatly appreciated.

@fsecada01

This comment has been minimized.

fsecada01 commented Jan 27, 2018

I am in the same boat.

`text = textract.process(filename)
Traceback (most recent call last):
File "C:\Languages\Python36\lib\site-packages\textract\parsers\utils.py", line 84, in run
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
File "C:\Languages\Python36\lib\subprocess.py", line 709, in init
restore_signals, start_new_session)
File "C:\Languages\Python36\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "C:\Languages\Python36\lib\site-packages\textract\parsers_init_.py", line 77, in process
return parser.process(filename, encoding, **kwargs)
File "C:\Languages\Python36\lib\site-packages\textract\parsers\utils.py", line 46, in process
byte_string = self.extract(filename, **kwargs)
File "C:\Languages\Python36\lib\site-packages\textract\parsers\pdf_parser.py", line 28, in extract
raise ex
File "C:\Languages\Python36\lib\site-packages\textract\parsers\pdf_parser.py", line 20, in extract
return self.extract_pdftotext(filename, **kwargs)
File "C:\Languages\Python36\lib\site-packages\textract\parsers\pdf_parser.py", line 43, in extract_pdftotext
stdout, _ = self.run(args)
File "C:\Languages\Python36\lib\site-packages\textract\parsers\utils.py", line 91, in run
' '.join(args), 127, '', '',
textract.exceptions.ShellError: The command pdftotext C:\Users\Francis Secada\OneDrive\Documents\Resume + Resume Writing Resources\Resumes\Francis Secada - Resume.pdf - failed with exit code 127
------------- stdout -------------
------------- stderr -------------`

@Prasengupta

This comment has been minimized.

Prasengupta commented Apr 23, 2018

Is the problem still open in windows? , Does anyone has a solution ?.
@deanmalmgren @fsecada01 @bef55

Traceback (most recent call last):
File "C:\Python27\myscripts\SVNFOLDER\textract_examples\main.py", line 2, in
text = textract.process('testpdf.pdf')
File "C:\Python27\lib\site-packages\textract\parsers_init_.py", line 77, in process
return parser.process(filename, encoding, **kwargs)
File "C:\Python27\lib\site-packages\textract\parsers\utils.py", line 46, in process
byte_string = self.extract(filename, **kwargs)
File "C:\Python27\lib\site-packages\textract\parsers\pdf_parser.py", line 28, in extract
raise ex
textract.exceptions.ShellError: The command pdftotext testpdf.pdf - failed with exit code 127
------------- stdout -------------
------------- stderr -------------

@ayrtondenner

This comment has been minimized.

ayrtondenner commented May 4, 2018

Having the same problem as @Prasengupta, after installing textract and trying to run it, got the same problem:

textract.exceptions.ShellError: The command 'pdftotext D:/13574594000196.pdf -' failed with exit code 127
------------- stdout -------------
------------- stderr -------------

@grvaggarwal

This comment has been minimized.

grvaggarwal commented Jun 14, 2018

I am facing the same issue. Has this been resolved somewhere?
specs: Windows 10, anaconda 3.6, python 3

@JoelSalzesson

This comment has been minimized.

JoelSalzesson commented Jun 14, 2018

I'm working on mac and istructions from http://textract.readthedocs.io/en/stable/installation.html for OSX worked for me:

brew cask install xquartz
brew install poppler antiword unrtf tesseract swig
pip install textract
@THEdavehogue

This comment has been minimized.

THEdavehogue commented Jun 15, 2018

Same as @Prasengupta. I realize Windows isn't cool but some of us are confined to it at the office. Can someone please give some clear instructions on how to correctly set up poppler on Windows 10?

@RAZelzner

This comment has been minimized.

RAZelzner commented Aug 21, 2018

@Prasengupta and @THEdavehogue have someone find a solution?

@ZippoML

This comment has been minimized.

ZippoML commented Sep 11, 2018

Brows to:
https://www.xpdfreader.com/download.html
Download Windows XPDF tools, find there pdftoppm.exe and copy this file to your path directory or add a new path Dir. The problem is resolved.

@totopopov

This comment has been minimized.

totopopov commented Sep 21, 2018

@ZippoML May be a silly question but when you say your path dir, do you mean the python.exe dir? or to a path variable? Cuz I am doing both but both seem to not produce a positive outcome. But there could be something that I do wrong.

@ZippoML

This comment has been minimized.

ZippoML commented Sep 24, 2018

Try command pdftoppm --help in your CMD. If it works, then u'v succesfully added exe to your path directory. Textract is only wrapper.

@JG145

This comment has been minimized.

JG145 commented Oct 1, 2018

Hi anything further on this? Also facing the same issue

@ZippoML

This comment has been minimized.

ZippoML commented Oct 2, 2018

Hi anything further on this? Also facing the same issue

Actually in my case i decided not to use textract, cause it doesn't have the option to add config to OCR character recognition. For example i coudn't set PSM and OEM modes, data files DIR etc. Thus i used pytesseract as a tesseract wrapper and pdf2image module as pdf2ppm wrapper. I pass images between this two packages using PIL.
P.S. my working env is Windows.
Finally I downloaded pdf2ppm.exe from this URL:
http://blog.alivate.com.au/wp-content/uploads/2018/08/poppler-0.67.0_x86.7z
Copied following files:

  1. freetype6.dll
  2. jpeg62.dll
  3. libgcc_s_dw2...
  4. libpng16-16.dll
  5. libpoppler-78.dll
  6. libstdc++-6.dll
  7. libtiff3.dll
  8. zlib1.dll
  9. pdfinfo.exe
  10. pdftoppm.exe

The other way is not to use pdf2ppm but to use ImageMagic for pdf to img converting. But as I remeber ImageMagic is not free software.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment