Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip install ocrd_tesserocr fails with tesseract version 4.0.0-beta-26-gfd49 #28

Closed
finkf opened this issue Jan 4, 2019 · 16 comments
Closed
Assignees
Projects

Comments

@finkf
Copy link

finkf commented Jan 4, 2019

I use pip install ocrd_tesserocr to install ocrd_tesseract into my virtualenv environment. The installation fails with:

...
  Running setup.py bdist_wheel for tesserocr ... error
  Complete output from command /run/media/flo/a57ed1c0-7fc5-41b1-a6e5-0d43b3ae6a40/data/devel/work/cis-ocrd-py/env/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-k_dgo547/tesserocr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-q7mozwr8 --python-tag cp37:
  Supporting tesseract v4.0.0
  Configs from pkg-config: {'include_dirs': ['/usr/include'], 'libraries': ['tesseract', 'lept'], 'cython_compile_time_env': {'TESSERACT_VERSION': 67108864}}
  running bdist_wheel
  running build
  running build_ext
  building 'tesserocr' extension
  creating build
  creating build/temp.linux-x86_64-3.7
  gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fstack-protector-strong -fno-plt -march=x86-64 -mtune=generic -O3 -pipe -fstack-protector-strong -fno-plt -march=x86-64 -mtune=generic -O3 -pipe -fstack-protector-strong -fno-plt -fPIC -I/usr/include -I/usr/include/python3.7m -c tesserocr.cpp -o build/temp.linux-x86_64-3.7/tesserocr.o -std=c++11 -DUSE_STD_NAMESPACE
  tesserocr.cpp: In function 'PyObject* __pyx_pf_9tesserocr_16PyResultIterator_8GetBestLSTMSymbolChoices(__pyx_obj_9tesserocr_PyResultIterator*)':
  tesserocr.cpp:12196:43: error: 'class tesseract::ResultIterator' has no member named 'GetBestLSTMSymbolChoices'
     __pyx_v_output = (__pyx_v_self->_riter->GetBestLSTMSymbolChoices()[0]);
                                             ^~~~~~~~~~~~~~~~~~~~~~~~
  error: command 'gcc' failed with exit status 1

  ----------------------------------------
  Failed building wheel for tesserocr
  Running setup.py clean for tesserocr
Failed to build tesserocr
Installing collected packages: tesserocr, ocrd-tesserocr
  Running setup.py install for tesserocr ... error
    Complete output from command /run/media/flo/a57ed1c0-7fc5-41b1-a6e5-0d43b3ae6a40/data/devel/work/cis-ocrd-py/env/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-k_dgo547/tesserocr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-fc87h61b/install-record.txt --single-version-externally-managed --compile --install-headers /run/media/flo/a57ed1c0-7fc5-41b1-a6e5-0d43b3ae6a40/data/devel/work/cis-ocrd-py/env/include/site/python3.7/tesserocr:
    Supporting tesseract v4.0.0
    Configs from pkg-config: {'include_dirs': ['/usr/include'], 'libraries': ['lept', 'tesseract'], 'cython_compile_time_env': {'TESSERACT_VERSION': 67108864}}
    running install
    running build
    running build_ext
    building 'tesserocr' extension
    creating build
    creating build/temp.linux-x86_64-3.7
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fstack-protector-strong -fno-plt -march=x86-64 -mtune=generic -O3 -pipe -fstack-protector-strong -fno-plt -march=x86-64 -mtune=generic -O3 -pipe -fstack-protector-strong -fno-plt -fPIC -I/usr/include -I/usr/include/python3.7m -c tesserocr.cpp -o build/temp.linux-x86_64-3.7/tesserocr.o -std=c++11 -DUSE_STD_NAMESPACE
    tesserocr.cpp: In function 'PyObject* __pyx_pf_9tesserocr_16PyResultIterator_8GetBestLSTMSymbolChoices(__pyx_obj_9tesserocr_PyResultIterator*)':
    tesserocr.cpp:12196:43: error: 'class tesseract::ResultIterator' has no member named 'GetBestLSTMSymbolChoices'
       __pyx_v_output = (__pyx_v_self->_riter->GetBestLSTMSymbolChoices()[0]);
                                               ^~~~~~~~~~~~~~~~~~~~~~~~
    error: command 'gcc' failed with exit status 1
...

tesseract is installed on the system:

tesseract 4.0.0-beta.4-26-gfd49
 leptonica-1.77.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.1) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1
 Found AVX
 Found SSE
@stweil
Copy link
Contributor

stweil commented Jan 4, 2019

Don't use 4.0.0-beta-26-gfd49. Use 4.0.0.

@finkf
Copy link
Author

finkf commented Jan 4, 2019

OK. Thanks

@finkf
Copy link
Author

finkf commented Jan 4, 2019

It does not work for tesseract-v4.0.0-beta.1 as well (its the default version for tesseract in ubuntu for windows).

@wrznr
Copy link
Contributor

wrznr commented Jan 4, 2019

I stumbled upon the same issue: tesserocr from github does not work with tesseract from ubuntu (18.04) due to the missing LSTM choice iterator. I had to install tesseract and ocrd_tesserocr from github to make it work.

@finkf
Copy link
Author

finkf commented Jan 4, 2019

It worked for me now. I just had to use an older version for tesserocr: pip install tesserocr==2.3.1. Thanks.

@stweil
Copy link
Contributor

stweil commented Jan 4, 2019

Noah had updated the Python code to autodetect the Tesseract version to select the right API. It looks like ocrd_tesserocr still needs that update. Cc @noahmetzger.

@stweil
Copy link
Contributor

stweil commented Jan 4, 2019

Ubuntu 18.04 should have the right version for Tesseract. Maybe you just have to update / upgrade. See https://packages.ubuntu.com/bionic/tesseract-ocr.

@finkf
Copy link
Author

finkf commented Jan 4, 2019

The version downgrade of tesserocr worked for me.
I have currently no Ubuntu machines available. Only SUSE, Ubuntu on Windows and Arch-Linux (it did not work for any of them).
Is the newer version of tesserocr stricly needed?

@kba kba added this to Backlog in coordinate Jan 7, 2019
@kba
Copy link
Member

kba commented Jan 15, 2019

@noahmetzger Can you investigate?

@kba kba moved this from Backlog to Low priority in coordinate Jan 15, 2019
@noahmetzger
Copy link
Contributor

I ll look into it

@noahmetzger
Copy link
Contributor

I ve tried to reproduce your problem with a virtualbox Ubuntu 18.04 and the default version of tesseract, tesserocr and ocrd_tesserocr of ubuntu.
But the for me the installation went fine.
Which version of pip and python did you use?

@kba
Copy link
Member

kba commented Jan 15, 2019

@noahmetzger What version of tesserocr are you using?

pip freeze |grep tess

@noahmetzger
Copy link
Contributor

2.4

@noahmetzger
Copy link
Contributor

My steps were:

  • Building a new Ubuntu 18.04 with virtual box
  • Install pip with sudo apt install python3-pip
  • Install virtualenv sudo apt install virtualenv
    -create and start a virtual Environment with:
    mkdir virtualEnvironment
    virtualenv -p python3 virtualEnvironment
    source virtualEnvironment/bin/activate
  • Install tesseract
    sudo apt-get install tesseract-ocr libtesseract-dev libleptonica-dev pkg-config
  • And finally pip3 install ocrd_tesserocr

@wrznr
Copy link
Contributor

wrznr commented Apr 17, 2019

I can confirm that the instructions by @noahmetzger work. If @finkf has a running system again, I'd propose to close this issue.

@finkf
Copy link
Author

finkf commented Apr 18, 2019

I have not tried it. But if it works you can close.

@wrznr wrznr closed this as completed Jun 25, 2019
coordinate automation moved this from Low priority to Done Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
coordinate
  
Done
Development

No branches or pull requests

5 participants