Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty new crate build fails on Linux #32

Closed
ecashin opened this issue Nov 9, 2022 · 5 comments
Closed

Empty new crate build fails on Linux #32

ecashin opened this issue Nov 9, 2022 · 5 comments

Comments

@ecashin
Copy link

ecashin commented Nov 9, 2022

Hello. I'm a tesseract newbie getting started, but adding tesseract as a dependency looks like it's leading to type discrepancies between tesseract's own dependencies.

Steps to reproduce:

  • rustup update # (optional)
  • cargo new try_tesseract && cd try_tesseract
  • cargo add tesseract
  • cargo build

The build fails with the errors listed below.

   Compiling leptonica-plumbing v0.4.1
   Compiling tesseract-plumbing v0.6.1
error[E0432]: unresolved imports `self::tesseract_sys::TessBaseAPIGetAltoText`, `self::tesseract_sys::TessBaseAPIGetLSTMBoxText`, `self::tesseract_sys::TessBaseAPIGetTsvText`, `self::tesseract_sys::TessBaseAPIGetWordStrBoxText`
 --> /home/ecashin/.cargo/registry/src/github.com-1ecc6299db9ec823/tesseract-plumbing-0.6.1/src/tess_base_api.rs:5:74
  |
5 |     TessBaseAPIAllWordConfidences, TessBaseAPICreate, TessBaseAPIDelete, TessBaseAPIGetAltoText,
  |                                                                          ^^^^^^^^^^^^^^^^^^^^^^ no `TessBaseAPIGetAltoText` in the root
6 |     TessBaseAPIGetComponentImages, TessBaseAPIGetHOCRText, TessBaseAPIGetInputImage,
7 |     TessBaseAPIGetLSTMBoxText, TessBaseAPIGetSourceYResolution, TessBaseAPIGetTsvText,
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^                                   ^^^^^^^^^^^^^^^^^^^^^ no `TessBaseAPIGetTsvText` in the root
  |     |
  |     no `TessBaseAPIGetLSTMBoxText` in the root
8 |     TessBaseAPIGetUTF8Text, TessBaseAPIGetWordStrBoxText, TessBaseAPIInit2, TessBaseAPIInit3,
  |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no `TessBaseAPIGetWordStrBoxText` in the root
  |
help: a similar name exists in the module
  |
5 |     TessBaseAPIAllWordConfidences, TessBaseAPICreate, TessBaseAPIDelete, TessBaseAPIGetUTF8Text,
  |                                                                          ~~~~~~~~~~~~~~~~~~~~~~
help: a similar name exists in the module
  |
7 |     TessBaseAPIGetBoxText, TessBaseAPIGetSourceYResolution, TessBaseAPIGetTsvText,
  |     ~~~~~~~~~~~~~~~~~~~~~
help: a similar name exists in the module
  |
7 |     TessBaseAPIGetLSTMBoxText, TessBaseAPIGetSourceYResolution, TessBaseAPIGetUTF8Text,
  |                                                                 ~~~~~~~~~~~~~~~~~~~~~~
help: a similar name exists in the module
  |
8 |     TessBaseAPIGetUTF8Text, TessBaseAPIGetBoxText, TessBaseAPIInit2, TessBaseAPIInit3,
  |                             ~~~~~~~~~~~~~~~~~~~~~

For more information about this error, try `rustc --explain E0432`.
error: could not compile `tesseract-plumbing` due to previous error
@ccouzens
Copy link
Collaborator

ccouzens commented Nov 9, 2022

Hello, thank you for using tesseract and for reporting this issue.

These functions look like they all come from the c-api, which is used to auto-generate a Rust wrapper.

Can you run

# possibly adjust the path to tesseract/capi.h
cat /usr/include/tesseract/capi.h | grep TessBaseAPIGetAltoText

and see if the function is declared on your system?

Can you tell me the version of the tesseract library you have installed, and any other information that might be unusual about it?

@ecashin
Copy link
Author

ecashin commented Nov 10, 2022

/usr/include/tesseract/capi.h | grep TessBaseAPIGetAltoText

Hi. A colleague points out that tesseract-rs probably depends on a version 5 library, and I'm probably using a version of the library before version 5.

bash$ grep TessBaseAPIGetAltoText /usr/include/tesseract/capi.h
bash$ dpkg -S /usr/include/tesseract/capi.h
libtesseract-dev: /usr/include/tesseract/capi.h
bash$ dpkg -l | grep libtesseract-dev
ii  libtesseract-dev                              4.00~git2288-10f4998a-2                 amd64        Development files for the tesseract command line OCR tool
bash$ 

... and ...

bash$ grep -v JohnTukeyWasSmart /etc/*release*
/etc/lsb-release:DISTRIB_ID=LinuxMint
/etc/lsb-release:DISTRIB_RELEASE=19.3
/etc/lsb-release:DISTRIB_CODENAME=tricia
/etc/lsb-release:DISTRIB_DESCRIPTION="Linux Mint 19.3 Tricia"
/etc/os-release:NAME="Linux Mint"
/etc/os-release:VERSION="19.3 (Tricia)"
/etc/os-release:ID=linuxmint
/etc/os-release:ID_LIKE=ubuntu
/etc/os-release:PRETTY_NAME="Linux Mint 19.3"
/etc/os-release:VERSION_ID="19.3"
/etc/os-release:HOME_URL="https://www.linuxmint.com/"
/etc/os-release:SUPPORT_URL="https://forums.linuxmint.com/"
/etc/os-release:BUG_REPORT_URL="http://linuxmint-troubleshooting-guide.readthedocs.io/en/latest/"
/etc/os-release:PRIVACY_POLICY_URL="https://www.linuxmint.com/"
/etc/os-release:VERSION_CODENAME=tricia
/etc/os-release:UBUNTU_CODENAME=bionic
grep: /etc/upstream-release: Is a directory
bash$ 

At a glance it looks like the system doesn't have a version 5 library available from the default APT sources.

@bbenne10
Copy link

Hi. I'm the colleague mentioned above.
We discovered (by checking the source code for tesseract itself) that this is indeed a problem of tesseract versions.

I would recommend advertising the tesseract version necessary for building all of these related crates via documentation and updating this line to require a minimum version of something like 5.0 so that this confusion is avoided in the future. The new version could look something like the following:

    let pk = pkg_config::Config::new().atleast_version("5.0").probe("tesseract").unwrap();

As a bit of a workaround, I have published a working build (using nix flakes) here.

@ccouzens
Copy link
Collaborator

ccouzens commented Nov 10, 2022

It looks to me like 4.1.0 should be fine

https://github.com/tesseract-ocr/tesseract/blob/4.1.0/src/api/capi.h#L411

But 4.0.0 is indeed missing the function 😢

I would recommend advertising the tesseract version necessary for building all of these related crates via documentation and updating this line to require a minimum version of something like 5.0 so that this confusion is avoided in the future. The new version could look something like the following:

    let pk = pkg_config::Config::new().atleast_version("5.0").probe("tesseract").unwrap();

Thank you. Will do.

Another approach would be for me to put the functions that use these missing functions behind a Rust feature flag. Do you think many other people will be on 4.0.0?

As a bit of a workaround, I have published a working build (using nix flakes) here.

I'm glad to hear you've got a workaround.

@bbenne10
Copy link

Frankly, I think you have a decision to make. What version do you SUPPORT?

If you cannot support tesseract 4.x by testing against tesseract 4.x (which is fine!) , I personally think it may be best to specify the minimum tested version as required in build.rs. I do not know how stable the API for tesseract is between major versions, but the existance and continued shipping of versions 3, 4, and 5 indicate to me that trying to provide sane APIs linked against more than 1 major version may be non-trivial?

Documenting the required minimum version and enforcing it in build.rs is enough, in my opinion. Feature flags potentially ask a lot of you, the developer, in maintenance costs and infrastructure to test and upkeep the promised features. Only you can make a decision about your time in relation to the API(s) in question.

This is a long winded way of saying that I have NO IDEA of the prevalence of 4.x versions.

ccouzens added a commit to ccouzens/tesseract-sys that referenced this issue Nov 10, 2022
antimatter15/tesseract-rs#32 (comment)

4.1 was released in July 2019 (>3 years ago). So most people should have
it. Except for people using their Linux distro's provided version, it is
unlikely anyone is on such an old version.

I've not added the restriction to the windows build, as I can't test it
so easily and because I don't expect it to be a problem for Windows users.

Tested with these commands (more or less)
```bash
podman container run --rm -it -v "$(pwd)/Documents:/var/Documents:Z" linuxmintd/mint19.3-amd64

apt-get update && apt-get dist-upgrade --assume-yes
apt-get install curl libtesseract-dev tesseract-ocr-eng pkg-config clang --assume-yes

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

cd /var/Documents/github.com/ccouzens/tesseract-sys
cargo clean
cargo test
```

It isn't actually true that tesseract-sys requires 4.1.0. But everything
that builds on top of it does. It is simplist to put the restriction
here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants