Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicting and outdated requirements for calamari-ocr #294

Closed
stweil opened this issue Mar 8, 2022 · 12 comments
Closed

Conflicting and outdated requirements for calamari-ocr #294

stweil opened this issue Mar 8, 2022 · 12 comments

Comments

@stweil
Copy link
Collaborator

stweil commented Mar 8, 2022

ocrd_calamari requires calamari-ocr == 1.0.*, while ocrd_cis requires calamari_ocr == 0.3.5. That causes a conflict which is currently only avoided by using a sub venv headless-tf2 for ocrd_calamari. I'd like to get rid of that sub venv as it does not seem to be necessary otherwise.

In addition, both requirements are really outdated. The latest release is 2.1.4. It is desirable to use that.

@mikegerber
Copy link
Contributor

ocrd_calamari will be updated to support Calamari 2.1.x.

It will then still need TensorFlow 2, other processors need TensorFlow 1, so you can't get around it.

@bertsky
Copy link
Collaborator

bertsky commented Mar 17, 2022

ocrd_calamari requires calamari-ocr == 1.0.*, while ocrd_cis requires calamari_ocr == 0.3.5.

The latter is an artefact of the very early days (before ocrd_calamari even existed) and will likely be removed soon. See cisocrgroup/ocrd_cis#88

That causes a conflict which is currently only avoided by using a sub venv headless-tf2 for ocrd_calamari.

That interpretation of the situation is wrong. We need headless-tf2 because you cannot have TF2 and TF1 in the same venv. And besides ocrd_calamari, also ocrd_pc_segmentation and ocrd_anybaseocr depend on TF2.

In addition, both requirements are really outdated. The latest release is 2.1.4. It is desirable to use that.

As Mike already pointed out, this is already planned – see OCR-D/ocrd_calamari#61 – but is not an ocrd_all issue.

Can we close?

@stweil
Copy link
Collaborator Author

stweil commented Mar 17, 2022

That interpretation of the situation is wrong. We need headless-tf2 because you cannot have TF2 and TF1 in the same venv. And besides ocrd_calamari, also ocrd_pc_segmentation and ocrd_anybaseocr depend on TF2.

TF2 can simply be installed in the default venv. There is no need for an extra headless-tf2 venv as soon as the conflict is fixed, for example by removing ocrd_cis.

The issue can be closed after removing that outdated submodule.

@stweil
Copy link
Collaborator Author

stweil commented Mar 17, 2022

Maybe we can agree that submodules which use recent software versions should ideally be installed in the default venv.

Additional venvs should be avoided where possible. They cannot be avoided for submodules which use incompatible old software versions, so for example headless-tf1 is needed as long as submodules depend on Tensorflow 1.x

@mikegerber
Copy link
Contributor

If ocrd_cis requires calamari 0.3.5 then it should probably go to the environment supporting TF 1.x.

@bertsky
Copy link
Collaborator

bertsky commented Mar 17, 2022

for example by removing ocrd_cis.

The issue can be closed after removing that outdated submodule

What are you talking about? ocrd_cis is one of the most important modules we have.

TF2 can simply be installed in the default venv

TF in the main venv has brought us into mutual dependency hell with other modules in the past (numpy IIRC), probably does today, and certainly will in the future. Abandoning this long held precaution makes no sense, because you don't even save up much space (the large packages are TF and its dependents, not the stuff in the main venv).

Same is true for headless-torch14 BTW (which I saw you attempt to remove in some other branch).

@bertsky
Copy link
Collaborator

bertsky commented Mar 17, 2022

@mikegerber

If ocrd_cis requires calamari 0.3.5 then it should probably go to the environment supporting TF 1.x.

Yes, but that old version does not depend on TF though (only protobuf). So my take is still that this should be removed from ocrd_cis soon.

@stweil
Copy link
Collaborator Author

stweil commented Mar 17, 2022

What are you talking about? ocrd_cis is one of the most important modules we have.

I misunderstood your "will likely be removed soon". Then let's wait until for the updates of ocrd_cis and ocrd_calamari which hopefully will fix the conflict and allow closing this issue.

@bertsky
Copy link
Collaborator

bertsky commented Mar 17, 2022

Then let's wait until for the updates of ocrd_cis and ocrd_calamari which hopefully will fix the conflict and allow closing this issue.

I don't see any conflict. They reside in different venvs.

If you hope to achieve all-in-one (fat) builds without sub-venvs, then let me reiterate that this is fruitless: there will always be some conflicts around the corner in this evolving federated system. We'll need to manage sub-venvs as long as we have no other isolation mechanism (think: networking).

@mikegerber
Copy link
Contributor

If you hope to achieve all-in-one (fat) builds without sub-venvs, then let me reiterate that this is fruitless: there will always be some conflicts around the corner in this evolving federated system. We'll need to manage sub-venvs as long as we have no other isolation mechanism (think: networking).

100% agree with this. You'll always have conflicting dependencies and it's best to have a solution that works with this fact - separate venvs or containers.

@stweil
Copy link
Collaborator Author

stweil commented Mar 24, 2022

For the current ocrd_all submodules it is sufficient to have two separate venvs (one main venv, one sub-venv). That works without dependency conflicts for different versions of Ubuntu / Debian and Python3 and minimizes build time and storage requirements: https://github.com/stweil/ocrd_all/actions/runs/2029562843.

@bertsky
Copy link
Collaborator

bertsky commented Mar 18, 2023

fixed long ago

@bertsky bertsky closed this as completed Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants