Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calmari/1.0: Fix 1.0.x models for Python 3.11 #348

Open
mikegerber opened this issue Oct 23, 2023 · 9 comments
Open

calmari/1.0: Fix 1.0.x models for Python 3.11 #348

mikegerber opened this issue Oct 23, 2023 · 9 comments

Comments

@mikegerber
Copy link
Contributor

We have old 1.0.x models that wouldn't run using the Calamari 1.0.x branch on Python 3.11, as the replacements use regexen now considered invalid in Python 3.11:

re.error: global flags not at the start of the expression at position 3

E.g. in our 0.ckpt.json:

            {
              "old": "\\s+(?u)",
              "new": " ",
              "regex": true
            },
            {
              "old": "\\n(?u)",
              "regex": true
            },
            {
              "old": "^\\s+(?u)",
              "regex": true
            },
            {
              "old": "\\s+$(?u)",
              "regex": true
            }

The global (?u) regex flag needs to go in front. This script fixes it:
https://github.com/OCR-D/ocrd_calamari/blob/master/ocrd_calamari/fix_calamari1_model.py

The question is if you want this "upgrading" procedure to go into the 1.0 branch's modeling loading code?

(I haven't checked any other 1.0 models, but I am somewhat sure that these replacements weren't customized by us and came from Calamari itself.)

@mikegerber
Copy link
Contributor Author

Related issue in ocrd_calamari is here: OCR-D/ocrd_calamari#91

@andbue
Copy link
Member

andbue commented Oct 23, 2023

Hi @mikegerber, I've just made two commits to the 1.0 branch: the first is trying to fix the regex problem and the second to make all the tests run without warning. Could you please test if this works with ocrd_calamari?

@mikegerber
Copy link
Contributor Author

Hi @mikegerber, I've just made two commits to the 1.0 branch: the first is trying to fix the regex problem and the second to make all the tests run without warning. Could you please test if this works with ocrd_calamari?

Unfortunately this only got called for the default parameters, not the ones read from the model on disk. I've had another look and opened PR #349. That PR fixes the issue for me!

@mikegerber
Copy link
Contributor Author

(I have 2 other issues with 1.0.x - more NumPy noise and another small issue with noise in the output. If you want to release another 1.0.x version maybe wait a little bit, I still need to investigate if it's Calamari or ocrd_calamari.)

andbue pushed a commit that referenced this issue Oct 24, 2023
* Revert "Move global flags to the start of regular expressions (fix for issue #348)"

This reverts commit 13b544b.

* Move global flags to the start of regular expressions

Fix regex replacements for models where global flags were put at the end
of the pattern strings. These patterns are invalid as of Python 3.11.
@andbue
Copy link
Member

andbue commented Oct 24, 2023

Thanks a lot – I didn't have an old model so I was just guessing where to fix the regexes... If you have any other suggestions, I'll gladly include them in the 1.0.7 release!

@mikegerber
Copy link
Contributor Author

Yeah I should have linked our historic model so you can reproduce :)

If you need a working model for 1.0 in the future: https://qurator-data.de/calamari-models/GT4HistOCR/2019-12-11T11_10+0100/model.tar.xz

(only for old prints/Fraktur)

@mikegerber
Copy link
Contributor Author

If you have any other suggestions, I'll gladly include them in the 1.0.7 release!

I'll try to debug today!

@mikegerber
Copy link
Contributor Author

The other issues:

So I think you could release 1.0.7 when #350 is merged and this issue can be closed too :)

@mikegerber
Copy link
Contributor Author

Nevermind, there's still lots of DeprecationWarnings I'd like to take a look first (other than the ones @andbue thankfully already fixed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants