Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: module 'numpy' has no attribute 'str'. #87

Closed
mikegerber opened this issue Mar 9, 2023 · 10 comments
Closed

AttributeError: module 'numpy' has no attribute 'str'. #87

mikegerber opened this issue Mar 9, 2023 · 10 comments
Assignees
Labels
blocked bug Something isn't working

Comments

@mikegerber
Copy link
Collaborator

Using

I get this error:

18:00:28.245 INFO processor.CalamariRecognize - About to recognize 27 lines of region 'r13'
/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_model.py:338: FutureWarning: In the future `np.str` will be defined as the corresponding NumPy scalar.
  [x / 255, len_x, np.zeros((len(x), 1), dtype=np.str)],
Traceback (most recent call last):
  File "/home/b-mg106/.pyenv/versions/ocrd_calamari/bin/ocrd-calamari-recognize", line 33, in <module>
    sys.exit(load_entry_point('ocrd-calamari', 'console_scripts', 'ocrd-calamari-recognize')())
  File "/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/b-mg106/devel/ocrd_calamari/ocrd_calamari/cli.py", line 13, in ocrd_calamari_recognize
    return ocrd_cli_wrap_processor(CalamariRecognize, *args, **kwargs)
  File "/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/ocrd/decorators/__init__.py", line 117, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/ocrd/processor/helpers.py", line 121, in run_processor
    processor.process()
  File "/home/b-mg106/devel/ocrd_calamari/ocrd_calamari/recognize.py", line 121, in process
    for line, line_coords, raw_results in zip(textlines, line_coordss, raw_results_all):
  File "/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/calamari_ocr/ocr/predictor.py", line 250, in predict_raw
    for result in zip(*prediction):
  File "/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/calamari_ocr/ocr/predictor.py", line 166, in predict_raw
    for p, ip in zip(self.network.predict_raw(input_images), input_params):
  File "/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/calamari_ocr/ocr/backends/model_interface.py", line 62, in predict_raw
    for r in self.predict_raw_batch(*self.zero_padding(x)):
  File "/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_model.py", line 338, in predict_raw_batch
    [x / 255, len_x, np.zeros((len(x), 1), dtype=np.str)],
  File "/home/b-mg106/.pyenv/versions/3.9.16/envs/ocrd_calamari/lib/python3.9/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'str'.
`np.str` was a deprecated alias for the builtin `str`. To avoid this error in existing code, use `str` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
@mikegerber mikegerber self-assigned this Mar 9, 2023
@mikegerber mikegerber added the bug Something isn't working label Mar 9, 2023
@mikegerber
Copy link
Collaborator Author

The deprecation of np.str (and similar) has expired in NumPy 1.24.0; i.e. it now throws an error.

Until we can update to Calamari 2, I'll require NumPy < 1.24.0 as a workaround.

mikegerber added a commit that referenced this issue Mar 9, 2023
NumPy has deprecated np.str (etc.) since NumPy 1.20, and since Numpy
1.24 throws an error. We can't currently update to Calamari 2 (and fix
the problem there if necessary), so stick to NumPy 1.23.x for now.

#87
@mikegerber
Copy link
Collaborator Author

Managed to get it to work, using numpy 1.23.5. CircleCI is still failing.

@mikegerber
Copy link
Collaborator Author

The filename of the created file seems to have changed in OCR-D so the tests fail. I'm correcting it. (This is unrelated to the NumPy problem. But illustrates why scheduled tests seems to be useful.)

@bertsky
Copy link
Contributor

bertsky commented Aug 15, 2023

Note: on calamari/1.0 branch, this had already been fixed – perhaps we just need another release?

@mikegerber
Copy link
Collaborator Author

Note: on calamari/1.0 branch, this had already been fixed – perhaps we just need another release?

Thanks for making me aware of this open issue again! I will look into it!

Is there an urgent issue in ocrd_all with no working workaround?

@bertsky
Copy link
Contributor

bertsky commented Aug 16, 2023

Note: on calamari/1.0 branch, this had already been fixed – perhaps we just need another release?

Thanks for making me aware of this open issue again! I will look into it!

Also note Calamari-OCR/calamari#341, which hopefully will be enough to make ocrd_calamari 1.0.5 (before the workarounds) work out of the box.

Is there an urgent issue in ocrd_all with no working workaround?

I would say that 2.x support is quite urgent (because most/best models are trained on 2.x). Given that Calamari 2.x now has good native PAGE support, this should actually be easy IIUC.

We have 2 workarounds for that:

  1. extracting line pairs via ocrd-segment-extract-lines, running the 2.x calamari-predict on them, and then re-importing with ocrd-segment-replace-text

     ocrd-segment-extract-lines -I $IGRP -O LINES
     calamari-predict --pipeline.num_processes 4 --checkpoint /path/to/\*.json --data.images "LINES/*.png"
     ocrd-segment-replace-text -I $IGRP -O $OGRP -P file_glob "LINES/*.pred.txt"
    
  2. running the 2.x calamari-predict on the PAGE files directly and then reimporting the resulting PAGE files into the METS via bulk-add

     calamari-predict --checkpoint /path/to/deep3_lsh4/\*.json --data PageXML --data.xml_files "$IGRP/*.xml" --data.images "$IMGGRP/*.png" --data.output_glyphs True --data.max_glyph_alternatives 5 --data.output_confidences True
     ocrd workspace find -m application/vnd.prima.page+xml -G $IGRP -k page_id -k file_id -k url | while read page_id file_id url; do out=${url%.xml}.pred.xml; file_id=${file_id//$IGRP/$OGRP}; url=${url//$IGRP/$OGRP}; url=${url//pred.}; mv $out $url; echo $page_id $file_id $url; done | ocrd workspace bulk-add -r '(?P<pageid>.*) (?P<fileid>.*) (?P<url>.*)' -G $OGRP -g '{{ pageid }}' -i '{{ fileid }}' -S '{{ url }}' -
    

But in both cases we loose any information below the line level including confidence, and we get no model provenance here). Also, with these recipes we cannot use the regular specialised workflow formats.

@mikegerber
Copy link
Collaborator Author

I would say that 2.x support is quite urgent (because most/best models are trained on 2.x). Given that Calamari 2.x now has good native PAGE support, this should actually be easy IIUC.

Moving this to #61.

@bertsky
Copy link
Contributor

bertsky commented Aug 18, 2023

Thanks for making me aware of this open issue again! I will look into it!

Also note Calamari-OCR/calamari#341, which hopefully will be enough to make ocrd_calamari 1.0.5 (before the workarounds) work out of the box.

This has now happened: there is a new calamari-ocr==1.0.6 which already takes care of the Numpy and Protobuf problems, so you could rewrite ocrd_calamari to basically what we had before these workarounds and release a ocrd_calamari==1.0.6.post1 or so.

mikegerber added a commit that referenced this issue Oct 13, 2023
Older Calamari version had used deprecated np.str, remove the workaround
for this and require Calamari >= 1.0.6 instead.

See #87.
@mikegerber
Copy link
Collaborator Author

Removing the numpy workaround now as it seems to be fixed in calamari-ocr indeed. I get another warning though using Python 3.11 and the latest numpy... Need to investigate.

❯ make test
[ ... ]
========================================== warnings summary ==========================================
../../.pyenv/versions/tmp.ocrd_calamari.issue-91/lib/python3.11/site-packages/numpy/core/getlimits.py:542
  /home/b-mg106/.pyenv/versions/tmp.ocrd_calamari.issue-91/lib/python3.11/site-packages/numpy/core/getlimits.py:542: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.longdouble'> does not match any known type: falling back to type probe function.
  This warnings indicates broken support for the dtype!
    machar = _get_machar(dtype)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================== 4 passed, 1 warning in 267.15s (0:04:27) ==============================

@mikegerber
Copy link
Collaborator Author

This issue can be closed. There's

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants