Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction API Error #270

Closed
Mageswaran1989 opened this issue Jul 1, 2021 · 12 comments
Closed

Prediction API Error #270

Mageswaran1989 opened this issue Jul 1, 2021 · 12 comments

Comments

@Mageswaran1989
Copy link

I used cli to train on SROIE2019 dataset (original images are preprocessed into line images) with :

calamari-train \
--device.gpus 0 \
--trainer.gen SplitTrain \
--trainer.gen.validation_split_ratio=0.2  \
--trainer.output_dir /data/model_output \
--trainer.epochs 25 \
--early_stopping.frequency=1 \
--early_stopping.n_to_go=3 \
--train.images /data/*.jpg

Training went smooth and the logs are
train.log

After the training process, I am trying to load the model as mentioned here, however I get following error:

>>> predictor = Predictor.from_checkpoint(params=PredictorParams(), checkpoint='/data/model_output/best.ckpt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/predict/predictor.py", line 31, in from_checkpoint
    keras.models.load_model(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/save.py", line 206, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

I tried loading pretrainined model from antiqua_historical, and again I got the same error:

>>> predictor = Predictor.from_checkpoint(params=PredictorParams(), checkpoint='/data/model_output/antiqua_historical/0.ckpt')
/usr/local/lib/python3.8/dist-packages/paiargparse/dataclass_json_overrides.py:78: RuntimeWarning: `NoneType` object value of non-optional type tfaip_commit_hash detected when decoding CalamariScenarioParams.
  warnings.warn(f"`NoneType` object {warning}.", RuntimeWarning)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/predict/predictor.py", line 26, in from_checkpoint
    ckpt = SavedCalamariModel(checkpoint, auto_update=auto_update_checkpoints)
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 31, in __init__
    self.update_checkpoint()
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 56, in update_checkpoint
    self._single_upgrade()
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 88, in _single_upgrade
    update_model(self.dict, self.ckpt_path)
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/migrations/version3_4to5.py", line 22, in update_model
    pred_model.load_weights(path + ".h5")
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 2234, in load_weights
    hdf5_format.load_weights_from_hdf5_group(f, self.layers)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 662, in load_weights_from_hdf5_group
    original_keras_version = f.attrs['keras_version'].decode('utf8')
AttributeError: 'str' object has no attribute 'decode'


@ChWick
Copy link
Member

ChWick commented Jul 1, 2021

This might be related to Tensorflow. Can you post your Tensorflow and h5py version? (tensorflow should be >= 2.3.0) I guess there is a mismatch.

@Mageswaran1989
Copy link
Author

tf.version
'2.4.2'

I am using docker to build the model with only calamari installed.

best.ckpt.zip

@ChWick
Copy link
Member

ChWick commented Jul 1, 2021

I just setup a fresh venv (current calamari master) with python 3.8:

virtualenv -p python3.8 venv
git clone https://github.com/Calamari-OCR/calamari.git
source venv/bin/activate
pip install -U pip
pip install -e calamari

I had not problems loading your provided model using the predict script (calamari-predict).
Using tensorflow==2.4.2 and h5py==2.10.0. Maybe your h5py version is already 3.x?

@Mageswaran1989
Copy link
Author

Yup.... but I am installing only calamari using pip with Python version - 3.8.5 (default installation version.)

h5py 3.3.0

@Mageswaran1989
Copy link
Author

Dockerfile

FROM nvidia/cuda:11.1-cudnn8-runtime-ubuntu20.04 as runtime-image

ARG DEBIAN_FRONTEND=noninteractive
RUN ln -snf /usr/share/zoneinfo/$CONTAINER_TIMEZONE /etc/localtime && echo $CONTAINER_TIMEZONE > /etc/timezone
RUN mkdir -p /usr/share/man/man1/

RUN --mount=type=cache,target=/var/cache/apt \
    apt-get update && \
    apt-get install --no-install-recommends --no-install-suggests -y \
    build-essential \
    curl \
    ca-certificates p11-kit \
    python3-dev \
    python3-distutils \
    python3-venv \
    openjdk-11-jre-headless \
    tesseract-ocr \
    libtesseract-dev \
    libpq-dev \
    python3-pip \
    libgl1-mesa-glx &&\
    apt clean && rm -rf /var/lib/apt/lists/*

RUN pip3 install calamari-ocr
COPY ops/docker/ocr/requirements.txt /
RUN --mount=type=cache,target=/root/.cache/pip3 pip3 install --no-cache-dir -r /requirements.txt
RUN ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.10
ENV export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64/:$LD_LIBRARY_PATH

@ChWick
Copy link
Member

ChWick commented Jul 1, 2021

I guess the pip version is outdated (<21.x). Older pip version do not resolve version conflicts automatically, which is why a newer h5py version gets installed. When I install pip install calamari-ocr in a venv I receive h5py=2.10.0.

Please try to insert a

pip3 install -U pip setuptools

in your Dockerfile before you install calamari-ocr.

@Mageswaran1989
Copy link
Author

Thanks a lot @ChWick

Its working with h5py=2.10.0 and the predictions are also quite good :)

However, keeping predictor alive is running into error:

>>> raw_predictor = predictor.raw().__enter()__  # you can also wrap the following lines in a `with`-block
  File "<stdin>", line 1
    raw_predictor = predictor.raw().__enter()__  # you can also wrap the following lines in a `with`-block
                                             ^
SyntaxError: invalid syntax

If you are Ok for an example, I can put out a small example to train on SROIE2019 and do predictions on whole image using craft text detection and calamari for OCR.

@ChWick
Copy link
Member

ChWick commented Jul 1, 2021

oups, this is is a typo in the docs. Should be:

raw_predictor = predictor.raw().__enter__()

@Mageswaran1989
Copy link
Author

>>> raw_predictor = predictor.raw().__enter__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Predictor' object has no attribute 'raw'

@ChWick
Copy link
Member

ChWick commented Jul 1, 2021

Ah, this particular interface is rather new and not yet included in the current pip-release, its only in the master and will be included in the next release.

@ChWick
Copy link
Member

ChWick commented Jul 1, 2021

I can draft a quick minor release if you rely on this feature, though! Just let me know

@Mageswaran1989
Copy link
Author

I hope there is not much difference between two methods, for now I am good with the working version.

It would be good to have in pip version in general :) if it is not much of a work. Thanks again.

@ChWick ChWick closed this as completed Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants