## Introduction
Optical Character Recognition (OCR) is critical in Computer Vision and Artificial Intelligence. Its applications span a variety of sectors, including document digitization, automated data entry, and license plate recognition. Unlike typical image processing tasks, OCR identifies, extracts, and digitizes written or printed characters from images or documents. This technology bridges the gap between the physical text world and the digital realm, allowing computers to understand and utilize written text within images.

This guide will show you how to run a powerful OCR model Paddle-OCR and deploy it using SnapML. You can read more about PaddleOCR in their official repo [here](https://github.com/PaddlePaddle/PaddleOCR).

## Dependencies

### Python Inference
To test the model in our Python framework and convert it into a SnapML-compatible ONNX graph, we've prepared [this](https://github.com/opencv-ai/paddle-ocr) repository. You need to clone it and install dependencies.

In [None]:
!git clone https://github.com/opencv-ai/paddle-ocr
!cd paddle-ocr && pip install -r requirements.txt
!pip install git+https://github.com/daquexian/onnx-simplifier.git@v0.3.6 numpy==1.21.6 paddle2onnx gdown paddlepaddle==2.4.2 Pillow==9.5.0

## Training (optional)

We've prepared a separate notebook with instructions on launching the PaddleOCR training scripts. We didn't train the model on our side and reproduce the authors' instructions there; therefore, we can't guarantee that the results will match the authors' pre-trained model we used. The training notebook can be found [here](https://drive.google.com/file/d/1K_QAvqBF-lzHtDZrHLYwjbbPj0UR0dVn/view?usp=drive_link).

## Export

We need to convert the model weights into ONNX format to run our Python pipeline and use it with SnapML. We provide converted models in our Python repository, and you may skip these steps if you want to.

We used the pre-trained checkpoints. You need to download it or train your own models. To prepare your own weights for convertation, follow [the instructions](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_en/inference_en.md#1-convert-training-model-to-inference-model) to save inference state checkpoint.

Launch the cell below to download pre-trained weights.

In [None]:
%cd paddle-ocr
!mkdir paddle_weights

!wget -P ./paddle_weights/ https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar
!cd paddle_weights && tar -xf en_PP-OCRv3_det_infer.tar && rm -rf en_PP-OCRv3_det_infer.tar
!wget -P ./paddle_weights/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar
!cd paddle_weights && tar -xf en_number_mobile_v2.0_rec_infer.tar && rm -rf en_number_mobile_v2.0_rec_infer.tar

Now you need to convert the weights into ONNX using Paddle2ONNX.

In [None]:
!paddle2onnx --model_dir ./paddle_weights/en_PP-OCRv3_det_infer \
             --model_filename inference.pdmodel \
             --params_filename inference.pdiparams\
             --save_file ./weights/source/en_PP-OCRv3_det_infer_fixed_shape.onnx \
             --enable_dev_version False \
             --input_shape_dict "{'x':[1,3,640,640]}" \
             --opset_version 11 \
             --enable_onnx_checker True

!paddle2onnx --model_dir ./paddle_weights/en_number_mobile_v2.0_rec_infer \
             --model_filename inference.pdmodel \
             --params_filename inference.pdiparams\
             --save_file ./weights/source/en_number_mobile_v2.0_rec_infer_fixed_shape.onnx \
             --enable_dev_version False \
             --input_shape_dict "{'x':[1,3,32,832]}" \
             --opset_version 11 \
             --enable_onnx_checker True

Let's simplify our models using onnxsim.

In [None]:
!python -m onnxsim ./weights/source/en_PP-OCRv3_det_infer_fixed_shape.onnx ./weights/source/en_PP-OCRv3_det_infer_fixed_shape_optimized.onnx
!python -m onnxsim ./weights/source/en_number_mobile_v2.0_rec_infer_fixed_shape.onnx ./weights/source/en_number_mobile_v2.0_rec_infer_fixed_shape_optimized.onnx

As the final step, we need to correct some operations to make the model compatible with SnapML

In [None]:
!python ./weights/fix_detector.py --input_path ./weights/source/en_PP-OCRv3_det_infer_fixed_shape_optimized.onnx --output_path ./weights/changed/detector_v3.onnx

In [None]:
!python ./weights/fix_recognizer.py --input_path ./weights/source/en_number_mobile_v2.0_rec_infer_fixed_shape_optimized.onnx --output_path ./weights/changed/recognition_v1.onnx

## Testing

Let's run our ONNX pipeline on some images. We support a text detection pipeline (use `--run_detection` argument) or an entire pipeline with recognition (`--run_pipeline`). Note that these arguments conflict with each other.

In [None]:
!python inference.py --det_model_dir ./weights/changed/detector_v3.onnx \
                     --rec_model_dir ./weights/changed/recognition_v1.onnx \
                     -i ./images/test2.png \
                     --run_detection

In [None]:
from PIL import Image
Image.open("./images/results/test2.png")

In [None]:
!python inference.py --det_model_dir ./weights/changed/detector_v3.onnx \
                     --rec_model_dir ./weights/changed/recognition_v1.onnx \
                     -i ./images/test1.png \
                     --run_pipeline \
                     --rec_char_dict_path ./rec_char_dict/en_dict.txt

In [None]:
Image.open("./images/results/test1.png")

### Expectations

- Works well on full frontal views with very little obstruction in text, where text is horizontally placed. License plates are a good example where it works well, front facing billboards, book covers with horizontal text are good too. 
- Doesn’t work well on use cases where the text is slanted (for example road signs that may be angled, or a book cover that is placed on a table on an angle etc).
- Text that is very exaggerated or highly stylized is also one where it doesn’t work very well. If there are ornaments near the text, it tends to capture those as well and try to recognize them as special characters. 
- Sometimes, the model is not able to recognize spaces, leading to confusing outputs.
- Big chunks of small text are also a hit or miss.
