# DEEPX Tutorial 10 - PaddleOCR

This tutorial intruduces basic concept of PaddleOCR and how to use **PP-OCRv5** with DEEPX NPU.

**What You Will Learn**:
 - Understand what PaddleOCR is and basic AI pipeline for OCR
 - Overall workflow to use PaddleOCR on DEEPX NPU
 - How to improve OCR accuracy

>This tutorial is based on dx-all-suite v2.1.0, released in December 2025.

## 1. What is OCR?

**Optical Character Recognition** (OCR) is the technology that converts different types of documents (scanned paper documents, PDF files, or images captured by a digital camera) into editable and searchable data.

Think of it as giving "eyes" to your AI. It generally works in a two-step pipeline:
1. Text Detection: Locating where the text is in an image (drawing a box around it).
2. Text Recognition: Deciphering what the characters inside that box are.

<img src="https://miro.medium.com/v2/resize:fit:1400/1*2hxwOTzkZQh6EDJDPj4_xg.png" style="max-width: 1000px;">


## 2. PaddleOCR

PaddleOCR is an ultra-lightweight, open-source OCR system developed by Baidu based on the PaddlePaddle framework.
- PP-OCRv5 (The Latest): The v5 release is optimized for even higher accuracy and speed compared to previous versions.
- Key Advantages:
  * Lightweight: It offers "server" models (high accuracy) and "mobile" models (tiny size, perfect for NPUs).
  * Multilingual: Supports over 80 languages.
  * Rich Toolset: Includes high-quality tools for data annotation and model training.

<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/README/Arch.jpg" style="max-width: 1000px;">

## 3. AI workflow to use PaddleOCRv5 for DEEPX NPU

<img src="assets/ocr-workflow.jpg" style="max-width: 1000px;">

To apply PaddleOCR to DX NPU, following 4 steps are required:

1. Download PaddleOCR ONNX model

2. Fix the dynamic input shape

3. Compile ONNX to *.dxnn for DX NPU

4. Implement OCR application with DEEPX-SDK

## 4. Step #1 - Download PaddleOCR ONNX models

PaddleOCRv5 uses three AI models internally as shown below:
1. **DET**: PP-OCRv5_det - text detection
2. **CLS**: Text Line Orientation Classification
3. **REC**: PP-OCRv5_rec - text recognition

You can download following three ONNX models.

<img src="assets/algorithm_ppocrv5.png" style="max-width: 1000px;">

In [None]:
# Move to models path
import os
root_path = os.environ.get('ROOT_PATH')
%cd $root_path/notebooks/T10-PaddleOCR/models

In [None]:
# Reference: https://github.com/jingsongliujing/OnnxOCR

# Download 'det' model
!wget --no-check-certificate "https://github.com/jingsongliujing/OnnxOCR/raw/refs/heads/main/onnxocr/models/ppocrv5/det/det.onnx"

# Download 'cls' model
!wget --no-check-certificate "https://github.com/jingsongliujing/OnnxOCR/raw/refs/heads/main/onnxocr/models/ppocrv5/cls/cls.onnx"

# Download 'rec' model
!wget --no-check-certificate "https://github.com/jingsongliujing/OnnxOCR/raw/refs/heads/main/onnxocr/models/ppocrv5/rec/rec.onnx"

## 5. Step #2 - Fix the Dynamic Input Shape

In [None]:
!pip install -q onnxruntime

### 5.1. You can check the input shape of each downloaded ONNX model at [netron.app](https://netron.app/) or using python code below:

In [None]:
import onnxruntime as ort

def print_input_shape(model_path):
    session = ort.InferenceSession(model_path)
    print(f"---- Input Shapes of {model_path} ----")
    for input_meta in session.get_inputs():
        print(f"Name: {input_meta.name}, Shape: {input_meta.shape}, Type: {input_meta.type}")
        print(" " * 30)

print_input_shape("det.onnx")
print_input_shape("cls.onnx")
print_input_shape("rec.onnx")

As you see, there are dynamic input shapes for `[Batch, Channel, Height, Width]` and those dynamic input shape should be static for DEEPX NPU.

`ONNX Simplier` is highly recommeded way to fix dynamic shape.

In [None]:
# Install ONNX Simplier
!pip install -q onnxsim

In [None]:
# '--overwrite-input-shape' option to make dynamic input shape to static one
!onnxsim -h

### 5.2. Fix input shape of TEXT Detection Model
Recommeded width/heidght range is 320 to 1280 depending on your use case. In this tutorial, 480x480 will be used.
> Note: it must be **Multiple of 32**

In [None]:
!onnxsim det.onnx det_fixed.onnx --overwrite-input-shape "x:1,3,480,480"

### 5.3. Fix input shape of Classification Model
Recommeded width/heidght range is 48 to 192 depending on your use case. These values are the standard of PaddleOCR CLS.

In [None]:
!onnxsim cls.onnx cls_fixed.onnx --overwrite-input-shape "x:1,3,48,192"

### 5.4. Fix input shape of TEXT Recognition Model

Since the NPU requires fixed input shapes, we use six separate models with different aspect ratios to improve recognition accuracy.

For each case, we select and apply the model that best matches the ratio of the detected text.

<img src="assets/ocr-ratio.png" style="max-width: 1000px;">

In [None]:
!onnxsim rec.onnx rec_fixed_ratio_3.onnx --overwrite-input-shape "x:1,3,48,120"
!onnxsim rec.onnx rec_fixed_ratio_5.onnx --overwrite-input-shape "x:1,3,48,240"
!onnxsim rec.onnx rec_fixed_ratio_10.onnx --overwrite-input-shape "x:1,3,48,480"
!onnxsim rec.onnx rec_fixed_ratio_15.onnx --overwrite-input-shape "x:1,3,48,720"
!onnxsim rec.onnx rec_fixed_ratio_25.onnx --overwrite-input-shape "x:1,3,48,1200"
!onnxsim rec.onnx rec_fixed_ratio_35.onnx --overwrite-input-shape "x:1,3,48,1920"

### 5.5. Verify all fixed input ratio

In [None]:
import onnxruntime as ort

def print_input_shape(model_path):
    session = ort.InferenceSession(model_path)
    print(f"---- Input Shapes of {model_path} ----")
    for input_meta in session.get_inputs():
        print(f"Name: {input_meta.name}, Shape: {input_meta.shape}, Type: {input_meta.type}")
        print(" " * 30)

print_input_shape("det_fixed.onnx")
print_input_shape("cls_fixed.onnx")
print_input_shape("rec_fixed_ratio_3.onnx")
print_input_shape("rec_fixed_ratio_5.onnx")
print_input_shape("rec_fixed_ratio_10.onnx")
print_input_shape("rec_fixed_ratio_15.onnx")
print_input_shape("rec_fixed_ratio_25.onnx")
print_input_shape("rec_fixed_ratio_35.onnx")

## 6. Compile to *.dxnn

### 6.1. Compile TEXT Detection Model

In [None]:
%%writefile det.json
{
  "inputs": { "x": [1, 3, 480, 480] },
  "calibration_num": 100,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "det_dataset",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 480, "height": 480 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.485, 0.456, 0.406], "std": [0.229, 0.224, 0.225] } },
      { "transpose": { "axis": [2, 0, 1] } },
      { "expandDim": { "axis": 0 } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

In [None]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m det_fixed.onnx -c det.json -o ./

### 6.2. Compile Orientation Classification Model

In [None]:
%%writefile cls.json
{
  "inputs": { "x": [1, 3, 48, 192] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_5",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 192, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

In [None]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m cls_fixed.onnx -c cls.json -o ./

### 6.3. Compile TEXT Recognition Models (Ratio x2.5 / W:48 / H:120)

In [None]:
%%writefile rec_ratio_3.json
{
  "inputs": { "x": [1, 3, 48, 120] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_5",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 120, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

In [None]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_3.onnx -c rec_ratio_3.json -o ./

### 6.4. Compile TEXT Recognition Models (Ratio x5 / W:48 / H:240)

In [None]:
%%writefile rec_ratio_5.json
{
  "inputs": { "x": [1, 3, 48, 240] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_5",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 240, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

In [None]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_5.onnx -c rec_ratio_5.json -o ./

### 6.5. Compile TEXT Recognition Models (Ratio x10 / W:48 / H:480)

In [None]:
%%writefile rec_ratio_10.json
{
  "inputs": { "x": [1, 3, 48, 480] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_15",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 480, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

In [None]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_10.onnx -c rec_ratio_10.json -o ./

### 6.6. Compile TEXT Recognition Models (Ratio x15 / W:48 / H:720)

In [None]:
%%writefile rec_ratio_15.json
{
  "inputs": { "x": [1, 3, 48, 720] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_15",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 720, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

In [None]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_15.onnx -c rec_ratio_15.json -o ./

### 6.7. Compile TEXT Recognition Models (Ratio x25 / W:48 / H:1200)

In [None]:
%%writefile rec_ratio_25.json
{
  "inputs": { "x": [1, 3, 48, 1200] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_25",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 1200, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

In [None]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_25.onnx -c rec_ratio_25.json -o ./

### 6.8. Compile TEXT Recognition Models (Ratio x35 / W:48 / H:1920)

In [None]:
%%writefile rec_ratio_35.json
{
  "inputs": { "x": [1, 3, 48, 1920] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_25",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 1920, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

In [None]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_35.onnx -c rec_ratio_35.json -o ./

Verify if all required *.dxnn files are compiled:

* **Expected results**:
   ```
   cls_fixed.dxnn		 rec_fixed_ratio_15.dxnn  rec_fixed_ratio_3.dxnn
   det_fixed.dxnn		 rec_fixed_ratio_25.dxnn  rec_fixed_ratio_5.dxnn
   rec_fixed_ratio_10.dxnn  rec_fixed_ratio_35.dxnn
   ```

In [None]:
!ls *.dxnn

## 7. Implement OCR Application

To handle text images with different shapes (dynamic ratios), we couldn't use a single fixed input size. So, we split the ratio range into segments and used a different input shape.

Following the AI pipeline illustrated, this AI pipeline is one of OCR concepts based on NPU.

<img src="assets/ocr-npu-pipeline.jpg" style="max-width: 1000px;">

In [None]:
# Move to T10-PaddleOCR path
import os
root_path = os.environ.get('ROOT_PATH')
%cd $root_path/notebooks/T10-PaddleOCR

In [None]:
!pip install -r requirements.txt

In [None]:
!cd $root_path/dx-all-suite/dx-runtime/dx_rt/python_package && pip install .

In [None]:
# You can stop the app by 'q' keyboard input
!python3 main.py

<img src="assets/paddleocr-result.png" style="max-width: 1000px;">