# DEEPX Tutorial 10 - PaddleOCR

This tutorial intruduces basic concept of PaddleOCR and how to use **PP-OCRv5** with DEEPX NPU.

**What You Will Learn**:
 - Understand what PaddleOCR is and basic AI pipeline for OCR
 - Overall workflow to use PaddleOCR on DEEPX NPU
 - How to improve OCR accuracy

>This tutorial is based on dx-all-suite v2.1.0, released in December 2025.

## 1. What is OCR?

**Optical Character Recognition** (OCR) is the technology that converts different types of documents (scanned paper documents, PDF files, or images captured by a digital camera) into editable and searchable data.

Think of it as giving "eyes" to your AI. It generally works in a two-step pipeline:
1. Text Detection: Locating where the text is in an image (drawing a box around it).
2. Text Recognition: Deciphering what the characters inside that box are.

<img src="https://miro.medium.com/v2/resize:fit:1400/1*2hxwOTzkZQh6EDJDPj4_xg.png" style="max-width: 1000px;">


## 2. PaddleOCR

PaddleOCR is an ultra-lightweight, open-source OCR system developed by Baidu based on the PaddlePaddle framework.
- PP-OCRv5 (The Latest): The v5 release is optimized for even higher accuracy and speed compared to previous versions.
- Key Advantages:
  * Lightweight: It offers "server" models (high accuracy) and "mobile" models (tiny size, perfect for NPUs).
  * Multilingual: Supports over 80 languages.
  * Rich Toolset: Includes high-quality tools for data annotation and model training.

<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/README/Arch.jpg" style="max-width: 1000px;">

## 3. AI workflow to use PaddleOCRv5 for DEEPX NPU

<img src="assets/ocr-workflow.jpg" style="max-width: 1000px;">

To apply PaddleOCR to DX NPU, following 4 steps are required:

1. Download PaddleOCR ONNX model

2. Fix the dynamic input shape

3. Compile ONNX to *.dxnn for DX NPU

4. Implement OCR application with DEEPX-SDK

## 4. Step #1 - Download PaddleOCR ONNX models

PaddleOCRv5 uses three AI models internally as shown below:
1. **DET**: PP-OCRv5_det - text detection
2. **CLS**: Text Line Orientation Classification
3. **REC**: PP-OCRv5_rec - text recognition

You can download following three ONNX models.

<img src="assets/algorithm_ppocrv5.png" style="max-width: 1000px;">

In [1]:
# Move to models path
import os
root_path = os.environ.get('ROOT_PATH')
%cd $root_path/notebooks/T10-PaddleOCR/models

/home/max/DEEPX-git/dx-tutorials/notebooks/T10-PaddleOCR/models


In [2]:
# Reference: https://github.com/jingsongliujing/OnnxOCR

# Download 'det' model
!wget --no-check-certificate "https://github.com/jingsongliujing/OnnxOCR/raw/refs/heads/main/onnxocr/models/ppocrv5/det/det.onnx"

# Download 'cls' model
!wget --no-check-certificate "https://github.com/jingsongliujing/OnnxOCR/raw/refs/heads/main/onnxocr/models/ppocrv5/cls/cls.onnx"

# Download 'rec' model
!wget --no-check-certificate "https://github.com/jingsongliujing/OnnxOCR/raw/refs/heads/main/onnxocr/models/ppocrv5/rec/rec.onnx"

--2025-11-18 22:54:46--  https://github.com/jingsongliujing/OnnxOCR/raw/refs/heads/main/onnxocr/models/ppocrv5/det/det.onnx
Resolving github.com (github.com)... 20.200.245.247
Connecting to github.com (github.com)|20.200.245.247|:443... connected.
  Self-signed certificate encountered.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/jingsongliujing/OnnxOCR/refs/heads/main/onnxocr/models/ppocrv5/det/det.onnx [following]
--2025-11-18 22:54:47--  https://raw.githubusercontent.com/jingsongliujing/OnnxOCR/refs/heads/main/onnxocr/models/ppocrv5/det/det.onnx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4748769 (4.5M) [application/octet-stream]
Saving to: ‘det.onnx’


2025-11-18 22:54:47 (88.5 MB/s) - ‘det.onnx’ saved

## 5. Step #2 - Fix the Dynamic Input Shape

In [3]:
!pip install -q onnxruntime

### 5.1. You can check the input shape of each downloaded ONNX model at [netron.app](https://netron.app/) or using python code below:

In [4]:
import onnxruntime as ort

def print_input_shape(model_path):
    session = ort.InferenceSession(model_path)
    print(f"---- Input Shapes of {model_path} ----")
    for input_meta in session.get_inputs():
        print(f"Name: {input_meta.name}, Shape: {input_meta.shape}, Type: {input_meta.type}")
        print(" " * 30)

print_input_shape("det.onnx")
print_input_shape("cls.onnx")
print_input_shape("rec.onnx")

---- Input Shapes of det.onnx ----
Name: x, Shape: ['DynamicDimension.0', 3, 'DynamicDimension.1', 'DynamicDimension.2'], Type: tensor(float)
                              
---- Input Shapes of cls.onnx ----
Name: x, Shape: ['p2o.DynamicDimension.0', 3, 'p2o.DynamicDimension.1', 'p2o.DynamicDimension.2'], Type: tensor(float)
                              
---- Input Shapes of rec.onnx ----
Name: x, Shape: ['DynamicDimension.0', 3, 48, 'DynamicDimension.1'], Type: tensor(float)
                              


As you see, there are dynamic input shapes for `[Batch, Channel, Height, Width]` and those dynamic input shape should be static for DEEPX NPU.

`ONNX Simplier` is highly recommeded way to fix dynamic shape.

In [5]:
# Install ONNX Simplier
!pip install -q onnxsim

In [6]:
# '--overwrite-input-shape' option to make dynamic input shape to static one
!onnxsim -h

usage: onnxsim [-h] [--enable-fuse-bn] [--skip-fuse-bn]
               [--skip-optimization [SKIP_OPTIMIZATION ...]]
               [--skip-constant-folding]
               [--input-shape INPUT_SHAPE [INPUT_SHAPE ...]]
               [--overwrite-input-shape OVERWRITE_INPUT_SHAPE [OVERWRITE_INPUT_SHAPE ...]]
               [--test-input-shape TEST_INPUT_SHAPE [TEST_INPUT_SHAPE ...]]
               [--skip-optimizer SKIP_OPTIMIZER [SKIP_OPTIMIZER ...]]
               [--skip-shape-inference] [--enable-onnxruntime-optimization]
               [--dynamic-input-shape]
               [--input-data-path INPUT_DATA_PATH [INPUT_DATA_PATH ...]]
               [--custom-lib CUSTOM_LIB] [--include-subgraph]
               [--unused-output UNUSED_OUTPUT [UNUSED_OUTPUT ...]]
               [--no-large-tensor [TENSOR_SIZE_THRESHOLD]]
               [--mutable-initializer] [--save-as-external-data] [-v]
               input_model output_model [check_n]

positional arguments:
  input_model           I

### 5.2. Fix input shape of TEXT Detection Model
Recommeded width/heidght range is 320 to 1280 depending on your use case. In this tutorial, 480x480 will be used.
> Note: it must be **Multiple of 32**

In [7]:
!onnxsim det.onnx det_fixed.onnx --overwrite-input-shape "x:1,3,480,480"

Simplifying[33m...[0m
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1m                  [0m[1m [0m┃[1m [0m[1mOriginal Model[0m[1m [0m┃[1m [0m[1mSimplified Model[0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add                │ 117            │ [1;38;5;46m65              [0m │
│ BatchNormalization │ 3              │ [1;38;5;46m1               [0m │
│ Concat             │ 1              │ 1                │
│ Constant           │ 300            │ [1;38;5;46m232             [0m │
│ Conv               │ 62             │ 62               │
│ ConvTranspose      │ 2              │ 2                │
│ GlobalAveragePool  │ 10             │ 10               │
│ HardSigmoid        │ 10             │ 10               │
│ HardSwish          │ 24             │ 24               │
│ Mul                │ 62             │ 62               │
│ Relu               │ 12             │ 12               

### 5.3. Fix input shape of Classification Model
Recommeded width/heidght range is 48 to 192 depending on your use case. These values are the standard of PaddleOCR CLS.

In [8]:
!onnxsim cls.onnx cls_fixed.onnx --overwrite-input-shape "x:1,3,48,192"

Simplifying[33m...[0m
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1m                  [0m[1m [0m┃[1m [0m[1mOriginal Model[0m[1m [0m┃[1m [0m[1mSimplified Model[0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add                │ 26             │ [1;38;5;46m25              [0m │
│ BatchNormalization │ 35             │ [1;38;5;46m0               [0m │
│ Cast               │ 3              │ [1;38;5;46m0               [0m │
│ Clip               │ 18             │ 18               │
│ Concat             │ 1              │ [1;38;5;46m0               [0m │
│ Constant           │ 290            │ [1;38;5;46m112             [0m │
│ Conv               │ 53             │ 53               │
│ Div                │ 18             │ 18               │
│ Gemm               │ 0              │ 1                │
│ GlobalAveragePool  │ 10             │ 10               │
│ HardSigmoid        │ 9 

### 5.4. Fix input shape of TEXT Recognition Model

Since the NPU requires fixed input shapes, we use six separate models with different aspect ratios to improve recognition accuracy.

For each case, we select and apply the model that best matches the ratio of the detected text.

<img src="assets/ocr-ratio.png" style="max-width: 1000px;">

In [9]:
!onnxsim rec.onnx rec_fixed_ratio_3.onnx --overwrite-input-shape "x:1,3,48,120"
!onnxsim rec.onnx rec_fixed_ratio_5.onnx --overwrite-input-shape "x:1,3,48,240"
!onnxsim rec.onnx rec_fixed_ratio_10.onnx --overwrite-input-shape "x:1,3,48,480"
!onnxsim rec.onnx rec_fixed_ratio_15.onnx --overwrite-input-shape "x:1,3,48,720"
!onnxsim rec.onnx rec_fixed_ratio_25.onnx --overwrite-input-shape "x:1,3,48,1200"
!onnxsim rec.onnx rec_fixed_ratio_35.onnx --overwrite-input-shape "x:1,3,48,1920"

Simplifying[33m...[0m
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1m                  [0m[1m [0m┃[1m [0m[1mOriginal Model[0m[1m [0m┃[1m [0m[1mSimplified Model[0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add                │ 112            │ [1;38;5;46m79              [0m │
│ AveragePool        │ 1              │ 1                │
│ BatchNormalization │ 6              │ [1;38;5;46m0               [0m │
│ Concat             │ 3              │ [1;38;5;46m1               [0m │
│ Constant           │ 342            │ [1;38;5;46m228             [0m │
│ Conv               │ 38             │ 38               │
│ Div                │ 5              │ 5                │
│ GlobalAveragePool  │ 2              │ 2                │
│ HardSigmoid        │ 2              │ 2                │
│ HardSwish          │ 28             │ 28               │
│ MatMul             │ 13             │ 1

### 5.5. Verify all fixed input ratio

In [10]:
import onnxruntime as ort

def print_input_shape(model_path):
    session = ort.InferenceSession(model_path)
    print(f"---- Input Shapes of {model_path} ----")
    for input_meta in session.get_inputs():
        print(f"Name: {input_meta.name}, Shape: {input_meta.shape}, Type: {input_meta.type}")
        print(" " * 30)

print_input_shape("det_fixed.onnx")
print_input_shape("cls_fixed.onnx")
print_input_shape("rec_fixed_ratio_3.onnx")
print_input_shape("rec_fixed_ratio_5.onnx")
print_input_shape("rec_fixed_ratio_10.onnx")
print_input_shape("rec_fixed_ratio_15.onnx")
print_input_shape("rec_fixed_ratio_25.onnx")
print_input_shape("rec_fixed_ratio_35.onnx")

---- Input Shapes of det_fixed.onnx ----
Name: x, Shape: [1, 3, 480, 480], Type: tensor(float)
                              
---- Input Shapes of cls_fixed.onnx ----
Name: x, Shape: [1, 3, 48, 192], Type: tensor(float)
                              
---- Input Shapes of rec_fixed_ratio_3.onnx ----
Name: x, Shape: [1, 3, 48, 120], Type: tensor(float)
                              
---- Input Shapes of rec_fixed_ratio_5.onnx ----
Name: x, Shape: [1, 3, 48, 240], Type: tensor(float)
                              
---- Input Shapes of rec_fixed_ratio_10.onnx ----
Name: x, Shape: [1, 3, 48, 480], Type: tensor(float)
                              
---- Input Shapes of rec_fixed_ratio_15.onnx ----
Name: x, Shape: [1, 3, 48, 720], Type: tensor(float)
                              
---- Input Shapes of rec_fixed_ratio_25.onnx ----
Name: x, Shape: [1, 3, 48, 1200], Type: tensor(float)
                              
---- Input Shapes of rec_fixed_ratio_35.onnx ----
Name: x, Shape: [1, 3, 48, 192

## 6. Compile to *.dxnn

### 6.1. Compile TEXT Detection Model

In [11]:
%%writefile det.json
{
  "inputs": { "x": [1, 3, 480, 480] },
  "calibration_num": 100,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "det_dataset",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 480, "height": 480 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.485, 0.456, 0.406], "std": [0.229, 0.224, 0.225] } },
      { "transpose": { "axis": [2, 0, 1] } },
      { "expandDim": { "axis": 0 } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

Writing det.json


In [12]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m det_fixed.onnx -c det.json -o ./

[INFO] - Using optimization level 1. Compilation may take longer.
[INFO] - For faster compilation, consider using --opt_level 0 (may increase NPU latency).
[INFO] - Starting file validation...
[INFO] - File validation completed successfully
Compiling Model :   0%|                           | 0.0/1.0 [00:00<?, ?model/s ]
                                                                                [A
[A[INFO] - Starting ONNX import for: det_fixed.onnx                            
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:00,  8.28model/s ]
Loading ONNX model:   0%|                               | 0/4 [00:00<?, ?step/s][A
Loading ONNX model and Validating:   0%|                | 0/4 [00:00<?, ?step/s][A
                                                                                [A
[A[INFO] - Converting model opset to target 21...                              
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:01,  1.21s/model ]
                  

### 6.2. Compile Orientation Classification Model

In [13]:
%%writefile cls.json
{
  "inputs": { "x": [1, 3, 48, 192] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_5",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 192, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

Writing cls.json


In [14]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m cls_fixed.onnx -c cls.json -o ./

[INFO] - Using optimization level 1. Compilation may take longer.
[INFO] - For faster compilation, consider using --opt_level 0 (may increase NPU latency).
[INFO] - Starting file validation...
[INFO] - File validation completed successfully
Compiling Model :   0%|                           | 0.0/1.0 [00:00<?, ?model/s ]
                                                                                [A
[A[INFO] - Starting ONNX import for: cls_fixed.onnx                            
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:00,  9.03model/s ]
Loading ONNX model:   0%|                               | 0/4 [00:00<?, ?step/s][A
Loading ONNX model and Validating:   0%|                | 0/4 [00:00<?, ?step/s][A
                                                                                [A
[A[INFO] - Converting model opset to target 21...                              
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:00,  1.02s/model ]
                  

### 6.3. Compile TEXT Recognition Models (Ratio x2.5 / W:48 / H:120)

In [15]:
%%writefile rec_ratio_3.json
{
  "inputs": { "x": [1, 3, 48, 120] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_5",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 120, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

Writing rec_ratio_3.json


In [16]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_3.onnx -c rec_ratio_3.json -o ./

[INFO] - Using optimization level 1. Compilation may take longer.
[INFO] - For faster compilation, consider using --opt_level 0 (may increase NPU latency).
[INFO] - Starting file validation...
[INFO] - File validation completed successfully
Compiling Model :   0%|                           | 0.0/1.0 [00:00<?, ?model/s ]
                                                                                [A
[A[INFO] - Starting ONNX import for: rec_fixed_ratio_3.onnx                    
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:00,  8.22model/s ]
Loading ONNX model:   0%|                               | 0/4 [00:00<?, ?step/s][A
Loading ONNX model and Validating:   0%|                | 0/4 [00:00<?, ?step/s][A
                                                                                [A
[A[INFO] - Converting model opset to target 21...                              
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:01,  1.31s/model ]
                  

### 6.4. Compile TEXT Recognition Models (Ratio x5 / W:48 / H:240)

In [17]:
%%writefile rec_ratio_5.json
{
  "inputs": { "x": [1, 3, 48, 240] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_5",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 240, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

Writing rec_ratio_5.json


In [18]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_5.onnx -c rec_ratio_5.json -o ./

[INFO] - Using optimization level 1. Compilation may take longer.
[INFO] - For faster compilation, consider using --opt_level 0 (may increase NPU latency).
[INFO] - Starting file validation...
[INFO] - File validation completed successfully
Compiling Model :   0%|                           | 0.0/1.0 [00:00<?, ?model/s ]
                                                                                [A
[A[INFO] - Starting ONNX import for: rec_fixed_ratio_5.onnx                    
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:00,  8.22model/s ]
Loading ONNX model:   0%|                               | 0/4 [00:00<?, ?step/s][A
Loading ONNX model and Validating:   0%|                | 0/4 [00:00<?, ?step/s][A
                                                                                [A
[A[INFO] - Converting model opset to target 21...                              
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:01,  1.29s/model ]
                  

### 6.5. Compile TEXT Recognition Models (Ratio x10 / W:48 / H:480)

In [19]:
%%writefile rec_ratio_10.json
{
  "inputs": { "x": [1, 3, 48, 480] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_15",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 480, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

Writing rec_ratio_10.json


In [20]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_10.onnx -c rec_ratio_10.json -o ./

[INFO] - Using optimization level 1. Compilation may take longer.
[INFO] - For faster compilation, consider using --opt_level 0 (may increase NPU latency).
[INFO] - Starting file validation...
[INFO] - File validation completed successfully
Compiling Model :   0%|                           | 0.0/1.0 [00:00<?, ?model/s ]
                                                                                [A
[A[INFO] - Starting ONNX import for: rec_fixed_ratio_10.onnx                   
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:00,  8.37model/s ]
Loading ONNX model:   0%|                               | 0/4 [00:00<?, ?step/s][A
Loading ONNX model and Validating:   0%|                | 0/4 [00:00<?, ?step/s][A
                                                                                [A
[A[INFO] - Converting model opset to target 21...                              
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:01,  1.30s/model ]
                  

### 6.6. Compile TEXT Recognition Models (Ratio x15 / W:48 / H:720)

In [21]:
%%writefile rec_ratio_15.json
{
  "inputs": { "x": [1, 3, 48, 720] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_15",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 720, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

Writing rec_ratio_15.json


In [22]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_15.onnx -c rec_ratio_15.json -o ./

[INFO] - Using optimization level 1. Compilation may take longer.
[INFO] - For faster compilation, consider using --opt_level 0 (may increase NPU latency).
[INFO] - Starting file validation...
[INFO] - File validation completed successfully
Compiling Model :   0%|                           | 0.0/1.0 [00:00<?, ?model/s ]
                                                                                [A
[A[INFO] - Starting ONNX import for: rec_fixed_ratio_15.onnx                   
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:00,  8.31model/s ]
Loading ONNX model:   0%|                               | 0/4 [00:00<?, ?step/s][A
Loading ONNX model and Validating:   0%|                | 0/4 [00:00<?, ?step/s][A
                                                                                [A
[A[INFO] - Converting model opset to target 21...                              
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:01,  1.32s/model ]
                  

### 6.7. Compile TEXT Recognition Models (Ratio x25 / W:48 / H:1200)

In [23]:
%%writefile rec_ratio_25.json
{
  "inputs": { "x": [1, 3, 48, 1200] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_25",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 1200, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

Writing rec_ratio_25.json


In [24]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_25.onnx -c rec_ratio_25.json -o ./

[INFO] - Using optimization level 1. Compilation may take longer.
[INFO] - For faster compilation, consider using --opt_level 0 (may increase NPU latency).
[INFO] - Starting file validation...
[INFO] - File validation completed successfully
Compiling Model :   0%|                           | 0.0/1.0 [00:00<?, ?model/s ]
                                                                                [A
[A[INFO] - Starting ONNX import for: rec_fixed_ratio_25.onnx                   
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:00,  8.09model/s ]
Loading ONNX model:   0%|                               | 0/4 [00:00<?, ?step/s][A
Loading ONNX model and Validating:   0%|                | 0/4 [00:00<?, ?step/s][A
                                                                                [A
[A[INFO] - Converting model opset to target 21...                              
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:01,  1.29s/model ]
                  

### 6.8. Compile TEXT Recognition Models (Ratio x35 / W:48 / H:1920)

In [25]:
%%writefile rec_ratio_35.json
{
  "inputs": { "x": [1, 3, 48, 1920] },
  "calibration_num": 80,
  "calibration_method": "ema",
  "default_loader": {
    "dataset_path": "rec_dataset/ratio_25",
    "file_extensions": ["jpeg", "jpg", "png", "JPEG"],
    "preprocessings": [
      { "resize": { "width": 1920, "height": 48 } },
      { "convertColor": { "form": "BGR2RGB" } },
      { "div": { "x": 255 } },
      { "normalize": { "mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5] } },
      { "transpose": { "axis": [2, 0, 1] } }
    ]
  },
  "enhanced_scheme": { "DXQ-P0": { "alpha": 0.5 } }
}

Writing rec_ratio_35.json


In [26]:
!$root_path/dx-all-suite/dx-compiler/dx_com/dx_com/dx_com -m rec_fixed_ratio_35.onnx -c rec_ratio_35.json -o ./

[INFO] - Using optimization level 1. Compilation may take longer.
[INFO] - For faster compilation, consider using --opt_level 0 (may increase NPU latency).
[INFO] - Starting file validation...
[INFO] - File validation completed successfully
Compiling Model :   0%|                           | 0.0/1.0 [00:00<?, ?model/s ]
                                                                                [A
[A[INFO] - Starting ONNX import for: rec_fixed_ratio_35.onnx                   
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:00,  8.10model/s ]
Loading ONNX model:   0%|                               | 0/4 [00:00<?, ?step/s][A
Loading ONNX model and Validating:   0%|                | 0/4 [00:00<?, ?step/s][A
                                                                                [A
[A[INFO] - Converting model opset to target 21...                              
Compiling Model :   6%|█▏                 | 0.1/1.0 [00:00<00:01,  1.31s/model ]
                  

Verify if all required *.dxnn files are compiled:

* **Expected results**:
   ```
   cls_fixed.dxnn		 rec_fixed_ratio_15.dxnn  rec_fixed_ratio_3.dxnn
   det_fixed.dxnn		 rec_fixed_ratio_25.dxnn  rec_fixed_ratio_5.dxnn
   rec_fixed_ratio_10.dxnn  rec_fixed_ratio_35.dxnn
   ```

In [27]:
!ls *.dxnn

cls_fixed.dxnn		 rec_fixed_ratio_15.dxnn  rec_fixed_ratio_3.dxnn
det_fixed.dxnn		 rec_fixed_ratio_25.dxnn  rec_fixed_ratio_5.dxnn
rec_fixed_ratio_10.dxnn  rec_fixed_ratio_35.dxnn


## 7. Implement OCR Application

To handle text images with different shapes (dynamic ratios), we couldn't use a single fixed input size. So, we split the ratio range into segments and used a different input shape.

Following the AI pipeline illustrated, this AI pipeline is one of OCR concepts based on NPU.

<img src="assets/ocr-npu-pipeline.jpg" style="max-width: 1000px;">

In [34]:
# Move to T10-PaddleOCR path
import os
root_path = os.environ.get('ROOT_PATH')
%cd $root_path/notebooks/T10-PaddleOCR

/home/max/DEEPX-git/dx-tutorials/notebooks/T10-PaddleOCR


In [29]:
!pip install -r requirements.txt



In [30]:
!cd $root_path/dx-all-suite/dx-runtime/dx_rt/python_package && pip install .

Processing /home/max/DEEPX-git/dx-tutorials/dx-all-suite/dx-runtime/dx_rt/python_package
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: dx-engine
  Building wheel for dx-engine (pyproject.toml) ... [?25ldone
[?25h  Created wheel for dx-engine: filename=dx_engine-1.1.2-cp312-cp312-linux_x86_64.whl size=191323 sha256=7fabdfe85fe1a8c7606e9160a86d2e082d4b021fcec17619770a828de68f46fe
  Stored in directory: /tmp/pip-ephem-wheel-cache-3vq8isk6/wheels/65/c7/da/46f14cfb17eea19325e5f5bc6ea2ae74b56b1420b6429e850d
Successfully built dx-engine
Installing collected packages: dx-engine
  Attempting uninstall: dx-engine
    Found existing installation: dx-engine 1.1.2
    Uninstalling dx-engine-1.1.2:
      Successfully uninstalled dx-engine-1.1.2
Successfully installed dx-engine-1.1.2


In [35]:
# You can stop the app by 'q' keyboard input
!python3 main.py

^C


<img src="assets/paddleocr-result.png" style="max-width: 1000px;">