Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updata doc of infer #42

Merged
merged 6 commits into from
May 15, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_mode

# 预测image_dir指定的图像集合
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/"

# 如果想使用CPU进行预测,执行命令如下
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/" --use_gpu=False
```
更多的文本检测、识别串联推理使用方式请参考文档教程中[基于预测引擎推理](./doc/inference.md)。

Expand Down
Binary file added doc/imgs_en/img_10.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_en/img_11.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_en/img_195.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_results/det_res_2.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_results/det_res_img_10_db.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_results/det_res_img_10_east.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_results/img_10.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_words_en/.DS_Store
Binary file not shown.
Binary file added doc/imgs_words_en/word_10.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_words_en/word_116.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_words_en/word_19.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_words_en/word_201.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_words_en/word_308.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_words_en/word_336.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_words_en/word_401.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_words_en/word_461.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_words_en/word_52.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/imgs_words_en/word_545.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
138 changes: 121 additions & 17 deletions doc/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,51 +6,155 @@ inference 模型(fluid.io.save_inference_model保存的模型)
训练过程中保存的模型是checkpoints模型,保存的是模型的参数,多用于恢复训练等。
与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合与实际系统集成。更详细的介绍请参考文档[分类预测框架](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). 接下来将依次介绍文本检测、文本识别以及两者串联基于预测引擎推理。与此同时也会介绍checkpoints转换成inference model的实现。


## 文本检测模型推理

将文本检测模型训练过程中保存的模型,转换成inference model,可以使用如下命令:
下面将介绍超轻量中文检测模型推理、DB文本检测模型推理和EAST文本检测模型推理。默认配置是根据DB文本检测模型推理设置的。由于EAST和DB算法差别很大,在推理时,需要通过传入相应的参数适配EAST文本检测算法。

### 1.超轻量中文检测模型推理

超轻量中文检测模型推理,可以执行如下命令:

```
python tools/export_model.py -c configs/det/det_db_mv3.yml -o Global.checkpoints="./output/best_accuracy" \
Global.save_inference_dir="./inference/det/"
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det/"
```

推理模型保存在$./inference/det/model$, $./inference/det/params$
可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:

![](imgs_results/det_res_2.jpg)

使用保存的inference model实现在单张图像上的预测
通过设置参数det_max_side_len的大小,改变检测算法中图片规范化的最大值。当图片的长宽都小于det_max_side_len,则使用原图预测,否则将图片等比例缩放到最大值,进行预测。该参数默认设置为det_max_side_len=960. 如果输入图片的分辨率比较大,而且想使用更大的分辨率预测,可以执行如下命令

```
python tools/infer/predict_det.py --image_dir="/demo.jpg" --det_model_dir="./inference/det/"
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det/" --det_max_side_len=1200
```

### 2.DB文本检测模型推理

首先将DB文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)),可以使用如下命令进行转换:

```
# -c后面设置训练算法的yml配置文件
# Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。
# Global.save_inference_dir参数设置转换的模型将保存的地址。

python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.checkpoints="./models/det_r50_vd_db/best_accuracy" Global.save_inference_dir="./inference/det_db"
```

DB文本检测模型推理,可以执行如下命令:

```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/"
```

可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:

![](imgs_results/det_res_img_10_db.jpg)

**注意**:由于ICDAR2015数据集只有1000张训练图像,主要针对英文场景,所以上述模型对中文文本图像检测效果非常差。

### 3.EAST文本检测模型推理

首先将EAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)),可以使用如下命令进行转换:

```
# -c后面设置训练算法的yml配置文件
# Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。
# Global.save_inference_dir参数设置转换的模型将保存的地址。

python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.checkpoints="./models/det_r50_vd_east/best_accuracy" Global.save_inference_dir="./inference/det_east"
```

EAST文本检测模型推理,需要设置参数det_algorithm,指定检测算法类型为EAST,可以执行如下命令:

```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST"
```
可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:

![](imgs_results/det_res_img_10_east.jpg)

**注意**:本代码库中EAST后处理中NMS采用的Python版本,所以预测速度比较耗时。如果采用C++版本,会有明显加速。


## 文本识别模型推理

将文本识别模型训练过程中保存的模型,转换成inference model,可以使用如下命令:
下面将介绍超轻量中文检测模型推理和基于CTC损失的识别模型推理。**而基于Attention损失的识别模型推理还在调试中**。对于中文文本识别,建议优先选择基于CTC损失的识别模型,实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。


### 1.超轻量中文识别模型推理

超轻量中文识别模型推理,可以执行如下命令:

```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/word_4.jpg" --rec_model_dir="./inference/rec/"
```

![](imgs_words/word_4.jpg)

执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下:

Predicts of ./doc/imgs_words/word_4.jpg:['实力活力', 0.9504319]


### 2.基于CTC损失的识别模型推理

我们以STAR-Net为例,介绍基于CTC损失的识别模型推理。 CRNN和Rosetta使用方式类似,不用设置识别算法参数rec_algorithm。

首先将STAR-Net文本识别训练过程中保存的模型,转换成inference model。以基于Resnet34_vd骨干网络,使用MJSynth和SynthText两个英文文本识别合成数据集训练
的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)),可以使用如下命令进行转换:

```
# -c后面设置训练算法的yml配置文件
# Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。
# Global.save_inference_dir参数设置转换的模型将保存的地址。

python3 tools/export_model.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.checkpoints="./models/rec_r34_vd_tps_bilstm_ctc/best_accuracy" Global.save_inference_dir="./inference/starnet"
```

STAR-Net文本识别模型推理,可以执行如下命令:

```
python tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints="./output/best_accuracy" \
Global.save_inference_dir="./inference/rec/"
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_401.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
```
![](imgs_words_en/word_401.png)

执行命令后,上面图像的识别结果如下:

Predicts of ./doc/imgs_words_en/word_401.png:['burgen', 0.9008867]

**注意**:由于上述模型是参考[DTRB](https://arxiv.org/abs/1904.01906)文本识别训练和评估流程,与超轻量级中文识别模型训练有两方面不同:

推理模型保存在$./inference/rec/model$, $./inference/rec/params$
- 训练时采用的图像分辨率不同,训练上述模型采用的图像分辨率是[3,32,100],而中文模型训练时,为了保证长文本的识别效果,训练时采用的图像分辨率是[3, 32, 320]。预测推理程序默认的的形状参数是训练中文采用的图像分辨率,即[3, 32, 320]。因此,这里推理上述英文模型时,需要通过参数rec_image_shape设置识别图像的形状。

使用保存的inference model实现在单张图像上的预测:
- 字符列表,DTRB论文中实验只是针对26个小写英文本母和10个数字进行实验,总共36个字符。所有大小字符都转成了小写字符,不在上面列表的字符都忽略,认为是空格。因此这里没有输入字符字典,而是通过如下命令生成字典.因此在推理时需要设置参数rec_char_type,指定为英文"en"。

```
python tools/infer/predict_rec.py --image_dir="/demo.jpg" --rec_model_dir="./inference/rec/"
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
```

## 文本检测、识别串联推理

实现文本检测、识别串联推理,预测$image_dir$指定的单张图像:
### 1.超轻量中文OCR模型推理

在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。

```
python tools/infer/predict_eval.py --image_dir="/Demo.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/"
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/"
```

实现文本检测、识别串联推理,预测$image_dir$指指定文件夹下的所有图像:
执行命令后,识别结果图像如下:

![](imgs_results/2.jpg)

### 2.其他模型推理

如果想尝试使用其他检测算法或者识别算法,请参考上述文本检测模型推理和文本识别模型推理,更新相应配置和模型,下面给出基于EAST文本检测和STAR-Net文本识别执行命令:

```
python tools/infer/predict_eval.py --image_dir="/test_imgs/" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/"
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/rec/" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
```

执行命令后,识别结果图像如下:

![](imgs_results/img_10.jpg)
55 changes: 11 additions & 44 deletions tools/infer/predict_det.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,17 @@
import utility
from ppocr.utils.utility import initial_logger
logger = initial_logger()
from ppocr.utils.utility import get_image_file_list
import cv2
from ppocr.data.det.east_process import EASTProcessTest
from ppocr.data.det.db_process import DBProcessTest
from ppocr.postprocess.db_postprocess import DBPostProcess
from ppocr.postprocess.east_postprocess import EASTPostPocess
from ppocr.utils.utility import get_image_file_list
from tools.infer.utility import draw_ocr
import copy
import numpy as np
import math
import time
import sys
import os


class TextDetector(object):
Expand Down Expand Up @@ -79,50 +77,22 @@ def order_points_clockwise(self, pts):
rect = np.array([tl, tr, br, bl], dtype="float32")
return rect

def expand_det_res(self, points, bbox_height, bbox_width, img_height,
img_width):
if bbox_height * 1.0 / bbox_width >= 2.0:
expand_w = bbox_width * 0.20
expand_h = bbox_width * 0.20
elif bbox_width * 1.0 / bbox_height >= 3.0:
expand_w = bbox_height * 0.20
expand_h = bbox_height * 0.20
else:
expand_w = bbox_height * 0.1
expand_h = bbox_height * 0.1

points[0, 0] = int(max((points[0, 0] - expand_w), 0))
points[1, 0] = int(min((points[1, 0] + expand_w), img_width))
points[3, 0] = int(max((points[3, 0] - expand_w), 0))
points[2, 0] = int(min((points[2, 0] + expand_w), img_width))

points[0, 1] = int(max((points[0, 1] - expand_h), 0))
points[1, 1] = int(max((points[1, 1] - expand_h), 0))
points[3, 1] = int(min((points[3, 1] + expand_h), img_height))
points[2, 1] = int(min((points[2, 1] + expand_h), img_height))
def clip_det_res(self, points, img_height, img_width):
for pno in range(4):
points[pno, 0] = int(min(max(points[pno, 0], 0), img_width - 1))
points[pno, 1] = int(min(max(points[pno, 1], 0), img_height - 1))
return points

def filter_tag_det_res(self, dt_boxes, image_shape):
img_height, img_width = image_shape[0:2]
dt_boxes_new = []
for box in dt_boxes:
box = self.order_points_clockwise(box)
left = int(np.min(box[:, 0]))
right = int(np.max(box[:, 0]))
top = int(np.min(box[:, 1]))
bottom = int(np.max(box[:, 1]))
bbox_height = bottom - top
bbox_width = right - left
diffh = math.fabs(box[0, 1] - box[1, 1])
diffw = math.fabs(box[0, 0] - box[3, 0])
box = self.clip_det_res(box, img_height, img_width)
rect_width = int(np.linalg.norm(box[0] - box[1]))
rect_height = int(np.linalg.norm(box[0] - box[3]))
if rect_width <= 10 or rect_height <= 10:
continue
# if diffh <= 10 and diffw <= 10:
# box = self.expand_det_res(
# copy.deepcopy(box), bbox_height, bbox_width, img_height,
# img_width)
dt_boxes_new.append(box)
dt_boxes = np.array(dt_boxes_new)
return dt_boxes
Expand Down Expand Up @@ -153,8 +123,6 @@ def __call__(self, img):
return dt_boxes, elapse


from tools.infer.utility import draw_text_det_res

if __name__ == "__main__":
args = utility.parse_args()
image_file_list = get_image_file_list(args.image_dir)
Expand All @@ -171,9 +139,8 @@ def __call__(self, img):
total_time += elapse
count += 1
print("Predict time of %s:" % image_file, elapse)
img_draw = draw_text_det_res(dt_boxes, image_file, return_img=True)
save_path = os.path.join("./inference_det/",
os.path.basename(image_file))
print("The visualized image saved in {}".format(save_path))

print("Avg Time:", total_time / (count - 1))
src_im = utility.draw_text_det_res(dt_boxes, image_file)
img_name_pure = image_file.split("/")[-1]
cv2.imwrite("./inference_results/det_res_%s" % img_name_pure, src_im)
if count > 1:
print("Avg Time:", total_time / (count - 1))
9 changes: 6 additions & 3 deletions tools/infer/predict_rec.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
import utility
from ppocr.utils.utility import initial_logger
logger = initial_logger()
from ppocr.utils.utility import get_image_file_list
import cv2

import copy
import numpy as np
import math
Expand All @@ -30,6 +30,7 @@ def __init__(self, args):
utility.create_predictor(args, mode="rec")
image_shape = [int(v) for v in args.rec_image_shape.split(",")]
self.rec_image_shape = image_shape
self.character_type = args.rec_char_type
char_ops_params = {}
char_ops_params["character_type"] = args.rec_char_type
char_ops_params["character_dict_path"] = args.rec_char_dict_path
Expand All @@ -38,7 +39,8 @@ def __init__(self, args):

def resize_norm_img(self, img, max_wh_ratio):
imgC, imgH, imgW = self.rec_image_shape
imgW = int(32 * max_wh_ratio)
if self.character_type == "ch":
imgW = int(32 * max_wh_ratio)
h = img.shape[0]
w = img.shape[1]
ratio = w / float(h)
Expand Down Expand Up @@ -102,7 +104,7 @@ def __call__(self, img_list):

if __name__ == "__main__":
args = utility.parse_args()
image_file_list = utility.get_image_file_list(args.image_dir)
image_file_list = get_image_file_list(args.image_dir)
text_recognizer = TextRecognizer(args)
valid_image_file_list = []
img_list = []
Expand All @@ -114,6 +116,7 @@ def __call__(self, img_list):
valid_image_file_list.append(image_file)
img_list.append(img)
rec_res, predict_time = text_recognizer(img_list)
rec_res, predict_time = text_recognizer(img_list)
for ino in range(len(img_list)):
print("Predicts of %s:%s" % (valid_image_file_list[ino], rec_res[ino]))
print("Total predict time for %d images:%.3f" %
Expand Down
4 changes: 2 additions & 2 deletions tools/program.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,8 @@ def build_export(config, main_prog, startup_prog):
func_infor = config['Architecture']['function']
model = create_module(func_infor)(params=config)
image, outputs = model(mode='export')
fetches_var = sorted([outputs[name] for name in outputs])
fetches_var_name = [name for name in fetches_var]
fetches_var_name = sorted([name for name in outputs])
fetches_var = [outputs[name] for name in fetches_var_name]
feeded_var_names = [image.name]
target_vars = fetches_var
return feeded_var_names, target_vars, fetches_var_name
Expand Down