Skip to content

昇腾910b上,使用paddleocr读取表格,耗时达到6s左右 #1453

@danyXu

Description

@danyXu

在昇腾910b上,使用paddleocr读取表格,耗时达到6s左右, n卡只需0.8s

物理环境: cann80RC1-ubuntu20-paddleocr2.7.3-paddlepaddle(3.0.0.dev20240527)

环境变量:

ENV FLAGS_npu_jit_compile=False ENV FLAGS_npu_scale_aclnn=True ENV CUSTOM_DEVICE_BLACK_LIST="set_value,set_value_with_tensor"

复现方式:

命令行方式:

paddleocr --image_dir tts --use_npu true --type=structure
tts 目录中2个图片,都包含表格

image

代码:

`table_engine = NewPPStructure(show_log=True, lang='ch', recovery=False, det_limit_side_len=1920, use_npu=is_use_npu(),
det_model_dir='ch_PP-OCRv4_det_infer',
rec_model_dir='ch_PP-OCRv4_rec_infer',
layout_model_dir='picodet_lcnet_x1_0_fgd_layout_cdla_infer',
layout_dict_path='/opt/anaconda3/envs/paddle_env/lib/python3.8/site-packages/paddleocr/ppocr/utils/dict/layout_dict/layout_cdla_dict.txt',
# table_char_dict_path='table_structure_dict_ch.txt',
table_model_dir='ch_ppstructure_mobile_v2.0_SLANet_infer',
layout_score_threshold=0.25,
layout_nms_threshold=0.5)

提取table信息
table_extract_num = 0
for b in final_layouts:
if str(b.type).lower() == "table":
x1,y1,x2,y2 = b.block.x_1, b.block.y_1, b.block.x_2, b.block.y_2
res, table_time_dict = self.table_system(
ori_im[y1:y2, x1:x2].copy(), return_ocr_result_in_table)
b.text = res['html']
ori_im[y1:y2, x1:x2] = np.ones((y2-y1, x2-x1, 3), dtype=np.uint8)*255
table_extract_num += 1
else:
b.text = []

    get_logger().info("extract table nums:{}".format(table_extract_num))

`
image

profiling
执行了一次profiling,日志数据可以提供,profiling数据较大,可以留言发送
paddleocr_6.log

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions