-
Notifications
You must be signed in to change notification settings - Fork 212
Description
在昇腾910b上,使用paddleocr读取表格,耗时达到6s左右, n卡只需0.8s
物理环境: cann80RC1-ubuntu20-paddleocr2.7.3-paddlepaddle(3.0.0.dev20240527)
环境变量:
ENV FLAGS_npu_jit_compile=False ENV FLAGS_npu_scale_aclnn=True ENV CUSTOM_DEVICE_BLACK_LIST="set_value,set_value_with_tensor"
复现方式:
命令行方式:
paddleocr --image_dir tts --use_npu true --type=structure
tts 目录中2个图片,都包含表格
代码:
`table_engine = NewPPStructure(show_log=True, lang='ch', recovery=False, det_limit_side_len=1920, use_npu=is_use_npu(),
det_model_dir='ch_PP-OCRv4_det_infer',
rec_model_dir='ch_PP-OCRv4_rec_infer',
layout_model_dir='picodet_lcnet_x1_0_fgd_layout_cdla_infer',
layout_dict_path='/opt/anaconda3/envs/paddle_env/lib/python3.8/site-packages/paddleocr/ppocr/utils/dict/layout_dict/layout_cdla_dict.txt',
# table_char_dict_path='table_structure_dict_ch.txt',
table_model_dir='ch_ppstructure_mobile_v2.0_SLANet_infer',
layout_score_threshold=0.25,
layout_nms_threshold=0.5)
提取table信息
table_extract_num = 0
for b in final_layouts:
if str(b.type).lower() == "table":
x1,y1,x2,y2 = b.block.x_1, b.block.y_1, b.block.x_2, b.block.y_2
res, table_time_dict = self.table_system(
ori_im[y1:y2, x1:x2].copy(), return_ocr_result_in_table)
b.text = res['html']
ori_im[y1:y2, x1:x2] = np.ones((y2-y1, x2-x1, 3), dtype=np.uint8)*255
table_extract_num += 1
else:
b.text = []
get_logger().info("extract table nums:{}".format(table_extract_num))
profiling
执行了一次profiling,日志数据可以提供,profiling数据较大,可以留言发送
paddleocr_6.log

