CV套件建设专项活动 - 文字识别返回单字识别坐标 #10515

ToddBear · 2023-07-31T10:03:38Z

No description provided.

paddle-bot · 2023-07-31T10:03:43Z

Thanks for your contribution!

CLAassistant · 2023-07-31T10:03:44Z

All committers have signed the CLA.

shiyutang

很棒，可以补充下运行指令和输出结果。我明天再仔细看看～

shiyutang · 2023-07-31T12:17:06Z

ppocr/postprocess/rec_postprocess.py

@@ -64,10 +64,55 @@ def pred_reverse(self, pred):

        return ''.join(pred_re[::-1])

-    def add_special_char(self, dict_character):
+    def add_special_char(self, text, dict_character):


这个多传入的参数似乎没有使用？

噢，这个应该是写错了，没有用到这个函数

ToddBear · 2023-07-31T13:38:40Z

针对英文文档恢复：

先下载推理模型：

cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it
https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar
# Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar
# Download the ultra-lightweight English table inch model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar
tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar
# Download the layout model of publaynet dataset and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar
tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar
cd ..

然后在/ppstructure/目录下使用下面的指令推理：

python predict_system.py \
    --image_dir=./docs/table/1.png \
    --det_model_dir=inference/en_PP-OCRv3_det_infer \
    --rec_model_dir=inference/en_PP-OCRv3_rec_infer \
    --rec_char_dict_path=../ppocr/utils/en_dict.txt \
    --table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
    --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \
    --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \
    --vis_font_path=../doc/fonts/simfang.ttf \
    --recovery=True \
    --output=../output/ \
    --return_word_box=True

在../output/structure/1/show_0.jpg下查看推理结果的可视化，如下图所示：

针对中文文档恢复

先下载推理模型

cd PaddleOCR/ppstructure

# download model
cd inference
# Download the detection model of the ultra-lightweight Chinesse PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
# Download the ultra-lightweight Chinese table inch model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar
tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
# Download the layout model of CDLA dataset and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar
tar xf picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar
cd ..

上传下面的测试图片 "2.png" 至目录 ./docs/table/ 中

然后在/ppstructure/目录下使用下面的指令推理

python predict_system.py \
    --image_dir=./docs/table/2.png \
    --det_model_dir=inference/ch_PP-OCRv3_det_infer \
    --rec_model_dir=inference/ch_PP-OCRv3_rec_infer \
    --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
    --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_cdla_infer \
    --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt \
    --vis_font_path=../doc/fonts/chinese_cht.ttf \
    --recovery=True \
    --output=../output/ \
    --return_word_box=True

在../output/structure/2/show_0.jpg下查看推理结果的可视化，如下图所示：

shiyutang

留了一下comment，可以看看要不要这样修改，提升代码简洁程度
结果验证没有问题。

shiyutang · 2023-08-02T03:57:05Z

tools/infer/predict_rec.py

+            if self.postprocess_params['name'] == 'CTCLabelDecode':
+                rec_result = self.postprocess_op(preds, return_word_box=self.return_word_box)
+                ino_list = list(range(beg_img_no, end_img_no))
+                for rec_idx, rec in enumerate(rec_result):
+                    ino = ino_list[rec_idx]
+                    h, w = img_list[indices[ino]].shape[0:2]
+                    wh_ratio = w * 1.0 / h
+                    rec[2][0] = rec[2][0]*(wh_ratio/max_wh_ratio)
+            else:
+                rec_result = self.postprocess_op(preds)


How about add this in the call func of CTCLabelDecode postprocess?

这块好像不太好放进去，因为计算涉及到当前的图像的宽高比，宽高比的信息只有在predict_rec.py这个层级有

shiyutang · 2023-08-02T04:06:17Z

ppocr/postprocess/rec_postprocess.py

        if isinstance(preds, tuple) or isinstance(preds, list):
            preds = preds[-1]
        if isinstance(preds, paddle.Tensor):
            preds = preds.numpy()
        preds_idx = preds.argmax(axis=2)
        preds_prob = preds.max(axis=2)
-        text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True)
+        text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True, return_word_box=return_word_box)
        if label is None:
            return text
        label = self.decode(label)


感觉在预测里面的代码可以放在这里，因为所有的CTCLabelDecode都是在rec后处理中。

加这里的话可能要把图像的宽高比也作为参数传进来哈哈

这个能不能利用kwargs传入呢？

shiyutang · 2023-08-02T06:54:38Z

ppstructure/predict_system.py

+                                rec_word_info = rec_res[2]
+                                col_num, word_list, word_col_list, state_list = rec_word_info
+                                box = box.tolist()
+                                bbox_x_start = box[0][0]
+                                bbox_x_end = box[1][0]
+                                bbox_y_start = box[0][1]
+                                bbox_y_end = box[2][1]
+
+                                cell_width = (bbox_x_end - bbox_x_start)/col_num
+
+                                word_box_list = []
+                                word_box_content_list = []
+                                cn_width_list = []
+                                cn_col_list = []
+                                for word, word_col, state in zip(word_list, word_col_list, state_list):
+                                    if state == 'cn':
+                                        if len(word_col) != 1:
+                                            char_seq_length = (word_col[-1] - word_col[0] + 1) * cell_width
+                                            char_width = char_seq_length/(len(word_col)-1)
+                                            cn_width_list.append(char_width)
+                                        cn_col_list += word_col
+                                        word_box_content_list += word
+                                    else:
+                                        cell_x_start = bbox_x_start + int(word_col[0] * cell_width)
+                                        cell_x_end = bbox_x_start + int((word_col[-1]+1) * cell_width)
+                                        cell = ((cell_x_start, bbox_y_start), (cell_x_end, bbox_y_start), (cell_x_end, bbox_y_end), (cell_x_start, bbox_y_end))
+                                        word_box_list.append(cell)
+                                        word_box_content_list.append("".join(word))
+                                if len(cn_col_list) != 0:
+                                    if len(cn_width_list) != 0:
+                                        avg_char_width = np.mean(cn_width_list)
+                                    else:
+                                        avg_char_width = (bbox_x_end - bbox_x_start)/len(rec_str)
+                                    for center_idx in cn_col_list:
+                                        center_x = (center_idx+0.5)*cell_width
+                                        cell_x_start = max(int(center_x - avg_char_width/2), 0) + bbox_x_start
+                                        cell_x_end = min(int(center_x + avg_char_width/2), bbox_x_end-bbox_x_start) + bbox_x_start
+                                        cell = ((cell_x_start, bbox_y_start), (cell_x_end, bbox_y_start), (cell_x_end, bbox_y_end), (cell_x_start, bbox_y_end))
+                                        word_box_list.append(cell)
+


建议抽象这部分成函数到ppstructure 到utility，提升代码可读性。对了可以给代码增加一些注释，例如说明这一部分是将识别结果转化为基于字符的位置和内容信息

shiyutang · 2023-08-02T06:55:49Z

ppocr/postprocess/rec_postprocess.py

        if isinstance(preds, tuple) or isinstance(preds, list):
            preds = preds[-1]
        if isinstance(preds, paddle.Tensor):
            preds = preds.numpy()
        preds_idx = preds.argmax(axis=2)
        preds_prob = preds.max(axis=2)
-        text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True)
+        text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True, return_word_box=return_word_box)
        if label is None:
            return text
        label = self.decode(label)


这个能不能利用kwargs传入呢？

shiyutang

LGTM

* modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py

* Update recognition_en.md (#10059) ic15_dict.txt only have 36 digits * Update ocr_rec.h (#9469) It is enough to include preprocess_op.h, we do not need to include ocr_cls.h. * 补充num_classes注释说明 (#10073) ser_vi_layoutxlm_xfund_zh.yml中的Architecture.Backbone.num_classes所赋值会设置给Loss.num_classes，由于采用BIO标注，假设字典中包含n个字段（包含other）时，则类别数为2n-1;假设字典中包含n个字段（不含other）时，则类别数为2n+1。 * Update algorithm_overview_en.md (#9747) Fix links to super-resolution algorithm docs * 改进文档`deploy/hubserving/readme.md`和`doc/doc_ch/models_list.md` (#9110) * Update readme.md * Update readme.md * Update readme.md * Update models_list.md * trim trailling spaces @ `deploy/hubserving/readme_en.md` * `s/shell/bash/` @ `deploy/hubserving/readme_en.md` * Update `deploy/hubserving/readme_en.md` to sync with `deploy/hubserving/readme.md` * Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md` * Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md` * Update `doc/doc_en/models_list_en.md` to sync with `doc/doc_ch/models_list_en.md` * using Grammarly to weak `deploy/hubserving/readme_en.md` * using Grammarly to tweak `doc/doc_en/models_list_en.md` * `ocr_system` module will return with values of field `confidence` * Update README_CN.md * 修复测试服务中图片转Base64的引用地址错误。 (#8334) * Update application.md * [Doc] Fix 404 link. (#10318) * Update PP-OCRv3_det_train.md * Update knowledge_distillation.md * Update config.md * Fix fitz camelCase deprecation and .PDF not being recognized as pdf file (#10181) * Fix fitz camelCase deprecation and .PDF not being recognized as pdf file * refactor get_image_file_list function * Update customize.md (#10325) * Update FAQ.md (#10345) * Update FAQ.md (#10349) * Don't break overall processing on a bad image (#10216) * Add preprocessing common to OCR tasks (#10217) Add preprocessing to options * [MLU] add mlu device for infer (#10249) * Create newfeature.md * Update newfeature.md * remove unused imported module, so can avoid PyInstaller packaged binary's start-time not found module error. (#10502) * CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515) * modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py * Update README_ch.md * revert README_ch.md update * Fixed Layout recovery README file (#10493) Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one> * update_doc * bugfix --------- Co-authored-by: ChuongLoc <89434232+ChuongLoc@users.noreply.github.com> Co-authored-by: Wang Xin <xinwang614@gmail.com> Co-authored-by: tanjh <dtdhinjapan@gmail.com> Co-authored-by: Louis Maddox <lmmx@users.noreply.github.com> Co-authored-by: n0099 <n@n0099.net> Co-authored-by: zhenliang li <37922155+shouyong@users.noreply.github.com> Co-authored-by: itasli <ilyas.tasli@outlook.fr> Co-authored-by: UserUnknownFactor <63057995+UserUnknownFactor@users.noreply.github.com> Co-authored-by: PeiyuLau <135964669+PeiyuLau@users.noreply.github.com> Co-authored-by: kerneltravel <kjpioo2006@gmail.com> Co-authored-by: ToddBear <43341135+ToddBear@users.noreply.github.com> Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com> Co-authored-by: Shubham Chambhare <59397280+Shubham654@users.noreply.github.com> Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one> Co-authored-by: andyj <87074272+andyjpaddle@users.noreply.github.com>

zelohon · 2023-10-26T17:29:37Z

cpp的支持吗？

eternal-mind · 2023-11-08T09:32:16Z

这个单字坐标支持Parseq这种用了transformer decoder的识别模型的输出吗？

JerryZhang0111 · 2024-02-19T07:51:52Z

请问可以支持C++吗？与python不同的是，C++的输出没有位置info信息，没法给出每个字符在detection box里面的位置

JerryZhang0111 · 2024-02-29T06:02:24Z

请问可以支持C++吗？与python不同的是，C++的输出没有位置info信息，没法给出每个字符在detection box里面的位置

简单对照python补充了一下代码，勉强可以，不是很准确

modification of return word box

f3c43b7

ToddBear changed the title ~~modification of return word box~~ CV套件建设专项活动 - 文字识别返回单字识别坐标 Jul 31, 2023

shiyutang requested changes Jul 31, 2023

View reviewed changes

shiyutang mentioned this pull request Aug 1, 2023

🏅️飞桨套件快乐开源常规赛 #10223

Closed

shiyutang reviewed Aug 2, 2023

View reviewed changes

shiyutang assigned shiyutang and ToddBear Aug 2, 2023

update_implements

92696e3

ToddBear force-pushed the return_word_box branch from bd477ac to 92696e3 Compare August 2, 2023 09:22

ToddBear added 2 commits August 2, 2023 18:57

Update rec_postprocess.py

8815277

Update utility.py

7a46024

shiyutang approved these changes Aug 2, 2023

View reviewed changes

shiyutang merged commit 1e11f25 into PaddlePaddle:release/2.6 Aug 2, 2023
1 check passed

ToddBear added a commit to ToddBear/PaddleOCR that referenced this pull request Aug 2, 2023

CV套件建设专项活动 - 文字识别返回单字识别坐标 (PaddlePaddle#10515)

4555fd6

* modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py

shiyutang pushed a commit that referenced this pull request Aug 10, 2023

CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515) (#10537)

b3f9f68

* modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py

ToddBear mentioned this pull request Sep 4, 2023

文字识别后返回单字识别坐标目前paddleocr支持吗 #10815

Closed

bltcn mentioned this pull request Sep 19, 2023

文字识别返回单字识别坐标，该功能在2.6版本中有，但是在2.7版本中没有了 #10939

Open

shiyutang pushed a commit that referenced this pull request Oct 16, 2023

CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515)

983b3e8

* modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py

shiyutang mentioned this pull request Oct 19, 2023

新增需求征集（Collect Feature Request） #10334

Open

aijun198600 mentioned this pull request Oct 27, 2023

[功能] 能否支持文字识别后返回单字识别坐标 RapidAI/RapidOCR#135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CV套件建设专项活动 - 文字识别返回单字识别坐标 #10515

CV套件建设专项活动 - 文字识别返回单字识别坐标 #10515

ToddBear commented Jul 31, 2023

paddle-bot bot commented Jul 31, 2023

CLAassistant commented Jul 31, 2023 •

edited

shiyutang left a comment

shiyutang Jul 31, 2023

ToddBear Jul 31, 2023

ToddBear commented Jul 31, 2023 •

edited by shiyutang

shiyutang left a comment •

edited

shiyutang Aug 2, 2023

ToddBear Aug 2, 2023

shiyutang Aug 2, 2023

ToddBear Aug 2, 2023

shiyutang Aug 2, 2023

shiyutang Aug 2, 2023

shiyutang Aug 2, 2023

shiyutang left a comment

zelohon commented Oct 26, 2023

eternal-mind commented Nov 8, 2023

JerryZhang0111 commented Feb 19, 2024

JerryZhang0111 commented Feb 29, 2024

CV套件建设专项活动 - 文字识别返回单字识别坐标 #10515

CV套件建设专项活动 - 文字识别返回单字识别坐标 #10515

Conversation

ToddBear commented Jul 31, 2023

paddle-bot bot commented Jul 31, 2023

CLAassistant commented Jul 31, 2023 • edited

shiyutang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ToddBear commented Jul 31, 2023 • edited by shiyutang

针对英文文档恢复：

先下载推理模型：

然后在/ppstructure/目录下使用下面的指令推理：

在../output/structure/1/show_0.jpg下查看推理结果的可视化，如下图所示：

针对中文文档恢复

先下载推理模型

上传下面的测试图片 "2.png" 至目录 ./docs/table/ 中

然后在/ppstructure/目录下使用下面的指令推理

在../output/structure/2/show_0.jpg下查看推理结果的可视化，如下图所示：

shiyutang left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shiyutang left a comment

Choose a reason for hiding this comment

zelohon commented Oct 26, 2023

eternal-mind commented Nov 8, 2023

JerryZhang0111 commented Feb 19, 2024

JerryZhang0111 commented Feb 29, 2024

CLAassistant commented Jul 31, 2023 •

edited

ToddBear commented Jul 31, 2023 •

edited by shiyutang

shiyutang left a comment •

edited