Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CV套件建设专项活动 - 文字识别返回单字识别坐标 #10515

Merged
merged 4 commits into from
Aug 2, 2023

Conversation

ToddBear
Copy link
Collaborator

No description provided.

@paddle-bot
Copy link

paddle-bot bot commented Jul 31, 2023

Thanks for your contribution!

@CLAassistant
Copy link

CLAassistant commented Jul 31, 2023

CLA assistant check
All committers have signed the CLA.

@ToddBear ToddBear changed the title modification of return word box CV套件建设专项活动 - 文字识别返回单字识别坐标 Jul 31, 2023
Copy link
Collaborator

@shiyutang shiyutang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

很棒,可以补充下运行指令和输出结果。我明天再仔细看看~

@@ -64,10 +64,55 @@ def pred_reverse(self, pred):

return ''.join(pred_re[::-1])

def add_special_char(self, dict_character):
def add_special_char(self, text, dict_character):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个多传入的参数似乎没有使用?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

噢,这个应该是写错了,没有用到这个函数

@ToddBear
Copy link
Collaborator Author

ToddBear commented Jul 31, 2023

针对英文文档恢复:

先下载推理模型:

cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it
https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar
# Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar
# Download the ultra-lightweight English table inch model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar
tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar
# Download the layout model of publaynet dataset and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar
tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar
cd ..

然后在/ppstructure/目录下使用下面的指令推理:

python predict_system.py \
    --image_dir=./docs/table/1.png \
    --det_model_dir=inference/en_PP-OCRv3_det_infer \
    --rec_model_dir=inference/en_PP-OCRv3_rec_infer \
    --rec_char_dict_path=../ppocr/utils/en_dict.txt \
    --table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
    --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \
    --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \
    --vis_font_path=../doc/fonts/simfang.ttf \
    --recovery=True \
    --output=../output/ \
    --return_word_box=True

在../output/structure/1/show_0.jpg下查看推理结果的可视化,如下图所示:

show_0_mdf_v2

针对中文文档恢复

先下载推理模型

cd PaddleOCR/ppstructure

# download model
cd inference
# Download the detection model of the ultra-lightweight Chinesse PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
# Download the ultra-lightweight Chinese table inch model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar
tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
# Download the layout model of CDLA dataset and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar
tar xf picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar
cd ..

上传下面的测试图片 "2.png" 至目录 ./docs/table/ 中

2

然后在/ppstructure/目录下使用下面的指令推理

python predict_system.py \
    --image_dir=./docs/table/2.png \
    --det_model_dir=inference/ch_PP-OCRv3_det_infer \
    --rec_model_dir=inference/ch_PP-OCRv3_rec_infer \
    --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
    --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_cdla_infer \
    --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt \
    --vis_font_path=../doc/fonts/chinese_cht.ttf \
    --recovery=True \
    --output=../output/ \
    --return_word_box=True

在../output/structure/2/show_0.jpg下查看推理结果的可视化,如下图所示:

show_1_mdf_v2

Copy link
Collaborator

@shiyutang shiyutang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 留了一下comment,可以看看要不要这样修改,提升代码简洁程度
  2. 结果验证没有问题。

Comment on lines 621 to 625
if self.postprocess_params['name'] == 'CTCLabelDecode':
rec_result = self.postprocess_op(preds, return_word_box=self.return_word_box)
ino_list = list(range(beg_img_no, end_img_no))
for rec_idx, rec in enumerate(rec_result):
ino = ino_list[rec_idx]
h, w = img_list[indices[ino]].shape[0:2]
wh_ratio = w * 1.0 / h
rec[2][0] = rec[2][0]*(wh_ratio/max_wh_ratio)
else:
rec_result = self.postprocess_op(preds)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add this in the call func of CTCLabelDecode postprocess?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块好像不太好放进去,因为计算涉及到当前的图像的宽高比,宽高比的信息只有在predict_rec.py这个层级有

if isinstance(preds, tuple) or isinstance(preds, list):
preds = preds[-1]
if isinstance(preds, paddle.Tensor):
preds = preds.numpy()
preds_idx = preds.argmax(axis=2)
preds_prob = preds.max(axis=2)
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True)
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True, return_word_box=return_word_box)
if label is None:
return text
label = self.decode(label)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉在预测里面的代码可以放在这里,因为所有的CTCLabelDecode都是在rec后处理中。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加这里的话可能要把图像的宽高比也作为参数传进来哈哈

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个能不能利用kwargs传入呢?

Comment on lines 168 to 207
rec_word_info = rec_res[2]
col_num, word_list, word_col_list, state_list = rec_word_info
box = box.tolist()
bbox_x_start = box[0][0]
bbox_x_end = box[1][0]
bbox_y_start = box[0][1]
bbox_y_end = box[2][1]

cell_width = (bbox_x_end - bbox_x_start)/col_num

word_box_list = []
word_box_content_list = []
cn_width_list = []
cn_col_list = []
for word, word_col, state in zip(word_list, word_col_list, state_list):
if state == 'cn':
if len(word_col) != 1:
char_seq_length = (word_col[-1] - word_col[0] + 1) * cell_width
char_width = char_seq_length/(len(word_col)-1)
cn_width_list.append(char_width)
cn_col_list += word_col
word_box_content_list += word
else:
cell_x_start = bbox_x_start + int(word_col[0] * cell_width)
cell_x_end = bbox_x_start + int((word_col[-1]+1) * cell_width)
cell = ((cell_x_start, bbox_y_start), (cell_x_end, bbox_y_start), (cell_x_end, bbox_y_end), (cell_x_start, bbox_y_end))
word_box_list.append(cell)
word_box_content_list.append("".join(word))
if len(cn_col_list) != 0:
if len(cn_width_list) != 0:
avg_char_width = np.mean(cn_width_list)
else:
avg_char_width = (bbox_x_end - bbox_x_start)/len(rec_str)
for center_idx in cn_col_list:
center_x = (center_idx+0.5)*cell_width
cell_x_start = max(int(center_x - avg_char_width/2), 0) + bbox_x_start
cell_x_end = min(int(center_x + avg_char_width/2), bbox_x_end-bbox_x_start) + bbox_x_start
cell = ((cell_x_start, bbox_y_start), (cell_x_end, bbox_y_start), (cell_x_end, bbox_y_end), (cell_x_start, bbox_y_end))
word_box_list.append(cell)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议抽象这部分成函数到ppstructure 到utility,提升代码可读性。对了可以给代码增加一些注释,例如说明这一部分是将识别结果转化为基于字符的位置和内容信息

if isinstance(preds, tuple) or isinstance(preds, list):
preds = preds[-1]
if isinstance(preds, paddle.Tensor):
preds = preds.numpy()
preds_idx = preds.argmax(axis=2)
preds_prob = preds.max(axis=2)
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True)
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True, return_word_box=return_word_box)
if label is None:
return text
label = self.decode(label)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个能不能利用kwargs传入呢?

Copy link
Collaborator

@shiyutang shiyutang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shiyutang shiyutang merged commit 1e11f25 into PaddlePaddle:release/2.6 Aug 2, 2023
1 check passed
ToddBear added a commit to ToddBear/PaddleOCR that referenced this pull request Aug 2, 2023
* modification of return word box

* update_implements

* Update rec_postprocess.py

* Update utility.py
shiyutang pushed a commit that referenced this pull request Aug 10, 2023
* modification of return word box

* update_implements

* Update rec_postprocess.py

* Update utility.py
shiyutang pushed a commit that referenced this pull request Oct 16, 2023
* modification of return word box

* update_implements

* Update rec_postprocess.py

* Update utility.py
shiyutang added a commit that referenced this pull request Oct 18, 2023
* Update recognition_en.md (#10059)

ic15_dict.txt only have 36 digits

* Update ocr_rec.h (#9469)

It is enough to include preprocess_op.h, we do not need to include ocr_cls.h.

* 补充num_classes注释说明 (#10073)

ser_vi_layoutxlm_xfund_zh.yml中的Architecture.Backbone.num_classes所赋值会设置给Loss.num_classes,
由于采用BIO标注,假设字典中包含n个字段(包含other)时,则类别数为2n-1;假设字典中包含n个字段(不含other)时,则类别数为2n+1。

* Update algorithm_overview_en.md (#9747)

Fix links to super-resolution algorithm docs

* 改进文档`deploy/hubserving/readme.md`和`doc/doc_ch/models_list.md` (#9110)

* Update readme.md

* Update readme.md

* Update readme.md

* Update models_list.md

* trim trailling spaces @ `deploy/hubserving/readme_en.md`

* `s/shell/bash/` @ `deploy/hubserving/readme_en.md`

* Update `deploy/hubserving/readme_en.md` to sync with `deploy/hubserving/readme.md`

* Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md`

* Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md`

* Update `doc/doc_en/models_list_en.md` to sync with `doc/doc_ch/models_list_en.md`

* using Grammarly to weak `deploy/hubserving/readme_en.md`

* using Grammarly to tweak `doc/doc_en/models_list_en.md`

* `ocr_system` module will return with values of field `confidence`

* Update README_CN.md

* 修复测试服务中图片转Base64的引用地址错误。 (#8334)

* Update application.md

* [Doc] Fix 404 link.  (#10318)

* Update PP-OCRv3_det_train.md

* Update knowledge_distillation.md

* Update config.md

* Fix fitz camelCase deprecation and .PDF not being recognized as pdf file (#10181)

* Fix fitz camelCase deprecation and .PDF not being recognized as pdf file

* refactor get_image_file_list function

* Update customize.md (#10325)

* Update FAQ.md (#10345)

* Update FAQ.md (#10349)

* Don't break overall processing on a bad image (#10216)

* Add preprocessing common to OCR tasks (#10217)

Add preprocessing to options

* [MLU] add mlu device for infer (#10249)

* Create newfeature.md

* Update newfeature.md

* remove unused imported module, so can avoid PyInstaller packaged binary's start-time not found module error. (#10502)

* CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515)

* modification of return word box

* update_implements

* Update rec_postprocess.py

* Update utility.py

* Update README_ch.md

* revert README_ch.md update

* Fixed Layout recovery README file (#10493)

Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one>

* update_doc

* bugfix

---------

Co-authored-by: ChuongLoc <89434232+ChuongLoc@users.noreply.github.com>
Co-authored-by: Wang Xin <xinwang614@gmail.com>
Co-authored-by: tanjh <dtdhinjapan@gmail.com>
Co-authored-by: Louis Maddox <lmmx@users.noreply.github.com>
Co-authored-by: n0099 <n@n0099.net>
Co-authored-by: zhenliang li <37922155+shouyong@users.noreply.github.com>
Co-authored-by: itasli <ilyas.tasli@outlook.fr>
Co-authored-by: UserUnknownFactor <63057995+UserUnknownFactor@users.noreply.github.com>
Co-authored-by: PeiyuLau <135964669+PeiyuLau@users.noreply.github.com>
Co-authored-by: kerneltravel <kjpioo2006@gmail.com>
Co-authored-by: ToddBear <43341135+ToddBear@users.noreply.github.com>
Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: Shubham Chambhare <59397280+Shubham654@users.noreply.github.com>
Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one>
Co-authored-by: andyj <87074272+andyjpaddle@users.noreply.github.com>
@zelohon
Copy link

zelohon commented Oct 26, 2023

cpp的支持吗?

@eternal-mind
Copy link

这个单字坐标支持Parseq这种用了transformer decoder的识别模型的输出吗?

@JerryZhang0111
Copy link

请问可以支持C++吗?与python不同的是,C++的输出没有位置info信息,没法给出每个字符在detection box里面的位置

@JerryZhang0111
Copy link

请问可以支持C++吗?与python不同的是,C++的输出没有位置info信息,没法给出每个字符在detection box里面的位置

简单对照python补充了一下代码,勉强可以,不是很准确

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants