Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance the OCR recognition accuracy of PPStructure. #11916

Merged
merged 1 commit into from
Apr 16, 2024

Conversation

Copy link

paddle-bot bot commented Apr 11, 2024

Thanks for your contribution!

__init__.py Outdated Show resolved Hide resolved
@GreatV
Copy link
Collaborator

GreatV commented Apr 12, 2024

For the picture below, this hot fix didn't work.

INPUT:

test_1

OUTPUT :

result

@RussellLuo
Copy link
Contributor Author

RussellLuo commented Apr 12, 2024

In my own opinion, this fix is essentially just a patch (rather than a complete solution), so there may be edge cases where this fix does not address.

UPDATE: That being said, in my own use cases, this patch typically has much higher OCR recognition accuracy when compared to the current implementation of PPStructure.

@RussellLuo
Copy link
Contributor Author

I took a screenshot of your INPUT image (intentionally leaving extra white space around the last line), and got the OUTPUT as below:

image

(I guess the omission of the last line of text is due to the original image being cropped.)

@GreatV
Copy link
Collaborator

GreatV commented Apr 12, 2024

for region in layout_res:
res = ''
if region['bbox'] is not None:
x1, y1, x2, y2 = region['bbox']
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
roi_img = ori_im[y1:y2, x1:x2, :]
else:
x1, y1, x2, y2 = 0, 0, w, h
roi_img = ori_im
if region['label'] == 'table':
if self.table_system is not None:
res, table_time_dict = self.table_system(
roi_img, return_ocr_result_in_table)
time_dict['table'] += table_time_dict['table']
time_dict['table_match'] += table_time_dict['match']
time_dict['det'] += table_time_dict['det']
time_dict['rec'] += table_time_dict['rec']
else:
if self.text_system is not None:
if self.recovery:
wht_im = np.ones(ori_im.shape, dtype=ori_im.dtype)
wht_im[y1:y2, x1:x2, :] = roi_img
filter_boxes, filter_rec_res, ocr_time_dict = self.text_system(
wht_im)
else:
filter_boxes, filter_rec_res, ocr_time_dict = self.text_system(
roi_img)
time_dict['det'] += ocr_time_dict['det']
time_dict['rec'] += ocr_time_dict['rec']

Hi @RussellLuo. Could you modify the code here to make it detect and recognize the original image directly, so that the other interfaces don't need to be changed?

@RussellLuo
Copy link
Contributor Author

RussellLuo commented Apr 12, 2024

I'm not familiar with this piece of code. To my understanding, the core logic is to first detect the layout regions (by using the LayoutPredictor):

layout_res, elapse = self.layout_predictor(img)

and then recognize the corresponding texts from each layout region (by using the TextSystem):

for region in layout_res:
res = ''
if region['bbox'] is not None:
x1, y1, x2, y2 = region['bbox']
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
roi_img = ori_im[y1:y2, x1:x2, :]

As shown above, to get the image pixels roi_img within the layout region, the original image ori_im is cropped according to the coordinates of each layout region. As discussed in #10270, this step is suspected to be the cause of decreased OCR precision, as the coordinates of the layout region are likely to be inaccurate (compared to the coordinates of the text region detected by TextSystem).

Based on the analysis, the core idea behind this fix is to:

  1. First detect all possible text regions (by using PaddleOCR, which is essentially a TextSystem).
  2. Then detect all layout regions (by using a StructureSystem), and for each layout region:
    • Filter the text regions that intersect with this layout region.
    • For these intersecting text regions, recognize the texts within them (by using PaddleOCR).
    • Finally associate these texts with the this layout region.

Therefore, this fix is a hybrid solution that leverages both PaddleOCR and StructureSystem, but I'm not sure whether it is appropriate to place the changes of this hybrid solution into PaddleOCR/ppstructure/predict_system.py.

Hope to hear your suggestions.

@GreatV
Copy link
Collaborator

GreatV commented Apr 12, 2024

Based on the analysis, the core idea behind this fix is to:

  1. First detect all possible text regions and recognize the texts within them (by only using PaddleOCR, which is essentially a TextSystem).
  2. Then collect texts from the text regions that intersect with each layout region (detected by using a StructureSystem).

Yes, I think that would be a more appropriate modification, with minimal impact on the overall code structure. I hope you'll give it a try.

paddleocr.py Outdated Show resolved Hide resolved
@RussellLuo
Copy link
Contributor Author

All changes have been centralized in StructureSystem, and here are some additional notes.

The following line would keep results with low confidence and has been deleted:

args.drop_score = 0

These two lines would cause coordinate error and have been commented out. It took me a significant amount of time to identify this issue. I don't understand the logic behind this piece of code, perhaps these lines should be deleted?

if not self.recovery:
box += [x1, y1]

@GreatV
Copy link
Collaborator

GreatV commented Apr 13, 2024

Why do we need to keep low confidence objectives?

args.drop_score = 0

The code here is used to calculate the offset from the results of the layout analysis and can be removed.

if not self.recovery:
box += [x1, y1]

@RussellLuo
Copy link
Contributor Author

RussellLuo commented Apr 13, 2024

Why do we need to keep low confidence objectives?

This line is from the previous code and will lead to the return of results with low confidence (as the default value 0.5 was reset). The latest change has been made to remove this line.

The code here is used to calculate the offset from the results of the layout analysis and can be removed.

Got it!

@RussellLuo
Copy link
Contributor Author

RussellLuo commented Apr 13, 2024

All suggested changes have been made and all commits have been squashed into one.

@GreatV
Copy link
Collaborator

GreatV commented Apr 13, 2024

@RussellLuo Thanks for your contribution, I will take some time to check it again.

@GreatV
Copy link
Collaborator

GreatV commented Apr 15, 2024

I have run some tests and can confirm that this issue has been improved.

  • old:
    result_old

  • new:
    result_new

Copy link
Collaborator

@GreatV GreatV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RussellLuo
Copy link
Contributor Author

@GreatV Thanks for your patient review and excellent suggestions!

@jzhang533 jzhang533 merged commit 667fda8 into PaddlePaddle:main Apr 16, 2024
2 checks passed
@jzhang533
Copy link
Collaborator

thanks, lets merge it.

@luotao1
Copy link
Collaborator

luotao1 commented Oct 15, 2024

@RussellLuo Thanks for your contribution! You will receive a beautiful PaddlePaddle gift. Please provide your mailing address by filling out the following questionnaire before October 18th.

Looking forward to the future, we will walk further together in the world of open source!
Click Here :https://paddle.wjx.cn/vm/h4On9gJ.aspx#

@luotao1
Copy link
Collaborator

luotao1 commented Nov 6, 2024

hi, @RussellLuo

  • 非常感谢你对飞桨的贡献,我们正在运营一个PFCC组织,会通过定期分享技术知识与发布开发者主导任务的形式持续为飞桨做贡献,详情可见 https://github.com/luotao1 主页说明。
  • 如果你对PFCC有兴趣,请发送邮件至 ext_paddle_oss@baidu.com,我们会邀请你加入~

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants