Enhance the OCR recognition accuracy of PPStructure. #11916

RussellLuo · 2024-04-11T07:07:38Z

This PR is likely to close the following issues:

paddle-bot · 2024-04-11T07:07:43Z

Thanks for your contribution!

__init__.py

GreatV · 2024-04-12T06:19:32Z

For the picture below, this hot fix didn't work.

INPUT：

OUTPUT ：

RussellLuo · 2024-04-12T06:23:16Z

In my own opinion, this fix is essentially just a patch (rather than a complete solution), so there may be edge cases where this fix does not address.

UPDATE: That being said, in my own use cases, this patch typically has much higher OCR recognition accuracy when compared to the current implementation of PPStructure.

RussellLuo · 2024-04-12T07:20:48Z

I took a screenshot of your INPUT image (intentionally leaving extra white space around the last line), and got the OUTPUT as below:

(I guess the omission of the last line of text is due to the original image being cropped.)

GreatV · 2024-04-12T08:03:32Z

PaddleOCR/ppstructure/predict_system.py

Lines 120 to 149 in c82dd64

    
           for region in layout_res: 
        
               res = '' 
        
               if region['bbox'] is not None: 
        
                   x1, y1, x2, y2 = region['bbox'] 
        
                   x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2) 
        
                   roi_img = ori_im[y1:y2, x1:x2, :] 
        
               else: 
        
                   x1, y1, x2, y2 = 0, 0, w, h 
        
                   roi_img = ori_im 
        
               if region['label'] == 'table': 
        
                   if self.table_system is not None: 
        
                       res, table_time_dict = self.table_system( 
        
                           roi_img, return_ocr_result_in_table) 
        
                       time_dict['table'] += table_time_dict['table'] 
        
                       time_dict['table_match'] += table_time_dict['match'] 
        
                       time_dict['det'] += table_time_dict['det'] 
        
                       time_dict['rec'] += table_time_dict['rec'] 
        
               else: 
        
                   if self.text_system is not None: 
        
                       if self.recovery: 
        
                           wht_im = np.ones(ori_im.shape, dtype=ori_im.dtype) 
        
                           wht_im[y1:y2, x1:x2, :] = roi_img 
        
                           filter_boxes, filter_rec_res, ocr_time_dict = self.text_system( 
        
                               wht_im) 
        
                       else: 
        
                           filter_boxes, filter_rec_res, ocr_time_dict = self.text_system( 
        
                               roi_img) 
        
                       time_dict['det'] += ocr_time_dict['det'] 
        
                       time_dict['rec'] += ocr_time_dict['rec']

Hi @RussellLuo. Could you modify the code here to make it detect and recognize the original image directly, so that the other interfaces don't need to be changed?

RussellLuo · 2024-04-12T10:29:39Z

I'm not familiar with this piece of code. To my understanding, the core logic is to first detect the layout regions (by using the LayoutPredictor):

PaddleOCR/ppstructure/predict_system.py

Line 114 in c82dd64

layout_res, elapse = self.layout_predictor(img)

and then recognize the corresponding texts from each layout region (by using the TextSystem):

PaddleOCR/ppstructure/predict_system.py

Lines 120 to 125 in c82dd64

    
           for region in layout_res: 
        
               res = '' 
        
               if region['bbox'] is not None: 
        
                   x1, y1, x2, y2 = region['bbox'] 
        
                   x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2) 
        
                   roi_img = ori_im[y1:y2, x1:x2, :]

As shown above, to get the image pixels roi_img within the layout region, the original image ori_im is cropped according to the coordinates of each layout region. As discussed in #10270, this step is suspected to be the cause of decreased OCR precision, as the coordinates of the layout region are likely to be inaccurate (compared to the coordinates of the text region detected by TextSystem).

Based on the analysis, the core idea behind this fix is to:

First detect all possible text regions (by using PaddleOCR, which is essentially a TextSystem).
Then detect all layout regions (by using a StructureSystem), and for each layout region:
- Filter the text regions that intersect with this layout region.
- For these intersecting text regions, recognize the texts within them (by using PaddleOCR).
- Finally associate these texts with the this layout region.

Therefore, this fix is a hybrid solution that leverages both PaddleOCR and StructureSystem, but I'm not sure whether it is appropriate to place the changes of this hybrid solution into PaddleOCR/ppstructure/predict_system.py.

Hope to hear your suggestions.

GreatV · 2024-04-12T10:58:47Z

Based on the analysis, the core idea behind this fix is to:

First detect all possible text regions and recognize the texts within them (by only using PaddleOCR, which is essentially a TextSystem).

Then collect texts from the text regions that intersect with each layout region (detected by using a StructureSystem).

Yes, I think that would be a more appropriate modification, with minimal impact on the overall code structure. I hope you'll give it a try.

ppstructure/predict_system.py

paddleocr.py

tools/infer/predict_system.py

RussellLuo · 2024-04-13T01:50:43Z

All changes have been centralized in StructureSystem, and here are some additional notes.

The following line would keep results with low confidence and has been deleted:

PaddleOCR/ppstructure/predict_system.py

Line 61 in c82dd64

args.drop_score = 0

These two lines would cause coordinate error and have been commented out. It took me a significant amount of time to identify this issue. I don't understand the logic behind this piece of code, perhaps these lines should be deleted?

PaddleOCR/ppstructure/predict_system.py

Lines 165 to 166 in c82dd64

    
           if not self.recovery: 
        
               box += [x1, y1]

GreatV · 2024-04-13T02:32:40Z

Why do we need to keep low confidence objectives?

PaddleOCR/ppstructure/predict_system.py

Line 61 in c82dd64

args.drop_score = 0

The code here is used to calculate the offset from the results of the layout analysis and can be removed.

PaddleOCR/ppstructure/predict_system.py

Lines 165 to 166 in c82dd64

    
           if not self.recovery: 
        
               box += [x1, y1]

RussellLuo · 2024-04-13T02:45:49Z

Why do we need to keep low confidence objectives?

This line is from the previous code and will lead to the return of results with low confidence (as the default value 0.5 was reset). The latest change has been made to remove this line.

The code here is used to calculate the offset from the results of the layout analysis and can be removed.

Got it!

Closes PaddlePaddle#10270 and PaddlePaddle#11665.

RussellLuo · 2024-04-13T03:04:48Z

All suggested changes have been made and all commits have been squashed into one.

GreatV · 2024-04-13T03:21:06Z

@RussellLuo Thanks for your contribution, I will take some time to check it again.

GreatV · 2024-04-15T12:18:05Z

I have run some tests and can confirm that this issue has been improved.

old:
new:

GreatV

LGTM

RussellLuo · 2024-04-15T12:34:52Z

@GreatV Thanks for your patient review and excellent suggestions!

jzhang533 · 2024-04-16T02:08:22Z

thanks, lets merge it.

luotao1 · 2024-10-15T07:11:36Z

@RussellLuo Thanks for your contribution! You will receive a beautiful PaddlePaddle gift. Please provide your mailing address by filling out the following questionnaire before October 18th.

Looking forward to the future, we will walk further together in the world of open source!
Click Here ：https://paddle.wjx.cn/vm/h4On9gJ.aspx#

luotao1 · 2024-11-06T12:11:10Z

hi, @RussellLuo

非常感谢你对飞桨的贡献，我们正在运营一个PFCC组织，会通过定期分享技术知识与发布开发者主导任务的形式持续为飞桨做贡献，详情可见 https://github.com/luotao1 主页说明。
如果你对PFCC有兴趣，请发送邮件至 ext_paddle_oss@baidu.com，我们会邀请你加入~

paddle-bot bot added the contributor label Apr 11, 2024

paddle-bot bot assigned cuicheng01 Apr 11, 2024

RussellLuo mentioned this pull request Apr 11, 2024

PPStructure版面分析得到的结果，bbox里OCR的结果缺失最后一行 #10270

Closed

jzhang533 requested review from jzhang533 and GreatV April 12, 2024 04:51

GreatV reviewed Apr 12, 2024

View reviewed changes

__init__.py Outdated Show resolved Hide resolved

GreatV reviewed Apr 13, 2024

View reviewed changes

ppstructure/predict_system.py Outdated Show resolved Hide resolved

GreatV reviewed Apr 13, 2024

View reviewed changes

paddleocr.py Outdated Show resolved Hide resolved

GreatV reviewed Apr 13, 2024

View reviewed changes

tools/infer/predict_system.py Outdated Show resolved Hide resolved

Enhance StructureSystem to achieve higher OCR recognition accuracy

474d9b3

Closes PaddlePaddle#10270 and PaddlePaddle#11665.

RussellLuo force-pushed the fix-ppstructure-ocr branch from ebf52cd to 474d9b3 Compare April 13, 2024 03:01

This was referenced Apr 13, 2024

图像Layout检测，一行文字丢失前面部分 #11869

Closed

PPStructure得到了版面分析的结果后，对各个块的图进行OCR，有一个放大的预处理，导致像素失真，而OCR对缩放像素失真的鲁棒性很差。 #11328

Closed

GreatV approved these changes Apr 15, 2024

View reviewed changes

jzhang533 merged commit 667fda8 into PaddlePaddle:main Apr 16, 2024
2 checks passed

jzhang533 mentioned this pull request Apr 16, 2024

【疑难解决】解决PaddleOCR历史存在的疑难Issue #11906

Closed

RussellLuo deleted the fix-ppstructure-ocr branch April 16, 2024 04:17

github-actions bot locked as resolved and limited conversation to collaborators Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance the OCR recognition accuracy of PPStructure. #11916

Enhance the OCR recognition accuracy of PPStructure. #11916

RussellLuo commented Apr 11, 2024 •

edited

Loading

paddle-bot bot commented Apr 11, 2024

GreatV commented Apr 12, 2024

RussellLuo commented Apr 12, 2024 •

edited

Loading

RussellLuo commented Apr 12, 2024

GreatV commented Apr 12, 2024 •

edited

Loading

RussellLuo commented Apr 12, 2024 •

edited

Loading

GreatV commented Apr 12, 2024

RussellLuo commented Apr 13, 2024

GreatV commented Apr 13, 2024

RussellLuo commented Apr 13, 2024 •

edited

Loading

RussellLuo commented Apr 13, 2024 •

edited

Loading

GreatV commented Apr 13, 2024

GreatV commented Apr 15, 2024 •

edited

Loading

GreatV left a comment

RussellLuo commented Apr 15, 2024

jzhang533 commented Apr 16, 2024

luotao1 commented Oct 15, 2024

luotao1 commented Nov 6, 2024

Enhance the OCR recognition accuracy of PPStructure. #11916

Enhance the OCR recognition accuracy of PPStructure. #11916

Conversation

RussellLuo commented Apr 11, 2024 • edited Loading

paddle-bot bot commented Apr 11, 2024

GreatV commented Apr 12, 2024

RussellLuo commented Apr 12, 2024 • edited Loading

RussellLuo commented Apr 12, 2024

GreatV commented Apr 12, 2024 • edited Loading

RussellLuo commented Apr 12, 2024 • edited Loading

GreatV commented Apr 12, 2024

RussellLuo commented Apr 13, 2024

GreatV commented Apr 13, 2024

RussellLuo commented Apr 13, 2024 • edited Loading

RussellLuo commented Apr 13, 2024 • edited Loading

GreatV commented Apr 13, 2024

GreatV commented Apr 15, 2024 • edited Loading

GreatV left a comment

Choose a reason for hiding this comment

RussellLuo commented Apr 15, 2024

jzhang533 commented Apr 16, 2024

luotao1 commented Oct 15, 2024

luotao1 commented Nov 6, 2024

RussellLuo commented Apr 11, 2024 •

edited

Loading

RussellLuo commented Apr 12, 2024 •

edited

Loading

GreatV commented Apr 12, 2024 •

edited

Loading

RussellLuo commented Apr 12, 2024 •

edited

Loading

RussellLuo commented Apr 13, 2024 •

edited

Loading

RussellLuo commented Apr 13, 2024 •

edited

Loading

GreatV commented Apr 15, 2024 •

edited

Loading