Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sanitize: stay on page image/array #22

Merged
merged 1 commit into from Dec 9, 2019

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Dec 9, 2019

Fixes #21 – it was not correct to use the region image/array here, because that depends on the bounding box of the region, which can be too small.

Something not covered by this is when TextLine coordinates even extrude the page Border.

Copy link
Collaborator

@wrznr wrznr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it!

@wrznr wrznr merged commit 1ea4305 into OCR-D:master Dec 9, 2019
@bertsky bertsky deleted the sanitize-on-page-array branch December 9, 2019 12:07
@bertsky
Copy link
Collaborator Author

bertsky commented Dec 9, 2019

One more thing about this fix. It is not optimal w.r.t. rotation.

If deskewing has happened (only) on the region level, then the closing of the regions' lines will no longer find deskewed hulls, but undeskewed hulls. That's because of the simplistic approximation to morphological operations used here (simple rectangular minimum and maximum filters) for efficiency reasons.

@wrznr
Copy link
Collaborator

wrznr commented Dec 9, 2019

But deskewing on region level would usually happen after sanitizing the regions, right?

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 9, 2019

But deskewing on region level would usually happen after sanitizing the regions, right?

I don't think so. Sanitizing regions is only possible after textline segmentation. But the latter usually relies on deskewing itself.

@wrznr
Copy link
Collaborator

wrznr commented Dec 9, 2019

Right. In my particular usecase (reusing existing ABBYY segementation) however, I apply region-level deskewing after sanitization (results look great!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expand regions via repair/sanitize
2 participants