Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor segment-repair end with Exception #42

Closed
j-panzer opened this issue Aug 27, 2020 · 7 comments
Closed

Processor segment-repair end with Exception #42

j-panzer opened this issue Aug 27, 2020 · 7 comments

Comments

@j-panzer
Copy link

The processor 'segment-repir' ends wirh Exception "Exception: ocrd-segment-repair exited with non-zero return value 1" if it comes after processor 'cis-ocropy-segment' in the workflow. In a changed workflow.

In a modified workflow, where processor 'cis-ocropy-segment' is replaced by processor 'tesserocr-segment-line', the processing runs.

@j-panzer
Copy link
Author

log.txt

@krvoigt
Copy link

krvoigt commented Aug 27, 2020

@kba

@krvoigt
Copy link

krvoigt commented Aug 27, 2020

@kba
Copy link
Member

kba commented Aug 27, 2020

The root cause of the error is

shapely.errors.TopologicalError: The operation 'GEOSWithin_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.polygon.Polygon object at 0x2aaae5b3f898>

Which is caused by _child_within_parent not getting a valid polygon from the coordinates of the parent region. Likely, the parent region's @coords has invalid points. I'll try to reproduce.

@kba
Copy link
Member

kba commented Aug 27, 2020

input data: https://owncloud.gwdg.de/index.php/s/k96zk4XILHi3let

This was the workflow:

time ocrd process\
    "olena-binarize -I PRESENTATION -O OCR-D-BIN -P impl sauvola"\
    "anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP"\
    "olena-binarize -I OCR-D-CROP -O OCR-D-BIN2 -P impl kim"\
    "cis-ocropy-denoise -I OCR-D-BIN2 -O OCR-D-BIN-DENOISE -P level-of-operation page"\
    "cis-ocropy-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P level-of-operation page"\
    "tesserocr-segment-region -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG-REG"\
    "segment-repair -I OCR-D-SEG-REG -O OCR-D-SEG-REPAIR -P plausibilize true"\
    "cis-ocropy-deskew -I OCR-D-SEG-REPAIR -O OCR-D-SEG-REG-DESKEW -P level-of-operation region"\
    "cis-ocropy-clip -I OCR-D-SEG-REG-DESKEW -O OCR-D-SEG-REG-DESKEW-CLIP -P level-of-operation region"\
    "cis-ocropy-segment -I OCR-D-SEG-REG-DESKEW-CLIP -O OCR-D-SEG-LINE -P level-of-operation region"\
    "segment-repair -I OCR-D-SEG-LINE -O OCR-D-SEG-REPAIR-LINE -P sanitize true"\
    "cis-ocropy-dewarp -I OCR-D-SEG-REPAIR-LINE -O OCR-D-SEG-LINE-RESEG-DEWARP"\
    "calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint /usr/users/jpanzer/ocrd/models/calamari-models/\*.ckpt.json"

@bertsky
Copy link
Collaborator

bertsky commented Aug 27, 2020

Thanks everyone for the detailed report! @kba we are looking for a TextLine from the else part at the bottom of ocrd_cis.ocropy.segment._process_element. This should be detectable via page validation, yes. (The polygons themselves are produced in ocrd_cis.ocropy.segment.masks2polygons and polygon_for_parent)

EDIT And we know that we are not looking for a bad TextRegion because the error goes away when replacing the ocrd_cis line segmentation with the one from ocrd_tesserocr.

@paulpestov paulpestov added this to Ideas in coordinate_all Aug 30, 2021
@bertsky
Copy link
Collaborator

bertsky commented Feb 17, 2022

Note: repair has long since included a mechanism for PAGE input validation and automatic fixing – I have not tested this again, but I'm pretty sure it has been solved.

@bertsky bertsky closed this as completed Feb 17, 2022
coordinate_all automation moved this from Ideas to Done Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

4 participants