Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the basic project layout and minimal functionality #1

Merged
merged 2 commits into from Aug 13, 2019

Conversation

wrznr
Copy link
Collaborator

@wrznr wrznr commented Aug 6, 2019

This is supposed to be an OCR-D processor which someday will give plausibility feedback on a page's segmentation. It uses https://pypi.org/project/Shapely/ as proposed by @bertsky.

@wrznr wrznr requested a review from kba August 6, 2019 15:44
@bertsky
Copy link
Collaborator

bertsky commented Aug 12, 2019

Thanks for breathing life into this!

Does this minimal functionality reflect a common/specific use case you observed on some pipeline, or is it rather exemplary?

BTW, I found a good explanation of intersects vs overlaps vs crosses in the manual, which is much more than in the docstring. It does not say anything on covers though.

With the second commit, I can already see this module as going into the more general "layout alignment" (by aligning with itself) – maybe we can ultimately make 2 different processors (sharing the same heuristics), one for evaluation, one for alignment/merging?

@wrznr
Copy link
Collaborator Author

wrznr commented Aug 13, 2019

@bertsky Could you pls. accept my invitation as a collaborator? Would like to assign you as a reviewer.

2 different processors

I do not like the idea. Analytics (and this implementations) would highly overlap. I'd like to control analyze vs. modify via parameters.

image
This is one of the intended use cases. Happens quite often with tesseract and has a very negative influence on resegment/clip (and ultimately text recognition).

@bertsky
Copy link
Collaborator

bertsky commented Aug 13, 2019

Could you pls. accept my invitation as a collaborator? Would like to assign you as a reviewer.

Found it.

2 different processors

I do not like the idea. Analytics (and this implementations) would highly overlap. I'd like to control analyze vs. modify via parameters.

Yes, I guess there will be too much overlap.

This is one of the intended use cases. Happens quite often with tesseract and has a very negative influence on resegment/clip (and ultimately text recognition).

Thanks, that's very illustrative! True, this would not work at all with resegment/clip (the rules of which I rather optimised in view of present GT).

@wrznr wrznr requested a review from bertsky August 13, 2019 07:24
@bertsky
Copy link
Collaborator

bertsky commented Aug 13, 2019

This could be very useful for GT as well (example from buerger_gedichte_1178 page 0002):

buerger_gedichte_1778 0002 gt-seg-block

@wrznr wrznr merged commit ed0a68c into OCR-D:master Aug 13, 2019
@wrznr wrznr deleted the add_basic_stuff branch August 13, 2019 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants