Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve coordinate validator #418

Merged
merged 2 commits into from
Jan 22, 2020
Merged

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Jan 21, 2020

By request of @tboenig, this slightly improves useability of the coordinate checks.

Example output (scroll right to see the change):

  <error>INVALIDITY in Word ID 'w46' of 'weigel_gnothi02_1618/page/weigel_gnothi02_1618_0003.xml': coords '1270,526 1270,527 1271,527 1271,529 1272,529 1272,531 1273,531 1273,544 1283,544 1283,545 1284,545 1283,545 1283,560 1282,560 1282,561 1281,561 1281,562 1280,562 1280,563 1279,563 1279,564 1265,564 1265,565 1221,565 1221,566 1196,566 1196,541 1197,541 1197,540 1239,540 1239,532 1240,532 1240,530 1241,530 1241,529 1242,529 1242,527 1256,527 1257,526' - Self-intersection[1283 545]</error>

@bertsky bertsky requested a review from kba January 21, 2020 22:17
@codecov-io
Copy link

codecov-io commented Jan 21, 2020

Codecov Report

Merging #418 into master will decrease coverage by 0.57%.
The diff coverage is 32%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #418      +/-   ##
==========================================
- Coverage   82.81%   82.24%   -0.58%     
==========================================
  Files          39       39              
  Lines        2275     2292      +17     
  Branches      406      414       +8     
==========================================
+ Hits         1884     1885       +1     
- Misses        330      338       +8     
- Partials       61       69       +8
Impacted Files Coverage Δ
ocrd_validators/ocrd_validators/page_validator.py 87.73% <32%> (-9.54%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89166a1...8099334. Read the comment docs.

@bertsky
Copy link
Collaborator Author

bertsky commented Jan 21, 2020

Additionally, IMO the coordinate checks should be made a little less strict (and thus more compatible with Aletheia) to avoid crying wolf.

Things I see frequently:

  1. very small (up to 1 pixel) violations of non-containment in parent element
    • Shapely does not have almost_within, but one could try containment within the dilated version:
      if not (child_poly.within(node_poly) or
              child_poly.within(node_poly.buffer(0.5)))
  2. tiny (direct neighbour) self-intersections because of back-and-forth (probably caused by internal rounding)
    • This must be repaired on the spot, otherwise Shapely will not operate on these polygons. Possibly:
      if not node_poly.is_valid:
          if node_poly.simplify(0.8).is_valid:
              node_poly = node_poly.simplify(0.8)

But it could be more prudent to keep a strict validator, and outsource these repairs into a dedicated Aletheia postprocessor (e.g. ocrd-segment-repair with a new correct-coords=true).

@kba @wrznr @tboenig let me know if you want me to implement either option.

Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved UX always welcome.

But it could be more prudent to keep a strict validator, and outsource these repairs into a dedicated Aletheia postprocessor

That seems wiser then relaxing the rules for validation of GT 👍

@kba kba requested review from cneud and tboenig January 22, 2020 10:54
@kba kba merged commit 034d538 into OCR-D:master Jan 22, 2020
@bertsky bertsky deleted the improve-coordinate-validator branch October 8, 2020 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants