Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve handling of rotation #311

Merged
merged 9 commits into from
Nov 5, 2019

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Sep 11, 2019

Workspace.image_from_*: when rotation is necessary, do not always fill with white; instead, determine the background color by median, and only use white for binary images; moreover, add a transparency
channel if the input mode allows it

@bertsky bertsky force-pushed the rotate-with-background-and-transparency branch from 114e4f0 to 9409694 Compare September 11, 2019 17:29
@codecov-io
Copy link

codecov-io commented Sep 11, 2019

Codecov Report

Merging #311 into master will decrease coverage by 0.97%.
The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #311      +/-   ##
==========================================
- Coverage   90.82%   89.85%   -0.98%     
==========================================
  Files          30       30              
  Lines        1603     1626      +23     
  Branches      309      317       +8     
==========================================
+ Hits         1456     1461       +5     
- Misses        111      125      +14     
- Partials       36       40       +4
Impacted Files Coverage Δ
ocrd/ocrd/workspace.py 62.27% <31.81%> (-3.27%) ⬇️
ocrd_utils/ocrd_utils/__init__.py 78.17% <35.71%> (-3.75%) ⬇️
ocrd/ocrd/resolver.py 98.79% <0%> (+3.61%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0410153...1d82c42. Read the comment docs.

do not always fill with white; instead, determine the
background color by median, and only use white for
binary images; moreover, add a transparency channel
if the input mode allows it
@bertsky bertsky force-pushed the rotate-with-background-and-transparency branch from 9409694 to c3d217b Compare September 11, 2019 22:22
(image_from_polygon), keep the input image mode;
moreover, add a transparency channel if the input
image allows it
@bertsky bertsky force-pushed the rotate-with-background-and-transparency branch from c3d217b to a10c9f2 Compare September 11, 2019 22:42
@bertsky bertsky requested a review from wrznr September 11, 2019 22:49
@wrznr
Copy link
Contributor

wrznr commented Sep 12, 2019

All CI tests fail!

Copy link
Contributor

@wrznr wrznr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@bertsky
Copy link
Collaborator Author

bertsky commented Sep 20, 2019

Please don't merge, yet! Just as we expected earlier, this does cause problems with some consumers:

OSError: cannot write mode LA as JPEG

This is tesserocr saying it cannot convert the image to libleptonica's Pixa, because JPGs must not have a transparency channel. (The choice JPEG is a fixed default in tesserocr.pyx.) Others like Ocropy might complain they suddenly see too many (2 or 4) dimensions in their Numpy arrays (instead of 1 or 3).

It is too much an imposition to expect the transparency reduction to white from the consumer. But it is also hard to tell from the API which consumers want which behaviour. Maybe we should spend a new option fill like so:

  • white: fill areas between segment polygon and rectangle with white
  • background: fill with the background colour, measured via median
  • transparent: add a transparency channel (current state of the PR)

@wrznr @kba what do you think?

@bertsky
Copy link
Collaborator Author

bertsky commented Sep 20, 2019

Correction: The above error for Tesseract is a shortcoming entirely attributable to tesserocr. And since in Ocropy/ocrolib with pil2image we have full control of what behaviour we want with transparent inputs, I propose to stick with fill as a parameter, but use transparent as default. Thus, we get the current behaviour, but can opt-out. Commit is following shortly. Please re-review!

@bertsky bertsky requested review from kba and wrznr September 20, 2019 13:12
- image_from_page, image_from_segment, image_from_polygon:
  add parameter ``fill``
- possible values white/background/transparent, with
  ``transparent`` (behaviour introduced by this branch)
  as default
@bertsky bertsky force-pushed the rotate-with-background-and-transparency branch from b49648d to d4b46a3 Compare September 20, 2019 19:35
@bertsky
Copy link
Collaborator Author

bertsky commented Sep 20, 2019

A little elaboration is due on why we want transparency. Transparency can preserve the information which image regions are of interest (as a mask), without clipping the rest to some (not so) random colour like white or background. Especially if we have to "invent" the color as in rotation with reshaping.

But this is only helpful for

  1. image-only consumers (like CNN segmentation, dewarping or recognition), and among that only
  2. those that do not already rely on binarized input, and among that only
  3. those that are capable of interpreting a transparency channel.

By image-only I mean processors that cannot mask image rectangles via coordinates in their inner representation. By 2 I exclude approaches like Ocropy which do everything in binary anyway (where we can and must safely clip to white). And 3 could be e.g. a CNN segmentation without alpha channel in the model input.

So this positively includes Tesseract recognition (as it turns out – at least in principle), and upcoming segmentation tools like ocrd_segment and probably the modules from Würzburg and DFKI.

But by making transparent images the default, we put a burden on all consumers to be able to cope with the extra channels (and usually ignore them). As it stands, Ocropy (in ocrd_cis) is already robust. But we should check the other existing processors (calamari, kraken, anyOCR, typegroups classifier, larex?) before merging this.

@bertsky
Copy link
Collaborator Author

bertsky commented Sep 20, 2019

And on the producer side: clipping would be much more useful if it did not have to actually insert any colours, but could export its shrinked mask (which is completely free of the restrictions of coordinate polygons) as transparency.

@bertsky
Copy link
Collaborator Author

bertsky commented Sep 21, 2019

So this positively includes Tesseract recognition (as it turns out – at least in principle)

No, Tesseract can cope with raw colors (by converting them to 8-bit grayscale), but it always converts alpha channels to white background! The model does not have an extra colour for transparency. (See here for relevant links into the code.)

That leaves ocrd_segment et al. as the only use-case. Maybe not enough to swing the fill=transparent default anymore? Rather the contrary: Tesseract with transparent (becoming white) at runtime is worse than filling with background when running on raw images. The network is more perplexed by white/grey mix than by pure grey or pure binarized.

- image_from_page, image_from_segment, image_from_polygon:
  add parameter ``transparency``, independent of ``fill``
- an alpha channel with the mask will be added iff ``transparency``,
  colour in ``fill`` will be used regardless (for consumers which
  cannot handle alpha channels)
…b.com/bertsky/core into rotate-with-background-and-transparency
- image_from_polygon: regardless of the ``transparency`` parameter,
  if the input already has an alpha channel, then shrink its mask
  from the polygon mask
@bertsky
Copy link
Collaborator Author

bertsky commented Sep 23, 2019

The last 3 commits explained:

Having seen how Tesseract can cope with transparency, but does it rather badly (by reducing to white instead of background), and others like Ocropy ignoring transparency completely (thereby reverting to the color information in the other channels outside the mask), I think we need both:

  1. having a way to add/preserve mask information if some workflow configuration demands it, but
  2. not depending on alpha channels at all, but use estimated background colour everywhere outside the mask.

So the idea is to separate the transparency opt-in from the fill decision. And to nevertheless disregard transparency if the input already contains an alpha channel. Thus we get the following behaviour:

  • 1 is satisfied by having some producer in the workflow chain use transparency=True or add images with an alpha mask itself; but by default no alpha channels will be created
  • 2 is satisfied because fill=background is always there as a fallback for dumb consumers

@wrznr
Copy link
Contributor

wrznr commented Sep 24, 2019

That is rather disappointing, right? Is there a way we can improve Tesseract's handling of transparency? Maybe via tesserocr and/or tesstrain?

@bertsky
Copy link
Collaborator Author

bertsky commented Sep 24, 2019

That is rather disappointing, right? Is there a way we can improve Tesseract's handling of transparency? Maybe via tesserocr and/or tesstrain?

Not without changing the internal network topology and retraining. Transparency needs to be an extra input, or at least a "special colour".

Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks

ocrd/ocrd/workspace.py Show resolved Hide resolved
Copy link
Member

@cneud cneud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM following #311 (comment)

@kba kba merged commit 1d82c42 into OCR-D:master Nov 5, 2019
@bertsky bertsky deleted the rotate-with-background-and-transparency branch October 8, 2020 06:41
bertsky pushed a commit to bertsky/core that referenced this pull request Apr 26, 2023
Update Dockerfile, Makefile and GitHub action for Docker images
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants