-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve handling of rotation #311
improve handling of rotation #311
Conversation
114e4f0
to
9409694
Compare
Codecov Report
@@ Coverage Diff @@
## master #311 +/- ##
==========================================
- Coverage 90.82% 89.85% -0.98%
==========================================
Files 30 30
Lines 1603 1626 +23
Branches 309 317 +8
==========================================
+ Hits 1456 1461 +5
- Misses 111 125 +14
- Partials 36 40 +4
Continue to review full report at Codecov.
|
do not always fill with white; instead, determine the background color by median, and only use white for binary images; moreover, add a transparency channel if the input mode allows it
9409694
to
c3d217b
Compare
(image_from_polygon), keep the input image mode; moreover, add a transparency channel if the input image allows it
c3d217b
to
a10c9f2
Compare
All CI tests fail! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
Please don't merge, yet! Just as we expected earlier, this does cause problems with some consumers:
This is It is too much an imposition to expect the transparency reduction to white from the consumer. But it is also hard to tell from the API which consumers want which behaviour. Maybe we should spend a new option
|
Correction: The above error for Tesseract is a shortcoming entirely attributable to tesserocr. And since in Ocropy/ocrolib with |
- image_from_page, image_from_segment, image_from_polygon: add parameter ``fill`` - possible values white/background/transparent, with ``transparent`` (behaviour introduced by this branch) as default
b49648d
to
d4b46a3
Compare
A little elaboration is due on why we want transparency. Transparency can preserve the information which image regions are of interest (as a mask), without clipping the rest to some (not so) random colour like white or background. Especially if we have to "invent" the color as in rotation with reshaping. But this is only helpful for
By So this positively includes Tesseract recognition (as it turns out – at least in principle), and upcoming segmentation tools like ocrd_segment and probably the modules from Würzburg and DFKI. But by making transparent images the default, we put a burden on all consumers to be able to cope with the extra channels (and usually ignore them). As it stands, Ocropy (in ocrd_cis) is already robust. But we should check the other existing processors (calamari, kraken, anyOCR, typegroups classifier, larex?) before merging this. |
And on the producer side: clipping would be much more useful if it did not have to actually insert any colours, but could export its shrinked mask (which is completely free of the restrictions of coordinate polygons) as transparency. |
No, Tesseract can cope with raw colors (by converting them to 8-bit grayscale), but it always converts alpha channels to white background! The model does not have an extra colour for transparency. (See here for relevant links into the code.) That leaves ocrd_segment et al. as the only use-case. Maybe not enough to swing the |
- image_from_page, image_from_segment, image_from_polygon: add parameter ``transparency``, independent of ``fill`` - an alpha channel with the mask will be added iff ``transparency``, colour in ``fill`` will be used regardless (for consumers which cannot handle alpha channels)
…b.com/bertsky/core into rotate-with-background-and-transparency
- image_from_polygon: regardless of the ``transparency`` parameter, if the input already has an alpha channel, then shrink its mask from the polygon mask
The last 3 commits explained: Having seen how Tesseract can cope with transparency, but does it rather badly (by reducing to white instead of background), and others like Ocropy ignoring transparency completely (thereby reverting to the color information in the other channels outside the mask), I think we need both:
So the idea is to separate the
|
That is rather disappointing, right? Is there a way we can improve Tesseract's handling of transparency? Maybe via |
Not without changing the internal network topology and retraining. Transparency needs to be an extra input, or at least a "special colour". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM following #311 (comment)
Update Dockerfile, Makefile and GitHub action for Docker images
Workspace.image_from_*: when rotation is necessary, do not always fill with white; instead, determine the background color by median, and only use white for binary images; moreover, add a transparency
channel if the input mode allows it