-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introducing pixScaleBySamplingTopLeft, because sometimes we need (to visualize) pixel expansion from top/left rather than centre/centre. #677
Conversation
…visualize) pixel expansion from top/left rather than centre/centre. See the 'Dancing Troupe' comparative screenshots reported with issue #xxxxx, for a use case.
A few comments. (1) I don't see a visual difference in the binarized #6 images for the different cases. And I wouldn't expect it if you're just changing the sampling location within the input pixel array. (2) pixScaleBySampling() is very fast and crude, with serious aliasing problems when downscaling. Use of low-pass filtered downscaling and interpolated upscaling functions is recommended for many situation. (3) We wouldn't want to add functions to the library for small changes like yours. (4) Otsu is a global thresholding function. Adaptive thresholding requires more computation but is often much better, and I use it whenever the output binarization quality is important. (5) I appreciate that you're reading the code carefully and trying to make sense of it. Leptonica is a big library and it can be quite hard to know how best to use it. Some people have found this to be useful at the very highest level: |
AFK, apologies.
re your point 1: what I didn't explain properly in the context section is
this:
- nothing in the actual image processing pipeline changed, so same result
in all scenarios.
- the "diagnostics output" code is the place where that scaling happens,
taking the otsu produced tiny bitmap and scaling it up as part of the
diagnostics-output-only postprocessing (which includes other
postprocessing: the red-tinged original image mixed-in to help humans see
what has happened vs what they might think should-have-happened)
(I hope this makes sense)
what I saw, and why i need that new API, is that *apparently* the main
work\process otsu code (quoted in the pullreq message) does apply that tiny
bitmap this way and, because I am producing a "augmented" diagnostic image
sequence as an aside, need a scale method like that so I can show the state
of the components used in the main process, "pixel perfect", so a human can
easily look at it and "replay" or reason about what they see, in their head.
thanks for your other notes, I have to chew on them later, but the purpose
of this whole exercise is trying to make sense of existing code in
tesseract (mainline and dev branches\forks) while I'm trying to answer the
question: "why the hell is tesseract going ape on me?" and finding current
diagnostic assists (some rough & ready image dumping code) inadequate to
answer that question.
and ignore this next bit if you don't like it, but that last part I wrote
is where I notice, after readying the pr, that regular old-skool pixScale
also does this "illogical" scale-up where top left source pixels
effectively weigh half and bottom right edge pixels end up weighing in 1.5
in area-of-influence. what I call "shifted". because the scaling code does
a "center of gravity of source pixel is in its center" mathematician
mindset scaling, which works, theoretically, but is one of those dreaded
off-by-one traps: upscaling pixels ABC to, say, 4 times, is (pixScale
style): AABBBBCCCCCC (current output, smoothed of course), where it should
be AABBBBCC, ie output size is source size *4 MINUS 1 (ie 4). the former is
what I see pixScale doing (output size = source * 4), while, if
centre\centre is what you want, it should be the latter. Or one has to
abandon centre\centre mindset and "move half a pixel".
which is what any "naive, fast" up sampling code out there does. and what
is observed as done by that otsu call. so I needed a scale call to match
that behaviour so I can have augmented diagnostics as a side channel.
'center is at top left of pixel' thinking turns ABC into (scale up by 4)
AAAABBBBCCCC. That is what I see happening, looking at those otsu inputs
and outputs (mask + result).
ok, sorry for my lack of using proper jargon, and this should be written up
separately because it probably addresses something that I suspect is
present in all(?) scaling as I see you call internals that do the real
scaling work.
... wondering now how I can make my point clear and easier to read\grok
(without spending a day on it; I try to be lazy. ;-). )
…On Thu, Mar 9, 2023, 06:41 Dan Bloomberg ***@***.***> wrote:
A few comments.
(1) I don't see a visual difference in the binarized #6
<#6> images for the
different cases. And I wouldn't expect it if you're just changing the
sampling location within the input pixel array.
(2) pixScaleBySampling() is very fast and crude, with serious aliasing
problems when downscaling. Use of low-pass filtered downscaling and
interpolated upscaling functions is recommended for many situation.
(3) We wouldn't want to add functions to the library for small changes
like yours.
(4) Otsu is a global thresholding function. Adaptive thresholding requires
more computation but is often much better, and I use it whenever the output
binarization quality is important.
(5) I appreciate that you're reading the code carefully and trying to make
sense of it. Leptonica is a big library and it can be quite hard to know
how best to use it. Some people have found this to be useful at the very
highest level:
http://www.leptonica.org/highlevel.html
—
Reply to this email directly, view it on GitHub
<#677 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADCIHQZE2GGO4I6Q2SK7STW3FUPJANCNFSM6AAAAAAVUNKE7U>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Your point about the mistake in adding 0.5 for rounding in the scaling functions is valid. I should not have done that. The effect is typically minor, but in your case, where the source image is very small and the scaling factor is large, it can cause problems. I have fixed the problem by modifying some of the sampling functions in scale1.c. You will need to call pixScaleBySamplingWithShift() or pixScaleBinaryWithShift(). Please download from head and see if this fixes your issue. |
I just noticed your commit SHA-1: f068b48 when I did a quick pull & check as I got home late. I think this is what I need, but I'd like to check my outputs to make sure. Can do that later tomorrow (it's late and I'm in Amsterdam timezone: 0230 hours right now); will report back tomorrow or saturday, at the latest. Intuition right now is that this change removes the need for this pullreq completely, but I am not 100% sure (late, tired). Thank you for the quick work; I'll try to report back as soon as I can. |
…leptonica codebase as per DanBloomberg/leptonica#677 conclusion.
Tested your latest code: PASS! 👍 Code snippet from my tesseract diagnostics output rendering code:
and screenshot of output HTML (which is the result of the above 👍 |
Which closes/obsoletes this pullreq AFAIAC. By the way: thanks for the documentation link! And the quick response + resolution, of course 😄 |
Great. Thanks for bringing up the issue in detail. Dan |
See the 'Dancing Troupe' comparative screenshots below for a use case.
Context / Background info
OK, same custom leptonica + tesseract + others rig as in the previous pullreq.
Here, tesseract is augmented to produce a HTML report, including leptonica-style
PIX
images, all kept in aPIXA
list.While producing the report, I inspect each
PIX
image collected in that list and blending it with the 'original input image' in such a way that the newPIX
is the top layer, while the 'original input image' is used as a bottom layer (think Photoshop layers), where that bottom layer is tinged subdued red. The blend is a custom one (because I was unable to produce the same using one or only a few standard leptonica API calls; I blame my n00bness re leptonica usage); the visual end result is that where-ever the top layer is WHITE (or rather: "pretty bright"), the "original image, but red tinted", "shines through", so you can easily observe processing artifacts which you might not want or expect. (red was chosen as that is similar to what you see when working in Photoshop, using Quick Masks, etc.)For this to work with my custom hand-written blender code, both layers must have the same dimensions, hence the top layer is scaled up to match the bottom layer, when this is necessary. So far, so good. There is a catch however:
Good: the new situation (using the new API
pixScaleBySamplingTopLeft
)What we are looking at in the next screenshot is an extract of that custom tesseract diagnostic report with three images (note their reported width/height in pixels!)
The top image is the greyscaled "input" to a thresholding routine.
The second image is the thresholding mask produced by that routine: it's quite a bit smaller than the top input image, but I scale it to the same size as the original before rendering it as PNG, as part of the HTML output and for the reason described above: for this one, nothing is "shining through" but that's okay. What matters here, and WHY
pixScaleBySamplingTopLeft
is used and useful here is: I want (the user) to see how we got from top + middle (threshold mask) to the bottom one (third PIX) in one easy flow.When you look at the bottom (third) image in the screenshot, you see "artifacts": the middle-bottom part is red-ish, because it got thresholded to white by way of those mask pixels, but that is probably undesirable as the original was pretty dark there, hence red-ish tint: the "locality" of the thresholding is failing us here. No problem (not for this pullreq anyway), but exactly the kind of thing I want to see: top + middle, mixed, produces bottom. 👍
We see the thresholding algorithm at work:
Bad: the previous situation (using the existing API
pixScaleBySampling
)This requires a bit of an explanation, but TL;DR: let me show you what that
pixScaleBySampling
API call delivered, all else exactly the same:here the scaled mask image (the middle one) looks "sampled" (good), but looks like it's oddly shifted by half a pixel left/up: pay particular attention to the reported pixel dimensions and keep in mind: top image mixed with this (sampled by thresholding algorithm) middle image produces bottom image. That does not look right! 🤔 ❓ Making this visual a headscratcher. Which is "bad":
Bad (v0.0.1) or how we got to use
pixScaleBySampling
in the first place:pixScale
Please remember: I'm a leptonica n00b, so I do my doc read, I do some RTFC (because the docs don't always link up properly with my brain), I do some
grep
, and hope to not embarrass myself too much. 😉This is what I got when I had "something working for the first time": I had decided
pixScale
was probably the initially-sanest answer to my quest re scaling up small images to "original image size": note those three reported images' width/height pixel dimensions in the screenshot again; nothing has changed, only this time I callpixScale
as I did initially:Please remember the goal here is to visualize: top mixed with middle (threshold level mask) produces bottom result somehow. Thanks to
pixScale
, the middle image gets scaled in a rather smooth fashion and to my dotard brain, as a user of those images I can't comprehend how the heck top + middle mixed makes bottom one: 😕Which is the reason why I dug into leptonica some more after this first attempt and dove up
pixScaleBySampling
because I was looking for an upscaling that specifically DID NOT interpolate / smooth / otherwise-mix adjacent source pixels in the scaling up as the artifacts I was wondering about in the bottom image needed some explanation and my guess was that the crazy stuff I saw in the bottom ones (this "Dancing Troupe" is only one example and certainly not the "weirdest" of the bunch!) is possibly due to the sampling method used by the alleged thresholding algorithm.Hence
pixScaleBySampling
popping up as prime candidate. 👍Which got me another WTF as I was now looking at some oddly-overlarge bottom pixel row in the mask and the next think was: why is this middle one suddenly shifted when I call
pixScaleBySampling
? What am I doing wrong?!Which took some time, but landed me at "I don't know, but when I do this (
pixScaleBySamplingTopLeft
), my expectations match application output reality. ... 🤔 and maybe file a pullreq if a second RTFC doesn't tell me I've redone something that's already present anyway.Sorry, couldn't find what I hoped to find, so here is
pixScaleBySampling
, duplicated and then corrected for my use case by dropping the two+ 0.5
pixel position calculus expressions: that is the entire difference betweenpixScaleBySamplingTopLeft
andpixScaleBySampling
.The End
I hope I didn't embarrass myself by completely overlooking leptonica API
XYZ
. 😅PS: and that "Otsu thresholding algorithm"?! Code or it didn't happen!
... and then, through magic, that tuple lands in a
PIXA
, which I take to make an HTML, and we get the story above.PPS: 🤔 hm, that half-pixel-to-the-top-left-SHIFT is everywhere?
Now that I write this pullreq, only now do I notice that that half-pixel SHIFT is also already present in the
pixScale
scaled-up output image. If you know -- with 20:20 hindsight -- what to look for: see the image dimensions, where the height is 3 pixels, and notice again that the 'linearly smoothed'pixScale
-produced image also has a 'thinner' top row vs. a 'over-thick' bottom row, exactly likepixScaleBySampling
, where it was so obviously visible: last screenshot repeated here for convenience: check the middle image (pixScale
output):Is this what we (you?) want, by design? Or is this an artifact that nobody's noticed up to now? Or... (fill in the blanks; n00b may be completely off his rocker.) ❓ 🤔
Thanks for a very nice library; all misunderstandings/incomprehensions are mine.
Compare this to RTFC-ing OpenCV, for example, and I know why I've been, äh, "ambivalent" about working with & on that one, despite the lure of some desirable magic tech in there.
At least I can grok this leptonica code and get results I want in a couple of weeks and still gaining speed (of coding). 👍