Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introducing pixScaleBySamplingTopLeft, because sometimes we need (to visualize) pixel expansion from top/left rather than centre/centre. #677

Closed
wants to merge 1 commit into from

Conversation

GerHobbelt
Copy link
Contributor

@GerHobbelt GerHobbelt commented Mar 9, 2023

See the 'Dancing Troupe' comparative screenshots below for a use case.

Context / Background info

OK, same custom leptonica + tesseract + others rig as in the previous pullreq.

Here, tesseract is augmented to produce a HTML report, including leptonica-style PIX images, all kept in a PIXA list.
While producing the report, I inspect each PIX image collected in that list and blending it with the 'original input image' in such a way that the new PIX is the top layer, while the 'original input image' is used as a bottom layer (think Photoshop layers), where that bottom layer is tinged subdued red. The blend is a custom one (because I was unable to produce the same using one or only a few standard leptonica API calls; I blame my n00bness re leptonica usage); the visual end result is that where-ever the top layer is WHITE (or rather: "pretty bright"), the "original image, but red tinted", "shines through", so you can easily observe processing artifacts which you might not want or expect. (red was chosen as that is similar to what you see when working in Photoshop, using Quick Masks, etc.)

For this to work with my custom hand-written blender code, both layers must have the same dimensions, hence the top layer is scaled up to match the bottom layer, when this is necessary. So far, so good. There is a catch however:

Good: the new situation (using the new API pixScaleBySamplingTopLeft)

What we are looking at in the next screenshot is an extract of that custom tesseract diagnostic report with three images (note their reported width/height in pixels!)

The top image is the greyscaled "input" to a thresholding routine.

The second image is the thresholding mask produced by that routine: it's quite a bit smaller than the top input image, but I scale it to the same size as the original before rendering it as PNG, as part of the HTML output and for the reason described above: for this one, nothing is "shining through" but that's okay. What matters here, and WHY pixScaleBySamplingTopLeft is used and useful here is: I want (the user) to see how we got from top + middle (threshold mask) to the bottom one (third PIX) in one easy flow.

When you look at the bottom (third) image in the screenshot, you see "artifacts": the middle-bottom part is red-ish, because it got thresholded to white by way of those mask pixels, but that is probably undesirable as the original was pretty dark there, hence red-ish tint: the "locality" of the thresholding is failing us here. No problem (not for this pullreq anyway), but exactly the kind of thing I want to see: top + middle, mixed, produces bottom. 👍

We see the thresholding algorithm at work:

msedge_good_crop

Bad: the previous situation (using the existing API pixScaleBySampling)

This requires a bit of an explanation, but TL;DR: let me show you what that pixScaleBySampling API call delivered, all else exactly the same:

here the scaled mask image (the middle one) looks "sampled" (good), but looks like it's oddly shifted by half a pixel left/up: pay particular attention to the reported pixel dimensions and keep in mind: top image mixed with this (sampled by thresholding algorithm) middle image produces bottom image. That does not look right! 🤔 ❓ Making this visual a headscratcher. Which is "bad":

msedge_bad_crop

Bad (v0.0.1) or how we got to use pixScaleBySampling in the first place: pixScale

Please remember: I'm a leptonica n00b, so I do my doc read, I do some RTFC (because the docs don't always link up properly with my brain), I do some grep, and hope to not embarrass myself too much. 😉

This is what I got when I had "something working for the first time": I had decided pixScale was probably the initially-sanest answer to my quest re scaling up small images to "original image size": note those three reported images' width/height pixel dimensions in the screenshot again; nothing has changed, only this time I call pixScale as I did initially:

Please remember the goal here is to visualize: top mixed with middle (threshold level mask) produces bottom result somehow. Thanks to pixScale, the middle image gets scaled in a rather smooth fashion and to my dotard brain, as a user of those images I can't comprehend how the heck top + middle mixed makes bottom one: 😕

msedge_bad-old_crop

Which is the reason why I dug into leptonica some more after this first attempt and dove up pixScaleBySampling because I was looking for an upscaling that specifically DID NOT interpolate / smooth / otherwise-mix adjacent source pixels in the scaling up as the artifacts I was wondering about in the bottom image needed some explanation and my guess was that the crazy stuff I saw in the bottom ones (this "Dancing Troupe" is only one example and certainly not the "weirdest" of the bunch!) is possibly due to the sampling method used by the alleged thresholding algorithm.

Hence pixScaleBySampling popping up as prime candidate. 👍

Which got me another WTF as I was now looking at some oddly-overlarge bottom pixel row in the mask and the next think was: why is this middle one suddenly shifted when I call pixScaleBySampling? What am I doing wrong?!

Which took some time, but landed me at "I don't know, but when I do this (pixScaleBySamplingTopLeft), my expectations match application output reality. ... 🤔 and maybe file a pullreq if a second RTFC doesn't tell me I've redone something that's already present anyway.

Sorry, couldn't find what I hoped to find, so here is pixScaleBySampling, duplicated and then corrected for my use case by dropping the two + 0.5 pixel position calculus expressions: that is the entire difference between pixScaleBySamplingTopLeft and pixScaleBySampling.


The End

I hope I didn't embarrass myself by completely overlooking leptonica API XYZ. 😅

PS: and that "Otsu thresholding algorithm"?! Code or it didn't happen!

std::tuple<bool, Image, Image, Image> ImageThresholder::Threshold(
                                                      ThresholdMethod method) {
  Image pix_binary = nullptr;
  Image pix_thresholds = nullptr;

  // ### useless/irrelevant code: *snip*

  auto pix_grey = GetPixRectGrey();

  int r = 0;
  l_int32 threshold_val = 0;

  l_int32 pix_w, pix_h;
  pixGetDimensions(pix_ /* pix_grey */, &pix_w, &pix_h, nullptr);

  if (tesseract_->thresholding_debug) {
    tprintf("\nimage width: {}  height: {}  ppi: {}\n", pix_w, pix_h, yres_);
  }

  if (method == ThresholdMethod::Sauvola) {
  // ### useless/irrelevant code: *snip*
  } else if (method == ThresholdMethod::LeptonicaOtsu) {
    int tile_size;
    double tile_size_factor = tesseract_->thresholding_tile_size;
    tile_size = tile_size_factor * yres_;
    tile_size = std::max(16, tile_size);

    int smooth_size;
    double smooth_size_factor = tesseract_->thresholding_smooth_kernel_size;
    smooth_size_factor = std::max(0.0, smooth_size_factor);
    smooth_size = smooth_size_factor * yres_;
    int half_smooth_size = smooth_size / 2;

    double score_fraction = tesseract_->thresholding_score_fraction;

    if (tesseract_->thresholding_debug) {
      tprintf("tile size: {}  smooth_size: {}  score_fraction: {}\n", tile_size, smooth_size, score_fraction);
    }

    // ### TADA! It wasn't me!   ;-)
    r = pixOtsuAdaptiveThreshold(pix_grey, tile_size, tile_size,
                                 half_smooth_size, half_smooth_size,
                                 score_fraction,
                                 (PIX **)pix_thresholds,
                                 (PIX **)pix_binary);
  } else if (method == ThresholdMethod::Nlbin) {
  // ### useless/irrelevant code: *snip*
  } else {
    // Unsupported threshold method.
    r = 1;
  }

  bool ok = (r == 0) && pix_binary;
  return std::make_tuple(ok, pix_grey, pix_binary, pix_thresholds);
}

... and then, through magic, that tuple lands in a PIXA, which I take to make an HTML, and we get the story above.

PPS: 🤔 hm, that half-pixel-to-the-top-left-SHIFT is everywhere?

Now that I write this pullreq, only now do I notice that that half-pixel SHIFT is also already present in the pixScale scaled-up output image. If you know -- with 20:20 hindsight -- what to look for: see the image dimensions, where the height is 3 pixels, and notice again that the 'linearly smoothed' pixScale-produced image also has a 'thinner' top row vs. a 'over-thick' bottom row, exactly like pixScaleBySampling, where it was so obviously visible: last screenshot repeated here for convenience: check the middle image (pixScale output):

msedge_bad-old_crop

Is this what we (you?) want, by design? Or is this an artifact that nobody's noticed up to now? Or... (fill in the blanks; n00b may be completely off his rocker.) ❓ 🤔


Thanks for a very nice library; all misunderstandings/incomprehensions are mine.
Compare this to RTFC-ing OpenCV, for example, and I know why I've been, äh, "ambivalent" about working with & on that one, despite the lure of some desirable magic tech in there.
At least I can grok this leptonica code and get results I want in a couple of weeks and still gaining speed (of coding). 👍

…visualize) pixel expansion from top/left rather than centre/centre. See the 'Dancing Troupe' comparative screenshots reported with issue #xxxxx, for a use case.
@DanBloomberg
Copy link
Owner

A few comments.

(1) I don't see a visual difference in the binarized #6 images for the different cases. And I wouldn't expect it if you're just changing the sampling location within the input pixel array.

(2) pixScaleBySampling() is very fast and crude, with serious aliasing problems when downscaling. Use of low-pass filtered downscaling and interpolated upscaling functions is recommended for many situation.

(3) We wouldn't want to add functions to the library for small changes like yours.

(4) Otsu is a global thresholding function. Adaptive thresholding requires more computation but is often much better, and I use it whenever the output binarization quality is important.

(5) I appreciate that you're reading the code carefully and trying to make sense of it. Leptonica is a big library and it can be quite hard to know how best to use it. Some people have found this to be useful at the very highest level:
http://www.leptonica.org/highlevel.html

@GerHobbelt
Copy link
Contributor Author

GerHobbelt commented Mar 9, 2023 via email

@DanBloomberg
Copy link
Owner

Your point about the mistake in adding 0.5 for rounding in the scaling functions is valid. I should not have done that. The effect is typically minor, but in your case, where the source image is very small and the scaling factor is large, it can cause problems.

I have fixed the problem by modifying some of the sampling functions in scale1.c. You will need to call pixScaleBySamplingWithShift() or pixScaleBinaryWithShift(). Please download from head and see if this fixes your issue.

@GerHobbelt
Copy link
Contributor Author

I just noticed your commit SHA-1: f068b48 when I did a quick pull & check as I got home late.

I think this is what I need, but I'd like to check my outputs to make sure. Can do that later tomorrow (it's late and I'm in Amsterdam timezone: 0230 hours right now); will report back tomorrow or saturday, at the latest. Intuition right now is that this change removes the need for this pullreq completely, but I am not 100% sure (late, tired).

Thank you for the quick work; I'll try to report back as soon as I can.

GerHobbelt added a commit to GerHobbelt/tesseract that referenced this pull request Mar 11, 2023
@GerHobbelt
Copy link
Contributor Author

Tested your latest code: PASS! 👍

Code snippet from my tesseract diagnostics output rendering code:

    else {
      int ow, oh, od;
      pixGetDimensions(original_image, &ow, &oh, &od);

      Image toplayer = pixConvertTo32(pix);
      Image botlayer = pixConvertTo32(original_image);

      if (w != ow || h != oh)
      {
        // smaller images are generally masks, etc. and we DO NOT want to be confused by the smoothness
        // introduced by regular scaling, so we apply brutal sampled scale then:
        if (w < ow && h < oh) {
          toplayer = pixScaleBySamplingWithShift(toplayer, ow * 1.0f / w, oh * 1.0f / h, 0.0f, 0.0f);
        }
        else if (w > ow && h > oh) {
          // the new image has been either scaled up vs. the original OR a border was added (TODO)
          //
          // for now, we simply apply regular smooth scaling
          toplayer = pixScale(toplayer, ow * 1.0f / w, oh * 1.0f / h);
        }
        else {
          // non-uniform scaling...
          ASSERT0(!"Should never get here! Non-uniform scaling of images collected in DebugPixa!");
          toplayer = pixScale(toplayer, ow * 1.0f / w, oh * 1.0f / h);
        }
      }

      auto datas = pixGetData(toplayer);
      auto datad = pixGetData(botlayer);
      auto wpls = pixGetWpl(toplayer);
      auto wpld = pixGetWpl(botlayer);
      int i, j;
      for (i = 0; i < oh; i++) {
        auto lines = (datas + i * wpls);
        auto lined = (datad + i * wpld);
        for (j = 0; j < ow; j++) {
          // if top(SRC) is black, use that.
          // if top(SRC) is white, and bot(DST) isn't, color bot(DST) red and use that.
          // if top(SRC) is white, and bot(DST) is white, use white.

and screenshot of output HTML (which is the result of the above pixScaleBySamplingWithShift code, executed for the Otsu mask PIX in the diagnostics PIXA, plus some scaffolding):

msedge_q6EpBNMvi9

👍

@GerHobbelt
Copy link
Contributor Author

Which closes/obsoletes this pullreq AFAIAC.

By the way: thanks for the documentation link! And the quick response + resolution, of course 😄

@GerHobbelt GerHobbelt closed this Mar 11, 2023
@DanBloomberg
Copy link
Owner

Great. Thanks for bringing up the issue in detail.

Dan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants