Compatibility issue with read_region #51

nickwww · 2022-12-09T11:22:30Z

First of all, thanks for creating tiffslide!

When using it as a drop-in-replacement, we noticed that read_region returns slightly different regions for levels other than the base level. But this does not apply in general, for example it works for all levels and regions if the region starts with the location (0,0).

When we looked into the code we noticed two differences to openslide, which could be the reason:

Downsample factors

The downsample factors are calculated in different ways, compare
https://github.com/openslide/openslide/blob/v3.4.1/src/openslide.c#L271

l->downsample =
        (((double) blh / (double) l->h) +
         ((double) blw / (double) l->w)) / 2.0;
    }

and
https://github.com/bayer-science-for-a-better-life/tiffslide/blob/v1.10.0/tiffslide/tiffslide.py#L228

math.sqrt((w0 * h0) / (w * h))

This results in values that are slightly different, e.g. for CMU-1.svs it's
(1.0, 4.000121534371467, 16.00048613748587)(tiffslide) instead of
(1.0, 4.000121536217793, 16.00048614487117)(openslide).

But the difference is really small and probably has no major impact.

Coordinate conversion in `read_region`

In read_region openslide uses the mentioned downsample factors to get from the location on the base level to the wanted level, especially the same downsample factor is used for x and y direction. Tiffslide calculates the location independently for x and y and uses the ratio of the width or height at the specific level and the base level, compare

https://github.com/bayer-science-for-a-better-life/tiffslide/blob/v1.10.0/tiffslide/tiffslide.py#L354

rx0 = (base_x * level_w) // base_w
ry0 = (base_y * level_h) // base_h

and openslide
https://github.com/openslide/openslide/blob/v3.4.1/src/openslide-vendor-aperio.c#L259

x / l->base.downsample
y / l->base.downsample

This can result in a different region, e.g. getting region starting at base level location(100, 100) for level 1 results in different starting locations at level 1, again for example image CMU-1.svs (with level dimensions: (46000, 32914), (11500, 8228), (2875, 2057) and mentioned downsample factors):

tiffslide: (25, 24)
openslide: (24, 24)

Do you think adjustments in read_region makes sense? I can understand if you are cautious with the change, because tiffslide in itself then loses compatibility, on the other hand the consistency with the openslide would be desirable.

I don't think there is a right solution here unfortunately, because we mostly don't know how the downsampling has been done exactly, but I would assume that in many cases the same factor has been used for x and y direction and that the pixels should remain square and the image has rather been cut off or expanded somewhere.

The text was updated successfully, but these errors were encountered:

ap-- · 2022-12-09T18:07:50Z

Hi @nickwww

Thanks for reporting the issue and thanks for all the detail you've added to the bug report.
I am definitely interested in fixing this.

Would you be interested in creating a pull request, that adds test cases that reproduce your issue? It'll be easier to reason about potential impact of the fix.

Cheers,
Andreas 😃

This adjusts the level downsamples to be identical to openslide's. Partially addresses #51.

ap-- · 2023-01-10T15:11:59Z

Okay, so I think I've reached a conclusion here and will try to close this in the next week or so.

If I'm not mistaken, Openslide's read_region aligns output tiles according to location in level 0 and handles subpixel translation for the intermediate levels, while tiffslide only returns actual pixel data available in the level stored in the file.

When reading tiles, at intermediate levels it's obvious that openslide tiles are slightly blurry compared to tiffslide tiles at subpixel offsets. In the below example the output of tiffslide from level 1 will only change for every 4th increase in loc x (level=0).

I prefer this behavior since it provides the raw data by default without any smoothing applied. I believe this is preferable in a ML context, where further augmentations are commonly applied in addition.

import tiffslide
import openslide
from pathlib import Path
from PIL import Image
import numpy as np

fn = Path("/Users/poehlmann/benchmark/Aperio/CMU-1.svs")

t_slide = tiffslide.open_slide(fn)
o_slide = openslide.open_slide(fn)

dbg = Path("/Users/poehlmann/dbg/_test")
dbg.mkdir(exist_ok=True)

def save(arr, name):
    Image.fromarray(arr).save(dbg.joinpath(name))

def draw(a0s, a1s, border=10, gap_h=20, gap_v=10):
    (height, width, depth), = set(a.shape for a in a0s + a1s)
    assert len(a0s) == len(a1s)
    n_v = len(a0s)
    h = height * n_v + gap_v * n_v + 2 * border
    w = width * 2 + gap_h + 2 * border
    canvas = np.zeros((h, w, depth), dtype=np.uint8)

    for i, (a0, a1) in enumerate(zip(a0s, a1s)):
        y0 = border + i * (height + gap_v)
        x0 = border
        x1 = border + width + gap_h
        canvas[y0:y0+height, x0:x0+width, :] = a0
        canvas[y0:y0+height, x1:x1+width, :] = a1

    return canvas


size = (400, 10)
aaa = [
    (
        t_slide.read_region((x, 22000), 1, size, as_array=True),
        np.array(o_slide.read_region((x, 22000), 1, size))[:, :, :3],
    )
    for x in range(2000, 2016)
]
a0s, a1s = zip(*aaa)

save(draw(a0s, a1s, gap_v=4), f"cmu1-lvl1-difference.png")

Not sure if it's obvious, but on the left are image slices taken with tiffslide and on the right with openslide:

openslide's read_region will move smoothly, while tiffslide's tiles will move every 4th row.
I recommend playing around with the script to get some intuition on the differences.

What I will implement

I will fix the scaling discrepancy between x and y coordinates, but might not implement it the way openslide does.

I will also add more documentation to explain minor differences in output at non-base levels.

If there's interest in full subpixel translation of tiles (which might be of interest in a visualization context) I am happy to accept pull requests.

Cheers,
Andreas 😃

nickwww · 2023-01-12T09:47:42Z

Thanks for looking into this! Interesting that openslide actually interpolates and does not round the location again in the last step. I agree that it makes more sense to return the data directly, as tiffslide does. It's also a pity that read_region rounds the pixel location for level 0 first, so you can't get around the interpolation with a clever choice of location either.

Thanks for the script, I'll have a look and also try to figure out implications for creating annotations on one level and transfering/showing it on another level.

464hee · 2023-01-16T06:43:05Z

This discovery is very interesting. My current deep learning prediction also uses tiffslide's read_ Region, I don't know how much it will affect training

ap-- · 2023-02-09T00:35:25Z

Okay, so v2.0.0 is out with identical downsamples to openslide, which is the breaking change I had to make.

I'll keep this issue open, until I fix scaling to be identical for intermediate levels on both axis. Should be in the next minor release, coming soon.

ap-- added the compatibility OpenSlide compatibility related label Dec 9, 2022

ap-- added a commit that referenced this issue Jan 10, 2023

tiffslide: fix downsamples compatibility

7e95e37

This adjusts the level downsamples to be identical to openslide's. Partially addresses #51.

ap-- mentioned this issue Jan 10, 2023

correct downsample #56

Merged

2 tasks

nickwww closed this as completed Jan 12, 2023

nickwww reopened this Jan 12, 2023

ap-- self-assigned this Jan 16, 2023

ap-- linked a pull request Feb 9, 2023 that will close this issue

Uniform axis scaling on intermediate lvl #62

Merged

ap-- closed this as completed in #62 Feb 13, 2023

ap-- mentioned this issue Feb 13, 2023

round downsamples ?!? #63

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility issue with read_region #51

Compatibility issue with read_region #51

nickwww commented Dec 9, 2022 •

edited

Loading

ap-- commented Dec 9, 2022

ap-- commented Jan 10, 2023 •

edited

Loading

nickwww commented Jan 12, 2023

464hee commented Jan 16, 2023

ap-- commented Feb 9, 2023

Compatibility issue with read_region #51

Compatibility issue with read_region #51

Comments

nickwww commented Dec 9, 2022 • edited Loading

Downsample factors

Coordinate conversion in read_region

ap-- commented Dec 9, 2022

ap-- commented Jan 10, 2023 • edited Loading

What I will implement

nickwww commented Jan 12, 2023

464hee commented Jan 16, 2023

ap-- commented Feb 9, 2023

nickwww commented Dec 9, 2022 •

edited

Loading

Coordinate conversion in `read_region`

ap-- commented Jan 10, 2023 •

edited

Loading