Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility issue with read_region #51

Closed
nickwww opened this issue Dec 9, 2022 · 5 comments · Fixed by #62
Closed

Compatibility issue with read_region #51

nickwww opened this issue Dec 9, 2022 · 5 comments · Fixed by #62
Assignees
Labels
compatibility OpenSlide compatibility related

Comments

@nickwww
Copy link

nickwww commented Dec 9, 2022

First of all, thanks for creating tiffslide!

When using it as a drop-in-replacement, we noticed that read_region returns slightly different regions for levels other than the base level. But this does not apply in general, for example it works for all levels and regions if the region starts with the location (0,0).

When we looked into the code we noticed two differences to openslide, which could be the reason:

Downsample factors

The downsample factors are calculated in different ways, compare
https://github.com/openslide/openslide/blob/v3.4.1/src/openslide.c#L271

l->downsample =
        (((double) blh / (double) l->h) +
         ((double) blw / (double) l->w)) / 2.0;
    }

and
https://github.com/bayer-science-for-a-better-life/tiffslide/blob/v1.10.0/tiffslide/tiffslide.py#L228

math.sqrt((w0 * h0) / (w * h))

This results in values that are slightly different, e.g. for CMU-1.svs it's
(1.0, 4.000121534371467, 16.00048613748587)(tiffslide) instead of
(1.0, 4.000121536217793, 16.00048614487117)(openslide).

But the difference is really small and probably has no major impact.

Coordinate conversion in read_region

In read_region openslide uses the mentioned downsample factors to get from the location on the base level to the wanted level, especially the same downsample factor is used for x and y direction. Tiffslide calculates the location independently for x and y and uses the ratio of the width or height at the specific level and the base level, compare

https://github.com/bayer-science-for-a-better-life/tiffslide/blob/v1.10.0/tiffslide/tiffslide.py#L354

rx0 = (base_x * level_w) // base_w
ry0 = (base_y * level_h) // base_h

and openslide
https://github.com/openslide/openslide/blob/v3.4.1/src/openslide-vendor-aperio.c#L259

x / l->base.downsample
y / l->base.downsample

This can result in a different region, e.g. getting region starting at base level location(100, 100) for level 1 results in different starting locations at level 1, again for example image CMU-1.svs (with level dimensions: (46000, 32914), (11500, 8228), (2875, 2057) and mentioned downsample factors):

  • tiffslide: (25, 24)
  • openslide: (24, 24)

Do you think adjustments in read_region makes sense? I can understand if you are cautious with the change, because tiffslide in itself then loses compatibility, on the other hand the consistency with the openslide would be desirable.

I don't think there is a right solution here unfortunately, because we mostly don't know how the downsampling has been done exactly, but I would assume that in many cases the same factor has been used for x and y direction and that the pixels should remain square and the image has rather been cut off or expanded somewhere.

@ap--
Copy link
Collaborator

ap-- commented Dec 9, 2022

Hi @nickwww

Thanks for reporting the issue and thanks for all the detail you've added to the bug report.
I am definitely interested in fixing this.

Would you be interested in creating a pull request, that adds test cases that reproduce your issue? It'll be easier to reason about potential impact of the fix.

Cheers,
Andreas 😃

@ap-- ap-- added the compatibility OpenSlide compatibility related label Dec 9, 2022
ap-- added a commit that referenced this issue Jan 10, 2023
This adjusts the level downsamples to be identical to openslide's.
Partially addresses #51.
@ap-- ap-- mentioned this issue Jan 10, 2023
2 tasks
@ap--
Copy link
Collaborator

ap-- commented Jan 10, 2023

Okay, so I think I've reached a conclusion here and will try to close this in the next week or so.

If I'm not mistaken, Openslide's read_region aligns output tiles according to location in level 0 and handles subpixel translation for the intermediate levels, while tiffslide only returns actual pixel data available in the level stored in the file.

When reading tiles, at intermediate levels it's obvious that openslide tiles are slightly blurry compared to tiffslide tiles at subpixel offsets. In the below example the output of tiffslide from level 1 will only change for every 4th increase in loc x (level=0).

I prefer this behavior since it provides the raw data by default without any smoothing applied. I believe this is preferable in a ML context, where further augmentations are commonly applied in addition.

import tiffslide
import openslide
from pathlib import Path
from PIL import Image
import numpy as np

fn = Path("/Users/poehlmann/benchmark/Aperio/CMU-1.svs")

t_slide = tiffslide.open_slide(fn)
o_slide = openslide.open_slide(fn)

dbg = Path("/Users/poehlmann/dbg/_test")
dbg.mkdir(exist_ok=True)

def save(arr, name):
    Image.fromarray(arr).save(dbg.joinpath(name))

def draw(a0s, a1s, border=10, gap_h=20, gap_v=10):
    (height, width, depth), = set(a.shape for a in a0s + a1s)
    assert len(a0s) == len(a1s)
    n_v = len(a0s)
    h = height * n_v + gap_v * n_v + 2 * border
    w = width * 2 + gap_h + 2 * border
    canvas = np.zeros((h, w, depth), dtype=np.uint8)

    for i, (a0, a1) in enumerate(zip(a0s, a1s)):
        y0 = border + i * (height + gap_v)
        x0 = border
        x1 = border + width + gap_h
        canvas[y0:y0+height, x0:x0+width, :] = a0
        canvas[y0:y0+height, x1:x1+width, :] = a1

    return canvas


size = (400, 10)
aaa = [
    (
        t_slide.read_region((x, 22000), 1, size, as_array=True),
        np.array(o_slide.read_region((x, 22000), 1, size))[:, :, :3],
    )
    for x in range(2000, 2016)
]
a0s, a1s = zip(*aaa)

save(draw(a0s, a1s, gap_v=4), f"cmu1-lvl1-difference.png")

Not sure if it's obvious, but on the left are image slices taken with tiffslide and on the right with openslide:

cmu1-lvl1-difference

openslide's read_region will move smoothly, while tiffslide's tiles will move every 4th row.
I recommend playing around with the script to get some intuition on the differences.



What I will implement

I will fix the scaling discrepancy between x and y coordinates, but might not implement it the way openslide does.

I will also add more documentation to explain minor differences in output at non-base levels.

If there's interest in full subpixel translation of tiles (which might be of interest in a visualization context) I am happy to accept pull requests.

Cheers,
Andreas 😃

@nickwww
Copy link
Author

nickwww commented Jan 12, 2023

Thanks for looking into this! Interesting that openslide actually interpolates and does not round the location again in the last step. I agree that it makes more sense to return the data directly, as tiffslide does. It's also a pity that read_region rounds the pixel location for level 0 first, so you can't get around the interpolation with a clever choice of location either.

Thanks for the script, I'll have a look and also try to figure out implications for creating annotations on one level and transfering/showing it on another level.

@nickwww nickwww closed this as completed Jan 12, 2023
@nickwww nickwww reopened this Jan 12, 2023
@464hee
Copy link

464hee commented Jan 16, 2023

This discovery is very interesting. My current deep learning prediction also uses tiffslide's read_ Region, I don't know how much it will affect training

@ap-- ap-- self-assigned this Jan 16, 2023
@ap--
Copy link
Collaborator

ap-- commented Feb 9, 2023

Okay, so v2.0.0 is out with identical downsamples to openslide, which is the breaking change I had to make.

I'll keep this issue open, until I fix scaling to be identical for intermediate levels on both axis. Should be in the next minor release, coming soon.

@ap-- ap-- linked a pull request Feb 9, 2023 that will close this issue
@ap-- ap-- closed this as completed in #62 Feb 13, 2023
@ap-- ap-- mentioned this issue Feb 13, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility OpenSlide compatibility related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants