Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit bounds #146

Merged
merged 23 commits into from
Mar 20, 2023
Merged

Limit bounds #146

merged 23 commits into from
Mar 20, 2023

Conversation

jonasteuwen
Copy link
Contributor

If the image provides bounds for the image region, these were currently not processed. Adding limit_bounds to the constructor of the dataset will do this.

Copy link
Contributor

@YoniSchirris YoniSchirris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine.

but i'm not seeing any tests. shouldn't we try to add tests to new functionalities?

many changes to the polygon functionality that seem unrelated to this PR and issue though.

dlup/_image.py Outdated Show resolved Hide resolved
dlup/cli/mask.py Outdated Show resolved Hide resolved
dlup/data/dataset.py Show resolved Hide resolved
dlup/data/dataset.py Outdated Show resolved Hide resolved
dlup/data/dataset.py Outdated Show resolved Hide resolved
dlup/experimental_backends/openslide_backend.py Outdated Show resolved Hide resolved
dlup/experimental_backends/pyvips_backend.py Show resolved Hide resolved
dlup/experimental_backends/pyvips_backend.py Show resolved Hide resolved
dlup/utils/mask.py Outdated Show resolved Hide resolved
dlup/utils/mask.py Outdated Show resolved Hide resolved
@YoniSchirris
Copy link
Contributor

So when the bounds are available (to my knowledge only with latest software of 3dhistech p1000 scanners), we likely want to use these bounds for any further processing.

We have multiple use-cases for dlup to be used:

  • extract tiles and save to disk
  • create TiledROIsSlideImageDataset for either feature extraction, patch-level classification, or WSI-level classification
  • read specific (annotated) regions and prepare those for training/inference

I think that generally these bounds will be used if there is no explicit ROI annotation. Which happens in the case of

  • inference on an entirely new slide
  • training WSI-level classification methods on unannotated slides

an explicit ROI "annotation" can be generated with e.g. the fesi or improved_fesi tissue segmentation algorithms as implemented in dlup.background, but these and other segmentatino methods may fail at the borders of the glass slide. It is then generally still preferable to first use the bounds, and only then do tissue segmentation within these bounds.

The current implementation allows to create a tiled dataset from the bounds, and if one already has a tissue mask this can be combined with it (either as a bounding box around the tissue mask, or by just passing the tissue mask), but it is not yet possible to compute a tissue mask within the bounds without additional custom steps.

So likely we want to add this to dlup.background.get_mask on

mask = mask_func(np.asarray(slide.get_thumbnail(size=(size, size))))
, such that it computes the mask only within the bounds, but returns an array of the size of the full SlideImage at mpp=32 so that it can be passed into the TiledROIsSlideImageDataset instantiation.

I think any other pre- or postprocessing steps with a slide with bounds will need to be implemented by the user.

dlup/data/dataset.py Outdated Show resolved Hide resolved
@YoniSchirris
Copy link
Contributor

After discussing with @jonasteuwen , we decided that this feature is intended to give the user easy access to the bounds, and will be added automatically to TiledROISlideImageDataset.from_standard_tiling, but nowhere else. It is up to the user to use this in other use cases.

For now, due to #151 this feature can not be used with ImageBackend.AUTODETECT, and it will throw a warning accordingly (thought it will still break because the opened file context can't be properly closed).

I ran the following "test":

from functools import partial
from dlup.data.dataset import TiledROIsSlideImageDataset
from dlup.experimental_backends import ImageBackend
from dlup import SlideImage
from pathlib import Path
from PIL import Image
import matplotlib.pyplot as plt 
import numpy as np
from PIL import ImageDraw

def main():
    mrxs_glob = "*/*/*.mrxs"
    MRXS_ROOT_FOLDER = "" # */*/*.mrxs
    TCGA_ROOT_FOLDER = ""  # */*.svs

    matador_paths = [path for path in Path(MRXS_ROOT_FOLDER).glob(mrxs_glob)][:5]
    tcga_paths = [path for path in Path(TCGA_ROOT_FOLDER).glob('*/*.svs')][:5]

    for slide_path in mrxs_paths + tcga_paths:        
        print(f"============================\n   Slide: {slide_path.stem}")
        for backend in [ImageBackend.OPENSLIDE, ImageBackend.PYVIPS]:
            # ImageBackend.AUTODETECT fails
            # ImageBackend.TIFFILE fails because we're not reading TIFFILEs.
            print(f"-------\n  Using backend: {backend}")
            slide = SlideImage.from_file_path(slide_path, backend=backend)
            print(f"Slide bounds: {slide.slide_bounds}")
            print(f"Full slide size: {slide.size}")
            ds = partial(TiledROIsSlideImageDataset.from_standard_tiling, path=slide_path, mpp=100, tile_size=(10, 10), tile_overlap=(0, 0), backend=backend)
            ds_limit_bounds = ds(limit_bounds=True)
            ds_no_limit_bounds = ds(limit_bounds=False)
            print(f"Len of tiled dataset without limit bounds: {len(ds_no_limit_bounds)}")
            print(f"Len of tiled dataset with limit bounds: {len(ds_limit_bounds)}")
            scaled_region_view = slide.get_scaled_view(slide.get_scaling(100))

            for limit_bounds in [True, False]:
                ds2 = ds(limit_bounds=limit_bounds)
                background = Image.new("RGBA", tuple(scaled_region_view.size), (255, 255, 255, 255))
                for d in ds2:
                    tile = d["image"]
                    coords = np.array(d["coordinates"])
                    box = tuple(np.array((*coords, *(coords + 10))).astype(int))
                    background.paste(tile, box)
                    draw = ImageDraw.Draw(background)
                    draw.rectangle(box, outline="red")
                background.save(f"{limit_bounds}_{str(backend).replace('.', '')}_{str(slide_path.stem).replace(' ', '')}.png")
          
if __name__ == "__main__":
    main()

Which gives the following output, where we see that it works for both PYVIPS and OPENSLIDE backends, it works for both images that have the bounds in their metadata and that don't, and that the locations are properly found.

Images are not uploaded here since we do not have access to open-source mrxs files at the moment.

============================
   Slide: Extern: T12-94006 a1 HENKI
-------
  Using backend: ImageBackend.OPENSLIDE
Slide bounds: ((11068, 24892), (99871, 199322))
Full slide size: (271950, 294038)
Len of tiled dataset without limit bounds: 4752
Len of tiled dataset with limit bounds: 1225
-------
  Using backend: ImageBackend.PYVIPS
Slide bounds: ((11068, 24892), (99871, 199322))
Full slide size: (271950, 294038)
Len of tiled dataset without limit bounds: 4752
Len of tiled dataset with limit bounds: 1225
============================
   Slide: Extern: T13-94161 a2 HENKI
-------
  Using backend: ImageBackend.OPENSLIDE
Slide bounds: ((166140, 25069), (97282, 197106))
Full slide size: (271950, 294038)
Len of tiled dataset without limit bounds: 4752
Len of tiled dataset with limit bounds: 1152
-------
  Using backend: ImageBackend.PYVIPS
Slide bounds: ((166140, 25069), (97282, 197106))
Full slide size: (271950, 294038)
Len of tiled dataset without limit bounds: 4752
Len of tiled dataset with limit bounds: 1152
============================
   Slide: Extern: T13-94169 a1 HENKI
-------
  Using backend: ImageBackend.OPENSLIDE
Slide bounds: ((166078, 24910), (97281, 197018))
Full slide size: (271950, 294038)
Len of tiled dataset without limit bounds: 4752
Len of tiled dataset with limit bounds: 1152
-------
  Using backend: ImageBackend.PYVIPS
Slide bounds: ((166078, 24910), (97281, 197018))
Full slide size: (271950, 294038)
Len of tiled dataset without limit bounds: 4752
Len of tiled dataset with limit bounds: 1152
============================
   Slide: Extern: T13-94035 b1 HENKI
-------
  Using backend: ImageBackend.OPENSLIDE
Slide bounds: ((10982, 24891), (99972, 201144))
Full slide size: (271950, 294038)
Len of tiled dataset without limit bounds: 4752
Len of tiled dataset with limit bounds: 1225
-------
  Using backend: ImageBackend.PYVIPS
Slide bounds: ((10982, 24891), (99972, 201144))
Full slide size: (271950, 294038)
Len of tiled dataset without limit bounds: 4752
Len of tiled dataset with limit bounds: 1225
============================
   Slide: Extern: T09-94048 a1 HENKI
-------
  Using backend: ImageBackend.OPENSLIDE
Slide bounds: ((10960, 24876), (99927, 134827))
Full slide size: (271950, 294038)
Len of tiled dataset without limit bounds: 4752
Len of tiled dataset with limit bounds: 825
-------
  Using backend: ImageBackend.PYVIPS
Slide bounds: ((10960, 24876), (99927, 134827))
Full slide size: (271950, 294038)
Len of tiled dataset without limit bounds: 4752
Len of tiled dataset with limit bounds: 825
============================
   Slide: TCGA-E2-A15H-01Z-00-DX1.E3A9DFDC-204D-4F03-98D9-97BBBB74E840
-------
  Using backend: ImageBackend.OPENSLIDE
Slide bounds: ((0, 0), (122808, 83672))
Full slide size: (122808, 83672)
Len of tiled dataset without limit bounds: 651
Len of tiled dataset with limit bounds: 651
-------
  Using backend: ImageBackend.PYVIPS
Slide bounds: ((0, 0), (122808, 83672))
Full slide size: (122808, 83672)
Len of tiled dataset without limit bounds: 651
Len of tiled dataset with limit bounds: 651
============================
   Slide: TCGA-S3-AA14-01Z-00-DX1.000A865F-19E6-4018-9352-BFA54EF0CE31
-------
  Using backend: ImageBackend.OPENSLIDE
Slide bounds: ((0, 0), (123503, 85526))
Full slide size: (123503, 85526)
Len of tiled dataset without limit bounds: 704
Len of tiled dataset with limit bounds: 704
-------
  Using backend: ImageBackend.PYVIPS
Slide bounds: ((0, 0), (123503, 85526))
Full slide size: (123503, 85526)
Len of tiled dataset without limit bounds: 704
Len of tiled dataset with limit bounds: 704
============================
   Slide: TCGA-A2-A0CT-01Z-00-DX1.A8564130-49CF-4F5B-B5AB-F4D1A10479FF
-------
  Using backend: ImageBackend.OPENSLIDE
Slide bounds: ((0, 0), (106292, 84971))
Full slide size: (106292, 84971)
Len of tiled dataset without limit bounds: 594
Len of tiled dataset with limit bounds: 594
-------
  Using backend: ImageBackend.PYVIPS
Slide bounds: ((0, 0), (106292, 84971))
Full slide size: (106292, 84971)
Len of tiled dataset without limit bounds: 594
Len of tiled dataset with limit bounds: 594
============================
   Slide: TCGA-OL-A5D8-01Z-00-DX1.C0A75731-1DDC-4FAF-A2C5-2E2ECB23DC13
-------
  Using backend: ImageBackend.OPENSLIDE
Slide bounds: ((0, 0), (169456, 72385))
Full slide size: (169456, 72385)
Len of tiled dataset without limit bounds: 756
Len of tiled dataset with limit bounds: 756
-------
  Using backend: ImageBackend.PYVIPS
Slide bounds: ((0, 0), (169456, 72385))
Full slide size: (169456, 72385)
Len of tiled dataset without limit bounds: 756
Len of tiled dataset with limit bounds: 756
============================
   Slide: TCGA-A2-A1FZ-01Z-00-DX1.0BAAEF41-DCA4-4677-9A27-09E990033FA6
-------
  Using backend: ImageBackend.OPENSLIDE
Slide bounds: ((0, 0), (114910, 72168))
Full slide size: (114910, 72168)
Len of tiled dataset without limit bounds: 522
Len of tiled dataset with limit bounds: 522
-------
  Using backend: ImageBackend.PYVIPS
Slide bounds: ((0, 0), (114910, 72168))
Full slide size: (114910, 72168)
Len of tiled dataset without limit bounds: 522
Len of tiled dataset with limit bounds: 522

YoniSchirris
YoniSchirris previously approved these changes Mar 15, 2023
Copy link
Contributor

@YoniSchirris YoniSchirris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the tests shown above I believe that the current implementation works as expected.

@jonasteuwen
Copy link
Contributor Author

Thanks for the test. I don't think we can add it as such in dlup as it requires access to data. Ideally we do this with a 'mock'.

@jonasteuwen jonasteuwen merged commit 8d9b603 into main Mar 20, 2023
@jonasteuwen jonasteuwen deleted the limit-bounds branch March 20, 2023 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants