Experimental implementation of caching mechanism #66

jonasteuwen · 2022-01-18T20:07:48Z

Implemented a preliminary version of a caching algorithm.

Currently you need to do dlup wsi downsample <> with the same settings as what you want to downsample with. This will not be the final implementation, but I would like to hear opinions on the implementation.

- Add Writer, Add tests

lromor

Thanks for the PR. I think there's a lot going on (a bit too much). So I would suggest you to split the PR and add add the changes for the type aliases or other unrelated features to the caching mechanism.

The code is an interesting proposition. I like it but I maybe would change the user-experience. Mostly moving the burden of creating another SlideImage object into embedding this system inside the current SlideImage, so that is opaque. I would do that by passing a possibly "default" global ImageCacher object. We could even merge the openslide object caching with this so we centralize the cachine features but we keep the granularity at the slide level.

I have a few questions for you. When the cache is created in the filesystem who owns it? is it going to be automatically deleted? Can you pre-generate a cache and instruct the SlideImage to point at it? Maybe each of these could be configuration options for the ImageCacher.

What do you think?

lromor · 2022-01-20T13:13:45Z

.gitmodules

-[submodule "openslide-python"]
-	path = openslide-python
-	url = https://github.com/NKI-AI/openslide-python.git
+# [submodule "openslide"]


Maybe just remove it?

lromor · 2022-01-20T13:14:40Z

dlup/__init__.py

-from ._image import SlideImage
+
+
+from ._image import CachedSlideImage, SlideImage


I would probably include the caching as an optional feature of SlideImage. No need to create another class.

Yes, was to see how to do it. Will do in next version.

lromor · 2022-01-20T13:15:11Z

dlup/_cache.py

@@ -0,0 +1,219 @@
+# coding=utf-8
+# Copyright (c) dlup contributors
+import abc


consistently group and order the imports

The imports feel a bit broken

lromor · 2022-01-20T13:16:53Z

dlup/_cache.py

+    ) -> PIL.Image.Image:
+        """..."""
+
+    # @abc.abstractmethod


If it's commented remove it

lromor · 2022-01-20T13:18:25Z

dlup/_cache.py

+        )
+
+
+class AbstractImageCache(abc.ABC):


Add some docs documenting the intent of the abstract interface. I like the name, but maybe it's bit too generic. If I understood correctly the purpose maybe something like ScaleLevelCache?

lromor · 2022-01-20T13:49:43Z

dlup/_image.py

+from dlup.utils.imports import PYVIPS_AVAILABLE
+from dlup.utils.types import GenericFloatArray, GenericIntArray, GenericNumber, PathLike
+
+if PYVIPS_AVAILABLE:


Maybe we should move away from vips and do our own thing, less dependencies and tailored behavior for now it's fine, but I would keep libvips as a full-fledged dependency. For now we really need it for caching purposes.

lromor · 2022-01-20T13:51:34Z

dlup/_image.py

+AbstractSlide = Union[openslide.AbstractSlide]
+
+
+class SlideReaderBackend(Enum):


I don't see it being used anywhere, maybe delete it?

lromor · 2022-01-20T13:52:25Z

dlup/_image.py



 class _SlideImageRegionView(RegionView):
    """Represents an image view tied to a slide image."""

-    def __init__(self, wsi: _TSlideImage, scaling: _GenericNumber, boundary_mode: BoundaryMode = None):
+    def __init__(self, wsi: _TSlideImage, scaling: GenericNumber, boundary_mode: BoundaryMode = None):


This changes should stay on a different PR

lromor · 2022-01-20T13:54:07Z

dlup/_image.py

        """Returns the objective power at which the WSI was sampled."""
-        return int(self._openslide_wsi.properties[openslide.PROPERTY_NAME_OBJECTIVE_POWER])
+        try:


Maybe add a test to reflect this behavior so we remain consistent.

lromor · 2022-01-20T13:56:10Z

dlup/_image.py

+        # ImageCacher.__init__(self, original_filename=wsi._filename)
+        # print(ImageCacher)
+        self._cache_directory = None
+        self._cacher = ImageCacher(original_filename=wsi._filename)


I like the idea of the ImageCacher, I think we should add that to the SlideImage so a user can configure if and how the slide can be cached. It makes it easier for testing and also how granularly you can setup a filesystem cache.

lromor

I couldn't look at everything, the annotations part seems relatively tidy but I would prefer to review it on it on a separate occasion. Regarding the cache implementation I think it works but I would make the user-interface much simpler. Somehow I would completely remove the access of any caching-logic from the user and give it only the minimal amount of necessary knobs. This is to minimize confusion and misuse. I would just keep everything behind SlideImage and add options to that.

Maybe someone else could give feedback on how they would like to approach this issue. Do you have any suggestions of what you would like to see?

lromor · 2022-01-25T16:16:11Z

dlup/_cache.py

+        return slide_image.read_region(location, 1.0, size)
+
+    @property
+    def cache_lock(self):


If it's not needed no need to add it.

lromor · 2022-01-25T16:17:56Z

dlup/_cache.py

+        """Cache lock, is not needed for Tiffs (they cannot be written on the fly)."""
+        return None
+
+    def get_cache_for_mpp(self, mpp: float) -> SlideImage:


Retrieveg a cache object based on the mpp seems to be a brittle implementation due to floating point representation. What if the mpp is 0.3213231, what if it's 0.3213232?

Maybe we could add either an absolute error threshold or define a power of 2 mpp values.
For instance, we could support caching for mpps 1.0, 0.5, 0.25, etc.. or even make it more dense.

lromor · 2022-01-25T16:25:55Z

dlup/_cache.py

+            image.close()
+
+
+def create_tiff_cache(


Maybe we could encapsulate create_tiff_cache and get_cache_for_mpp behavior inside some other function. I don't see why a user should know about these two functions. Ideally a user should create a caching object, initialize it from a path of pre-stored caching tiffs, and that's it.

lromor · 2022-01-25T16:29:52Z

dlup/_cache.py

+    writer.from_iterator(_local_iterator(), filename, total=len(grid))
+
+
+def image_cache(func):


Interesting idea to use a decorator. The issue is that i don't see it reused very often. I would definitely would embed all this logic just inside SlideImage.read_region().

lromor · 2022-01-25T16:36:08Z

dlup/_image.py

+                comment = self._openslide_wsi.properties.get(openslide.PROPERTY_NAME_COMMENT, None)
+                mpp_x, mpp_y = _read_dlup_wsi_mpp(comment)
+                # If it is still none you can raise.
+


Remove space and comment?

lromor · 2022-01-25T16:38:55Z

dlup/data/_annotations.py

@@ -0,0 +1,194 @@
+# coding=utf-8


Can you put in another separate PR?

lromor · 2022-01-25T16:44:06Z

dlup/writers.py

+    """Base writer class"""
+
+
+class TiffImageWriter(ImageWriter):


Feels a bit of re-implementing the underlying pyvips interface but without adding anything to it. Is this extra layer necessary?

lromor · 2022-01-25T16:44:39Z

tests/test_cache.py

+from dlup.data.dataset import TiledROIsSlideImageDataset
+from dlup.tiling import Grid, TilingMode
+
+# def test_dataset_equality():


Remove commented code

lromor · 2022-01-25T16:45:39Z

tests/test_cache.py

+    @pytest.mark.parametrize("regions", [(0, 0), (512, 512)])
+    def test_cache_correctness(self, regions):
+
+        INPUT_FILE_PATH = "/processing/j.teuwen/TCGA-5T-A9QA-01Z-00-DX1.B4212117-E0A7-4EF2-B324-8396042ACEC1.svs"


hardcoded paths?

jonasteuwen added 19 commits January 3, 2022 22:25

Experimental tiff writer.

2bb932c

Experimental tiff writer.

5cbca39

Remove modules for now

98582f4

Allow to read mpp_x from tiff header

f42d81b

Attempt to fix the transforms

26452b9

Check if generic tiff

834baff

Still need to handle overflows

f19c111

Introduce TiffWriter

034f776

- Add Writer, Add tests

Update writer test

bdd6fd7

Update writer test

50357e6

Add different tilingmode to code

81e04b9

Add ability to not use any tiling function

5d9ac87

Add vips tests and several fixes

0c835bd

Added writers

874d668

Add experimental downsample tool

1ed4092

Converted to pyramid

0b75fab

Preliminary version of CachedImage

2447deb

Preliminary version of CachedImage

ecefc5d

Output size is not required anymore

b413cc4

jonasteuwen requested review from lromor and YoniSchirris January 18, 2022 20:07

github-actions bot added the python label Jan 18, 2022

jonasteuwen added 4 commits January 19, 2022 17:26

fix path

14fde33

Add reading of mpps in different ways

2f6c5fd

Docstring

6db53a4

Update type

f32fb7c

lromor reviewed Jan 20, 2022

View reviewed changes

lromor marked this pull request as draft January 20, 2022 14:02

jonasteuwen added 2 commits January 21, 2022 09:28

Preliminary attempt to add annotation class

bd9985a

Updated to annotations and cache

0f8e0cb

jonasteuwen added 7 commits January 21, 2022 21:36

Cleanup

e1fa85a

Cleanup

ef33f77

Cleanup cache

b8b27cd

CLeanup

4e780e7

Remove xmltodict parser.

a197455

Add docstring

21ef5f7

Add docstring

2a5584a

lromor requested changes Jan 25, 2022

View reviewed changes

jonasteuwen closed this Oct 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental implementation of caching mechanism #66

Experimental implementation of caching mechanism #66

jonasteuwen commented Jan 18, 2022

lromor left a comment •

edited

Loading

lromor Jan 20, 2022

jonasteuwen Jan 21, 2022

lromor Jan 20, 2022

jonasteuwen Jan 21, 2022

lromor Jan 20, 2022

lromor Jan 20, 2022

lromor Jan 20, 2022

lromor Jan 20, 2022

lromor Jan 20, 2022 •

edited

Loading

lromor Jan 20, 2022

lromor Jan 20, 2022

lromor Jan 20, 2022

lromor Jan 20, 2022

lromor left a comment

lromor Jan 25, 2022

lromor Jan 25, 2022

lromor Jan 25, 2022

lromor Jan 25, 2022

lromor Jan 25, 2022

lromor Jan 25, 2022

lromor Jan 25, 2022

lromor Jan 25, 2022

lromor Jan 25, 2022

lromor Jan 25, 2022

		from ._image import SlideImage


		from ._image import CachedSlideImage, SlideImage

		AbstractSlide = Union[openslide.AbstractSlide]


		class SlideReaderBackend(Enum):

		writer.from_iterator(_local_iterator(), filename, total=len(grid))


		def image_cache(func):

Experimental implementation of caching mechanism #66

Experimental implementation of caching mechanism #66

Conversation

jonasteuwen commented Jan 18, 2022

lromor left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lromor Jan 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lromor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lromor left a comment •

edited

Loading

lromor Jan 20, 2022 •

edited

Loading