-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental implementation of caching mechanism #66
Conversation
- Add Writer, Add tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I think there's a lot going on (a bit too much). So I would suggest you to split the PR and add add the changes for the type aliases or other unrelated features to the caching mechanism.
The code is an interesting proposition. I like it but I maybe would change the user-experience. Mostly moving the burden of creating another SlideImage object into embedding this system inside the current SlideImage, so that is opaque. I would do that by passing a possibly "default" global ImageCacher object. We could even merge the openslide object caching with this so we centralize the cachine features but we keep the granularity at the slide level.
I have a few questions for you. When the cache is created in the filesystem who owns it? is it going to be automatically deleted? Can you pre-generate a cache and instruct the SlideImage to point at it? Maybe each of these could be configuration options for the ImageCacher.
What do you think?
.gitmodules
Outdated
[submodule "openslide-python"] | ||
path = openslide-python | ||
url = https://github.com/NKI-AI/openslide-python.git | ||
# [submodule "openslide"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright.
dlup/__init__.py
Outdated
from ._image import SlideImage | ||
|
||
|
||
from ._image import CachedSlideImage, SlideImage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably include the caching as an optional feature of SlideImage. No need to create another class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, was to see how to do it. Will do in next version.
@@ -0,0 +1,219 @@ | |||
# coding=utf-8 | |||
# Copyright (c) dlup contributors | |||
import abc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistently group and order the imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imports feel a bit broken
dlup/_cache.py
Outdated
) -> PIL.Image.Image: | ||
"""...""" | ||
|
||
# @abc.abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's commented remove it
dlup/_cache.py
Outdated
) | ||
|
||
|
||
class AbstractImageCache(abc.ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some docs documenting the intent of the abstract interface. I like the name, but maybe it's bit too generic. If I understood correctly the purpose maybe something like ScaleLevelCache?
from dlup.utils.imports import PYVIPS_AVAILABLE | ||
from dlup.utils.types import GenericFloatArray, GenericIntArray, GenericNumber, PathLike | ||
|
||
if PYVIPS_AVAILABLE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move away from vips and do our own thing, less dependencies and tailored behavior for now it's fine, but I would keep libvips as a full-fledged dependency. For now we really need it for caching purposes.
AbstractSlide = Union[openslide.AbstractSlide] | ||
|
||
|
||
class SlideReaderBackend(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it being used anywhere, maybe delete it?
|
||
|
||
class _SlideImageRegionView(RegionView): | ||
"""Represents an image view tied to a slide image.""" | ||
|
||
def __init__(self, wsi: _TSlideImage, scaling: _GenericNumber, boundary_mode: BoundaryMode = None): | ||
def __init__(self, wsi: _TSlideImage, scaling: GenericNumber, boundary_mode: BoundaryMode = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes should stay on a different PR
"""Returns the objective power at which the WSI was sampled.""" | ||
return int(self._openslide_wsi.properties[openslide.PROPERTY_NAME_OBJECTIVE_POWER]) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a test to reflect this behavior so we remain consistent.
dlup/_image.py
Outdated
# ImageCacher.__init__(self, original_filename=wsi._filename) | ||
# print(ImageCacher) | ||
self._cache_directory = None | ||
self._cacher = ImageCacher(original_filename=wsi._filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of the ImageCacher, I think we should add that to the SlideImage so a user can configure if and how the slide can be cached. It makes it easier for testing and also how granularly you can setup a filesystem cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't look at everything, the annotations part seems relatively tidy but I would prefer to review it on it on a separate occasion. Regarding the cache implementation I think it works but I would make the user-interface much simpler. Somehow I would completely remove the access of any caching-logic from the user and give it only the minimal amount of necessary knobs. This is to minimize confusion and misuse. I would just keep everything behind SlideImage and add options to that.
Maybe someone else could give feedback on how they would like to approach this issue. Do you have any suggestions of what you would like to see?
return slide_image.read_region(location, 1.0, size) | ||
|
||
@property | ||
def cache_lock(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not needed no need to add it.
"""Cache lock, is not needed for Tiffs (they cannot be written on the fly).""" | ||
return None | ||
|
||
def get_cache_for_mpp(self, mpp: float) -> SlideImage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retrieveg a cache object based on the mpp seems to be a brittle implementation due to floating point representation. What if the mpp is 0.3213231, what if it's 0.3213232?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add either an absolute error threshold or define a power of 2 mpp values.
For instance, we could support caching for mpps 1.0, 0.5, 0.25, etc.. or even make it more dense.
image.close() | ||
|
||
|
||
def create_tiff_cache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could encapsulate create_tiff_cache
and get_cache_for_mpp
behavior inside some other function. I don't see why a user should know about these two functions. Ideally a user should create a caching object, initialize it from a path of pre-stored caching tiffs, and that's it.
writer.from_iterator(_local_iterator(), filename, total=len(grid)) | ||
|
||
|
||
def image_cache(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea to use a decorator. The issue is that i don't see it reused very often. I would definitely would embed all this logic just inside SlideImage.read_region().
comment = self._openslide_wsi.properties.get(openslide.PROPERTY_NAME_COMMENT, None) | ||
mpp_x, mpp_y = _read_dlup_wsi_mpp(comment) | ||
# If it is still none you can raise. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove space and comment?
@@ -0,0 +1,194 @@ | |||
# coding=utf-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put in another separate PR?
"""Base writer class""" | ||
|
||
|
||
class TiffImageWriter(ImageWriter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels a bit of re-implementing the underlying pyvips interface but without adding anything to it. Is this extra layer necessary?
from dlup.data.dataset import TiledROIsSlideImageDataset | ||
from dlup.tiling import Grid, TilingMode | ||
|
||
# def test_dataset_equality(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
@pytest.mark.parametrize("regions", [(0, 0), (512, 512)]) | ||
def test_cache_correctness(self, regions): | ||
|
||
INPUT_FILE_PATH = "/processing/j.teuwen/TCGA-5T-A9QA-01Z-00-DX1.B4212117-E0A7-4EF2-B324-8396042ACEC1.svs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hardcoded paths?
Implemented a preliminary version of a caching algorithm.
Currently you need to do
dlup wsi downsample <>
with the same settings as what you want to downsample with. This will not be the final implementation, but I would like to hear opinions on the implementation.