-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tiffslide much slower than openslide reading patches from SVS with JPEG2000 compression #72
Comments
Hi @kaczmarj Happy to hear that tiffslide is useful to you! Your benchmark is not testing a really useful scenario. When you run timeit with the same region, you hit openslide's and tiffslide's internal cache after the first call and in this scenario, you're effectively measuring (on the tiffslide side) how long PIL takes to convert a numpy array. Benchmarking this stuff is not really simple, since you have to be aware of internal caches of your tools, and also of other non-obvious caches, like your operating system caching disk access, etc. As mentioned in the readme, I recommend running the benchmark below, which tries to test accessing multiple different tiles on files, to simulate a more realistic use case.
you can easily modify the files used to run the benchmark by changing: I'd be interested to see your results on the tcga files! Cheers, |
let me add the TCGA file to the benchmark and test. thanks for the quick reply @ap-- ! |
hi @ap-- I added an SVS files from TCGA to the pytests and generated the plots. i am seeing a 4x in runtime for tiffslide vs openslide. it's interesting that this does not have for CMU-2.svs... do you have any thoughts on why this could be? i can test other SVS slides from TCGA as well if you think that would be useful. my only hypothesis at this point is that this is related to the image size. the tcga svs is 1.6 gb whereas the CMU SVS is 542 mb. in FILES = {
"svs": "Aperio/CMU-2.svs",
"generic": "Generic-TIFF/CMU-1.tiff",
"tcga-svs": "TCGA-SVS/TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs",
} |
i tested two different tcga slides of different sizes but it seems that openslide is much faster than tifffile for both of these images. my hypothesis of image size being related to the speed does not seem to be correct.
by the way, i am on a debian 12 linux system with python 3.10.12 and glibc version 2.36.
|
Is there a difference in the compression used by these files? |
yes there is a difference in compression. i used
here are the tiff details for CMU-2.svs and TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs. please click on the arrows to expand the output. tiffinfo TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs
tiffinfo CMU-2.svs
in the TCGA SVS, TIFF directory 1 uses JPEG compression. perhaps by forcing a read from directory 1 we can test whether difference in compression is the culprit. if we read from directory 1 and tiffslide is still slower than openslide, there could be something in addition to compression differences. but if the speed matches/exceeds openslide, then the compression is the cause. but directory 1 of the TCGA SVS only has size 1024x568 WxH. perhaps that's the thumbnail. it does not come up as an image level in openslide or tiffslide. |
Hmm, my tests indicate both images seem to store uncompressed tiles... # pip install pado
# pip install aiohttp requests s3fs
import json
from pprint import pprint
from pado.images.ids import ImageId
from pado.images.providers import ImageProvider
from pado.io.files import urlpathlike_to_fsspec
from tiffslide import TiffSlide
import matplotlib.pyplot as plt
ip = ImageProvider.from_parquet(
"zip:///tcga.image.parquet::https://github.com/ap--/pado-tcga/releases/download/v0.0.1/pado-tcga-dataset.zip"
)
image_ids = [
ImageId(
'2aa283f3-732c-4879-8d37-1fec3ccf5bdc',
'TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs',
site='tcga',
),
ImageId(
'd46167af-6c29-49c7-95cf-3a801181aca4',
'TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs',
site='tcga',
),
]
for iid in image_ids:
img = ip[iid]
of = urlpathlike_to_fsspec(img.urlpath)
# check via tiffslide
ts = TiffSlide(of)
print(iid)
pprint(json.loads(ts.zarr_group.store["0/.zarray"]))
fig = plt.figure()
w, h = ts.dimensions
plt.imshow(ts.read_region((w//2, h//2), 0, (1000, 1000), as_array=True))
plt.show() output: ImageId('2aa283f3-732c-4879-8d37-1fec3ccf5bdc', 'TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs', site='tcga')
{'chunks': [256, 256, 3],
'compressor': None,
'dtype': '|u1',
'fill_value': 0,
'filters': None,
'order': 'C',
'shape': [26880, 48384, 3],
'zarr_format': 2}
ImageId('d46167af-6c29-49c7-95cf-3a801181aca4', 'TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs', site='tcga')
{'chunks': [256, 256, 3],
'compressor': None,
'dtype': '|u1',
'fill_value': 0,
'filters': None,
'order': 'C',
'shape': [74432, 101184, 3],
'zarr_format': 2} If that turns out to be true, it would mean that there's just too much python overhead in reading uncompressed tiles from disk via zarr. We'd need some profiling to be sure about that and a potential solution would be to try if we can just shortcut for local uncompressed files. I have a test implementation of a memory mapped zarr store for local files lying around somewhere. I'll try to find it. Will report back in the coming days. Cheers, |
thanks @ap-- that's very helpful. i also see that tiffslide reports no compression: code: import json, tiffslide
tslide = tiffslide.TiffSlide("TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs")
json.loads(tslide.zarr_group.store["0/.zarray"]) output: {'chunks': [256, 256, 3],
'compressor': None,
'dtype': '|u1',
'fill_value': 0,
'filters': None,
'order': 'C',
'shape': [26880, 48384, 3],
'zarr_format': 2} but
|
That's because
On my aging Windows system, the difference is much less: I suspect the difference could be due to differences in JPEG2000 decoders. For example, imagecodecs does not enable OpenJPEG multi-threading by default. I'll check if that's significant... I am surprised that tiffslide/tifffile/zarr perform competitively. There are many, many layers of pure Python code... |
It turns out that enabling multi-threading makes things significantly worse :( |
Maybe some basic profiling could help discerning if the time spent on other things is dominant or if this is really a case of differences between how imagecodecs and openslide wrap around OpenJPEG to decode JP2K. |
i am also surprised and impressed that this implementation performs competitively! i realize my words might have unintentionally come across as negative or offensive towards tiffslide/tifffile and i want to be clear that i do not imply any negativity here. i hold tremendous respect for tiffslide and tifffile (and all of your work @cgohlke !).
that is unfortunate 😢
i ran python's cProfile on the Tiffslidecode: import cProfile, pstats, tiffslide
tslide = tiffslide.TiffSlide("TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs")
with cProfile.Profile() as pr:
tslide.read_region(location=(14_000, 12_000), level=0, size=(512, 512))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats() output:
Openslidecode: import cProfile, pstats, openslide
oslide = openslide.OpenSlide("TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs")
with cProfile.Profile() as pr:
oslide.read_region(location=(14_000, 12_000), level=0, size=(512, 512))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats() output:
|
Oh no. I did not understand it like that. I am interested in learning about such issues.
That's good to know. The tiles are relatively small (256x256) for JPEG 2000. Compared to an implementation in all C, such as oopenslide, for decoding a single tile there might be overheads from 1. calling the C function from Python, 2. creating a new instance of the OpenJPEG decoder in every call, 3. releasing the GIL, and 4. creating and copying image data into a numpy array. I'll try to enable Cython profiling https://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html and see... |
None of these seem significant in this case. Almost all the time is spent in OpenJPEG's |
when i profiled git clone https://github.com/openslide/openslide
git checkout v3.4.1
# add print statement to line 59
sed -i '59i printf("Running unpack_argb\\n");' src/openslide-decode-jp2k.c
# build openslide
autoreconf -i
./configure
make after building openslide, i copied the resulting library import openslide
oslide = openslide.OpenSlide("TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs")
oslide.read_region((0, 0), 0, (128, 128))
# prints:
# Running unpack_argb
oslide.read_region((14_000, 12_000), 0, (512, 512))
# prints:
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb interestingly, if i profile import cProfile, pstats, tiffslide
tslide = tiffslide.TiffSlide("TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs")
with cProfile.Profile() as pr:
tslide.read_region(location=(0, 0), level=0, size=(128, 128))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats()
# ncalls tottime percall cumtime percall filename:lineno(function)
# 2 0.005 0.002 0.005 0.002 {imagecodecs._jpeg2k.jpeg2k_decode}
# [truncated]
with cProfile.Profile() as pr:
tslide.read_region(location=(14_000, 12_000), level=0, size=(512, 512))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats()
# ncalls tottime percall cumtime percall filename:lineno(function)
# 18 0.041 0.002 0.041 0.002 {imagecodecs._jpeg2k.jpeg2k_decode}
# [truncated] |
Good catch. This code requests the following keys from the Zarr store: from tifffile import imread
im = imread(
'TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs',
selection=(slice(14_000, 14_512), slice(12_000, 12_512)),
)
|
wow that's fantastic! thanks @cgohlke. should i open an issue in the zarr-python github repo? |
I'm on it. |
Ha! That's great 😃 I guess I'll have to update the benchmarks in the readme once a new version of zarr is released 😄 Thank's everyone! |
The Zarr issue accounts for a ~2x difference. Where does the other 2x come from? I don't see that on Windows, where the OS cache is not reset. Could also be a difference in how OpenJPEG is compiled. What versions of tifffile and imagecodecs were used and how were they installed? |
i was probably mistaken earlier when i said 4x, though i still do see that tiffslide is a bit slower than openslide when installed via pip. when installed via conda, tiffslide is faster!
i tested installations via pip and via conda/mamba. i include the versions of the packages in each environment below (click on the arrows to show the versions). in both cases, i patched from zarr.storage import KVStore
def _zarr_kvstore___contains__(self, key):
return key in self._mutable_mapping
KVStore.__contains__ = _zarr_kvstore___contains__ i also used test data from openslide test data and TCGA:
pip installcode: sudo apt install libopenslide0 # installs libopenslide0/stable,now 3.4.1+dfsg-6+b1 amd64
git clone https://github.com/bayer-science-for-a-better-life/tiffslide
cd tiffslide
~/mambaforge/bin/python3.10 -m venv venv
source ./venv/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install -e .[dev] matplotlib pandas openslide-python pytest-benchmark
OPENSLIDE_TESTDATA_DIR=images/ python docs/generate_benchmark_plots.py on my debian bookworm machine, libopenslide is linked to libopenjp2.so.7 (pulled as a dependency from https://packages.debian.org/bookworm/libopenjp2-7). Output of pip list
results: conda (mamba) installcode: git clone https://github.com/bayer-science-for-a-better-life/tiffslide
cd tiffslide
mamba env create -f environment.devenv.yml # from tiffslide's repo
mamba activate tiffslide
mamba install openslide openslide-python matplotlib pandas
OPENSLIDE_TESTDATA_DIR=images/ python docs/generate_benchmark_plots.py Output of mamba list
results: |
the difference comes down to imagecodecs from conda-forge and imagecodecs from pypi. using the one from pypi, tiffslide is slower then openslide on the TCGA SVS file i am debugging with. in my previous test, the conda/mamba environment had the best speeds for tiffslide. in that conda envirnoment, i pip installed mamba/conda environment with imagecodecs from conda-forgemamba/conda environment with imagecodecs from pypi |
aha! the culprit is the different ~/mambaforge/envs/tiffslide/lib/python3.11/site-packages/imagecodecs.libs/ and i essentially overwrote the previous version which was named i am not sure how openjpeg is pulled into the imagecodecs wheel during a build, but i presume openpjeg is being built differently than the conda-forge version. though looking at https://github.com/conda-forge/openjpeg-feedstock/blob/main/recipe/build.sh, there don't seem to be any special build options enabled for the conda-forge version. |
building openjpeg with the change should be made in these lines: https://github.com/Czaki/imagecodecs_build/blob/c7abf4b7c91746c30a754e5d3367f6347262e049/build_utils/build_libraries.sh#L361-L364 when openjpeg is not compiled in release mode, it looks like ffast-math is not enabled (see here):
|
This commit adds the option `-DCMAKE_BUILD_TYPE=Release` to the openjpeg build. Without this, ffast-math is not enabled and results in slower performance. Related to discussion in Bayer-Group/tiffslide#72 (comment), where we found performance differences between tiffslide and openslide. Dziekuje :)
enabling |
Never mind the failures. That repository is out of sync. I build the libraries locally in Docker these days and then build&test the wheels on Azure/GHA... |
New version of tiffslide with a fix is on its way to pypi, and then later today to conda. Thanks again everyone for the fun debugging session 😃 |
Thank you for finding this. Would you mind trying again with imagecodecs 2023.7.10? |
it works! here are the benchmark results on my machine with the most recent tiffslide (8bea5a4), what a triumph!!! |
Thanks again for reporting and investigating ❤️ |
hello, thanks for developing this fantastic package! i am working on porting one of my projects from openslide to tiffslide (very easy thanks to mirrored API 😄). however i found that tiffslide is much slower than openslide when reading patches from an SVS file in The Cancer Genome Atlas (TCGA).
i created a jupyter notebook to benchmark this here https://gist.github.com/kaczmarj/41c351be6f52aa6a553cc12ba98a9103. this notebook runs a simple benchmarking function on a TCGA BRCA slide and a TIFF and SVS file from openslide test data.
using the slide TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs (from https://portal.gdc.cancer.gov/files/d46167af-6c29-49c7-95cf-3a801181aca4), i got the following results. tiffslide takes >10x longer to read patches than openslide.
i did not see the same behavior when evaluating CMU-1.tiff and CMU-1.svs from openslide test data, so i don't suspect disk caching to be the culprit.
The text was updated successfully, but these errors were encountered: