Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiffslide much slower than openslide reading patches from SVS with JPEG2000 compression #72

Closed
kaczmarj opened this issue Jul 7, 2023 · 31 comments
Labels
performance 🐌 Gotta go fast

Comments

@kaczmarj
Copy link
Contributor

kaczmarj commented Jul 7, 2023

hello, thanks for developing this fantastic package! i am working on porting one of my projects from openslide to tiffslide (very easy thanks to mirrored API 😄). however i found that tiffslide is much slower than openslide when reading patches from an SVS file in The Cancer Genome Atlas (TCGA).

i created a jupyter notebook to benchmark this here https://gist.github.com/kaczmarj/41c351be6f52aa6a553cc12ba98a9103. this notebook runs a simple benchmarking function on a TCGA BRCA slide and a TIFF and SVS file from openslide test data.

using the slide TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs (from https://portal.gdc.cancer.gov/files/d46167af-6c29-49c7-95cf-3a801181aca4), i got the following results. tiffslide takes >10x longer to read patches than openslide.

i did not see the same behavior when evaluating CMU-1.tiff and CMU-1.svs from openslide test data, so i don't suspect disk caching to be the culprit.

Openslide -- get thumbnail
711 ms ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Tiffslide -- get thumbnail
2.27 s ± 38.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Openslide -- read region at level 0
1.89 ms ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Tiffslide -- read region at level 0
77.5 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Openslide -- read region at level 2
6.93 ms ± 250 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Tiffslide -- read region at level 2
73.5 ms ± 1.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
@ap--
Copy link
Collaborator

ap-- commented Jul 7, 2023

Hi @kaczmarj

Happy to hear that tiffslide is useful to you!

Your benchmark is not testing a really useful scenario. When you run timeit with the same region, you hit openslide's and tiffslide's internal cache after the first call and in this scenario, you're effectively measuring (on the tiffslide side) how long PIL takes to convert a numpy array.

Benchmarking this stuff is not really simple, since you have to be aware of internal caches of your tools, and also of other non-obvious caches, like your operating system caching disk access, etc.

As mentioned in the readme, I recommend running the benchmark below, which tries to test accessing multiple different tiles on files, to simulate a more realistic use case.

OPENSLIDE_TESTDATA_DIR=/path/to/testdata/ python docs/generate_benchmark_plots.py

you can easily modify the files used to run the benchmark by changing:
https://github.com/bayer-science-for-a-better-life/tiffslide/blob/63c86e9d4f168072bb75784e720d0d0acdacee0f/tiffslide/tests/test_benchmark.py#L15-L21

I'd be interested to see your results on the tcga files!

Cheers,
Andreas 😃

@kaczmarj
Copy link
Contributor Author

kaczmarj commented Jul 7, 2023

let me add the TCGA file to the benchmark and test. thanks for the quick reply @ap-- !

@kaczmarj
Copy link
Contributor Author

kaczmarj commented Jul 7, 2023

hi @ap-- I added an SVS files from TCGA to the pytests and generated the plots. i am seeing a 4x in runtime for tiffslide vs openslide. it's interesting that this does not have for CMU-2.svs... do you have any thoughts on why this could be? i can test other SVS slides from TCGA as well if you think that would be useful.

my only hypothesis at this point is that this is related to the image size. the tcga svs is 1.6 gb whereas the CMU SVS is 542 mb.

in test_benchmark.py, i set the FILES dictionary to

FILES = {
    "svs": "Aperio/CMU-2.svs",
    "generic": "Generic-TIFF/CMU-1.tiff",
    "tcga-svs": "TCGA-SVS/TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs",
}

benchmark_read_tiles_as_numpy
benchmark_read_tiles_as_pil

@kaczmarj
Copy link
Contributor Author

kaczmarj commented Jul 7, 2023

i tested two different tcga slides of different sizes but it seems that openslide is much faster than tifffile for both of these images. my hypothesis of image size being related to the speed does not seem to be correct.

  • 1.6 GB -- TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs
  • 183 MB -- TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs

by the way, i am on a debian 12 linux system with python 3.10.12 and glibc version 2.36.

$ uname -a
Linux dash 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux

benchmark_read_tiles_as_numpy
benchmark_read_tiles_as_pil

@sdvillal
Copy link
Collaborator

sdvillal commented Jul 8, 2023

Is there a difference in the compression used by these files?

@kaczmarj
Copy link
Contributor Author

kaczmarj commented Jul 8, 2023

yes there is a difference in compression. i used tiffinfo (from libtiff) to get this info. CMU-2.svs uses JPEG compression whereas the TCGA svs file uses compression scheme 33005 (which apparently is a specific type of JPEG 2000). OpenSlide has some notes about this compression scheme (from https://openslide.org/formats/aperio/):

JPEG 2000 (compression types 33003 or 33005)

Some Aperio files use compression type 33003 or 33005. Images using this compression need to be decoded as a JPEG 2000 codestream. For 33003: YCbCr format, possibly with a chroma subsampling of 4:2:2. For 33005: RGB format. Note that the TIFF file may not encode the colorspace or subsampling parameters in the PhotometricInterpretation field, nor the YCbCrSubsampling field, even though the TIFF standard seems to require this. The correct subsampling can be found in the JPEG 2000 codestream.

here are the tiff details for CMU-2.svs and TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs. please click on the arrows to expand the output.

tiffinfo TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs
=== TIFF directory 0 ===
TIFF Directory at offset 0xaa0a9ec (178301420)
  Subfile Type: (0 = 0x0)
  Image Width: 48384 Image Length: 26880 Image Depth: 1
  Tile Width: 256 Tile Length: 256
  Bits/Sample: 8
  Compression Scheme: 33005 (0x80ed)
  Photometric Interpretation: RGB color
  Samples/Pixel: 3
  Planar Configuration: single image plane
  ImageDescription: Aperio Image Library v11.0.37
48384x26880 (256x256) J2K/KDU Q=70;Mirax Digital Slide|AppMag = 20|MPP = 0.23250

=== TIFF directory 1 ===
TIFF Directory at offset 0xaa5e634 (178644532)
  Subfile Type: (0 = 0x0)
  Image Width: 1024 Image Length: 568 Image Depth: 1
  Bits/Sample: 8
  Compression Scheme: JPEG
  Photometric Interpretation: RGB color
  YCbCr Subsampling: 2, 2
  Samples/Pixel: 3
  Rows/Strip: 16
  Planar Configuration: single image plane
  ImageDescription: Aperio Image Library v11.0.37
48384x26880 -> 1024x568 - ;Mirax Digital Slide|AppMag = 20|MPP = 0.23250
  JPEG Tables: (289 bytes)

=== TIFF directory 2 ===
TIFF Directory at offset 0xb596216 (190407190)
  Subfile Type: (0 = 0x0)
  Image Width: 12096 Image Length: 6720 Image Depth: 1
  Tile Width: 256 Tile Length: 256
  Bits/Sample: 8
  Compression Scheme: 33005 (0x80ed)
  Photometric Interpretation: RGB color
  Samples/Pixel: 3
  Planar Configuration: single image plane
  ImageDescription: Aperio Image Library v11.0.37
48384x26880 (256x256) -> 12096x6720 J2K/KDU Q=70

=== TIFF directory 3 ===
TIFF Directory at offset 0xb669062 (191271010)
  Subfile Type: (0 = 0x0)
  Image Width: 3024 Image Length: 1680 Image Depth: 1
  Tile Width: 256 Tile Length: 256
  Bits/Sample: 8
  Compression Scheme: 33005 (0x80ed)
  Photometric Interpretation: RGB color
  Samples/Pixel: 3
  Planar Configuration: single image plane
  ImageDescription: Aperio Image Library v11.0.37
48384x26880 (256x256) -> 3024x1680 J2K/KDU Q=70
tiffinfo CMU-2.svs
=== TIFF directory 0 ===
TIFF Directory at offset 0x14548b52 (341085010)
  Subfile Type: (0 = 0x0)
  Image Width: 78000 Image Length: 30462 Image Depth: 1
  Tile Width: 256 Tile Length: 256
  Bits/Sample: 8
  Compression Scheme: JPEG
  Photometric Interpretation: RGB color
  YCbCr Subsampling: 2, 2
  Samples/Pixel: 3
  Planar Configuration: single image plane
  ImageDescription: Aperio Image Library v10.0.51
79560x30562 [0,100 78000x30462] (256x256) JPEG/RGB Q=30|AppMag = 20|StripeWidth = 2040|ScanScope ID = CPAPERIOCS|Filename = CMU-2|Date = 12/29/09|Time = 10:02:42|User = b414003d-95c6-48b0-9369-8010ed517ba7|Parmset = USM Filter|MPP = 0.4990|Left = 27.409658|Top = 20.522137|LineCameraSkew = -0.000424|LineAreaXOffset = 0.019265|LineAreaYOffset = -0.000313|Focus Offset = 0.000000|ImageID = 1004487|OriginalWidth = 79560|Originalheight = 30562|Filtered = 5|ICC Profile = ScanScope v1
  ICC Profile: <present>, 141992 bytes
  JPEG Tables: (289 bytes)

=== TIFF directory 1 ===
TIFF Directory at offset 0x145cfce2 (341638370)
  Subfile Type: (0 = 0x0)
  Image Width: 1024 Image Length: 399 Image Depth: 1
  Bits/Sample: 8
  Compression Scheme: JPEG
  Photometric Interpretation: RGB color
  YCbCr Subsampling: 2, 2
  Samples/Pixel: 3
  Rows/Strip: 16
  Planar Configuration: single image plane
  ImageDescription: Aperio Image Library v10.0.51
78000x30462 -> 1024x399 - |AppMag = 20|StripeWidth = 2040|ScanScope ID = CPAPERIOCS|Filename = CMU-2|Date = 12/29/09|Time = 10:02:42|User = b414003d-95c6-48b0-9369-8010ed517ba7|Parmset = USM Filter|MPP = 0.4990|Left = 27.409658|Top = 20.522137|LineCameraSkew = -0.000424|LineAreaXOffset = 0.019265|LineAreaYOffset = -0.000313|Focus Offset = 0.000000|ImageID = 1004487|OriginalWidth = 79560|Originalheight = 30562|Filtered = 5|ICC Profile = ScanScope v1
  JPEG Tables: (289 bytes)

=== TIFF directory 2 ===
TIFF Directory at offset 0x16f1c454 (384943188)
  Subfile Type: (0 = 0x0)
  Image Width: 19500 Image Length: 7615 Image Depth: 1
  Tile Width: 256 Tile Length: 256
  Bits/Sample: 8
  Compression Scheme: JPEG
  Photometric Interpretation: RGB color
  YCbCr Subsampling: 2, 2
  Samples/Pixel: 3
  Planar Configuration: single image plane
  ImageDescription: Aperio Image Library v10.0.51
79560x30562 [0,100 78000x30462] (256x256) -> 19500x7615 JPEG/RGB Q=65
  JPEG Tables: (289 bytes)

=== TIFF directory 3 ===
TIFF Directory at offset 0x172dfb2e (388889390)
  Subfile Type: (0 = 0x0)
  Image Width: 4875 Image Length: 1903 Image Depth: 1
  Tile Width: 256 Tile Length: 256
  Bits/Sample: 8
  Compression Scheme: JPEG
  Photometric Interpretation: RGB color
  YCbCr Subsampling: 2, 2
  Samples/Pixel: 3
  Planar Configuration: single image plane
  ImageDescription: Aperio Image Library v10.0.51
79560x30562 [0,100 78000x30462] (256x256) -> 4875x1903 JPEG/RGB Q=82
  JPEG Tables: (289 bytes)

=== TIFF directory 4 ===
TIFF Directory at offset 0x17431686 (390272646)
  Subfile Type: (0 = 0x0)
  Image Width: 2437 Image Length: 951 Image Depth: 1
  Tile Width: 256 Tile Length: 256
  Bits/Sample: 8
  Compression Scheme: JPEG
  Photometric Interpretation: RGB color
  YCbCr Subsampling: 2, 2
  Samples/Pixel: 3
  Planar Configuration: single image plane
  ImageDescription: Aperio Image Library v10.0.51
79560x30562 [0,100 78000x30462] (256x256) -> 2437x951 JPEG/RGB Q=91
  JPEG Tables: (289 bytes)

=== TIFF directory 5 ===
TIFF Directory at offset 0x1748db6c (390650732)
  Subfile Type: reduced-resolution image (1 = 0x1)
  Image Width: 387 Image Length: 463 Image Depth: 1
  Bits/Sample: 8
  Compression Scheme: LZW
  Photometric Interpretation: RGB color
  Samples/Pixel: 3
  Rows/Strip: 7
  Planar Configuration: single image plane
  ImageDescription: Aperio Image Library v10.0.51
label 387x463
  Predictor: horizontal differencing 2 (0x2)

=== TIFF directory 6 ===
TIFF Directory at offset 0x174a5ec4 (390749892)
  Subfile Type: reduced-resolution image (9 = 0x9)
  Image Width: 1280 Image Length: 431 Image Depth: 1
  Bits/Sample: 8
  Compression Scheme: JPEG
  Photometric Interpretation: RGB color
  YCbCr Subsampling: 2, 2
  Samples/Pixel: 3
  Rows/Strip: 16
  Planar Configuration: single image plane
  ImageDescription: Aperio Image Library v10.0.51
macro 1280x431
  JPEG Tables: (289 bytes)

in the TCGA SVS, TIFF directory 1 uses JPEG compression. perhaps by forcing a read from directory 1 we can test whether difference in compression is the culprit. if we read from directory 1 and tiffslide is still slower than openslide, there could be something in addition to compression differences. but if the speed matches/exceeds openslide, then the compression is the cause.

but directory 1 of the TCGA SVS only has size 1024x568 WxH. perhaps that's the thumbnail. it does not come up as an image level in openslide or tiffslide.

@ap--
Copy link
Collaborator

ap-- commented Jul 8, 2023

Hmm, my tests indicate both images seem to store uncompressed tiles...

# pip install pado
# pip install aiohttp requests s3fs

import json
from pprint import pprint

from pado.images.ids import ImageId
from pado.images.providers import ImageProvider
from pado.io.files import urlpathlike_to_fsspec
from tiffslide import TiffSlide

import matplotlib.pyplot as plt


ip = ImageProvider.from_parquet(
    "zip:///tcga.image.parquet::https://github.com/ap--/pado-tcga/releases/download/v0.0.1/pado-tcga-dataset.zip"
)

image_ids = [
    ImageId(
        '2aa283f3-732c-4879-8d37-1fec3ccf5bdc',
        'TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs',
        site='tcga',
    ),
    ImageId(
        'd46167af-6c29-49c7-95cf-3a801181aca4',
        'TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs',
        site='tcga',
    ),
]

for iid in image_ids:

    img = ip[iid]
    of = urlpathlike_to_fsspec(img.urlpath)

    # check via tiffslide
    ts = TiffSlide(of)
    print(iid)
    pprint(json.loads(ts.zarr_group.store["0/.zarray"]))

    fig = plt.figure()
    w, h = ts.dimensions
    plt.imshow(ts.read_region((w//2, h//2), 0, (1000, 1000), as_array=True))

plt.show()

output:

ImageId('2aa283f3-732c-4879-8d37-1fec3ccf5bdc', 'TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs', site='tcga')
{'chunks': [256, 256, 3],
 'compressor': None,
 'dtype': '|u1',
 'fill_value': 0,
 'filters': None,
 'order': 'C',
 'shape': [26880, 48384, 3],
 'zarr_format': 2}
ImageId('d46167af-6c29-49c7-95cf-3a801181aca4', 'TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs', site='tcga')
{'chunks': [256, 256, 3],
 'compressor': None,
 'dtype': '|u1',
 'fill_value': 0,
 'filters': None,
 'order': 'C',
 'shape': [74432, 101184, 3],
 'zarr_format': 2}

f1
f2

If that turns out to be true, it would mean that there's just too much python overhead in reading uncompressed tiles from disk via zarr. We'd need some profiling to be sure about that and a potential solution would be to try if we can just shortcut for local uncompressed files. I have a test implementation of a memory mapped zarr store for local files lying around somewhere. I'll try to find it. Will report back in the coming days.

Cheers,
Andreas 😃

@kaczmarj
Copy link
Contributor Author

kaczmarj commented Jul 8, 2023

thanks @ap-- that's very helpful. i also see that tiffslide reports no compression:

code:

import json, tiffslide
tslide = tiffslide.TiffSlide("TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs")
json.loads(tslide.zarr_group.store["0/.zarray"])

output:

{'chunks': [256, 256, 3],
 'compressor': None,
 'dtype': '|u1',
 'fill_value': 0,
 'filters': None,
 'order': 'C',
 'shape': [26880, 48384, 3],
 'zarr_format': 2}

but exiftool also shows that JPEG2000 compression is used.

$ git clone https://github.com/exiftool/exiftool.git
$ cd exiftool
$ ./exiftool ../TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs
ExifTool Version Number         : 12.64
File Name                       : TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs
Directory                       : ..
File Size                       : 191 MB
File Modification Date/Time     : 2023:07:08 09:34:00-04:00
File Access Date/Time           : 2023:07:08 09:37:25-04:00
File Inode Change Date/Time     : 2023:07:08 09:37:08-04:00
File Permissions                : -rw-r--r--
File Type                       : TIFF
File Type Extension             : tif
MIME Type                       : image/tiff
Exif Byte Order                 : Little-endian (Intel, II)
Image Width                     : 48384
Image Height                    : 26880
Bits Per Sample                 : 8 8 8
Compression                     : Aperio JPEG 2000 RGB
Photometric Interpretation      : RGB
Image Description               : Aperio Image Library v11.0.37..48384x26880 (256x256) J2K/KDU Q=70;Mirax Digital Slide|AppMag = 20|MPP = 0.23250
Samples Per Pixel               : 3
Planar Configuration            : Chunky
Strip Offsets                   : (Binary data 359 bytes, use -b option to extract)
Rows Per Strip                  : 16
Strip Byte Counts               : (Binary data 173 bytes, use -b option to extract)
JPEG Tables                     : (Binary data 289 bytes, use -b option to extract)
Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
Subfile Type                    : Full-resolution image
Tile Width                      : 256
Tile Length                     : 256
Tile Offsets                    : (Binary data 839 bytes, use -b option to extract)
Tile Byte Counts                : (Binary data 464 bytes, use -b option to extract)
Image Depth                     : 1
Page Count                      : 4
Image Size                      : 48384x26880
Megapixels                      : 1300.6

@cgohlke
Copy link

cgohlke commented Jul 8, 2023

i also see that tiffslide reports no compression

That's because tifffile.ZarrTiffStore is just a thin wrapper around a tifffile.TiffFile instance. The store transparently handles all the file access, decompression, predictors, unpacking, padding, etc. Zarr/numcodecs would not be able to handle all the cases found in TIFF.

it seems that openslide is much faster than tifffile

On my aging Windows system, the difference is much less:

benchmark_read_tiles_as_numpy

I suspect the difference could be due to differences in JPEG2000 decoders. For example, imagecodecs does not enable OpenJPEG multi-threading by default. I'll check if that's significant...

I am surprised that tiffslide/tifffile/zarr perform competitively. There are many, many layers of pure Python code...

@cgohlke
Copy link

cgohlke commented Jul 8, 2023

imagecodecs does not enable OpenJPEG multi-threading by default. I'll check if that's significant...

It turns out that enabling multi-threading makes things significantly worse :(

@sdvillal
Copy link
Collaborator

sdvillal commented Jul 9, 2023

Maybe some basic profiling could help discerning if the time spent on other things is dominant or if this is really a case of differences between how imagecodecs and openslide wrap around OpenJPEG to decode JP2K.

@kaczmarj
Copy link
Contributor Author

kaczmarj commented Jul 9, 2023

I am surprised that tiffslide/tifffile/zarr perform competitively. There are many, many layers of pure Python code...

i am also surprised and impressed that this implementation performs competitively!

i realize my words might have unintentionally come across as negative or offensive towards tiffslide/tifffile and i want to be clear that i do not imply any negativity here. i hold tremendous respect for tiffslide and tifffile (and all of your work @cgohlke !).

It turns out that enabling multi-threading makes things significantly worse :(

that is unfortunate 😢

Maybe some basic profiling could help

i ran python's cProfile on the read_region method in tiffslide and openslide. this doesn't capture the C bits in openslide unfortunately (and i don't know how to do that). when profiling TiffSlide.read_region, most of the time was spent in the function imagecodecs._jpeg2k.jpeg2k_decode. the results are below. i truncated the profiling results of tiffslide profiling to ~25 function calls. i also replaced the path to my python installation to 'path/to' to make the lines shorter.

Tiffslide

code:

import cProfile, pstats, tiffslide
tslide = tiffslide.TiffSlide("TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs")
with cProfile.Profile() as pr:
    tslide.read_region(location=(14_000, 12_000), level=0, size=(512, 512))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats()

output:

         6732 function calls (6304 primitive calls) in 0.044 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       18    0.039    0.002    0.039    0.002 {imagecodecs._jpeg2k.jpeg2k_decode}
   152/12    0.001    0.000    0.001    0.000 {built-in method _abc._abc_subclasscheck}
        4    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:11745(__init__)
        2    0.000    0.000    0.000    0.000 {built-in method _imp.create_dynamic}
      2/1    0.000    0.000    0.001    0.001 {built-in method _imp.exec_dynamic}
       18    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:12944(_indices)
        9    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/zarr/core.py:1862(_process_chunk)
        1    0.000    0.000    0.000    0.000 {built-in method PIL._imaging.fill}
        3    0.000    0.000    0.001    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:7770(__init__)
       43    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:10631(fromfile)
       19    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:12897(_parse_key)
       18    0.000    0.000    0.040    0.002 path/to/python3.11/site-packages/tifffile/tifffile.py:12836(_getitem)
       45    0.000    0.000    0.000    0.000 {method 'read' of '_io.BufferedReader' objects}
  301/230    0.000    0.000    0.000    0.000 path/to/python3.11/json/encoder.py:334(_iterencode_dict)
     24/6    0.000    0.000    0.004    0.001 path/to/python3.11/functools.py:981(__get__)
        1    0.000    0.000    0.000    0.000 {method 'decode' of 'ImagingDecoder' objects}
        1    0.000    0.000    0.040    0.040 path/to/python3.11/site-packages/zarr/core.py:1257(_get_selection)
      748    0.000    0.000    0.001    0.000 {built-in method builtins.isinstance}
      149    0.000    0.000    0.000    0.000 {built-in method _struct.unpack}
        8    0.000    0.000    0.000    0.000 path/to/python3.11/enum.py:241(__set_name__)
       43    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:10793(_process_value)
       18    0.000    0.000    0.039    0.002 path/to/python3.11/site-packages/tifffile/tifffile.py:8574(decode_image)
        1    0.000    0.000    0.001    0.001 path/to/python3.11/site-packages/tifffile/tifffile.py:12332(__init__)
  135/127    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        1    0.000    0.000    0.001    0.001 path/to/python3.11/site-packages/tifffile/tifffile.py:7288(_load)
      230    0.000    0.000    0.000    0.000 path/to/python3.11/json/encoder.py:414(_iterencode)
        5    0.000    0.000    0.000    0.000 path/to/python3.11/typing.py:1896(_get_protocol_attrs)

[truncated]

Openslide

code:

import cProfile, pstats, openslide
oslide = openslide.OpenSlide("TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs")
with cProfile.Profile() as pr:
    oslide.read_region(location=(14_000, 12_000), level=0, size=(512, 512))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats()

output:

         30 function calls in 0.026 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.026    0.026    0.026    0.026 path/to/python3.11/site-packages/openslide/lowlevel.py:300(read_region)
        1    0.000    0.000    0.000    0.000 {built-in method openslide._convert.argb2rgba}
        1    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/openslide/lowlevel.py:186(_load_image)
        1    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/openslide/lowlevel.py:222(_check_error)
        1    0.000    0.000    0.000    0.000 {built-in method PIL._imaging.fill}
        2    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/PIL/Image.py:505(_new)
        1    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/PIL/Image.py:2955(frombuffer)
        1    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/PIL/Image.py:2878(new)
        2    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/PIL/Image.py:2857(_check_size)
        2    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/openslide/lowlevel.py:129(from_param)
        1    0.000    0.000    0.000    0.000 {built-in method PIL._imaging.map_buffer}
        3    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/PIL/Image.py:481(__init__)
        1    0.000    0.000    0.026    0.026 path/to/python3.11/site-packages/openslide/__init__.py:226(read_region)
        1    0.000    0.000    0.000    0.000 path/to/python3.11/cProfile.py:118(__exit__)
        2    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/openslide/lowlevel.py:214(_check_string)
        3    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        3    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}

@cgohlke
Copy link

cgohlke commented Jul 9, 2023

i realize my words might have unintentionally come across as negative or offensive towards tiffslide/tifffile

Oh no. I did not understand it like that. I am interested in learning about such issues.

most of the time was spent in the function imagecodecs._jpeg2k.jpeg2k_decode

That's good to know. The tiles are relatively small (256x256) for JPEG 2000. Compared to an implementation in all C, such as oopenslide, for decoding a single tile there might be overheads from 1. calling the C function from Python, 2. creating a new instance of the OpenJPEG decoder in every call, 3. releasing the GIL, and 4. creating and copying image data into a numpy array. I'll try to enable Cython profiling https://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html and see...

@cgohlke
Copy link

cgohlke commented Jul 9, 2023

there might be overheads from 1. calling the C function from Python, 2. creating a new instance of the OpenJPEG decoder in every call, 3. releasing the GIL, and 4. creating and copying image data into a numpy array.

None of these seem significant in this case. Almost all the time is spent in OpenJPEG's opj_decode function. I rebuilt OpenJPEG with AVX2 extensions, but that made no difference on my system either :(

@kaczmarj
Copy link
Contributor Author

imagecodecs._jpeg2k.jpeg2k_decode is run twice as many times as openslide's jp2k decoder and this could potentially explain the longer runtime.

when i profiled tiffslide.TiffSlide.read_region, i noticed that imagecodecs._jpeg2k.jpeg2k_decode was being called multiple times. this makes sense as it's decoding multiple tiles. i sought to measure the number of times openslide's jpeg2k decoder was run. to do this, i cloned openslide and added a print statement to line 59 of openslide-decode-jp2k.c. it seems that that function is run wither every call to the openjpeg decoder.

git clone https://github.com/openslide/openslide
git checkout v3.4.1
# add print statement to line 59
sed -i '59i   printf("Running unpack_argb\\n");' src/openslide-decode-jp2k.c
# build openslide
autoreconf -i
./configure
make

after building openslide, i copied the resulting library libopenslide.so.0.4.1 into my conda environment containing tiffslide and openslide (replacing the original openslide downloaded from conda-forge).

import openslide
oslide = openslide.OpenSlide("TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs")
oslide.read_region((0, 0), 0, (128, 128))
# prints:
# Running unpack_argb

oslide.read_region((14_000, 12_000), 0, (512, 512))
# prints:
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb

interestingly, if i profile TiffSlide.read_region to count the number of times imagecodecs._jpeg2k.jpeg2k_decode is called, then it is 2 in the first case and 18 in the second case. openslide called openjpeg decoder 1 time and 9 times for the same regions.

import cProfile, pstats, tiffslide

tslide = tiffslide.TiffSlide("TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs")

with cProfile.Profile() as pr:
    tslide.read_region(location=(0, 0), level=0, size=(128, 128))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats()

#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         2    0.005    0.002    0.005    0.002 {imagecodecs._jpeg2k.jpeg2k_decode}
# [truncated]

with cProfile.Profile() as pr:
    tslide.read_region(location=(14_000, 12_000), level=0, size=(512, 512))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats()

#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#        18    0.041    0.002    0.041    0.002 {imagecodecs._jpeg2k.jpeg2k_decode}
# [truncated]

@cgohlke
Copy link

cgohlke commented Jul 10, 2023

Good catch. This code requests the following keys from the Zarr store:

from tifffile import imread

im = imread(
    'TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs',
    selection=(slice(14_000, 14_512), slice(12_000, 12_512)),
)
0/54.46.0
0/54.46.0
0/54.47.0
0/54.47.0
0/54.48.0
0/54.48.0
0/55.46.0
0/55.46.0
0/55.47.0
0/55.47.0
0/55.48.0
0/55.48.0
0/56.46.0
0/56.46.0
0/56.47.0
0/56.47.0
0/56.48.0
0/56.48.0

@cgohlke
Copy link

cgohlke commented Jul 10, 2023

The issue is that Zarr's KVStore, which is used to wrap ZarrTiffStore, does not have a __contains__ method such that key in store is routed through __getitem__, which triggers decoding...

I think it's a bug in Zarr that is easy to fix.

With the fix I get this:

benchmark_read_tiles_as_numpy

@kaczmarj
Copy link
Contributor Author

wow that's fantastic! thanks @cgohlke. should i open an issue in the zarr-python github repo?

@cgohlke
Copy link

cgohlke commented Jul 10, 2023

should i open an issue in the zarr-python github repo?

I'm on it.

@ap--
Copy link
Collaborator

ap-- commented Jul 10, 2023

Ha! That's great 😃 I guess I'll have to update the benchmarks in the readme once a new version of zarr is released 😄

Thank's everyone!

@ap-- ap-- added the performance 🐌 Gotta go fast label Jul 10, 2023
@cgohlke
Copy link

cgohlke commented Jul 10, 2023

i am seeing a 4x in runtime for tiffslide vs openslide

The Zarr issue accounts for a ~2x difference. Where does the other 2x come from? I don't see that on Windows, where the OS cache is not reset. Could also be a difference in how OpenJPEG is compiled. What versions of tifffile and imagecodecs were used and how were they installed?

@kaczmarj
Copy link
Contributor Author

Where does the other 2x come from?

i was probably mistaken earlier when i said 4x, though i still do see that tiffslide is a bit slower than openslide when installed via pip. when installed via conda, tiffslide is faster!

What versions of tifffile and imagecodecs were used and how were they installed?

i tested installations via pip and via conda/mamba. i include the versions of the packages in each environment below (click on the arrows to show the versions). in both cases, tifffile==2023.7.4 but in the pip environment, imagecodecs==2023.7.4 whereas in conda imagecodecs==2023.1.23 is used (i could not install a newer version). i will re-run this using the same versions in all environments and will update.

i patched zarr.KVStore in tiffslide's __init__.py file as follows:

from zarr.storage import KVStore

def _zarr_kvstore___contains__(self, key):
    return key in self._mutable_mapping

KVStore.__contains__ = _zarr_kvstore___contains__

i also used test data from openslide test data and TCGA:

images/
├── Aperio
│   └── CMU-2.svs
└── TCGA-SVS
    └── TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs

pip install

code:

sudo apt install libopenslide0  # installs libopenslide0/stable,now 3.4.1+dfsg-6+b1 amd64
git clone https://github.com/bayer-science-for-a-better-life/tiffslide
cd tiffslide
~/mambaforge/bin/python3.10 -m venv venv
source ./venv/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install -e .[dev] matplotlib pandas openslide-python pytest-benchmark
OPENSLIDE_TESTDATA_DIR=images/ python docs/generate_benchmark_plots.py

on my debian bookworm machine, libopenslide is linked to libopenjp2.so.7 (pulled as a dependency from https://packages.debian.org/bookworm/libopenjp2-7).

Output of pip list
Package           Version                        Editable project location
----------------- ------------------------------ -------------------------
asciitree         0.3.3
black             23.3.0
cfgv              3.3.1
click             8.1.4
contourpy         1.1.0
coverage          7.2.7
cycler            0.11.0
distlib           0.3.6
entrypoints       0.4
exceptiongroup    1.1.2
fasteners         0.18
filelock          3.12.2
fonttools         4.40.0
fsspec            2023.6.0
identify          2.5.24
imagecodecs       2023.7.4
iniconfig         2.0.0
kiwisolver        1.4.4
matplotlib        3.7.2
mypy              1.4.1
mypy-extensions   1.0.0
nodeenv           1.8.0
numcodecs         0.11.0
numpy             1.25.1
openslide-python  1.2.0
packaging         23.1
pandas            2.0.3
pathspec          0.11.1
Pillow            10.0.0
pip               23.1.2
platformdirs      3.8.1
pluggy            1.2.0
pre-commit        3.3.3
py-cpuinfo        9.0.0
pyparsing         3.0.9
pytest            7.4.0
pytest-benchmark  4.0.0
pytest-cov        4.1.0
python-dateutil   2.8.2
pytz              2023.3
PyYAML            6.0
setuptools        68.0.0
six               1.16.0
tifffile          2023.7.4
tiffslide         2.1.2.post0+g63c86e9.d20230710 /tmp/build/tiffslide
tomli             2.0.1
typing_extensions 4.7.1
tzdata            2023.3
virtualenv        20.23.1
wheel             0.40.0
zarr              2.15.0

results:

benchmark_read_tiles_as_numpy

conda (mamba) install

code:

git clone https://github.com/bayer-science-for-a-better-life/tiffslide
cd tiffslide
mamba env create -f environment.devenv.yml  # from tiffslide's repo
mamba activate tiffslide
mamba install openslide openslide-python matplotlib pandas
OPENSLIDE_TESTDATA_DIR=images/ python docs/generate_benchmark_plots.py
Output of mamba list
# packages in environment at /home/jakubk/mambaforge/envs/tiffslide:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
alsa-lib                  1.2.9                hd590300_0    conda-forge
aom                       3.5.0                h27087fc_0    conda-forge
asciitree                 0.3.3                      py_2    conda-forge
attr                      2.5.1                h166bdaf_1    conda-forge
black                     23.3.0          py311h38be061_1    conda-forge
blosc                     1.21.4               h0f2a231_0    conda-forge
brotli                    1.0.9                h166bdaf_9    conda-forge
brotli-bin                1.0.9                h166bdaf_9    conda-forge
brunsli                   0.1                  h9c3ff4c_0    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.19.1               hd590300_0    conda-forge
c-blosc2                  2.10.0               hb4ffafa_0    conda-forge
ca-certificates           2023.5.7             hbcca054_0    conda-forge
cairo                     1.16.0            hbbf8b49_1016    conda-forge
certifi                   2023.5.7           pyhd8ed1ab_0    conda-forge
cffi                      1.15.1          py311h409f033_3    conda-forge
cfgv                      3.3.1              pyhd8ed1ab_0    conda-forge
cfitsio                   4.2.0                hd9d235c_0    conda-forge
charls                    2.4.2                h59595ed_0    conda-forge
click                     8.1.4           unix_pyh707e725_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
contourpy                 1.1.0           py311h9547e67_0    conda-forge
coverage                  7.2.7           py311h459d7ec_0    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
dav1d                     1.2.1                hd590300_0    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
distlib                   0.3.6              pyhd8ed1ab_0    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
exceptiongroup            1.1.2              pyhd8ed1ab_0    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
fasteners                 0.17.3             pyhd8ed1ab_0    conda-forge
filelock                  3.12.2             pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.40.0          py311h459d7ec_0    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
fsspec                    2023.6.0           pyh1a96a4e_0    conda-forge
gdk-pixbuf                2.42.10              h6b639ba_2    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
giflib                    5.2.1                h0b41bf4_3    conda-forge
glib                      2.76.4               hfc55251_0    conda-forge
glib-tools                2.76.4               hfc55251_0    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
gst-plugins-base          1.22.4               hf7dbed1_1    conda-forge
gstreamer                 1.22.4               h98fc4e7_1    conda-forge
harfbuzz                  7.3.0                hdb3a94d_0    conda-forge
icu                       72.1                 hcb278e6_0    conda-forge
identify                  2.5.24             pyhd8ed1ab_0    conda-forge
imagecodecs               2023.1.23       py311hd374d05_2    conda-forge
iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
jxrlib                    1.1                  h7f98852_2    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4           py311h4dd048b_1    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lame                      3.100             h166bdaf_1003    conda-forge
lcms2                     2.15                 haa2dc70_1    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libaec                    1.0.6                hcb278e6_1    conda-forge
libavif                   0.11.1               h8182462_2    conda-forge
libblas                   3.9.0           17_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_9    conda-forge
libbrotlidec              1.0.9                h166bdaf_9    conda-forge
libbrotlienc              1.0.9                h166bdaf_9    conda-forge
libcap                    2.67                 he9d0100_0    conda-forge
libcblas                  3.9.0           17_linux64_openblas    conda-forge
libclang                  15.0.7          default_h7634d5b_2    conda-forge
libclang13                15.0.7          default_h9986a30_2    conda-forge
libcups                   2.3.3                h36d4200_3    conda-forge
libcurl                   8.1.2                h409715c_0    conda-forge
libdeflate                1.18                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.12               hf998b51_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libflac                   1.4.3                h59595ed_0    conda-forge
libgcc-ng                 13.1.0               he5830b7_0    conda-forge
libgcrypt                 1.10.1               h166bdaf_0    conda-forge
libgfortran-ng            13.1.0               h69a702a_0    conda-forge
libgfortran5              13.1.0               h15d22d2_0    conda-forge
libglib                   2.76.4               hebfc3b9_0    conda-forge
libgomp                   13.1.0               he5830b7_0    conda-forge
libgpg-error              1.47                 h71f35ed_0    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.1.5.1              h0b41bf4_0    conda-forge
liblapack                 3.9.0           17_linux64_openblas    conda-forge
libllvm15                 15.0.7               h5cf9203_2    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libogg                    1.3.4                h7f98852_1    conda-forge
libopenblas               0.3.23          pthreads_h80387f5_0    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libpq                     15.3                 hbcd7760_1    conda-forge
libsndfile                1.2.0                hb75c966_0    conda-forge
libsqlite                 3.42.0               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.1.0               hfd8a6a1_0    conda-forge
libsystemd0               253                  h8c4010b_1    conda-forge
libtiff                   4.5.1                h8b53f26_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.3.1                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxkbcommon              1.5.0                h5d7e998_3    conda-forge
libxml2                   2.11.4               h0d562d8_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
libzopfli                 1.0.3                h9c3ff4c_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
matplotlib                3.7.2           py311h38be061_0    conda-forge
matplotlib-base           3.7.2           py311h54ef318_0    conda-forge
mpg123                    1.31.3               hcb278e6_0    conda-forge
msgpack-python            1.0.5           py311ha3edf6b_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mypy                      1.4.1           py311h459d7ec_0    conda-forge
mypy_extensions           1.0.0              pyha770c72_0    conda-forge
mysql-common              8.0.33               hf1915f5_1    conda-forge
mysql-libs                8.0.33               hca2cd23_1    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
nodeenv                   1.8.0              pyhd8ed1ab_0    conda-forge
nspr                      4.35                 h27087fc_0    conda-forge
nss                       3.89                 he45b914_0    conda-forge
numcodecs                 0.11.0          py311hcafe171_1    conda-forge
numpy                     1.25.1          py311h64a7726_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openslide                 3.4.1                ha896ae7_9    conda-forge
openslide-python          1.2.0           py311hd4cff14_2    conda-forge
openssl                   3.1.1                hd590300_1    conda-forge
packaging                 23.1               pyhd8ed1ab_0    conda-forge
pandas                    2.0.3           py311h320fe9a_1    conda-forge
pathspec                  0.11.1             pyhd8ed1ab_0    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pillow                    10.0.0          py311h0b84326_0    conda-forge
pip                       23.1.2             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
platformdirs              3.8.1              pyhd8ed1ab_0    conda-forge
pluggy                    1.2.0              pyhd8ed1ab_0    conda-forge
ply                       3.11                       py_1    conda-forge
pre-commit                3.3.3              pyha770c72_0    conda-forge
psutil                    5.9.5           py311h2582759_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pulseaudio-client         16.1                 hb77b528_4    conda-forge
py-cpuinfo                9.0.0              pyhd8ed1ab_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyqt                      5.15.7          py311ha74522f_3    conda-forge
pyqt5-sip                 12.11.0         py311hcafe171_3    conda-forge
pytest                    7.4.0              pyhd8ed1ab_0    conda-forge
pytest-benchmark          4.0.0              pyhd8ed1ab_0    conda-forge
pytest-cov                4.1.0              pyhd8ed1ab_0    conda-forge
python                    3.11.4          hab00c5b_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-tzdata             2023.3             pyhd8ed1ab_0    conda-forge
python_abi                3.11                    3_cp311    conda-forge
pytz                      2023.3             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0             py311hd4cff14_5    conda-forge
qt-main                   5.15.8              hf9e2b05_14    conda-forge
readline                  8.2                  h8228510_1    conda-forge
setuptools                68.0.0             pyhd8ed1ab_0    conda-forge
sip                       6.7.9           py311hb755f60_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
tifffile                  2023.7.4           pyhd8ed1ab_0    conda-forge
tiffslide                 2.1.2.post0+g63c86e9.d20230710          pypi_0    pypi
tk                        8.6.12               h27826a3_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
tornado                   6.3.2           py311h459d7ec_0    conda-forge
typing-extensions         4.7.1                hd8ed1ab_0    conda-forge
typing_extensions         4.7.1              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
ukkonen                   1.0.1           py311h4dd048b_3    conda-forge
virtualenv                20.23.1            pyhd8ed1ab_0    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
xcb-util                  0.4.0                hd590300_1    conda-forge
xcb-util-image            0.4.0                h8ee46fc_1    conda-forge
xcb-util-keysyms          0.4.0                h8ee46fc_1    conda-forge
xcb-util-renderutil       0.3.9                hd590300_1    conda-forge
xcb-util-wm               0.4.1                h8ee46fc_1    conda-forge
xkeyboard-config          2.39                 hd590300_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.6                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.11               hd590300_0    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xf86vidmodeproto     2.3.1             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zarr                      2.15.0             pyhd8ed1ab_0    conda-forge
zfp                       1.0.0                h27087fc_3    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zlib-ng                   2.0.7                h0b41bf4_0    conda-forge
zstd                      1.5.2                hfc55251_7    conda-forge

results:

benchmark_read_tiles_as_numpy

@kaczmarj
Copy link
Contributor Author

the difference comes down to imagecodecs from conda-forge and imagecodecs from pypi. using the one from pypi, tiffslide is slower then openslide on the TCGA SVS file i am debugging with.

in my previous test, the conda/mamba environment had the best speeds for tiffslide. in that conda envirnoment, i pip installed imagecodecs==2023.1.23 and then tiffslide became almost 2x slower (~2.5 ms to ~4.9 ms).

mamba/conda environment with imagecodecs from conda-forge

benchmark_read_tiles_as_numpy

mamba/conda environment with imagecodecs from pypi

benchmark_read_tiles_as_numpy

@kaczmarj
Copy link
Contributor Author

kaczmarj commented Jul 10, 2023

aha! the culprit is the different libopenjp2.so.2.5.0 that is pulled in when using pip and conda. to test this, i first installed all tiffslide dependencies with mamba/conda (with imagecodecs==2023.1.23). using that, tiffslide was faster than openslide for tcga-svs. then i pip installed imagecodecs==2023.1.23, and tiffslide became slower than openslide for tcga-svs. finally, i copied the file libopenjp2.so.2.5.0 that was downloaded from conda-forge into the directory

~/mambaforge/envs/tiffslide/lib/python3.11/site-packages/imagecodecs.libs/

and i essentially overwrote the previous version which was named libopenjp2-fc287c52.so.2.5.0. using the openjpeg from conda-forge, tiffslide was faster than openslide.

benchmark_read_tiles_as_numpy

i am not sure how openjpeg is pulled into the imagecodecs wheel during a build, but i presume openpjeg is being built differently than the conda-forge version. though looking at https://github.com/conda-forge/openjpeg-feedstock/blob/main/recipe/build.sh, there don't seem to be any special build options enabled for the conda-forge version.

@kaczmarj
Copy link
Contributor Author

building openjpeg with -DCMAKE_BUILD_TYPE=Release solves the problem. i will submit a pull request to https://github.com/Czaki/imagecodecs_build to add this option.

the change should be made in these lines: https://github.com/Czaki/imagecodecs_build/blob/c7abf4b7c91746c30a754e5d3367f6347262e049/build_utils/build_libraries.sh#L361-L364

when openjpeg is not compiled in release mode, it looks like ffast-math is not enabled (see here):

  # Do not use ffast-math for all build, it would produce incorrect results, only set for release:
  set(OPENJPEG_LIBRARY_COMPILE_OPTIONS ${OPENJPEG_LIBRARY_COMPILE_OPTIONS} "$<$<CONFIG:Release>:-ffast-math>")
  set(OPENJP2_COMPILE_OPTIONS ${OPENJP2_COMPILE_OPTIONS} "$<$<CONFIG:Release>:-ffast-math>" -Wall -Wextra -Wconversion -Wunused-parameter -Wdeclaration-after-statement -Werror=declaration-after-statement)

kaczmarj added a commit to kaczmarj/imagecodecs_build that referenced this issue Jul 10, 2023
This commit adds the option `-DCMAKE_BUILD_TYPE=Release` to the openjpeg build. Without this, ffast-math is not enabled and results in slower performance.

Related to discussion in Bayer-Group/tiffslide#72 (comment), where we found performance differences between tiffslide and openslide.

Dziekuje :)
@kaczmarj
Copy link
Contributor Author

enabling -DCMAKE_BUILD_TYPE=Release in the openjpeg build causes imagecodecs tests to fail... :(

@cgohlke
Copy link

cgohlke commented Jul 10, 2023

Never mind the failures. That repository is out of sync. I build the libraries locally in Docker these days and then build&test the wheels on Azure/GHA...

@ap-- ap-- closed this as completed in b8b7393 Jul 10, 2023
@ap--
Copy link
Collaborator

ap-- commented Jul 10, 2023

New version of tiffslide with a fix is on its way to pypi, and then later today to conda.

Thanks again everyone for the fun debugging session 😃

@cgohlke
Copy link

cgohlke commented Jul 11, 2023

building openjpeg with -DCMAKE_BUILD_TYPE=Release solves the problem.

Thank you for finding this. Would you mind trying again with imagecodecs 2023.7.10?

@kaczmarj
Copy link
Contributor Author

Would you mind trying again with imagecodecs 2023.7.10?

it works! here are the benchmark results on my machine with the most recent tiffslide (8bea5a4), tifffile==2023.7.10, and imagecodecs==2023.7.10.

what a triumph!!!

benchmark_read_tiles_as_numpy

@kaczmarj kaczmarj changed the title Tiffslide much slower than openslide reading patches from TCGA SVS file Tiffslide much slower than openslide reading patches from SVS with JPEG2000 compression Jul 11, 2023
@ap--
Copy link
Collaborator

ap-- commented Jul 11, 2023

tiffslide==2.2.0 has the fix. (I just added two more commits to update the benchmark stuff)

what a triumph!!!

Thanks again for reporting and investigating ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance 🐌 Gotta go fast
Projects
None yet
Development

No branches or pull requests

4 participants