Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large timing overhead on TiffReader #550

Open
Armavica opened this issue Dec 27, 2023 · 6 comments
Open

Large timing overhead on TiffReader #550

Armavica opened this issue Dec 27, 2023 · 6 comments

Comments

@Armavica
Copy link
Contributor

System and Software

  • aicsimageio Version: 4.14.0
  • Python Version: 3.11.6
  • Operating System: linux

Description

I am comparing the read of a tiff image with OpenCV, tifffile and aicsimageio.

To download the image:

curl https://gitlab.com/dunloplab/delta/-/raw/44256ee77069c308ce478c233cc12d900737b537/tests/data/images/angle_-0.32.tif?inline=false -o angle.tif

Benchmarks with ipython:

In [1]: import aicsimageio, cv2, tifffile

In [2]: %timeit cv2.imread("angle.tif", cv2.IMREAD_ANYDEPTH)
185 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [3]: %timeit tifffile.imread("angle.tif")
237 µs ± 3.39 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [4]: %timeit aicsimageio.AICSImage("angle.tif").data
3.57 ms ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I am struggling to understand where this 15x overhead (compared to tifffile) comes from. Is there something that I am doing wrong?

@toloudis
Copy link
Collaborator

I have not specifically measured this so I'll speculate that it is our code that tries to make sure the dimensions are in a consistent order for the rest of the aicsimageio apis to deal with, as well as wrapping it into xarray.

@Armavica
Copy link
Contributor Author

Armavica commented Jan 31, 2024

Thank you for your answer. I profiled this call with the following IPython command:

%prun -q -D output100.prof [aicsimageio.AICSImage(path).data for _ in range(100)]

and got the following results, which suggest some inefficiencies in the code:

Global profile

Screenshot from 2024-01-31 13-50-01

Zoom on the initialization (right hand side)

Screenshot from 2024-01-31 13-48-48

  • Trying the OME-Tiff reader first takes significant time: would it make sense to switch the order of the Tiff and OME-Tiff readers for plain .tiff images?
  • The function _is_supported_image of the tiff reader is called twice: once to determine which reader to take, and a second time after this reader has been chosen: perhaps we could avoid this second check if the first one succeeded?

Zoom on the .data call (left hand side)

Screenshot from 2024-01-31 13-49-42

  • There is indeed some dimensions-related code that takes significant time but this looks only related to the mosaic mode(?)
  • The Tiff reader is re-initialized three more times during this function call (orange - light orange - green - ochre blocks), in addition to the first two that happened during initialization (previous graph)
  • There is an expensive DataArray initialization that is done twice in the aics_image.py file: once here
    return xr.DataArray(

    and once there
    self._xarray_dask_data = xr.DataArray(

although I don't know if this could be avoided.

I am happy to work on all this if there is interest.

@toloudis
Copy link
Collaborator

toloudis commented Jan 31, 2024

Ok this is wonderful and thank you for the detailed profile. I forgot about the reader check. You can give aicsimageio a hint about which reader to use like this:

aicsimageio.AICSImage("angle.tif", reader=aicsimageio.tiff_reader.TiffReader).data

I think we want to prefer the OmeTiffReader first because it is the more specific one and we don't want to drop ome metadata by accident.

Please note that aicsimageio is currently in maintenance mode. Contributions here are most welcome but for further work on this, please consider contributing to bioio. https://github.com/bioio-devs/bioio and bioio-devs/bioio-tifffile and bioio-devs/bioio-ome-tiff
We'd love to prioritize bioio over aicsimageio at this point in time. It would certainly be interesting if we could optimize the xr.DataArray creations or the reader initializations.
Thanks again for the deeper dive into this!

@toloudis
Copy link
Collaborator

I'll also add that determine_reader should hopefully be a fixed cost that doesn't scale with the data size. We expect the data loading to be the bottleneck. Arguably we could combine OmeTiffReader and TiffReader into one, which might be interesting.

@Armavica
Copy link
Contributor Author

Thank you for your help! I wasn't aware of bioio but will definitely consider switching and contributing to it instead.

@toloudis
Copy link
Collaborator

toloudis commented Feb 1, 2024

We haven't formally announced it but it will happen soon. Thank you again for your interest here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants