Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to read TIFF files from MicroManager with memory-mapping #52

Closed
michaelmell opened this issue Jan 11, 2021 · 9 comments
Closed

How to read TIFF files from MicroManager with memory-mapping #52

michaelmell opened this issue Jan 11, 2021 · 9 comments
Labels
question Further information is requested

Comments

@michaelmell
Copy link

Hi,
first of all thanks for this great Python package.

A question:
I would like to seek and read within a set of large (4.3GB) OME-TIFF files, which were stored with MicroManager 2.
Is it possible to memory-map and read these files with tifffile? I was only able to find the ability to write to memory-mapped files.

Thanks and best regards,
Michael

@cgohlke cgohlke added the question Further information is requested label Jan 11, 2021
@cgohlke
Copy link
Owner

cgohlke commented Jan 11, 2021

In case you are referring to memory-mapping the image data in the file as a numpy array. It depends on how the file was written. If all the image data was written contiguously, without any compression, tiling, predictors, packing, etc, then tifffile.memmap can return a memory mapped numpy array:

Memory-map image data of the first page in the TIFF file:
>>> memmap_image = memmap('temp.tif', page=0)
>>> memmap_image[1, 255, 255]
1.0
>>> del memmap_image

If you are referring to memory mapping a TIFF file with Python's mmap and passing that object to tifffile, that does not currently work.

@michaelmell
Copy link
Author

I think tifffile.memmap is what I need. For my better understanding:

The page parameter specifies the page to be loaded/memory-mapped within a multi-page TIFF file. Furthermore, I should be able to load any of the pages. Is this correct?

If so, I ran into the following issue:

I am unable to load some of the pages for which I get the following error message:

Traceback (most recent call last):
  File "/home/micha/Documents/01_work/git/mmpreprocesspy/mmpreprocesspy_conda_environment/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3418, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-27-09b4a1534241>", line 1, in <module>
    tff.memmap(image_path, page=ind, mode='r')
  File "/home/micha/Documents/01_work/git/mmpreprocesspy/mmpreprocesspy_conda_environment/lib/python3.7/site-packages/tifffile/tifffile.py", line 852, in memmap
    raise ValueError('image data are not memory-mappable')
ValueError: image data are not memory-mappable

The list of pages, that cannot be loaded is seemingly random (see list below for the first 100 pages). I do not understand, why this is the case, because in ImageJ I am able to open the first TIFF file of the multi-file stack and I can scroll through them without issue (it contains 512 images/pages).

I would be very grateful, if you could help figure out what is causing this issue.

Additional infos:

My TIFF files contain multiple positions (of the microscope stage) and for each position they have multiple channels for a given capture-timepoint. When I open them in ImageJ they have the following order, when I step through them with the frame selection slider (assuming two positions and two channels):

[Pos0-time0-channel0],
[Pos0-time0-channel1],
[Pos1-time0-channel0],
[Pos1-time0-channel1],
[Pos0-time1-channel0],
[Pos0-time1-channel1],
[Pos1-time1-channel0],
[Pos1-time1-channel1],
...

This is (preliminary) code of a class to reading my data (in case it helps you understand, what I am trying to achieve):

import tifffile as tff
import numpy as np

class MicroManagerTiffReader(object):

    def __init__(self, image_path):
        self.image_path = image_path
        with tff.TiffFile(self.image_path) as tiff:
            metadata = tiff.micromanager_metadata['Summary']
            self.height = metadata['Height']
            self.width = metadata['Width']
            self.channels = [c.strip(' ') for c in metadata['ChNames']]
            self.number_of_channels = len(self.channels)
            self.number_of_frames = metadata['Frames']
            if 'InitialPositionList' in metadata:
                self.positions = [c['Label'] for c in
                             metadata['InitialPositionList']]  # this is for OME-TIFF format from MicroManager 1
            elif 'StagePositions' in metadata:
                self.positions = [c['Label'] for c in
                             metadata['StagePositions']]  # this is for OME-TIFF format from MicroManager 2
            else:
                raise LookupError(
                    "TIFF metadata contains no entry for either 'InitialPositionList' or 'StagePositions'")
            self.number_of_positions = len(self.positions)

    def get_image(self, frame_index, channel_index, position_index):
        page_nr = self.calculate_page_nr(frame_index, channel_index, position_index)
        return self.get_copy_of_page(page=page_nr)

    def get_channel_stack(self, frame_index, position_index):
        pass

    def calculate_page_nr(self, frame_index, channel_index, position_index):
        page_nr = frame_index * self.number_of_channels * self.number_of_positions \
                  + position_index * self.number_of_channels \
                  + channel_index
        return page_nr

    def get_copy_of_page(self, page):
        img_memmap = tff.memmap(self.image_path, page=page, mode='r')
        img = np.copy(img_memmap)
        del img_memmap
        return img

This is the list of failed frames, which I generated with this code:

for ind in range(100):
    try:
        tff.memmap(image_path, page=ind, mode='r')
    except:
        print(f"Failed frame: {ind}")
Failed frame: 1
Failed frame: 2
Failed frame: 3
Failed frame: 6
Failed frame: 7
Failed frame: 10
Failed frame: 14
Failed frame: 15
Failed frame: 18
Failed frame: 22
Failed frame: 25
Failed frame: 26
Failed frame: 29
Failed frame: 30
Failed frame: 31
Failed frame: 33
Failed frame: 34
Failed frame: 35
Failed frame: 38
Failed frame: 42
Failed frame: 46
Failed frame: 47
Failed frame: 49
Failed frame: 50
Failed frame: 51
Failed frame: 53
Failed frame: 54
Failed frame: 55
Failed frame: 58
Failed frame: 61
Failed frame: 62
Failed frame: 63
Failed frame: 66
Failed frame: 69
Failed frame: 71
Failed frame: 72
Failed frame: 74
Failed frame: 75
Failed frame: 76
Failed frame: 78
Failed frame: 79
Failed frame: 80
Failed frame: 81
Failed frame: 82
Failed frame: 84
Failed frame: 85
Failed frame: 86
Failed frame: 88
Failed frame: 90
Failed frame: 91
Failed frame: 93
Failed frame: 94
Failed frame: 95
Failed frame: 96
Failed frame: 99

@cgohlke
Copy link
Owner

cgohlke commented Jan 12, 2021

It is counterproductive to use memory mapping like that. To get the image data of a specific page in a TIFF file as a numpy array use (e.g.) with tifffile.TiffFile(filename) as tif: data = tif.pages[pageindex].asarray(). It is best to keep the TiffFile instance open because parsing a TIFF file structure is an expensive operation.
Re image data are not memory-mappable: because tifffile detected that the image data was not stored in a memory-mappable format. If you think this is in error, please share a file.
I fail to see how this is related to OME-TIFF. You are using MicroManager, not OME metadata (?).

@michaelmell
Copy link
Author

Thank you for pointing this out to me: with tifffile.TiffFile(filename) as tif: data = tif.pages[pageindex].asarray()
With that I can read the pages without issue.
A follow-up question:
So to load pages from somewhere within the whole multi-file stack, I will have to memory map all TIFF-files of my stack and then calculate the correct file and page value in that file for retrieval. Is this correct or is there a better way to do this?

I fail to see how this is related to OME-TIFF. You are using MicroManager, not OME metadata (?).

You are right. It was because the file extension of my files is ome.tiff and that had me confused. I changed the title of this ticket accordingly.

@michaelmell michaelmell changed the title Read OME-TIFF files from MicroManager with memory-mapping How to read TIFF files from MicroManager with memory-mapping Jan 12, 2021
@cgohlke
Copy link
Owner

cgohlke commented Jan 13, 2021

So to load pages from somewhere within the whole multi-file stack, I will have to memory map all TIFF-files of my stack and then calculate the correct file and page value in that file for retrieval. Is this correct or is there a better way to do this?

If your file series is a multi-file OME-TIFF you can lazy access the image data via tifffile's zarr interface. Otherwise tifffile does not currently provide high-level access to individual pages of TIFF files in a multi-file series. You might want to check if other libraries support this, e.g. aicsimageio. Or just calculate the file/page indices yourself (which seems straightforward in your case) .

@michaelmell
Copy link
Author

I tried using the zarr interface like so:

image_path = '/path/to/first/file_in_multi_file.ome.tif'
store = tff.imread(image_path, aszarr=True)
z = zarr.open(store, mode='r')

This loads correctly, but the dimensions of z are incorrect. I get:

>>> z.shape
(480, 2, 2048, 2048)

where as it should be something like (32,480,2,2048,2048), since I have 32 positions in this case (with 2 channels and 480 frames).

Using AICSImage (thanks for bringing this to my attention), I get the following, but it is extremely slow even on my local SSD:

>>> img = AICSImage(image_path)
>>> img.dims
'STCZYX'
>>> img.shape
(32, 480, 2, 1, 2048, 2048)

Am I doing something wrong or is there a way to specify the axes?

@michaelmell
Copy link
Author

michaelmell commented Jan 14, 2021

To follow up on this and regarding my previous comment:

So to load pages from somewhere within the whole multi-file stack, I will have to memory map all TIFF-files of my stack and then calculate the correct file and page value in that file for retrieval. Is this correct or is there a better way to do this?

Is there a way to look up the correspondence between page-number and [position, time, channel] from the metadata? (I assume there must be because, MicroManager loads the data correctly)

The reason I ask (and also, why I started looking at the zarr interface) is was that I realized, that the ordering of pages is not as I described it above after all. So it is not:

[Pos0-time0-channel0], [Pos0-time0-channel1], [Pos1-time0-channel0], [Pos1-time0-channel1], [Pos0-time1-channel0], [Pos0-time1-channel1], [Pos1-time1-channel0], [Pos1-time1-channel1], ...

but instead pages get mixed up for higher page-numbers (e.g. after the first ~200 pages - I am unsure why). I therefore cannot calculate the correct page-number as proposed above.

@cgohlke
Copy link
Owner

cgohlke commented Jan 14, 2021

This loads correctly, but the dimensions of z are incorrect
...
where as it should be something like (32,480,2,2048,2048), since I have 32 positions in this case (with 2 channels and 480 frames).

The shape (480, 2, 2048, 2048) is likely correct. OME-TIFF stores positions as separate series. In general, different series can have different shapes, dtypes, transformations, or even overlap. Numpy and zarr cannot represent multi-series OME images in general. You can access the series in the file at TiffFile().series[]. It could also be that the file you open is not formally part of a multi-file OME-TIFF.

Is there a way to look up the correspondence between page-number and [position, time, channel] from the metadata? (I assume there must be because, MicroManager loads the data correctly)

That's a question for MicroManager. Tifffile uses the OME metadata.

@evamaxfield
Copy link

For the sake of potentially more feedback / cross-posting. I saw this thread and here is a related aicsimageio issue on "chunked reading is slow": AllenCellModeling/aicsimageio#178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants