# data_readers

> This module contains all code responsible for reading microscopy images and region-of-interest files:

In [None]:
#| default_exp data_readers

In [None]:
#| hide
from nbdev.showdoc import *

Imaging softwares, especially proprietary ones, produce a vast variety of filetypes. Likewise, you can also have your region-of-interest annotations in various different formats. This module is here to help you get all your data properly read, while the actual interface to the remaining modules, i.e. the `DataLoader`s can be found in the "data_loaders" module. This makes it easy to add more `DataReader` subclasses for additional input formats, while the `DataLoader` classes don´t have to be adjusted. For this to work the `DataReader` subclasses have to define what filetype extension they can read. This will be done via the `.readable_filetype_extensions` attribute as described below.

In [None]:
#| export

from abc import ABC, abstractmethod
import os
from pathlib import Path
import numpy as np
import pandas as pd
import czifile
from skimage.io import imread

In [None]:
#| export

class DataReader(ABC):
    
    """
    Abstract base class that defines the general structure of DataReader subclasses.
    Essentially, it demands the corresponding subclasses to define the "datatype" and
    the "readable_filetype_extensions" attributes, as well as the "read()" method.
    """
    
    @property
    @abstractmethod
    def datatype(self) -> str:
        """
        Property that will be used to filter the DataReader subclasses by datatype they can handle, 
        e.g. "microscopy_images" or "roi_files".
        """
        pass
    
    
    @property
    @abstractmethod
    def readable_filetype_extensions(self) -> List[str]:
        """
        Property that will denote which filetype extensions the respective DataReader subclass can handle.
        """
        pass
    
    @abstractmethod
    def read(self, filepath: Path):
        pass

In [None]:
#| export

class MicroscopyImageReader(DataReader):
    
    """
    The read method of MicroscopyImageReader subclasses has to return a numpy array with the following structure:
    [imaging-planes, rows, columns, imaging-channel] 
    For instance, the shape of the array of a RGB Zstack with 10 image planes and 1024x1024 pixels would look like:
    [10, 1024, 1024, 3]
    To improve re-usability of the same functions for all different kinds of input images, this structure will be used even if there is just a single plane. 
    For instance, the shape of the array of a grayscale 2D image with 1024 x 1024 pixels should look like this:
    [1, 1024, 1024, 1]    
    """
    
    def datatype(self) -> str:
        return 'microscopy_images'


Currently, there are the following MicroscopyImageReader subclasses implemented:

In [None]:
#| export
class CZIReader(MicroscopyImageReader):
    
    """
    This reader enables loading of images acquired with the ZEN imaging software by Zeiss, using the czifile package.
    """
    def read(self,
             filepath: Path # filepath to the microscopy image file
            ) -> np.ndarray: # numpy array with the structure: [imaging-planes, rows, columns, imaging-channel]
        return czifile.imread(filepath.as_posix())[0, 0, 0]

In [None]:
#show_doc(CZIReader.read)

---

[source](https://github.com/Defense-Circuits-Lab/findmycells/blob/main/findmycells/microscopy_images.py#L43){target="_blank" style="float:right; font-size:smaller"}

### CZIReader.read

>      CZIReader.read (filepath:pathlib.Path)

Abstract method that needs to be defined by the respective subclass
Returns the microscopy image as np.ndarray with structure: [imaging-planes, rows, columns, imaging-channel] 
For instance, the shape of the array of a RGB Zstack with 10 image planes and 1024x1024 pixels would look like:
[10, 1024, 1024, 3]
To improve re-usability of the same functions for all different kinds of input images, this structure will be used even if there is just a single plane. 
For instance, the shape of the array of a grayscale 2D image with 1024 x 1024 pixels should look like this:
[1, 1024, 1024, 1]

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| filepath | Path | filepath to the microscopy image file |
| **Returns** | **ndarray** | **numpy array with the structure: [imaging-planes, rows, columns, imaging-channel]** |

In [None]:
#| export
class RegularImageFiletypeReader(MicroscopyImageReader):
    
    """
    This reader enables loading of all regular image filetypes, that scikit-image can read, using the scikit-image.io.imread function.
    """
    def read(self,
             filepath: Path # filepath to the microscopy image file
            ) -> np.ndarray: # numpy array with the structure: [imaging-planes, rows, columns, imaging-channel]
        single_plane_image = imread(filepath)
        return np.expand_dims(single_plane_image, axis=[0, -1])

In [None]:
#show_doc(RegularImageFiletypeReader.read)

---

[source](https://github.com/Defense-Circuits-Lab/findmycells/blob/main/findmycells/microscopy_images.py#L54){target="_blank" style="float:right; font-size:smaller"}

### RegularImageFiletypeReader.read

>      RegularImageFiletypeReader.read (filepath:pathlib.Path)

Abstract method that needs to be defined by the respective subclass
Returns the microscopy image as np.ndarray with structure: [imaging-planes, rows, columns, imaging-channel] 
For instance, the shape of the array of a RGB Zstack with 10 image planes and 1024x1024 pixels would look like:
[10, 1024, 1024, 3]
To improve re-usability of the same functions for all different kinds of input images, this structure will be used even if there is just a single plane. 
For instance, the shape of the array of a grayscale 2D image with 1024 x 1024 pixels should look like this:
[1, 1024, 1024, 1]

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| filepath | Path | filepath to the microscopy image file |
| **Returns** | **ndarray** | **numpy array with the structure: [imaging-planes, rows, columns, imaging-channel]** |

In [None]:
#| export
class FromExcel(MicroscopyImageReader):
    
    """
    This reader is actually only a wrapper to the other MicroscopyImageReader subclasses. It can be used if you stored the filepaths
    to your individual plane images in an excel sheet, for instance if you were using our "prepare my data for findmycells" functions.
    Please be aware that the corresponding datatype has to be loadable with any of the corresponding MicroscopyImageReaders!
    """
    # should actually again check which loaded is applicable! Could be any!
    def read(self,
             filepath: Path # filepath to the excel sheet that contains the filepaths to the corresponding image files
            ) -> np.ndarray: # numpy array with the structure: [imaging-planes, rows, columns, imaging-channel]
        df_single_plane_filepaths = pd.read_excel(filepath)
        single_plane_images = []
        for row_index in range(df_single_plane_filepaths.shape[0]):
            single_plane_image_filepath = df_single_plane_filepaths['plane_filepath'].iloc[row_index]
            single_plane_images.append(imread(single_plane_image_filepath))
        return np.stack(single_plane_images)

In [None]:
show_doc(FromExcel.read)

---

[source](https://github.com/Defense-Circuits-Lab/findmycells/blob/main/findmycells/microscopy_images.py#L69){target="_blank" style="float:right; font-size:smaller"}

### FromExcel.read

>      FromExcel.read (filepath:pathlib.Path)

Abstract method that needs to be defined by the respective subclass
Returns the microscopy image as np.ndarray with structure: [imaging-planes, rows, columns, imaging-channel] 
For instance, the shape of the array of a RGB Zstack with 10 image planes and 1024x1024 pixels would look like:
[10, 1024, 1024, 3]
To improve re-usability of the same functions for all different kinds of input images, this structure will be used even if there is just a single plane. 
For instance, the shape of the array of a grayscale 2D image with 1024 x 1024 pixels should look like this:
[1, 1024, 1024, 1]

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| filepath | Path | filepath to the excel sheet that contains the filepaths to the corresponding image files |
| **Returns** | **ndarray** | **numpy array with the structure: [imaging-planes, rows, columns, imaging-channel]** |

While the `MicroscopyImageReader` subclasses defined above are actually doing the job of reading your images, the following `MicroscopyImageLoader` provides the interface to the remaining modules:

In [None]:
#| export
class MicroscopyImageLoader:
    
    def __init__(self, filepath: Path, filetype: str):
        self.filepath = filepath
        self.reader = self.determine_reader(filetype = filetype)
    
    def determine_reader(self, filetype: str) -> MicroscopyImageReader:
        if filetype == '.czi':
            reader = CZIReader()
        elif filetype in ['.png', '.PNG']: #add more that are applicable!
            reader = RegularImageFiletypeReader()
        elif filetype == '.xlsx':
            reader = FromExcel()
        else:
            message_part1 = 'The microscopy image format you are trying to load is not implemented yet.'
            message_part2 = 'Please consider raising an issue in our GitHub repository!'
            full_message = message_part1 + message_part2
            raise ValueError(full_message)
        return reader
    
    def as_array(self):
        return self.reader.read(filepath = self.filepath)

In [None]:
#| hide
import nbdev; nbdev.nbdev_export()