# N2D to TIF

This notebook uses the library [**ND2**](https://github.com/tlambert03/nd2) to read the Nikon ND2 files as numpy arrays and then save them as 'TIF' files.

With [**ND2**](https://github.com/tlambert03/nd2), you can either read the ND2 file to a numpy array (`nd2.imread`) or use `nd2.ND2File()` as a context manager to get the attributes and metadata.

Additional libraries that you may need to install to run this notebook:
* [skimage](https://scikit-image.org/): This library is used for the `downsampling` functions.
* [tifffile](https://github.com/cgohlke/tifffile/tree/master): To save the ND2 recordings as TIF files.

Other libraries that read Nikon files include: 
* [N2dreader](https://github.com/Open-Science-Tools/nd2reader)
* [AICSImageIO](https://allencellmodeling.github.io/aicsimageio/): `pip install aicsimageio[nd2]`.


# Example data

In vivo recordings of sequential widefield imaging (`NIK01a01_seq`), single-channel widefield imaging (`NIK02a01`), and two-photon microscopy (`NIK01b01`).


# Import the libraries

Versions for this notebook: NumPy 1.23.5, Pandas 1.5.3, scikit-image 0.24.0, Matplotlib 3.6.3, nd2 0.10.1, tifffile 2023.2.28

In [None]:
import numpy as np
import pandas as pd
import os
import glob
import sys
from collections import defaultdict
import matplotlib.pyplot as plt

# Nikon files
import nd2
import tifffile

# Metadata
import xml.etree.ElementTree as ET
from xml.dom.minidom import parseString

# Downsampling function
from skimage.transform import resize

# Create the paths

In [None]:
notebook_name = 'nd2_to_tif'

# Data path to 'Data_example' folders. Change accordingly to your data structure.
data_path = os.path.dirname(os.getcwd())  # Moves one level up from the current directory

# Change the folder names accordingly
paths = {'data': data_path,
         'raw_data':  f'{data_path}/Data_examples/{notebook_name}/',
         'processed_data': f'{data_path}/Processed_data_examples/{notebook_name}/',
         'analysis': f'{data_path}/Analysis_examples/{notebook_name}/',         
         'plots': f'{data_path}/Analysis_examples/{notebook_name}/Plots/'}

# Make folders if they do not exist yet
for path in paths.values():
    os.makedirs(path, exist_ok=True)

# Functions

## Downsampling

The original downsample function was written by Hidde Damen (MSc student, 2023) using the `skimage` transform function [Scikit-image](https://scikit-image.org/docs/stable/auto_examples/transform/plot_rescale.html) based on the factor provided. I added the following modifications: 
- [Skimage rescale_intensity](https://scikit-image.org/docs/stable/api/skimage.exposure.html#skimage.exposure.rescale_intensity) was removed because it changes the distribution of pixel intensity values.
- Processing in blocks is useful when dealing with large stacks or memory limitations. Within each block iteration, the `del` statement releases memory for processed frames that are no longer needed and may help reduce memory usage further.

Read here about the following [image data types](https://scikit-image.org/docs/dev/user_guide/data_types.html):


|Data type|Range|
|--|--|
|uint8|0 to 255|
|unit16|0 to 65535|
|uint32|0 to 2^^32^ - 1|
|float|-1 to 1 or 0 to 1|
|int8|-128 to 127|
|int16|-32768 to 32767|
|int32|-2^^32^ to 2^^32^ - 1|


In [None]:
def downsample_stack(stack, block_size, nd2_downsample_factor, dtype=np.uint16):
    """
    Downsamples a stack in the XY plane, processing the stack in blocks of frames.
    Parameters:
        stack: the input stack
        block_size: Number of frames to process in each block. Note: Use the length of the stack for processing in one block.
        nd2_downsample_factor: the factor with which to downsample in the X and Y direction
        dtype: the dtype of the returned stack, defaults to np.uint16
    Returns the new stack, where frames are downsampled and blocks are preserved.
    """
    
    num_frames = stack.shape[0]  # Number of frames in the stack
    shape = list(stack.shape)  # Shape of the stack
    shape[-2] = int(shape[-2] / nd2_downsample_factor)  # Update the shape to downsample in X 
    shape[-1] = int(shape[-1] / nd2_downsample_factor)  # Update the shape to downsample in Y 
    downsampled_stack = np.empty(shape, dtype=dtype)  # Create an empty array for the new stack

    # Iterate over blocks of frames
    for block_start in range(0, num_frames, block_size):
        block_end = min(block_start + block_size, num_frames)  # Calculate the end index of the current block
        frames_block = stack[block_start:block_end]  # Get the frames within the current block

        # Downsample frames 
        shape = list(np.shape(frames_block))  # Get the total number of frames in the stack
        shape[-2] = int(shape[-2] / nd2_downsample_factor)  
        shape[-1] = int(shape[-1] / nd2_downsample_factor) 
        
        # Resize downsampled blocks of frames
        downsampled_block = resize(frames_block, shape, 
                                   order=0, preserve_range=True,  
                                   anti_aliasing=True)  # Default: True
        downsampled_block = downsampled_block.astype(dtype)

        # Store the downsampled frames in the new stack
        downsampled_stack[block_start:block_end] = downsampled_block.astype(dtype)

        # Delete the processed block from memory
        del frames_block  

    return downsampled_stack

## Metadata

* [ND2 documentation](https://github.com/tlambert03/nd2)
* Nikon file dimensions: {'T', 'Z', 'Channel', 'Y', 'X'}
* Be aware that the camera and Nikon software may not pass an accurate time-stamp of the images. The timestamp NIS shows is the moment that the image data arrived at the PC (software jitter). 

In [None]:
def nd2_metadata(experiment, recording, paths):
    """
    Extracts metadata from ND2 files and returns it as dictionaries.
    Args:
        experiment (str): The name of the experiment.
        recording (str): The name of the recording.
        paths (dict): Dictionary with paths.
    Returns:
        attributes_dict (dict): A dictionary containing metadata attributes extracted from the ND2 file.
        metadata_text (str): Textual metadata information from the ND2 file.
    """
    # Construct the path to the ND2 file
    nd2_file_path = f"{paths['raw_data']}{experiment}/{recording}.nd2"
    
    # Extract metadata from the ND2 file
    with nd2.ND2File(nd2_file_path) as nd2file:
        # Create a dictionary to store metadata attributes
        attributes_dict = {}
        attributes_dict['duration_s'] = nd2file.events()[-1]['Time [s]'] - nd2file.events()[0]['Time [s]']
        attributes_dict['sizes'] = nd2file.sizes
        attributes_dict['voxel_size'] = nd2file.voxel_size()
        attributes_dict['dtype'] = nd2file.dtype
        attributes_dict['attributes'] = nd2file.attributes
        
        # Extract metadata text
        metadata_text = nd2file.text_info
    
    # Return the attributes dictionary and metadata text
    return attributes_dict, metadata_text

# Loop through files

**Note**: This is an example to loop through the different ND2 example recordings. Adapt the if statements to your data structure and filenames. 

The loop uses the [walk function](https://docs.python.org/3/library/os.html) that yields a directory tree:
* `dirpath`. String, the path to the directory.
* `dirnames`. List of subdirectories
* `filenames`. List of file names. 

Other types of recordings can also be processed or filtered by adding more if statements or [regular expressions](https://docs.python.org/3/library/re.html) (see [cheat sheet](https://cheatography.com/davechild/cheat-sheets/regular-expressions/)). For example, you can save an Excel or CSV file with annotations in your raw data folder, containing the acquisition mode: 'xyt', 'xyt_seq', etc. It reads the annotations file, and the if statements process the Nikon files accordingly, using the information associated with each recording.


In [None]:
# Downsample parameters (change as needed)
block_size = 500
downsample_factor = 2

In [None]:
# Flag to check if any .nd2 files are found
nd2_files_found = False

for dirpath, dirnames, filenames in os.walk(paths['raw_data']):
    for filename in filenames:
        if filename.endswith('.nd2'):
            nd2_files_found = True  # Update flag if a file is found

            # Paths and names
            experiment = os.path.split(dirpath)[1]
            recording = os.path.splitext(filename)[0]
            file_path = f"{dirpath}/{filename}"
            tif_save_path = f"{paths['processed_data']}{experiment}"

            # Create folders
            os.makedirs(tif_save_path, exist_ok=True)
            os.makedirs(f"{paths['processed_data']}{experiment}/Metadata", exist_ok=True)

            # Load the stack as numpy array
            stack = nd2.imread(file_path)

            # Extract and save ND2 files metadata
            attributes, metadata = nd2_metadata(experiment, recording, paths)
            # Save the attributes dictionary and metadata as text files
            with open(f"{paths['processed_data']}{experiment}/Metadata/{recording}_attributes.txt", 'w') as attributes_file:
                for key, value in attributes.items():
                    attributes_file.write(f"{key}: {value}\n")
            with open(f"{paths['processed_data']}{experiment}/Metadata/{recording}_metadata.txt", 'w') as metadata_file:
                for key, value in metadata.items():
                    metadata_file.write(f"{key}: {value}\n")

            with nd2.ND2File(file_path) as nd2file:
                
                # Time-series with only one channel
                if 'C' not in nd2file.sizes and nd2file.sizes.get('T', 0) > 1 and not filename.endswith('_seq.nd2'):

                    # Downsample the stack
                    stack = downsample_stack(stack, block_size, downsample_factor)
                    
                    # Save the downsampled stack as tif
                    new_height = stack.shape[-2]
                    save_path_file = f"{tif_save_path}/{recording}_{new_height}px.tif"
                    tifffile.imwrite(save_path_file, stack)
                    print(save_path_file)

                # Time-series with sequential channels recorded as one channel
                if 'C' not in nd2file.sizes and nd2file.sizes.get('T', 0) > 1 and filename.endswith('_seq.nd2'):

                    # Downsample the stack
                    stack = downsample_stack(stack, block_size, downsample_factor)
                    
                    # Separate channels 
                    ch1 = stack[0::2, :, :]
                    ch2 = stack[1::2, :, :]
                    
                    # Save the downsampled stack as separate tif files
                    new_height = stack.shape[-2]
                    save_path_ch1 = f"{tif_save_path}/{recording}_ch1_{new_height}px.tif"
                    save_path_ch2 = f"{tif_save_path}/{recording}_ch2_{new_height}px.tif"
                    tifffile.imwrite(save_path_ch1, ch1)
                    tifffile.imwrite(save_path_ch2, ch2)
                    print(save_path_ch1), print(save_path_ch2)

                # Time-series with two channels
                if nd2file.sizes.get('C', 0) == 2 and nd2file.sizes.get('T', 0) > 1:

                    # Downsample the stack (if needed)
                    # stack = downsample_stack(stack, block_size, downsample_factor)
                    
                    # Separate channels 
                    ch1 = stack[:, 0, :, :]
                    ch2 = stack[:, 1, :, :]
                    
                    # Save the downsampled stack as separate tif files
                    new_height = stack.shape[-2]
                    save_path_ch1 = f"{tif_save_path}/{recording}_ch1.tif"
                    save_path_ch2 = f"{tif_save_path}/{recording}_ch2.tif"
                    tifffile.imwrite(save_path_ch1, ch1)
                    tifffile.imwrite(save_path_ch2, ch2)
                    print(save_path_ch1), print(save_path_ch2)

if not nd2_files_found:
    print("No 'nd2' files were found")

# Histograms

To identify artifacts and possible changes in image intensity values after downsampling, plot the cumulative histograms to compare the pixel distribution before and after downsampling. In this example, I used the first channel of the stack.

See options for plotting [Cumulative histograms](https://matplotlib.org/stable/gallery/statistics/histogram_cumulative.html) in Matplotlib.

**Note**: Here I show another way to loop through specific files.

In [None]:
experiments = ['NIK01', 'NIK02']
recordings = ['NIK01b01', 'NIK02a01']

# Create a dictionary with the selected recordings
recordings_dict = defaultdict(list)
for recording in recordings:
    for experiment in experiments:
        if recording.startswith(experiment):
            recordings_dict[experiment].append(recording)

for experiment, recording_list in recordings_dict.items():
    for recording in recording_list: 
        # Load original stacks
        stack_orig_path = glob.glob(f"{paths['raw_data']}{experiment}/{recording}*.nd2")[0]
        
        stack_orig = nd2.imread(stack_orig_path)
        
        # # Load downsampled stacks
        stack_down_path = glob.glob(f"{paths['processed_data']}{experiment}/{recording}*_*.tif")[0]
        stack_down = tifffile.imread(stack_down_path)
        
        # Plotting
        fig, ax = plt.subplots()
        ax.hist(stack_orig[:, :, 0].flatten(), density=True, bins=100, 
                cumulative=True, histtype='step', fill=False, 
                edgecolor='gray', label='original')
        ax.hist(stack_down[:, :, 0] .flatten(), density=True, bins=100, 
                cumulative=True, histtype='step', fill=False, 
                edgecolor='magenta', label='downsampled')
        ax.legend()
        ax.set_title(f"{recording}")
        ax.set_xlabel("Pixel Values")
        ax.set_ylabel("Cumulative Density")
    
        # Save plots
        os.makedirs(f"{paths['plots']}{experiment}", exist_ok=True)
        plot_save_path = f"{paths['plots']}{experiment}/{recording}"
        plt.savefig(f"{plot_save_path}_histogram.png", dpi=300)
        print(f"{plot_save_path}_histogram.png")
        plt.close()