## Tutorial Steps:

1. **Download Example ISS Dataset:** Obtain the provided ISS dataset to work with.

2. **Optional: Deconvolution and Maximum Intensity Projection:** You have the option to apply deconvolution and create maximum intensity projections from the raw image data.

3. **Stitching Image Data:** Combine the image data using stitching techniques.

4. **Decode Image Data:** Decode the stitched image data.

5. **Quality Control and Visualization:** Evaluate the results through quality control measures and visualize them.


### Step 1. Download ISS Data
To begin, download the ISS toy dataset by clicking on the following link: [ISS Toy Dataset](https://drive.google.com/drive/folders/1AmNFyTtnl3i1QOuFRs_4u2StVnj7FkZK?usp=drive_link).


Once the dataset is downloaded, take a moment to examine the file names and familiarize yourself with their naming conventions. The files adhere to the following naming pattern: `stage{stage}_round{round}_z{z}_channel{channel}.tif`, where the placeholders correspond to the numerical identifiers for the stage position, staining round, z level, and channel.


Next, we'll proceed to load the dataset into an `ISSDataContainer` class. This class is designed to facilitate dataset management without the need to load the entire contents into memory simultaneously.

In [1]:
from imaging_utils import ISSDataContainer

# Create the container
issdata = ISSDataContainer()

# Add images
# join('downloads', 'stage{stage}_rounds{round}_z{z}_channel{channel}.tif')
pattern = 'decoding_tutorial\\S{stage}_R{round}_C{channel}_Z{z}.tif'
issdata.add_images_from_filepattern(pattern)

Added decoding_tutorial\S0_R0_C0_Z17.tif. Stage: 0, Round: 0, Channel: 0
Added decoding_tutorial\S0_R0_C1_Z17.tif. Stage: 0, Round: 0, Channel: 1
Added decoding_tutorial\S0_R0_C2_Z17.tif. Stage: 0, Round: 0, Channel: 2
Added decoding_tutorial\S0_R0_C3_Z17.tif. Stage: 0, Round: 0, Channel: 3
Added decoding_tutorial\S0_R0_C4_Z17.tif. Stage: 0, Round: 0, Channel: 4
Added decoding_tutorial\S1_R0_C0_Z17.tif. Stage: 1, Round: 0, Channel: 0
Added decoding_tutorial\S1_R0_C1_Z17.tif. Stage: 1, Round: 0, Channel: 1
Added decoding_tutorial\S1_R0_C2_Z17.tif. Stage: 1, Round: 0, Channel: 2
Added decoding_tutorial\S1_R0_C3_Z17.tif. Stage: 1, Round: 0, Channel: 3
Added decoding_tutorial\S1_R0_C4_Z17.tif. Stage: 1, Round: 0, Channel: 4
Added decoding_tutorial\S0_R1_C0_Z17.tif. Stage: 0, Round: 1, Channel: 0
Added decoding_tutorial\S0_R1_C1_Z17.tif. Stage: 0, Round: 1, Channel: 1
Added decoding_tutorial\S0_R1_C2_Z17.tif. Stage: 0, Round: 1, Channel: 2
Added decoding_tutorial\S0_R1_C3_Z17.tif. Stage: 0,

<imaging_utils.ISSDataContainer at 0x20515756fd0>

For verification, you can print out the size of the dataset.


In [2]:
num_stages, num_rounds, num_channels = issdata.get_dataset_shape()
print(f'There are {num_stages} number of stages')
print(f'There are {num_rounds} number of rounds')
print(f'There are {num_channels} number of channels')

There are 2 number of stages
There are 5 number of rounds
There are 5 number of channels


We can also verify that there are equal number of images for each stage, round and channel

In [3]:
issdata.is_dataset_complete()

IncompleteDatasetError: Found different number of channels for a given stage location.

(Optional) Let's take a look at the data using Napari.

In [3]:
import napari

# Select small piece of the data
small_data = issdata.select(stage=0, round=0)

# Load images into memory
small_data.load()

# Run Napari
viewer = napari.Viewer()
viewer.add_image(small_data.data.squeeze())
napari.run()

# Free memory
small_data.unload()

{'image_files': ['C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z0_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z1_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z2_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z3_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z4_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z5_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z6_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z7_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z8_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z9_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z10_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d\\R0_C0_Z11_L0.tif', 'C:\\Users\\Axel\\Documents\\ISTDECO\\downloads\\liver_3d

### Step 2. 2D Projection

In this step, we will perform a 2D projection of our data through maximum intensity projection. This involves selecting the maximum pixel value across different z-planes. To enhance the clarity of the 2D images, we can apply deconvolution. It's worth noting that deconvolution can be applied either before or after the 2D projection. However, it's important to highlight that deconvolution can be computationally intensive, often requiring a CUDA-supported GPU for efficient processing, especially when dealing with substantial stacks of 3D multiplexed images. For the purpose of this tutorial, we will omit the deconvolution step, but the necessary functions can be found in the `deconvolution.py` file.


In [12]:
# The iterate dataset allows us to iterate the dataset over stages, rounds and channels.
import numpy as np
from imaging_utils import imwrite
from os.path import join

for index, small_dataset in issdata.iterate_dataset(iter_stages=True, iter_rounds=True, iter_channels=True):
    # Load the small dataset
    small_dataset.load()
    # Get the image data
    data = small_dataset.data
    # MIP the data
    data = np.squeeze(data).max(axis=0)
    # Save the data
    imwrite(join('MIP','S{stage}_R{round}_C{channel}.tif'.format(**index)), data)
    # Finally, we unload the images (otherwise we might run oom)
    small_dataset.unload()

# Or equivalently ...
# from ISSDataset import mip
# mip(join('mip','stage{stage}_round{round}_channel{channel}.tif'), issdata)


IndexError: list index out of range

### Step 3. Stitching

In this step, we will proceed to stitch the data using ASHLAR. This task can be accomplished by utilizing the `stitch_ashlar.py` function.


In [14]:
from imaging_utils import stitch_ashlar

# First we load the miped data
iss_data_miped = ISSDataContainer()
iss_data_miped.add_images_from_filepattern(join('MIP','S{stage}_R{round}_C{channel}.tif'))


Added MIP\S0_R0_C0.tif. Stage: 0, Round: 0, Channel: 0
Added MIP\S0_R0_C1.tif. Stage: 0, Round: 0, Channel: 1
Added MIP\S0_R0_C2.tif. Stage: 0, Round: 0, Channel: 2
Added MIP\S0_R0_C3.tif. Stage: 0, Round: 0, Channel: 3
Added MIP\S0_R0_C4.tif. Stage: 0, Round: 0, Channel: 4
Added MIP\S1_R0_C0.tif. Stage: 1, Round: 0, Channel: 0
Added MIP\S1_R0_C1.tif. Stage: 1, Round: 0, Channel: 1
Added MIP\S1_R0_C2.tif. Stage: 1, Round: 0, Channel: 2
Added MIP\S1_R0_C3.tif. Stage: 1, Round: 0, Channel: 3
Added MIP\S1_R0_C4.tif. Stage: 1, Round: 0, Channel: 4
Added MIP\S0_R1_C0.tif. Stage: 0, Round: 1, Channel: 0
Added MIP\S0_R1_C1.tif. Stage: 0, Round: 1, Channel: 1
Added MIP\S0_R1_C2.tif. Stage: 0, Round: 1, Channel: 2
Added MIP\S0_R1_C3.tif. Stage: 0, Round: 1, Channel: 3
Added MIP\S0_R1_C4.tif. Stage: 0, Round: 1, Channel: 4
Added MIP\S1_R1_C0.tif. Stage: 1, Round: 1, Channel: 0
Added MIP\S1_R1_C1.tif. Stage: 1, Round: 1, Channel: 1
Added MIP\S1_R1_C2.tif. Stage: 1, Round: 1, Channel: 2
Added MIP\

<imaging_utils.ISSDataContainer at 0x191393c1bb0>

To successfully register and stitch the image data, it's crucial to have access to the initial position of each stage in pixel coordinates. This information can typically be extracted from the microscope software.

In [16]:
from imaging_utils import stitch_ashlar
stage_locations = {
    0: (0, 0), 
    1: (0, 1843), 
}

# Stitch using ASHLAR
stitch_ashlar(join('stitched','R{round}_C{channel}.tif'), iss_data_miped, stage_locations, reference_channel=4)

Stitching and registering input images
Cycle 0:
    reading filepattern|C:\Users\Axel\AppData\Local\Temp\tmphvfa4q7r\round0|pattern=round0_r{row:1}_c{col:1}_ch{channel:1}.tif|overlap=0.1|pixel_size=1


FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Axel\\AppData\\Local\\Temp\\tmphvfa4q7r\\round0\\MIP\\S0_R0_C0.tif'

### Step 4. Decoding

In this step, we will proceed to decode the previously stitched image data.

{'stage_locations': {0: (0, 0), 1: (0, 1843)},
 'codebook': {'DVL2': {'round_index': [0, 1, 2, 3, 4],
   'channel_index': [0, 3, 2, 1, 0]},
  'MLXIPL': {'round_index': [0, 1, 2, 3, 4], 'channel_index': [3, 0, 2, 1, 0]},
  'CDH1': {'round_index': [0, 1, 2, 3, 4], 'channel_index': [2, 1, 2, 2, 3]},
  'MXD1': {'round_index': [0, 1, 2, 3, 4], 'channel_index': [1, 2, 2, 2, 3]},
  'CDON': {'round_index': [0, 1, 2, 3, 4], 'channel_index': [0, 2, 3, 1, 0]},
  'TIE1': {'round_index': [0, 1, 2, 3, 4], 'channel_index': [3, 1, 3, 1, 0]},
  'RHOA': {'round_index': [0, 1, 2, 3, 4], 'channel_index': [2, 0, 3, 1, 0]},
  'PRKCZ': {'round_index': [0, 1, 2, 3, 4], 'channel_index': [1, 3, 3, 1, 0]},
  'MNT': {'round_index': [0, 1, 2, 3, 4], 'channel_index': [0, 1, 0, 1, 0]},
  'CAMK2A': {'round_index': [0, 1, 2, 3, 4], 'channel_index': [3, 2, 0, 1, 0]},
  'CDC42': {'round_index': [0, 1, 2, 3, 4], 'channel_index': [2, 3, 0, 1, 0]},
  'GLI2': {'round_index': [0, 1, 2, 3, 4], 'channel_index': [1, 0, 0, 1, 0]

In [8]:
from imaging_utils import ISSDataContainer
issdata = ISSDataContainer().add_images_from_filepattern(join('stitched','R{round}_C{channel}.tif'))
# Load the stitched data
issdata = ISSDataContainer().add_images_from_filepattern(join('stitched','R{round}_C{channel}.tif'))
# Load the data into memory
issdata.load()

In [34]:
import pickle
# Load combinatorial labels (the codebook)
# The metadata file is available in the Google Drive
metadata = pickle.load(open('metadata.pkl', 'rb'))
codebook = metadata['codebook']
gene_names = list(codebook.keys())

# We need to create a 3D numpy array of shape (num_genes, num_rounds, num_channels)
# that containes the combinatorial labells in a one-hot format
n_codes, n_rounds, n_channels = len(codebook), 5, 5
codebook_numpy = np.zeros((n_codes, n_rounds, n_channels))
for gene_id, (gene, indices) in enumerate(codebook.items()):
    round_id, channel_id = indices['round_index'], indices['channel_index']
    codebook_numpy[gene_id, round_id, channel_id] = 1

print('Example code in the codebook:')
print(codebook_numpy[0])


Example code in the codebook:
[[1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 1. 0. 0.]
 [0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0.]]


In [28]:
from imaging_utils import ISSDataContainer
from decoding import istdeco_decode, estimate_fdr
import pandas as pd
from os.path import join



# Run the decoding
results = []
for tile, origin in issdata.iterate_tiles(tile_height=512, tile_width=512, squeeze=True):
    print('Hej')
    # Decode the data using matrix factorization

    # Depending on your data, you might want to adjust the parameter min_integrated_intensity
    # or min_correct_spots
    # Usually a quality threshold between 0.5 and 0.85 works fine. 

    # This is really slow unless we can run on the GPU.
    decoded_table = istdeco_decode(tile, codebook_numpy, psf_sigma=(2.0, 2.0), device='cpu')

    decoded_table['Y'] = decoded_table['Y'] + origin[0]
    decoded_table['X'] = decoded_table['X'] + origin[1]
    results.append(pd.DataFrame(decoded_table))

    # Remove this to run over everything
    break

results = pd.concat(results, axis=1)

Hej


In [31]:
results['Gene' ] = [gene_names[id] for id in results['Target id']]
results


Unnamed: 0,Target id,Y,X,Intensity,Num explained spots,Rounds,Channels,Gene
0,1,136,52,28665.427734,4.317851,0;1;2;3;4,3;0;2;1;0,MLXIPL
1,3,437,186,9208.605469,3.607927,0;1;2;3;4,1;2;2;2;3,MXD1
2,5,134,346,10212.162109,3.244146,0;1;2;3;4,3;1;3;1;0,TIE1
3,16,491,448,16951.195312,3.878304,0;1;2;3;4,2;3;1;0;0,GGT5
4,45,156,152,11950.286133,3.599404,0;1;2;3;4,0;2;2;0;0,TBX2
...,...,...,...,...,...,...,...,...
273,146,337,134,8287.593750,3.466586,0;1;2;3;4,0;3;0;3;3,PDGFRB
274,148,266,477,16989.785156,3.830489,0;1;2;3;4,3;3;2;0;2,TAGLN
275,148,267,58,3753.294922,3.463863,0;1;2;3;4,3;3;2;0;2,TAGLN
276,148,269,62,3715.117432,3.300487,0;1;2;3;4,3;3;2;0;2,TAGLN


Some of the genes are marked as `Negatives` in the codebook. These genes correspond to non-biological labels that we do not expect to find in the data. Treating these negative genes as false-positives allow us to estimate a false discovery rate. This value is useful for quality control. 

In [36]:
positive_labels = [gene for gene in gene_names if 'Negative' not in gene]
negative_labels =  [gene for gene in gene_names if 'Negative' in gene]
fdr = estimate_fdr(results['Gene'], negative_labels, positive_labels)
print(f'False discovery rate is: {fdr}')

0.0


In [40]:
# We can also compute the quality for a different quality threshold 
fdr = estimate_fdr(results.query('`Num explained spots` > 3')['Gene'], negative_labels, positive_labels)
print(f'False discovery rate is: {fdr}')

False discovery rate is: 0.0


An FDR < 1% is pretty OK