## Tutorial Steps:

1. **Download Example ISS Dataset:** Obtain the provided ISS dataset to work with.

2. **Optional: Deconvolution and Maximum Intensity Projection:** You have the option to apply deconvolution and create maximum intensity projections from the raw image data.

3. **Stitching Image Data:** Combine the image data using stitching techniques.

4. **Decode Image Data:** Decode the stitched image data.

5. **Quality Control and Visualization:** Evaluate the results through quality control measures and visualize them.


# Installation

We recommened installing all the necessary packages using [miniconda](https://docs.conda.io/en/latest/miniconda.html)
or [Anaconda](https://www.anaconda.com/products/individual)

Begin by creating a named conda environment with python 3.10:
```bash
conda create -y -n iss_decoding_tutorial python=3.10
```

Activate the conda environment:
```bash
conda activate iss_decoding_tutorial
```

In the activated environment, install the following packages:
```bash
conda install -y -c conda-forge numpy scipy matplotlib networkx scikit-image=0.19 scikit-learn "tifffile>=2023.3.15" zarr pyjnius blessed
pip install ashlar pandas tqdm seaborn torch napari[all]
```

We also need to install `libvips` and `pyvips`. On Linux and macOS, this can be done through conda:

```bash
conda install -c conda-forge libvips pyvips
```

on Windows you can download a pre-compiled binary from the libvips website.

https://libvips.github.io/libvips/install.html


OBS! You will also need to add `vips-dev-x.y\bin` to your PATH so that pyvips can find all the DLLs it needs. You can either do this in the Advanced System Settings control panel,

Next, install `pyvips` as

```bash
pip install pyvips
```


### Step 1. Download ISS Data
To begin, download the ISS toy dataset by clicking on the following link: [ISS Toy Dataset](https://drive.google.com/file/d/1zYoUHDOCIuvyJBWj-KQnbVMM4THBf7ll/view?usp=drive_link).


Once the dataset is downloaded, take a moment to examine the file names and familiarize yourself with their naming conventions. The files adhere to the following naming pattern: `stage{stage}_round{round}_z{z}_channel{channel}.tif`, where the placeholders correspond to the numerical identifiers for the stage position, staining round, z level, and channel.


Next, we'll proceed to load the dataset into an `ISSDataContainer` class. This class is designed to facilitate dataset management without the need to load the entire contents into memory simultaneously.

In [1]:
from tools.image_container import ISSDataContainer
from os.path import join
# Create the container
issdata = ISSDataContainer()

# Add images
# join('downloads', 'stage{stage}_rounds{round}_z{z}_channel{channel}.tif')
pattern = join('datasets', 'tutorial', 'decoding_tutorial', 'S{stage}_R{round}_C{channel}_Z{z}.tif') 
issdata.add_images_from_filepattern(pattern)

Added datasets\tutorial\decoding_tutorial\S0_R0_C0_Z17.tif. Stage: 0, Round: 0, Channel: 0
Added datasets\tutorial\decoding_tutorial\S0_R0_C1_Z17.tif. Stage: 0, Round: 0, Channel: 1
Added datasets\tutorial\decoding_tutorial\S0_R0_C2_Z17.tif. Stage: 0, Round: 0, Channel: 2
Added datasets\tutorial\decoding_tutorial\S0_R0_C3_Z17.tif. Stage: 0, Round: 0, Channel: 3
Added datasets\tutorial\decoding_tutorial\S0_R0_C4_Z17.tif. Stage: 0, Round: 0, Channel: 4
Added datasets\tutorial\decoding_tutorial\S1_R0_C0_Z17.tif. Stage: 1, Round: 0, Channel: 0
Added datasets\tutorial\decoding_tutorial\S1_R0_C1_Z17.tif. Stage: 1, Round: 0, Channel: 1
Added datasets\tutorial\decoding_tutorial\S1_R0_C2_Z17.tif. Stage: 1, Round: 0, Channel: 2
Added datasets\tutorial\decoding_tutorial\S1_R0_C3_Z17.tif. Stage: 1, Round: 0, Channel: 3
Added datasets\tutorial\decoding_tutorial\S1_R0_C4_Z17.tif. Stage: 1, Round: 0, Channel: 4
Added datasets\tutorial\decoding_tutorial\S0_R1_C0_Z17.tif. Stage: 0, Round: 1, Channel: 0

<tools.image_container.ISSDataContainer at 0x11a2fe1a440>

For verification, you can print out the size of the dataset.


In [2]:
num_stages, num_rounds, num_channels = issdata.get_dataset_shape()
print(f'There are {num_stages} number of stages')
print(f'There are {num_rounds} number of rounds')
print(f'There are {num_channels} number of channels')

There are 2 number of stages
There are 5 number of rounds
There are 5 number of channels


We can also verify that there are equal number of images for each stage, round and channel

In [3]:
issdata.is_dataset_complete()

True

(Optional) Let's take a look at the data using Napari.

In [4]:
if False:
    import napari

    # Select small piece of the data
    small_data = issdata.select(stage=0, round=0)

    # Load images into memory
    small_data.load()

    # Run Napari
    viewer = napari.Viewer()
    viewer.add_image(small_data.data.squeeze())
    napari.run()

    # Free memory
    small_data.unload()

### Step 2. 2D Projection

In this step, we will perform a 2D projection of our data through maximum intensity projection. This involves selecting the maximum pixel value across different z-planes. To enhance the clarity of the 2D images, we can apply deconvolution. It's worth noting that deconvolution can be applied either before or after the 2D projection. However, it's important to highlight that deconvolution can be computationally intensive, often requiring a CUDA-supported GPU for efficient processing, especially when dealing with substantial stacks of 3D multiplexed images. For the purpose of this tutorial, we will omit the deconvolution step, but the necessary functions can be found in the `deconvolution.py` file.


In [6]:
# The iterate dataset allows us to iterate the dataset over stages, rounds and channels.
import numpy as np
from tools.utils import imwrite
from os.path import join

for index, small_dataset in issdata.iterate_dataset(iter_stages=True, iter_rounds=True, iter_channels=True):
    # Load the small dataset
    small_dataset.load()
    # Get the image data
    data = small_dataset.data
    # MIP the data
    data = np.squeeze(data).max(axis=0)
    # Save the data
    imwrite(join('datasets','tutorial', 'mipped', 'S{stage}_R{round}_C{channel}.tif'.format(**index)), data)
    # Finally, we unload the images (otherwise we might run oom)
    small_dataset.unload()

# Or equivalently ...
# from ISSDataset import mip
# mip(join('mip','stage{stage}_round{round}_channel{channel}.tif'), issdata)


### Step 3. Stitching

We will proceed to stitch the data using ASHLAR. This task can be accomplished by utilizing the `stitch_ashlar.py` function.


In [7]:
from os.path import join
# First we load the miped data
iss_data_miped = ISSDataContainer()
iss_data_miped.add_images_from_filepattern(join('datasets','tutorial', 'mipped','S{stage}_R{round}_C{channel}.tif'))


Added datasets\tutorial\mipped\S0_R0_C0.tif. Stage: 0, Round: 0, Channel: 0
Added datasets\tutorial\mipped\S0_R0_C1.tif. Stage: 0, Round: 0, Channel: 1
Added datasets\tutorial\mipped\S0_R0_C2.tif. Stage: 0, Round: 0, Channel: 2
Added datasets\tutorial\mipped\S0_R0_C3.tif. Stage: 0, Round: 0, Channel: 3
Added datasets\tutorial\mipped\S0_R0_C4.tif. Stage: 0, Round: 0, Channel: 4
Added datasets\tutorial\mipped\S1_R0_C0.tif. Stage: 1, Round: 0, Channel: 0
Added datasets\tutorial\mipped\S1_R0_C1.tif. Stage: 1, Round: 0, Channel: 1
Added datasets\tutorial\mipped\S1_R0_C2.tif. Stage: 1, Round: 0, Channel: 2
Added datasets\tutorial\mipped\S1_R0_C3.tif. Stage: 1, Round: 0, Channel: 3
Added datasets\tutorial\mipped\S1_R0_C4.tif. Stage: 1, Round: 0, Channel: 4
Added datasets\tutorial\mipped\S0_R1_C0.tif. Stage: 0, Round: 1, Channel: 0
Added datasets\tutorial\mipped\S0_R1_C1.tif. Stage: 0, Round: 1, Channel: 1
Added datasets\tutorial\mipped\S0_R1_C2.tif. Stage: 0, Round: 1, Channel: 2
Added datase

<tools.image_container.ISSDataContainer at 0x11a4f7f2e00>

To successfully register and stitch the image data, it's crucial to have access to the initial position of each stage in pixel coordinates. This information can typically be extracted from the microscope software.

FAQ 1: If you get an error saying `Exception: Unable to find JAVA_HOME` you have to install OpenJDK11. On Windows, OpenJDK can be downloaded from [here](https://learn.microsoft.com/en-us/java/openjdk/download#openjdk-11). on Linux and macOS, see [this](https://openjdk.org/install/). Perhaps it can be installed through conda ...

FAQ 2: If you get the error `ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject` try to downgrade numpy for version `1.26.4`.






In [1]:
from tools.image_container import ISSDataContainer
from tools.stitching import stitch
from os.path import join
# First we load the miped data
iss_data_miped = ISSDataContainer()
iss_data_miped.add_images_from_filepattern(join('datasets','tutorial','mipped','S{stage}_R{round}_C{channel}.tif'))

stage_locations = {
    0: (0, 0), 
    1: (0, 1), 
}

# Stitch using ASHLAR
stitch(iss_data_miped, join('datasets','tutorial','stitched','R{round}_C{channel}.tif'), stage_locations, reference_channel=4)

Added datasets\tutorial\mipped\S0_R0_C0.tif. Stage: 0, Round: 0, Channel: 0
Added datasets\tutorial\mipped\S0_R0_C1.tif. Stage: 0, Round: 0, Channel: 1
Added datasets\tutorial\mipped\S0_R0_C2.tif. Stage: 0, Round: 0, Channel: 2
Added datasets\tutorial\mipped\S0_R0_C3.tif. Stage: 0, Round: 0, Channel: 3
Added datasets\tutorial\mipped\S0_R0_C4.tif. Stage: 0, Round: 0, Channel: 4
Added datasets\tutorial\mipped\S1_R0_C0.tif. Stage: 1, Round: 0, Channel: 0
Added datasets\tutorial\mipped\S1_R0_C1.tif. Stage: 1, Round: 0, Channel: 1
Added datasets\tutorial\mipped\S1_R0_C2.tif. Stage: 1, Round: 0, Channel: 2
Added datasets\tutorial\mipped\S1_R0_C3.tif. Stage: 1, Round: 0, Channel: 3
Added datasets\tutorial\mipped\S1_R0_C4.tif. Stage: 1, Round: 0, Channel: 4
Added datasets\tutorial\mipped\S0_R1_C0.tif. Stage: 0, Round: 1, Channel: 0
Added datasets\tutorial\mipped\S0_R1_C1.tif. Stage: 0, Round: 1, Channel: 1
Added datasets\tutorial\mipped\S0_R1_C2.tif. Stage: 0, Round: 1, Channel: 2
Added datase



    aligning tile 2/2
    assembling thumbnail 2/2
    estimated cycle offset [y x] = [ 93.3888   -29.491201]
    aligning tile 2/2
    assembling thumbnail 2/2
    estimated cycle offset [y x] = [ 92.16   -28.2624]
    aligning tile 2/2
    assembling thumbnail 2/2
    estimated cycle offset [y x] = [ 98.304  -22.1184]
    aligning tile 2/2
Cycle 0:
    Channel 0:
    Channel 1:
    Channel 2:
    Channel 3:
    Channel 4:
Cycle 1:
    Channel 0:
    Channel 1:
    Channel 2:
    Channel 3:
    Channel 4:
Cycle 2:
    Channel 0:
    Channel 1:
    Channel 2:
    Channel 3:
    Channel 4:
Cycle 3:
    Channel 0:
    Channel 1:
    Channel 2:
    Channel 3:
    Channel 4:
Cycle 4:
    Channel 0:
    Channel 1:
    Channel 2:
    Channel 3:
    Channel 4:


### Step 4. Decoding

In this step, we will proceed to decode the previously stitched image data. We start by creating a codebook, which can be thought of as a set of expected signal patterns across the rounds and channels.

In [1]:
import pickle
import numpy as np
from os.path import join
from tools.decoding import Codebook

# Load combinatorial labels (the codebook)
# The metadata file is available in the Google Drive
metadata = pickle.load(open(join('datasets','tutorial', 'decoding_tutorial', 'metadata.pkl'), 'rb'))

# We need to create a 3D numpy array of shape (num_genes, num_rounds, num_channels)
# that containes the combinatorial labells in a one-hot format
num_rounds, num_channels = 5, 5
codebook = Codebook(num_rounds, num_channels)
for gene_id, (gene, indices) in enumerate(metadata['codebook'].items()):
    r, c = indices['round_index'], indices['channel_index']
    codeword = np.zeros((num_rounds, num_channels))
    codeword[r,c] = 1.0
    codebook.add_code(gene, codeword, is_unexpected='Negative' in gene)



In [2]:
from tools.decoding import istdeco, calculate_fdr
from tools.image_container import ISSDataContainer
from os.path import join
import pandas as pd
# Load the stitched data
issdata = ISSDataContainer().add_images_from_filepattern(join('datasets','tutorial', 'stitched','R{round}_C{channel}.tif'))

# Run the decoding
results = []
tile_idx = 1


for tile, origin in issdata.iterate_tiles(tile_height=512, tile_width=512, squeeze=True, use_vips=True):
    print(f'Decoding tile: {tile_idx}')
    tile_idx += 1
    # Decode the data using matrix factorization
    # Depending on your data, you might want to adjust the parameter min_integrated_intensity
    # or min_score
    # Usually a score threshold between 0.5 and 0.85 works fine. 
    # This is really slow unless we can run on the GPU.
    decoded_table = istdeco(tile, codebook, spot_sigma=2, device='cuda')

    decoded_table['Y'] = decoded_table['Y'] + origin[0]
    decoded_table['X'] = decoded_table['X'] + origin[1]
    results.append(pd.DataFrame(decoded_table))
    break
    # Remove this to run over everything

results = pd.concat(results, axis=0)

Added datasets\tutorial\stitched\R0_C0.tif. Stage: 0, Round: 0, Channel: 0
Added datasets\tutorial\stitched\R0_C1.tif. Stage: 0, Round: 0, Channel: 1
Added datasets\tutorial\stitched\R0_C2.tif. Stage: 0, Round: 0, Channel: 2
Added datasets\tutorial\stitched\R0_C3.tif. Stage: 0, Round: 0, Channel: 3
Added datasets\tutorial\stitched\R0_C4.tif. Stage: 0, Round: 0, Channel: 4
Added datasets\tutorial\stitched\R1_C0.tif. Stage: 0, Round: 1, Channel: 0
Added datasets\tutorial\stitched\R1_C1.tif. Stage: 0, Round: 1, Channel: 1
Added datasets\tutorial\stitched\R1_C2.tif. Stage: 0, Round: 1, Channel: 2
Added datasets\tutorial\stitched\R1_C3.tif. Stage: 0, Round: 1, Channel: 3
Added datasets\tutorial\stitched\R1_C4.tif. Stage: 0, Round: 1, Channel: 4
Added datasets\tutorial\stitched\R2_C0.tif. Stage: 0, Round: 2, Channel: 0
Added datasets\tutorial\stitched\R2_C1.tif. Stage: 0, Round: 2, Channel: 1
Added datasets\tutorial\stitched\R2_C2.tif. Stage: 0, Round: 2, Channel: 2
Added datasets\tutorial\s

Some of the genes are marked as `Unexpected` in the codebook. These genes correspond to non-biological labels that we do not expect to find in the data. Treating these unexpected genes as false-positives allow us to estimate a false discovery rate. This value is useful for quality control. 

In [3]:
from tools.decoding import calculate_fdr, filter_to_fdr
fdr = calculate_fdr(results['Name'], codebook.get_unexpected())
print(f'False discovery rate is: {fdr}')

False discovery rate is: 0.4565630944803008


In [4]:
# We can also compute the quality for a different quality threshold 
filtered_results, optimal_quality, optimal_intensity_threshold = filter_to_fdr(results, codebook)

In [5]:
filtered_results.to_csv(join('datasets','tutorial','stitched','results.csv'), index=False)