### Standalone analysis

This notebook details a standalone analysis of a single experiment, but can be readily extended to multiple experiment files as necessary. It details the following common steps in setting up an image processing pipeline:

- Accessing image files loaded onto the imaging server at CZBiohub
- Creating starfish Experiment files from images accessed on the server
- Working with Experiment files and viewing images in napari
- Segmenting cells using starfish watershed algorithm
- Image pre-processing for detecting flourescent spots
- Identifying spots and overlaying them with the raw images in napari
- Constructing codebooks that assign targets to spots
- Producing a target by cell matrix and other target statistics

Questions? contact author @ andrew.cote@czbiohub.org

#### Accessing image files loaded onto the imaging server at CZBiohub

To access images located on the server, you require login credentials for the database. These can be provided by Andrew Cote (andrew.cote@czbiohub.org), in addition to installing the requisite libraries (InSitu Toolkit @ https://github.com/czbiohub/InSituToolkit). 

The login credentials are a simple .json file which we use to authenticate your requests to the database. This can be stored locally on your laptop. 


In [1]:
# To start, we either need to know the dataset ids ahead of time, have a csv file with them (as might be 
# used to originaly upload the files from the microscope computer), or know the <id> 
# associated with the experiment files which we can then search in the database. 

# If we know the dataset id directly:
dataset_id = 'GW-2019-12-22-04-45-00-0002'

# Or even better, re-use the csv file we used on the microscope computer to upload the images 
import csv

list_of_datasets = []
with open('files_to_upload_example.csv') as csvfile:
    read_csv = csv.reader(csvfile, delimiter = ',')
    row_number = 0            # the top row of the csv file contains headers, which we want to ignore
    for row in read_csv:
        if row_number >= 1:
            list_of_datasets.append(row[0])
        row_number += 1
        

In [2]:
# OPTIONAL DETOUR (This can be skipped if you use the built in InSituToolkit functions).

# We can access all the metadata associated with the experiment by using database operations. 
# DatabaseOperations is a class in python that takes a unique dataset id in the constructor, and so must be re-made
# each time you want to query metadata for a different experiment

# A full tutorial for querying the database is given at 
# https://github.com/czbiohub/imagingDB/blob/master/notebooks/database_queries.ipynb

from imaging_db.database.db_operations import DatabaseOperations
import imaging_db.database.db_operations as db_ops
import imaging_db.utils.db_utils as db_utils

# Note: refer to your own db_credentials.json location stored locally
db_credentials = '/Users/andrew.cote/Documents/db_credentials.json'  

dbops = DatabaseOperations(dataset_id)
credentials_str = db_utils.get_connection_str(db_credentials)
with db_ops.session_scope(credentials_str) as session:
    global_meta, frames_meta = dbops.get_frames_meta(session)
    
# global_meta and frames_meta now contained all the metadata associated with the whole experiment, and each frame

In [3]:
frames_meta

Unnamed: 0,channel_idx,slice_idx,time_idx,pos_idx,channel_name,file_name,sha256
0,0,0,0,0,DAPI,im_c000_z000_t000_p000.tiff,128f5f59822b2ffd21bbbc2d2334725cb9bbcf5e397a0e...
1,0,0,0,1,DAPI,im_c000_z000_t000_p001.tiff,2fd6dd6b7be297e3983e0c6ec4d56593d09053a0180aac...
2,0,0,0,2,DAPI,im_c000_z000_t000_p002.tiff,b209618f901315b1d0c2302d99614a29fd4b3d695cfdeb...
3,0,0,0,3,DAPI,im_c000_z000_t000_p003.tiff,1ce609447626423306f00bb1a8285576b632a700580c1e...
4,0,0,0,4,DAPI,im_c000_z000_t000_p004.tiff,2dc5411557f1715a2758b1fe616b98f2d51ef8f0a42fb4...
...,...,...,...,...,...,...,...
1645,4,10,0,25,Cy7,im_c004_z010_t000_p025.tiff,90087d8e4482bd333af89ccd5e643e31dd65cc093e983b...
1646,4,10,0,26,Cy7,im_c004_z010_t000_p026.tiff,560497689112cb4dc895409484909bca04a383391f95d8...
1647,4,10,0,27,Cy7,im_c004_z010_t000_p027.tiff,e2ecd12486be3b768fb296f4235ab646fe3a28f17b0e68...
1648,4,10,0,28,Cy7,im_c004_z010_t000_p028.tiff,a0b7ff61bd6310e3d2d7b66032e8e8f985fba2b158a9f0...


### Creating "starfish experiment" files from images accessed on the server

Once we are able to access the raw image files on the database, we'd like to create starfish 'Experiment' objects to simplify the later analysis. Each Experiment is a self-contained module that has all raw image data, as well as metadata. Subsequent analysis in this notebook is restricted to a single experiment, but can be generalized to many experiments as all Experiment objects have the same interface / methods. 

An Experiment object is essentially a series of .json files that contain metadata which reference the raw images on the database. Therefore they are fairly small in size and can be created and stored locally, ideally in a './experiments/' folder for ease of navigation.

In [6]:
# Create the directories to contain experiment files
import os
cwd = os.getcwd()
experiment_path = cwd + '/experiments/' + dataset_id + '/'

if not os.path.exists(experiment_path):
    os.mkdir(experiment_path)

In [5]:
# To create experiment files we need to find a few key pieces of metadata: positions, and channels 
# these could be retrieved through the above database queries but InSituToolkit exposes a few useful methods

from InSituToolkit.imaging_database import write_experiment, get_positions, get_channels, search_ids
from slicedimage import ImageFormat
db_credentials = '/Users/andrew.cote/Documents/db_credentials.json' 


# search the database for dataset id's that contain a certain string
set_of_datasets = search_ids(db_credentials, 'GW')

# find all the microscope positions for a dataset in the database
positions = get_positions(db_credentials, dataset_id)

# find the filters and channels used
# Note: it is good practice to inspect the channels variable manually to double check we are not mis-assigning channels
channels = get_channels(db_credentials, dataset_id)

nuc_channel = [channels[0]]
stain_channel = [channels[1]]
spot_channel = [channels[2], channels[3], channels[4]]

# Note: the dataset_id MUST be contained in a list. Multiple dataset ID's could be written to the same experiment
# if the channels are common among them. 

write_experiment(db_credentials, experiment_path, [dataset_id], 
                spot_channels = spot_channel, \
                nuc_channels = nuc_channel, \
                stain_channels = stain_channel, \
                positions = positions)   # By default the InSituScope saves as .PNG files


    napari was tested with QT library `>=5.12.3`.
    The version installed is 5.9.6. Please report any issues with this
    specific QT version at https://github.com/Napari/napari/issues.
    
  warn(message=warn_message)


In [None]:
# OPTIONAL DETOUR: For a larger number of experiments with the same <id>, we could generalize this to:
list_of_datasets = []
list_of_positions = []
list_of_channels = []
for dataset_id in search_ids(db_credentials, 'GW'):
    list_of_datasets.append(dataset_id)
    list_of_positions.append(get_positions(db_credentials, dataset_id))
    list_of_channels.append(get_channels(db_credentials, dataset_id))
    
# ... include the above code for creating directories and experiments (ommitted here for run-ability of this notebook)

### Working with Starfish Experiment files and viewing images in napari

Napari is a viewer that is built around manipulating high-dimensional image files, for example, the 5D image file from a starfish Experiment, where the dimensions are (Round, Channel, Z, X, Y). It also has convenient options for viewing spots, stains, and segmentation masks on top of raw image files. 

In [35]:
# note: the below command '%gui qt5' is only required in a jupyter notebook. In a standalone script, starfish.display 
# will open the napari window by default. 

%gui qt5
from starfish import Experiment, FieldOfView, display

exp = Experiment.from_json(experiment_path + 'experiment.json')

In [29]:
# Experiment objects are dicts which hold all the image data for each microscope location, or fov. A likely use-case
# is to perform the same image processing technique to all fov's in an experiment. We can collect all keys as:

list_of_keys = []

for key in exp.keys():
    list_of_keys.append(key)

# a fov has multiple types of images depending on the data that was uploaded, for example, nuclei, or stain
# starfish by default returns a ImageStack Iterator object, which necessitates the call of 'next()' to 
# retrieve the actual image stack

sample_primary = next(exp['fov_002'].get_images('primary'))
sample_nuclei = next(exp['fov_002'].get_images('nuclei'))

In [20]:
# To display multiple images in the same viewer, assign a variable name to the display(ImageStack) command

example_viewer = display(sample_primary)
display(sample_nuclei, viewer = example_viewer)

100%|██████████| 33/33 [00:12<00:00,  2.67it/s]
100%|██████████| 11/11 [00:04<00:00,  2.56it/s]


<napari.viewer.Viewer at 0x1c8e3cbf90>

### Basic Image Processing in Starfish

Starfish has numerous built-in methods to do some simple image processing tasks, including:
- image registration (learning and applying transforms between successive imaging rounds)
- projection (reducing the dimensionality of the images, e.g. flattening the z-stack)
- filtering (high-pass or low-pass filtering to help isolate spots)

A more detailed list can be found here: https://spacetx-starfish.readthedocs.io/en/stable/api/image/index.html

In [36]:
# A common step is to project all images along the z-dimension by the maximum pixel value, as this captures the 
# information from the best in-focus plane. 

from starfish.types import Axes, Coordinates, Features, FunctionSource, TraceBuildingStrategies
from starfish.image import Filter, LearnTransform, ApplyTransform, Segment

projected_z_stacks = []

for key in list_of_keys[0:4]:
    img_raw = next(exp[key].get_images('primary'))
    img_proj_z = img_raw.reduce({Axes.ZPLANE}, func='max')
    projected_z_stacks.append(img_proj_z)

100%|██████████| 33/33 [00:13<00:00,  2.39it/s]
100%|██████████| 33/33 [00:13<00:00,  2.41it/s]
100%|██████████| 33/33 [00:15<00:00,  2.15it/s]
100%|██████████| 33/33 [00:13<00:00,  2.50it/s]


In [37]:
multi_view = display(projected_z_stacks[0])

for stack in projected_z_stacks[1:]:
    display(stack, viewer=multi_view)

100%|██████████| 3/3 [00:00<00:00, 31.01it/s]
100%|██████████| 3/3 [00:00<00:00, 25.49it/s]
100%|██████████| 3/3 [00:00<00:00, 22.96it/s]
100%|██████████| 3/3 [00:00<00:00, 30.28it/s]


<starfish.ImageStack (r: 1, c: 1, z: 11, y: 2048, x: 2048)>

### Segmenting cells using starfish watershed algorithm

Watershed algorithm is a method of segmenting cells based on changes in pixel intensity, by grouping pixels into 'basins'. 

### Codebook construction

Codebooks associate the information of different colored spots across rounds to specified target genes. They can be used in two ways:

- RNAscope: associated spots in each channel with a target gene
- InSituSequencing: associate a specific sequence of spots present in different channels with a target gene


In [None]:
from starfish import Codebook
from starfish.types import Axes, Coordinates, Features

# RNAscope codebooks should only have one round value for each target gene, as they are imaged in a single round
codebook_RNAscope = [
      {
          Features.CODEWORD: [
              {Axes.ROUND.value: 0, Axes.CH.value: 0, Features.CODE_VALUE: 1},
          ],
          Features.TARGET: "example_gene1"
      },
      {
          Features.CODEWORD: [
              {Axes.ROUND.value: 0, Axes.CH.value: 1, Features.CODE_VALUE: 1},
          ],
          Features.TARGET: "example_gene2"
      },
      {
          Features.CODEWORD: [
              {Axes.ROUND.value: 0, Axes.CH.value: 2, Features.CODE_VALUE: 1},
          ],
          Features.TARGET: "example_gene3"
      },
  ]

# ISS codebooks will by nature have multiple rounds for each gene, each round corresponds to reading off a single
# letter of the barcode. The different letters will appear on different channels and therefore can be repeated in
# a single target. Round number refers to position in the sequence and so must be unique for a given target. 
codebook_ISS = [
      {
          Features.CODEWORD: [
              {Axes.ROUND.value: 0, Axes.CH.value: 0, Features.CODE_VALUE: 1},
              {Axes.ROUND.value: 1, Axes.CH.value: 1, Features.CODE_VALUE: 1},
              {Axes.ROUND.value: 2, Axes.CH.value: 0, Features.CODE_VALUE: 1},
              {Axes.ROUND.value: 3, Axes.CH.value: 2, Features.CODE_VALUE: 1}
          ],
          Features.TARGET: "example_gene1"
      },
      {
          Features.CODEWORD: [
              {Axes.ROUND.value: 0, Axes.CH.value: 0, Features.CODE_VALUE: 1},
              {Axes.ROUND.value: 1, Axes.CH.value: 1, Features.CODE_VALUE: 1},
              {Axes.ROUND.value: 2, Axes.CH.value: 0, Features.CODE_VALUE: 1},
              {Axes.ROUND.value: 3, Axes.CH.value: 2, Features.CODE_VALUE: 1}
          ],
          Features.TARGET: "example_gene2"
      }
  ]

# Since this example experiment uses RNA scope, we finish the codebook construction by calling the constructor for 
# the starfish Codebook object. 
codebook = Codebook.from_code_array(codebook_RNAscope)