In [1]:
# Imports
import numpy as np
import pandas as pd

import time

import img_seg_package as img_seg

  data = yaml.load(f.read()) or {}


This is the primary code to analyze image stacks from a fluorescence microscope. The use of nuclear localization signal on your fluorescent proteins is recommended. For the analysis to work, an autofluorescence channel must be included for subtraction. The option for direct input of a subtracted image (high quality confocal images as an example), will be developed at a future date. The code generates stitched images from the image file metadata, performs autofluorescence subtraction to isolate signal compared to background, performs intensity thresholding and feature matching to a template file on the subtracted image. This generates a mask of detected cells. These cells can then be counted and the median brightness determined for a set of those cells. 

The primary parameters to alter to modify the results are: <br>
`threshold`: This parameter sets the limit of the amount of pixels that the signal must be below in the second derivative. This should be selected conservatively such that background pixels are just starting to appear at the applied threshold. Both the size of the image and intensity of the cells within the image are important for determining this parameter. 

Some preliminary suggestions organized by tissue type are:
    
    Brain: 50000
    Intestine: 10000
    Kidney: 1000
    Lung: 1000
    Liver: 10000
    Pancreas: 1000
    Spine: -
    Spleen: 2000
    Stomach: -
    Testes: 1000

`size_thresh`: This parameter controls how exact the match between the feature examined and the provided template must be for inclusion. If the cells are very homogeneous, this value can be increased. Due to the atypical shape of many nuclei examined, a window between 0.4-0.65 is recommended. For tissues with a lot of background fluorescence, 0.65 will exclude this signal at the expense of the exclusion of some more elliptical cell types. For tissues without much background fluorescence, this parameter can be decreased to 0.4.

Some preliminary suggestions organized by tissue type are:
    
    Brain: 0.4
    Intestine: 0.5
    Kidney: 0.55
    Lung: 0.62
    Liver: 0.4
    Pancreas: 0.5
    Spine: -
    Spleen: 0.60
    Stomach: -
    Testes: 0.55

`gaussian_size`: This parameter controls the width of the gaussian blur that is performed in order to perform background subtraction after autofluorescence subtraction. For 10x images, a gaussian size of 5 works well to keep fluorescent nuclei while removing larger features. For images of a different maginfication, this parameter may require further optimization. 

`truncation`: This parameter determines whether the gaussian area is truncated while a blurred version of the image is created. During preliminary work, a truncation of 2 standard deviations was used. The effect of changing this variable should be marginal. Increasing it to 4 would permit standard gaussian blur.

`min_size`: This parameter is the upper limit of contiguous pixels required before removal. This parameter allows small objects that typically represent noise to be removed prior to counting.   

To determine parameters for a tissue type, a prior estimate is made based on previous tissues and the size of the image. Run these parameters over a single stack of images to determine the parameters determined for subtraction and thresholding based on the parameters you input (these will be stored in the output dataframe). Then, in `single_image_analysis.ipynb`, examine an individual image from the stack using the built dataframe. If not enough cells are highlighted, consider relaxing the size threshold or increasing the intensity threshold in this script. If too much background autofluorescence is highlighted, consider increasing the size threshold or decreasing the intensity threshold in this script. After the image segmentation appears to represent the underlying biology, run these parameters over the whole image set. 

Once this has been completed, run the code in `plotting.ipynb` to generate plots of Cells Transduced per $mm^2$ and Median Cell Brightness.

In this first code cell, indicate your data directory from your current path and set the parameters as discussed above:

In [6]:
# File path to the images
data_dir = '../Lung_Paper_Images/Raw_Images/Brain/'

# This is the intensity threshold that determines how bright cells must be
threshold = 100000

# This size threshold determines how closely the cell must match a provided template
size_thresh = 0.4

# This determines the width of the gaussian applied during background subtraction
gaussian_size = 5

# This determines the width truncated in background subtraction
truncation = 2

# This determines the minimum size of objects allowed for the final quantification
min_size = 10

Next, we initialize or load in a dataframe. If you're continuing a previous run, comment out the initial dataframe 

In [7]:
# Initialize a dataframe
df = pd.DataFrame(columns=['Date',
                           'Tissue',
                           'Size Threshold', 
                           'Gaussian Size', 
                           'Truncation', 
                           'Intensity Threshold',
                           'Minimum Size',
                           'Virus', 
                           'Animal', 
                           'Replicate',  
                           'Count',
                           'Cells Quantified',
                           'Brightness List',
                           'Applied Threshold',
                           'Image Multiplication Factor',
                           'Minimum Pixel Value',
                           'Maximum Pixel Value',
                           'Area Threshold',
                           'Total Area'])

# You can use this line of code to continue a previous run, remember to adjust the following loop to avoid repeating sections.
#df = pd.read_csv('../Lung_Paper_Images/Quantification/Lung/20200807.csv', comment='#')

Next, we loop through the image file locations and call our function for each set of images. In this code there are three locations requiring your specific edits: 
+ The loop will require editing to your specific file nomenclature. 
+ The tissue type requires input where we write information into the dataframe. 
+ The dataframe name requires editing where it is written

In [None]:
# Loop through all of the images 
for virus in ['AAV5', 'AAV9', 'CAPA4']:  
    for animal in ['1', '2', '3', '4', '5', '6']:
        for replicate in ['1', '2']: 
            # Initialize a start time
            start_time = time.time()

            # Initialize the filename and read in the image
            fname = data_dir + virus + '/' + animal + '/' + replicate +'/'

            print('Performing analysis of %s for animal %s and replicate %s...' % (virus, animal, replicate))

            n_cells, cell_list, cell_intensity_list, actual_thresh, mult, min_v, max_v, area_thresh, total_area  = img_seg.counter.cell_counter(fname,
                                                                                                                                                size_thresh = size_thresh,
                                                                                                                                                gaussian_size = gaussian_size,
                                                                                                                                                truncation = truncation,
                                                                                                                                                threshold = threshold,
                                                                                                                                                min_size = min_size)

            # Write all the information into a tidy dataframe
            df = df.append({'Date' : datetime.datetime.now(),
                            'Tissue': 'Brain', 
                            'Size Threshold' : size_thresh,
                            'Gaussian Size' : gaussian_size,
                            'Truncation' : truncation,
                            'Intensity Threshold' : threshold, 
                            'Minimum Size' : min_size,
                            'Virus' : virus, 
                            'Animal' : animal, 
                            'Replicate' : replicate, 
                            'Count' : int(n_cells),
                            'Cells Quantified' : cell_list, 
                            'Brightness List': cell_intensity_list, 
                            'Applied Threshold': actual_thresh,
                            'Image Multiplication Factor': mult, 
                            'Minimum Pixel Value': min_v,
                            'Maximum Pixel Value': max_v,
                            'Area Threshold': area_thresh, 
                            'Total Area': total_area},
                           ignore_index=True)

            # Save the dataframe
            df.to_csv('../Lung_Paper_Images/Quantification/Brain/20200807.csv', index=False)

            end_time = time.time() - start_time

            # Print run length
            print('Analysis of %s for animal %s and replicate %s took %0.2f seconds.' % (virus, animal, replicate, end_time))

Your data will be saved as a csv in the location indicated in the previous code block. 

Next, you can use the `single_image_analysis.ipynb` jupyter notebook to examine individual images with the parameters obtained from the whole images to determine if the parameter choices enabled identification of cells in tissue. 

Provided the parameters are correct, you can use the `plotting.ipynb` to create plots from the dataframe you generated from data.