Fluorescence datasets ideally reflect a relationship between the pixels in an image and the location and local density of your fluorescent molecule in a sample. However, properties of the detectors, optics, or even the samples can confound direct interpretation of this data. Here we will present some operations that can mitigate these effects to achieve robust hypothesis testing. 

Hypothesis: Treatment with drug A will cause a decrease in the total amount of protein Y. You have saved the control dataset as "DMSO.tif" and the drugged cell dataset as "drugA.tif". 

First some boilerplate code to make it easier to access useful libraries, and to make it easier to visualize data in the notebook.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.ndimage

In [2]:
sns.set_style('dark', rc={'image.cmap':'inferno'})

Import an image file and associated metadata as we learnt yesterday.

In [3]:
from skimage.io import imread

data_drug = imread("../data/confocal_drug_panel/drugA.tif")
data_nodrug = imread("../data/confocal_drug_panel/DMSO.tif")

In [15]:
import json
with open('../data/confocal_drug_panel/DMSO_metadata.json', mode='r') as f_nodrug:
    meta_nodrug = json.load(f_nodrug)

drug_slice = {}
nodrug_slice = {}
for idx, channel in enumerate(meta_nodrug['channels']):
    drug_slice[channel] = data_drug[3,:,:,idx]
    nodrug_slice[channel] = data_nodrug[3,:,:,idx] #add in the indexing when read in full dataset
    print(channel)

Display the images to make sure everything worked as expected...

In [16]:
fig, ax = plt.subplots(1, 3, figsize=(16, 4))
ax[0].imshow(nodrug_slice["actin"])
ax[1].imshow(nodrug_slice['nucleus'])
ax[2].imshow(nodrug_slice["your_fav_protein"])

fig, ax = plt.subplots(1, 3, figsize=(16, 4))
ax[0].imshow(drug_slice["actin"])
ax[1].imshow(drug_slice['nucleus'])
ax[2].imshow(drug_slice["your_fav_protein"])

## Image pre-processing: a motivating example

So here we have images of fixed cells in three channels -- cell bodies labeled with an actin stain, nuclei labeled with DAPI, and a third protein "your_fav_protein" that responds to drug treatment. Just by looking at the images it seems like the protein is shifting from the nuclei to the cell body once the drug is applied (always visualize your intermediates!), but it is unclear if the drug treatment changes the total amount of protein per cell as well.

To address these questions, you will need to do the following --

1. Make a mask of the actin channel to identify pixels within the cell bodies
2. Make a mask of the nuclear channel to identify pixels within the nucleus
3. Determine the signal coming from *your favorite protein* within these regions of interest. 

Today, we will define the ROIs in the image. We'll find that the quality of masks can be improved by preprocessing the images by quantitatively determining thresholds and filtering to remove noise. We will then cover how to clean up the mask and turn it into an accurate ROI using morphological image processing. 

Tomorrow (Day 4), we will cover how to design your image processing pipeline to deal with some trickier problems, such as quantifying fluorescence in the ROIs determined here. 

**Preprocessing misteps are a good way to get a paper retracted. We argue that it's easier to make these misteps when doing things manually, but it's not *impossible* to do it computationally. In fact if you don't check intermediate steps of your data in either case, it's no good. Always visualize your intermediates!**

### Making masks to localize cell bodies

In [6]:
nodrug_slice['actin'].dtype
data = nodrug_slice['actin']

**Find an appropriate threshold that defines the cell bodies accurately across the image using the sliding bar.**
First, view the image more closely

In [7]:
#parameters to adjust
minX1 = 400 #crop edges for a cell in the center of field of view
minY1 = 500
minX2 = 650 #crop edges for cell at the edge of the field of view
minY2 = 1
crop_size = 200 #pix
image_view_thresh = 0.1

#run
maxX1 = minX1 + crop_size
maxY1 = minY1 + crop_size
maxX2 = minX2 + crop_size
maxY2 = minY2 + crop_size

top = data.max() * image_view_thresh

fig, ax = plt.subplots(1, 3, figsize=(16, 4))
ax[1].imshow(data[minY1 : maxY1 , minX1 : maxX1], vmin=0, vmax=top)
ax[0].imshow(data, vmin=0, vmax=top)
ax[2].imshow(data[minY2 : maxY2, minX2: maxX2], vmin=0, vmax=top)

Determine using the sliding bar which threshold gives the best mask across the image.

In [8]:
from ipywidgets import interactive
@interactive
def show_masks(thresh=(0, data.max() * 0.1, 20)):
    fig, ax = plt.subplots(1, 3, figsize=(16, 4))
    mask = np.zeros(nodrug_slice["actin"].shape)
    mask[nodrug_slice["actin"] >=thresh] = 1
    mask_zoom_center = mask[minY1 : maxY1 , minX1 : maxX1]
    mask_zoom_edge = mask[minY2 : maxY2 , minX2 : maxX2]
    ax[0].imshow(mask, vmin=0, vmax=1)
    ax[1].imshow(mask_zoom_center, vmin=0, vmax=1)
    ax[2].imshow(mask_zoom_edge, vmin=0, vmax=1)
show_masks

### Automated detection of foreground using Otsu's method

Nobuyuki Otsu proposed a method (now very widely used) to detect thresholds. Simply put, the idea is to assume that background pixels (unwanted), and foreground pixels (your signal) will follow a bimodal distribution, i.e. that all the background pixels will be a well defined group on a histogram, which will be different from another well defined group that will be brighter, and is the signal that you want.

In [9]:
from skimage import filters

thresh = filters.threshold_otsu(data)
print("the objective masking threshold for this dataset is:", thresh)

In [10]:
fig, ax = plt.subplots(1, 3, figsize=(16, 4))
mask = np.zeros(nodrug_slice["actin"].shape)
mask[nodrug_slice["actin"] >=thresh] = 1
mask_zoom_center = mask[minY1 : maxY1 , minX1 : maxX1]
mask_zoom_edge = mask[minY2 : maxY2 , minX2 : maxX2]
ax[0].imshow(mask, vmin=0, vmax=1)
ax[1].imshow(mask_zoom_center, vmin=0, vmax=1)
ax[2].imshow(mask_zoom_edge, vmin=0, vmax=1)

Note that the global threshold produces masks with different qualtities at the edges and the center of the image because of the uneven illumination throughout the sample. Observe the histogram of pixel intensities to see why this might be the case.

In [11]:
sns.distplot(data.flatten(), hist_kws={'log': True}, kde=False)
plt.axvline(thresh, ls='--', lw=2, c='r')

Both manual threshold determination and Otsu's threshold determination fail to produce high-quality masks in this dataset. Noise, uneven illumination, and background, which are all common in fluorescent microscopy datasets in biology, can be corrected using a set of *rank filters*

#### Here add the final corrected image (post illumination flattening, bgs, denoising) as well as intensity histogram and mask as determined by Otsu's to motivate Preprocessing part 2 - rank filters

In [14]:
from skimage.io import imread
clean_mask = imread("../fig/processed_sample_mask.tif")
clean_im = imread("../fig/processed_sample_data.tif")

clean_mask = scipy.ndimage.zoom(clean_mask, mask.shape[0]/clean_mask.shape[0], order=0)
clean_im = scipy.ndimage.zoom(clean_im, mask.shape[0]/clean_im.shape[0], order=0)

fig, ax = plt.subplots(1, 3, figsize=(16, 4))
cmask_zoom_center = clean_mask[minY1 : maxY1 , minX1 : maxX1]
cmask_zoom_edge = clean_mask[minY2 : maxY2 , minX2 : maxX2]
ax[0].imshow(clean_mask, vmin=0, vmax=1)
ax[1].imshow(cmask_zoom_center, vmin=0, vmax=1)
ax[2].imshow(cmask_zoom_edge, vmin=0, vmax=1)

top = clean_im.max() * 0.2
fig, ax = plt.subplots(1, 3, figsize=(16, 4))
cim_zoom_center = clean_im[minY1 : maxY1 , minX1 : maxX1]
cim_zoom_edge = clean_im[minY2 : maxY2 , minX2 : maxX2]
ax[0].imshow(clean_im, vmin=0, vmax=top)
ax[1].imshow(cim_zoom_center, vmin=0, vmax=top)
ax[2].imshow(cim_zoom_edge, vmin=0, vmax=top)

In [13]:
from skimage import filters
threshC = filters.threshold_otsu(clean_im)

sns.distplot(clean_im.flatten(), hist_kws={'log': False}, kde=False)
plt.axvline(threshC, ls='--', lw=2, c='r')
plt.gca().set_ylim([0, 500000])

This threshold seems like it's dividing a cluster instead of separating two clusters, even though empirically the results look very good. Some algorithms become staples of images processing not because the underlying model they reflect is correct, but because they're extremely robust.

## Next: Rank filters for locally-informed image manipulations