Fluorescence datasets ideally reflect a relationship between the pixels in an image and the location and local density of your fluorescent molecule in a sample. However, properties of the detectors, optics, or even the samples can confound direct interpretation of this data. Here we will present some operations that can mitigate these effects to achieve robust hypothesis testing. 

Hypothesis: Treatment with drug Y will cause a decrease in the total amount of protein Y. You have saved the control dataset as "no_drug.tif" and the drugged cell dataset as "drug.tif". 

First some boilerplate code to make it easier to access useful libraries, and to make it easier to visualize data in the notebook.

In [2]:
%matplotlib inline

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
sns.set_style('dark', rc={'image.cmap':'inferno'})

Let's import an image file and associated metadata as we learnt yesterday!

In [5]:
from skimage.io import imread
data_drug = imread("../data/drug.tif")
data_nodrug = imread("../data/no_drug.tif")

In [6]:
import json
with open('../data/drug.json', mode='r') as f_drug:
    meta_drug = json.load(f_drug)
with open('../data/no_drug.json', mode='r') as f_nodrug:
    meta_nodrug = json.load(f_nodrug)

drug_stack = {}
nodrug_stack = {}
for idx, channel in enumerate(meta_drug['channels']):
    drug_stack[channel] = data_drug[:,:,idx]
    nodrug_stack[channel] = data_nodrug[:,:,idx]    
    print(channel)

Okay, let's display the images to make sure everything worked as expected...

In [7]:
fig, ax = plt.subplots(1, 3, figsize=(16, 4))
ax[0].imshow(nodrug_stack["actin"])
ax[1].imshow(nodrug_stack['nucleus'])
ax[2].imshow(nodrug_stack["your_fav_protein"])

fig, ax = plt.subplots(1, 3, figsize=(16, 4))
ax[0].imshow(drug_stack["actin"])
ax[1].imshow(drug_stack['nucleus'])
ax[2].imshow(drug_stack["your_fav_protein"])

## Image processing pipelines: an example

So here we have images of fixed cells in three channels -- cell bodies labeled with an actin stain, nuclei labeled with DAPI, and a third protein "your_fav_protein" that responds to drug treatment. Just by looking at the images it seems like the protein is shifting from the nuclei to the cell body once the drug is applied (always visualize your intermediates!), but it is unclear if the drug treatment changes the total amount of protein per cell as well.

To address these questions, you will need to do the following --

1. Make a mask of the actin channel to identify pixels within the cell bodies
2. Make a mask of the nuclear channel to identify pixels within the nucleus
3. Determine the signal coming from *your favorite protein* within these regions of interest. 

Today, we will define the ROIs in the image. We'll find that the quality of masks can be improved by preprocessing the images by quantitatively determining thresholds and filtering to remove noise. We will then cover how to clean up the mask and turn it into an accurate ROI using morphological image processing. 

Tomorrow (Day 4), we will cover how to design your image processing pipeline to preserve the quantitative integrity of the image data when we pre-process the your_fav_protein image. 

**Preprocessing misteps are a good way to get a paper retracted. We argue that it's easier to make these misteps when doing things manually, but it's not *impossible* to do it computationally. In fact if you don't check intermediate steps of your data in either case, it's no good. Always visualize your intermediates!**

### Manual determination of image background

**in retrospect, it's really hard to do this lesson without a discussion of sources (and distributions) of noise and background in an image. What looks like background can come from read or thermal/dark noise, sample autofluorescence, or from out-of-focus light. The point here is that it is added on top of the true signal from your localized fluorophores, and so should be subtracted off for quantitative analysis.

**an approach where we add noise back in to the dataset would be really good here**

Background Subtraction can be a powerful way to remove unwanted signal from image data. For example, if the laser intensity was set too high (and as long as your image is not saturated, because then it's back to the microscope with you!), you can get rid of the excess signals by subtracting a number (called a "threshold") from every pixel in the image. There are many ways to find this number, but let's quickly check the data type of our image and make sure we're working on 16-bit unsigned integers as expected.

In [8]:
drug_stack['actin'].dtype

Okay, great! Now let's take a small slice of our image, so it's easier to visualize, and start subtracting a constant threshold from that slice.

In [18]:
from ipywidgets import interactive

original_slice = drug_stack["actin"][500:600,50:150]
top = original_slice.max()

@interactive
def manual_background_subtract(bg=(0, top * 0.3, 10)):
    fig, ax = plt.subplots(1, 3, figsize=(9, 4))
    bgs = original_slice.astype('int32') - bg 
    ax[0].imshow(original_slice, vmin=0, vmax=top * 0.8)
    ax[1].imshow(bgs, vmin=0, vmax=top * 0.8)
    ax[2].imshow(bgs[60:80,60:80], vmin=0, vmax=top * 0.01)
manual_background_subtract

Play around with this slider and see what happens as you subtract larger and larger thresholds from the image. Note that in the first two panels the background already appears to be *0*, but if you adjust the contrast and zoom in (panel 3) the background and noise become readily visible.

How might you determine the background in the precence of noise? 
**answer: you don't**

Do you think "eyeballing it" is a good way to figure out what the threshold should be?

### Automated detection of background threshold using Otsu's method

Nobuyuki Otsu proposed a method (now very widely used) to detect thresholds. Simply put, the idea is to assume that background pixels (unwanted), and foreground pixels (your signal) will follow a bimodal distribution, i.e. that all the background pixels will be a well defined group on a histogram, which will be different from another well defined group that will be brighter, and is the signal that you want.

In [55]:
from skimage import filters

thresh = filters.threshold_otsu(drug_stack["actin"])
#thresh2 = filters.threshold_otsu(nodrug_stack["actin"])
print("drug threshold is:", thresh)
#print("no drug threshold is:", thresh2)

Mr. Otsu thinks this is the magic number for this *slice* of the image. Let's just take a quick look at the histogram to see if the background pixels and foreground pixels are as clearly delineated as Mr. Otsu expects.

In [50]:
sns.distplot(nodrug_stack["actin"].flatten(), hist_kws={'log': True}, kde=False)
plt.axvline(thresh, ls='--', lw=2, c='r')

Note the saturation peak. Do you think this may be influencing the quality of the threshold? Repeat the threshold determination by excluding the saturationed pixels.

In [77]:
thresh_no_sat = filters.threshold_otsu(drug_stack["actin"][drug_stack["actin"] < 2**16 - 1])
bg = drug_stack["actin"][drug_stack["actin"]<thresh_no_sat].mean()
Otsu_bgs_slice = original_slice - bg
#thresh2_no_sat = filters.threshold_otsu(nodrug_stack["actin"][nodrug_stack["actin"] < 2**16 - 1])
print("drug threshold is:", thresh_no_sat)
print("drug background is:", bg)
#print("no drug threshold is:", thresh2_no_sat)

fig, ax = plt.subplots(1, 2, figsize=(9, 4))
ax[1].imshow(Otsu_bgs_slice[60:80,60:80], vmin=0, vmax=2**16 * 0.1  )

ax[0].distplot(drug_stack["actin"].flatten(), hist_kws={'log': True}, kde=False)
plt.axvline(thresh_no_sat, ls='--', lw=2, c='b') #marks the saturation-corrected threshold
plt.axvline(thresh, ls='--', lw=2, c='r') #marks the original threshold
plt.axvline(bg, ls='--', lw=2, c='g') #marks the background determined from the thresholds

**to be fixed: in my rendering, the histogram was flipped up side down**

### Making Masks for image processing

When you're trying to identify objects, you typically want to make a mask. A Mask is exactly what it sounds like. It's something that allows you to work on only some pixels and not others. Here, we'll essentially take most of the information in our 16-bit image, and throw it away (!). We will convert the image to **binary**, essentially reducing the bit depth to 1, and in the process, replace all the pixels below the threshold to zero, and all the pixels above the threshold to 1.

Let's see how this works in practice. We can ask the computer to return every pixels that's above the threshold, and take a look at a small portion of what it spits out.

In [88]:
#make a mask using thresh as the threshold
above_thresh = original_slice > thresh
above_thresh[45:50,45:50]

This means that above_thresh is a *boolean array*. This is a series of "True" or "False" statements, one for every pixel in the image, where True implies the pixel is above the threshold, and False implies it's below the threshold. You'll notice that we can't actually do any math with True or False statements (unless you're a philosopher), so we'll replace these with ones and zeros. We can write a "function" to do this easily every time with one line of code.

In [89]:
def threshold_image(img, thresh):
    output = img.copy()
    output[img < thresh] = 0
    output[img >= thresh] = 1
    return output

Now the function will take two "arguments", the image, and the threshold, and output a mask the same size as the image, but consisting of ones and zeros.

In [90]:
masked_slice = threshold_image(original_slice,thresh)
masked_slice.dtype

Great! So now we've converted our image slice into an integer array of ones and zeros. Let's take a quick second to see what this does to the entire image. Notice how we need to calculate the otsu threshold again for the whole image, because the *thresh* variable contains the threshold calculated with just the *original_slice*

In [91]:
whole_image = drug_stack["actin"]
masked_whole_image = threshold_image(whole_image, filters.threshold_otsu(whole_image))

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(masked_whole_image)
ax[1].imshow(masked_slice)

Notice how the cell bodies have many "holes" in them? Even isolated pixels just hanging out where they clearly don't belong. The impudence! What can we do to get rid of them?

### Removing Shot Noise from your Image -- Median Fitering

To get better at object detection, we can leverage various properties of the pixels. With time, you will be able to leverage pretty much any property you can articulate, but for now, let's use the idea that the pixels that are hanging out in the wrong places are surrounded by other pixels that are properly classified. Let's make them listen to their neighbours. There are many ways to do this. One useful method to know is called Median filtering. It goes pixel by pixel, and replaces each pixel with the median of its surroundings. Let's load our image slice...

In [92]:
from scipy.ndimage.filters import median_filter
plt.imshow(original_slice)

Tech tip: These images were taken with a confocal microscope, which uses a PMT (photomultiplier tube) with high sensitivity. However, because this detector operates in a low-photon regime, shot noise (Poisson distributed) can add substantial deviation of pixel values from the local fluorescence intensities they represent. Shot noise is commonly removed with the median filter, although other rank filters exist.

In [93]:
from ipywidgets import interactive

@interactive
def apply_filter(size=(1, 21)):
    fig, ax = plt.subplots(1, 3, figsize=(10, 5))
    
    # Here we implement the median filtering
    filtered = median_filter(original_slice, size=size)
                             
    ax[0].imshow(original_slice)
    ax[1].imshow(filtered)
    dif_img = filtered.astype('int') - original_slice.astype('int')
    
    extreme = 10000
    im = ax[2].imshow(dif_img, vmin=-extreme, vmax=extreme, cmap='coolwarm')
    
    print("total difference in image =" + str(np.mean(dif_img)) + " arbitrary units")
    print("percent change =" + str(np.mean(dif_img)/100) + "%") 
apply_filter



Note that the size of the filter determines the value of the median value of the pixels in the output. That means, the larger the filter size, the more neighbours the filter will look at, before deciding what the new pixel value should be. A good rule of thumb when determining an appropriate filter size is that it should be the smallest filter that sufficiently flattens the visible noise in the background. Many of these operations do not have well-accepted statistical tests for determing the appropriate parameters, so care needs to be taken to record and reproduce processing steps with the same parameters. 

Let's choose a filter size of 3x3.

In [94]:
filtered_slice = median_filter(original_slice, size=3)
filtered_image = median_filter(whole_image, size=3)

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(filtered_image)
ax[1].imshow(filtered_slice)

Now let's see how the filtering affects our mask, and compare to the mask we made earlier.

In [95]:
masked_filtered_slice = threshold_image(filtered_slice, filters.threshold_otsu(filtered_slice))
masked_filtered_image = threshold_image(filtered_image, filters.threshold_otsu(filtered_image))

fig, ax = plt.subplots(1, 3, figsize=(10, 5))
ax[0].imshow(masked_filtered_image)
ax[1].imshow(masked_slice)
ax[2].imshow(masked_filtered_slice)

Great, but still not perfect! What do you think would happen if we tried to apply the mask before the filter? (this could be an exercise)

In [96]:
filtered_masked_slice = median_filter(masked_slice, size=3)
filtered_masked_image = median_filter(masked_whole_image, size=3)

fig, ax = plt.subplots(1, 3, figsize=(10, 5))
ax[0].imshow(masked_filtered_slice)
ax[1].imshow(filtered_masked_slice)
ax[2].imshow(masked_filtered_slice - filtered_masked_slice, cmap='coolwarm')

Why does doing this not make sense? (this leads to discussion of morphological operations...)