# Day 3

Will be done ASAP: 
Get noisier actin signal dataset, stronger nuclear signal dataset 
Re-order processing steps
Consider a quick discussion of different types of noise
Consider re-organizing Day3/4 and 5/6 lesson plans into "fixed images" and "dynamic images" since it's hard to motivate pre-processing w/o what we wanted to put into "quantification"

Associated learning goal: Preprocess image data without introducing bias in downstream analysis.

- Fix shot noise in the image
    - using a median filter
- Background subtraction
    - manual (as a strawman)
    - Ostu
    - rolling ball (show a failure)
- Discuss the concept of a pre-processing pipeline
- Order the pre-processing steps for the pipeline based on the goals of the project
- Masks and morphological operations
    - Show slicing an image with a mask
    - Erode/dilate
    - Open/close
    
Other themes to keep in mind that can be used as a foundation for this lesson: 
    -Think about the downstream analysis you want to do do on the data when deciding which preprocessing steps to do.
    -There will be many instances when you need to determine the parameters by inspecting intermediate steps
    -It is especially important in these instances to use consistent parameters and loop through the data and report
    the parameters used (or better yet, make code available) so that analysis can be reproduced.

First some boilerplate code to make it easier to access useful libraries, and to make it easier to visualize data in the notebook.

In [1]:
%matplotlib inline

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
sns.set_style('dark', rc={'image.cmap':'inferno'})

Let's import an image file and associated metadata as we learnt yesterday!

In [4]:
from skimage.io import imread
data_drug = imread("../data/drug.tif")
data_nodrug = imread("../data/no_drug.tif")

In [5]:
import json
with open('../data/drug.json', mode='r') as f_drug:
    meta_drug = json.load(f_drug)
with open('../data/no_drug.json', mode='r') as f_nodrug:
    meta_nodrug = json.load(f_nodrug)

drug_stack = {}
nodrug_stack = {}
for idx, channel in enumerate(meta_drug['channels']):
    drug_stack[channel] = data_drug[:,:,idx]
    nodrug_stack[channel] = data_nodrug[:,:,idx]    
    print(channel)

Let's display the images to make sure everything worked as expected...

In [7]:
fig, ax = plt.subplots(1, 3, figsize=(16, 4))
ax[0].imshow(drug_stack["actin"])
ax[1].imshow(drug_stack['nucleus'])
ax[2].imshow(drug_stack["your_fav_protein"])

fig, ax = plt.subplots(1, 3, figsize=(16, 4))
ax[0].imshow(nodrug_stack["actin"])
ax[1].imshow(nodrug_stack['nucleus'])
ax[2].imshow(nodrug_stack["your_fav_protein"])

#here we inject some story line. 

Use the actin channel to segment the cell body
Use the nuclear channel to determine the location of the nucleus.
Determine if the localization of yfp changes with drug treatment, so need to be careful with background subtraction. Demonstrate that the median filter actually changes the mean values of the image and foreground, so use it only for smoothing the image for masking.

In [8]:
sns.distplot(drug_stack["actin"].flatten())

## Image processing pipelines: an example

Image data can be corrupted by a number of detector, optical, and statistical issues that obscure fluorescence quantification. Preprocessing can help correct for these problems.

We want to view the fluorescence intensity of cells in this image. We will need to reliably find cell bodies and be careful not to adjust their pixel intensities.

**Preprocessing misteps are a good way to get a paper retracted. We argue that it's easier to make these misteps when doing things manually, but it's not *impossible* to do it computationally. In fact if you don't check intermediate steps of your data in either case, it's no good. Always visualize your intermediates.**

So, let's get started!

### Removing background from images: manual correction

In [22]:
from ipywidgets import interactive
@interactive
def manual_background_subtract(bg=(1, 10)):
    fig, ax = plt.subplots(1, 2, figsize=(9, 4))
    bgs = np.array(drug_stack['your_fav_protein'][500:600,50:150]) - bg   
    ax[0].imshow(drug_stack['your_fav_protein'][500:600,50:150])
    ax[1].imshow(bgs[500:600,50:150])
manual_background_subtract

### Removing Shot Noise from your Image -- Median Fitering

We will start this exercise by determining which pixels in the image correspond to the cytoplasm, and which pixels correspond to the nucleus. 

In [55]:
#the quantitative utility of filters for anything aside from making images look nicer and smoothing for masking is questionable. We need to think about how this fits into the lesson. I argue for background subtraction first.

These images were taken with a confocal microscope, which uses a PMT (photomultiplier tube) with high sensitivity. However, because this detector operates in a low-photon regime, shot noise (Poisson distributed) can add substantial deviation of pixel values from the local fluorescence intensities they represent. Shot noise is commonly removed with a median filter, although other rank filters exist.

In [52]:
from scipy.ndimage.filters import median_filter
plt.imshow(drug_stack["actin"][500:600,50:150])

In [78]:
from ipywidgets import interactive

@interactive
def apply_filter(size=(1, 21)):
    fig, ax = plt.subplots(1, 3, figsize=(9, 4))
    filtered = median_filter(drug_stack['actin'][500:600,50:150], size=size)
    ax[0].imshow(drug_stack['actin'][500:600,50:150])
    ax[1].imshow(filtered)
    dif_img = filtered.astype('int') - drug_stack['actin'][500:600,50:150].astype('int')
    extreme = 10000#abs(dif_img).max()
    im = ax[2].imshow(dif_img, vmin=-extreme, vmax=extreme, cmap='coolwarm')
    #plt.colorbar(im, ax=ax[1])
    print("total difference in image =" + str(np.mean(dif_img)) + " arbitrary units")
    print("percent change =" + str(np.mean(dif_img)/100) + "%") 
apply_filter



Note that the size of the filter determines the value of the mean value of the pixels in the output. A good rule of thumb when determining an appropriate filter size is that it should be the smallest filter that sufficiently flattens the visible noise in the background. Many of these operations do not have well-accepted statistical tests for determing the appropriate parameters, so care needs to be taken to record and reproduce processing steps with the same parameters. 

Let's choose a filter size of 3x3.