# Day 3

Associated learning goal: Preprocess image data without introducing bias in downstream analysis.

- Fix shot noise in the image
    - using a median filter
- Background subtraction
    - manual (as a strawman)
    - Ostu
    - rolling ball (show a failure)
- Discuss the concept of a pre-processing pipeline
- Order the pre-processing steps for the pipeline based on the goals of the project
- Masks and morphological operations
    - Show slicing an image with a mask
    - Erode/dilate
    - Open/close

First some boilerplate code to make it easier to access useful libraries, and to make it easier to visualize data in the notebook.

In [6]:
%matplotlib inline

In [34]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [35]:
sns.set_style('dark', rc={'image.cmap':'inferno'})

Let's import an image file and associated metadata as we learnt yesterday!

In [36]:
from skimage.io import imread
data_drug = imread("../data/drug.tif")
data_nodrug = imread("../data/no_drug.tif")

In [42]:
import json
with open('../data/drug.json', mode='r') as f_drug:
    meta_drug = json.load(f_drug)
with open('../data/no_drug.json', mode='r') as f_nodrug:
    meta_nodrug = json.load(f_nodrug)

drug_stack = {}
nodrug_stack = {}
for idx, channel in enumerate(meta_drug['channels']):
    drug_stack[channel] = data_drug[:,:,idx]
    nodrug_stack[channel] = data_nodrug[:,:,idx]    

Let's display the image to make sure everything worked as expected...

In [63]:
plt.imshow(drug_stack["nucleus"][:100,:100])

In [67]:
sns.distplot(drug_stack["nucleus"][0:100,0:100].flatten())

## Image processing pipelines: an example

Now that we have our data loaded into python, let's see what pre-processing can do for us.

TODO: before and after of an image with shot noise, background objects, and uneven illumination.

We want to view the fluorescence intensity of cells in this image. We will need to reliably find cell bodies and be careful not to adjust their pixel intensities.

**Preprocessing misteps are a good way to get a paper retracted. We argue that it's easier to make these misteps when doing things manually, but it's not *impossible* to do it computationally. In fact if you don't check intermediate steps of your data in either case, it's no good. Always visualize your intermediates.**

So, let's get started!

### Removing Shot Noise from your Image -- Median Fitering

In [62]:
from scipy.ndimage.filters import median_filter
filtered = median_filter(drug_stack["nucleus"],size=5)
plt.imshow(filtered[:100,:100])

In [81]:
fig, ax = plt.subplots(1, 3, figsize=(16, 4))
ax[1].imshow(filtered[:100,:100])
ax[0].imshow(drug_stack['nucleus'][:100,:100])
dif_img = filtered[:100,:100].astype('int') - drug_stack['nucleus'][:100,:100].astype('int')
extreme = abs(dif_img).max()
ax[2].imshow(dif_img, vmin=-extreme, vmax=extreme, cmap='coolwarm')

In [90]:
from ipywidgets import interactive

@interactive
def apply_filter(size=(1, 21)):
    fig, ax = plt.subplots(1, 2, figsize=(9, 4))
    filtered = median_filter(drug_stack['nucleus'][:100,:100], size=size)
    ax[0].imshow(filtered)
    dif_img = filtered.astype('int') - drug_stack['nucleus'][:100,:100].astype('int')
    extreme = 10000#abs(dif_img).max()
    im = ax[1].imshow(dif_img, vmin=-extreme, vmax=extreme, cmap='coolwarm')
    #plt.colorbar(im, ax=ax[1])
    
apply_filter

In [94]:
global_median = np.median(drug_stack['nucleus'][:100,:100])
"global median = {}, {:.2%} of max".format(global_median, global_median / drug_stack['nucleus'][:100,:100].max())