# Thresholding, masking and preprocessing


Fluorescence datasets ideally reflect a relationship between the pixels in an image and the location and local density of your fluorescent molecule in a sample. However, properties of the detectors, optics, or even the samples can confound direct interpretation of this data. Here we will present some operations that can mitigate these effects to achieve robust hypothesis testing.

Today, we will explore ways to automatically define the ROIs (regions of interest) and deal with aberrations in the data that can make such analysis difficult on raw data

Episode 1: Load, understand and visualize the data  
    Mostly a recap. of lesson 1


Episode 2: Thresholding
> a) Global threshold and masking     
b) Automatic threshold detection  

Episode 3: Filtering and background subtraction
> a) The need for preprocessing  
b) Filtering out noise   
c) Background subtraction  
d) Combining preprocessing + thresholding  

***
# Episode 1: Load, understand and visualize the data

First, let's import packages and set some plotting defaults


### Episode 1a) Load the images


#### <font color='red'> Exercise</font>: load lesson 2 data  
Use what you learned in Lesson 1 to load the Lesson 2 image data: 'HAC-Cit-KRAB+dox_s41_t60.tif' and call it raw_data  
Hint:   
Step 1) Find the data directory  
Step 2) use skimage.io.imread to load the image data  



In [2]:
data_fn = 'HAC-Cit-KRAB+dox_s41_t60.tif'
data_dir = ''

In [3]:
from skimage.io import imread


Check the dimensions of the data

### Episode 1b) Load the metadata

Load the metadata, which is in JSON format

Take a look at the metadata dictionary

Let's re-label the slices with names, instead of numbers.

#### <font color='red'> Exercise</font>: Visualize the images  
Use what you learned in lesson 1 to display each channel, and label each image panel with the channel name

***
# Episode 2: Thresholding and masking

Suppose we want to automatically select regions of interest (ROIs) from these images to, for example, count cells or measure the intensity of some fluorescent signal within the cells. One simple way to start to do this is by thresholding the image based on the pixel intensity.

To illustrate, we will focus on the YFP channel.   
Let's first split up our channels into descriptive variable names...

### Part 2a) Global thresholding and masking


Let's create a __mask__, which is a binary image where values of 1 correspond to image pixels that exceed a given threshold and pixels whose intensity is below the threshold are assigned a value of 0  
How should we decide on the threshold value? First let's get a sense of the range of pixel intensities present in the image by looking at a histogram

OK now let's pick a threshold and make a mask...

This is nice but the idea was to *automate* this process. How do we choose an appropriate threshold value without the need for user input? Luckily there exist several auotmatic threshold calculation methods that are included in the scikit-image library. Generally these automatic threshold methods rely on the distribution of pixel intensity values.

### Part 2b) Automatic threshold detection  

Now let's use some built-in automatic threshold algorithms to define a threshold for our image  
https://scikit-image.org/docs/dev/api/skimage.filters.html  
https://en.wikipedia.org/wiki/Otsu%27s_method  


#### <font color='red'> Exercise</font>: make a mask function
First, since we will be doing this a lot, let's define our own function to calculate a mask given: 1) the original image and 2) a threshold value. give the function an informative name

Now let's look at the masks generated using these automatic thresholds. 

***
# Part 3: Filtering and background subtraction

To illustrate the need for preprocessing, let's try to automatically threshold the other fluorescent channel of our data

Uh oh - our simple automatic thresholding fails for the other two channels. There are at least two reasons for these failures:  
1) Salt & pepper, speckly noise  
2) A background due to uneven illumination or other optical aberrations

We'll try to get rid of the salt and pepper noise first. We'll do that by __filtering__

We need to import packages and filters:

### Part 3b) Filtering out noise


Now, try using a median filter to remove the salt-pepper noise. Can you see differences in the images? Play with the filter size and see how that affects the filtering.

### Part 3c) Background subtraction

We need to remove the un-even illumination. To do this we will subtract a background image, hopefully leaving intact the image structure we care about. The radius should be set to at least the size of the largest object that is not part of the background.

Next, let's substract out the background from the dataset.

With our pre-processed dataset, let's try the thresholding again.

### Part 3d) Combining preprocessing + masking   
#### <font color='red'> Exercise</font>: Make functions for an image processing pipeline   
We have already defined a simple function to return a mask given an image and a threshold, try to make a preprocessing function that could be used before the masking function in an image processing pipeline. This function should remove salt and pepper noise and do some background subtraction as well

In [5]:
def get_preprocessed_image(raw_data):

    
    return preprocessed_data

Let's put these two functions together and make a mask from noisy data in just a couple lines of code

### The limits of global intensity thresholding

We haven't yet tried any processing on our brightfield channel. Let's see how the methods we tried above work on these data

#### Edge / contrast-enhancing filters   
The above methods rely on the intensity of each pixel, but we see the cells in the DIC channel with our eyes because of the contrast in the image. To pick out high contrast regions we need to use a edge-detection filter