# Preprocessing


Fluorescence datasets ideally reflect a relationship between the pixels in an image and the location and local density of your fluorescent molecule in a sample. However, properties of the detectors, optics, or even the samples can confound direct interpretation of this data. Here we will present some operations that can mitigate these effects to achieve robust hypothesis testing. 

Today, we will define the ROIs (regions of interest) in this set of images. You may have noticed some qualities of the above images that make them hard to reliably threshold into masks relecting the relevant cell organelles.

Part 1 Getting Ready
1. Loading images
2. Metadata
3. Visual inspection


Part 2: Global Threshold
1. Difference in intensity between images.
    - Introduction to automated thresholding methods.

Part 3: Local Threshold
1. Noise corrupting the images
    - Introduction to Rank Filters: median filter
2. Uneven illumination in the nucleus channel makes finding a single threshold across a single image challenging. 
    - Rolling ball background subtraction
3. Uneven paxillin expression, low EGFP signal, and cytoplasmic signal complicate focal adhesion thresholding
    - thresholding within defined ROIs 

First, let's import packages.

First, check files in the directory.

Then, read files of interest, in our case, dataset with and without drug. Check dimensions of the dataset.

In [1]:
#check data shape

Let's examine the metadata- json is the format for this dataset.

In [2]:

#or more nicely, 
#for key, value in meta_nodrug.items():
#    print(key)

Let's re-label the slices with names, instead of numbers.

We are ready to plot and visualize our dataset! We can see there is a change before drug vs after drug.

Now, let's start pre-processing our dataset. We will focus on the "your_fav_protein" dataset.

First, let's try global threshold, selecting one, single value for the entire data set. The widget allows to change the value manually.

It looks like thresholding fails- a ring like structure is visible. There might be an uneven illumination and other noises in the dataset. Let's filter out these noises. Here's our raw data: can you see some of noises? Speckles are visible. Let's remove these 'salt-and-pepper' noise first.

We need to import packages and filters. (Please refer to the background slides for the introduction to filters)

First, let's examine the effect of filter sizes.

Now, try using median filter to remove the salt-pepper noise. Can you see differences in the images?

<Exercise: try median filter with nucleus dataset. Also try minimum filter.>

Now, we can take the filtered image and further pre-process. We'd like to remove the un-even illumination. First, we will extract the background. The radius should be set to at least the size of the largest object that is not part of the background.

Next, let's substract out the background from the dataset.

With our pre-processed dataset, let's try the thresholding again.

While manual thresholding is convenient, it is not the most reliable way to decide the threshold. How can we systematically and reproducibly define a threshold? Otsu's method looks at the histogram and differentiate "signal" vs "noise" to determine the threshold. We want a value that includes signal only. (please see the slides for more information)

First, we will define a mask function. When a data point is greater than threshold, we will give 1. Otherwise 0.

Let's use otsu filter from the imported packages.

Let's examine what Otsu's method did by looking at the histogram. We can see signal cluster vs noise cluster and the threshold value.

We can also extract the threshold value.

Exercise: try Otsu's method with the filtered nucleus dataset.

Otsu's method isn't the only way to determine the threshold. A list of various methods are listed here: https://imagej.net/Auto_Threshold Another example is Yen's method, a multi-level thresholding algorithm.

Let's examine istogram for Yen's method.

It's the end of this course. We only mentioned global thresholds, but local thresholding is also possible where threshold varies within a dataset.