<h1 style="font-size: 40px; margin-bottom: 0px;">5.2 Python image analysis (II)</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 800px;"></hr>

In notebook 5-1, we started exploring how images are represented on our computers by playing around with them in Python. Today, we'll take what we know about how images are represented to extract quantitative information from our images and use that information to analyze our images to derive biological insights. First, we'll load in the single set of serum-starved and serum-stimulated cells that we worked with in in notebook 5-1, then run through an analysis to quantify the nuclear intensity of YAP after serum stimulation. Then, you'll work with a full dataset to then determine if there is a significant difference in YAP nuclear localization. 

<strong>Learning objectives:</strong>
<ul>
    <li>Continue exploring images in Python</li>
    <li>Extract quantitative information from images</li>
    <li>Analyze a single image</li>
    <li>Run an analysis on a full dataset</li>
</ul>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
import os
import skimage as ski
import scipy.ndimage as ndi
import re

<h1 style="font-size: 40px; margin-bottom: 0px;">Extract quantitative information from images</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 850px;"></hr>

To get us started for today's lesson, let's quickly load in our image files from before. For convenience, the code is included below using the set up from our class demo:

In [None]:
file_names = [name for name in os.listdir('./data') if '.tiff' in name]
no_serum_names = [name for name in file_names if 'no-serum' in name]
no_serum_names.sort()
serum_names = [name for name in file_names if 'no-serum' not in name]
serum_names.sort()

no_serum = np.zeros(len(no_serum_names), dtype=object)
serum = np.zeros(len(serum_names), dtype=object)

for i in range(0, len(no_serum_names), 1):
    no_serum[i] = plt.imread(f'./data/{no_serum_names[i]}')
    serum[i] = plt.imread(f'./data/{serum_names[i]}')

<h2>Plot the intensity profile</h2>

Since our images are 2D arrays, we can pull out rows and columns and then plot the resulting intensities along that axis as a line plot to visualize the intensity profile. Select an image from the ones that we've imported, and then see if you can plot an intensity profile for a single row or column from that image.

<h2>Detect edges</h2>

The quantitative values of our images can be used to determine where edges exist in our images, and we can make use of scikit-image's <code>ski.feature.canny()</code> edge detector function to identify where the edges are. <a href="https://scikit-image.org/docs/0.25.x/api/skimage.feature.html#skimage.feature.canny" rel="noopener noreferrer"><u>Documentation for <code>ski.feature.canny()</code> can be found here.</u></a>

This function allows us to set the thresholds for what is considered an edge as well as smooth out noise if our images are noisy, resulting in edges being detected where they don't actually exist.

<h2>Identify particles in an image</h2>

Like with ImageJ, we can also create a binary image to then analyze the properties of the particles selected by our threshold. To do this, let's first process our two DAPI images again.

<h3>Remove particles on the image edge</h3>

Sometimes it can be helpful to also remove particles that are clipped by the boundaries of our image so that we exclude them from our analysis. In one of our images, we can see that there is a nucleus that is right on the edge. This could potentially throw off our results since we're not capturing the full nucleus. We can remove this particle by using <code>ski.segmentation.clear_border()</code>, which will remove particles that tough the edges of our image and is similar to what you can also do in ImageJ. <a href="https://scikit-image.org/docs/0.25.x/api/skimage.segmentation.html#skimage.segmentation.clear_border" rel="noopener noreferrer"><u>Documentation for <code>ski.segmentation.clear_border()</code> can be found here.</u></a>

Once we have our processed binaries, we can use scikit-image's <code>ski.measure.label()</code> function to identify and also label individual particles in our thresholded images. <a href="https://scikit-image.org/docs/0.25.x/api/skimage.measure.html#skimage.measure.label" rel="noopener noreferrer"><u>Documentation for <code>ski.measure.label()</code> can be found here.</u></a>

Digging into the documentation, we can see that this function works by assigning any element in our 2D matrix with the value <code>0</code> as background pixels by default, and then it identifies particles as clusters of connected elements that share the same value, which in our case is <code>1</code>. Then after it identifies particles, it will assign each particle its own integer label.

If we dig into the documentation, we can see that it returns to us one object by default (<code>labels</code>), which is the labeled array where each particle is assigned an integer value. We can also instruct it to return an additional object (<code>num</code>), which is just the total particle count if we switch the parameter <code>return_num</code> from <code>False</code> to <code>True</code>.

Now let's take a look at the objects it's returned to us.

<h3>Visualize labeled particles</h3>

We can then take the new array containing our labeled particles. First, we can assign a color to each particle based on their label using <code>ski.color.label2rgb()</code>, which will allow us to differentiate each particle based on their assigned color. <a href="https://scikit-image.org/docs/0.25.x/api/skimage.color.html#skimage.color.lab2rgb" rel="noopener noreferrer"><u>Documentation for <code>ski.color.label2rgb()</code> can be found here.</u></a>

<h2>Analyze particles</h2>

With our labeled particles, we can continue to make use of scikit-image to now analyze the labeled particles. To do this, we'll use the <code>ski.measure.regionprops()</code> function, which will measure a bunch of properties of each labeled particle. <a href="https://scikit-image.org/docs/0.25.x/api/skimage.measure.html#skimage.measure.regionprops" rel="noopener noreferrer"><u>Documentation for <code>ski.measure.regionprops()</code>, including the full set of properties that it measures, can be found here.</u></a>

Now let's take a look at the resulting object that we have as a result of this function:

We can see that each element is a RegionProperties object corresponding to a labeled particle that was analyzed. If we dig into the documentation for <code>ski.measure.regionprops()</code>, we can see under Notes that there are a bunch of properties that can be access as attributes or keys. These include <code>area</code> and <code>label</code>, which are most relevant to our analysis today.

Let's pull out these two attributes for a particle that we analyzed:

<h3>Pull particle areas</h3>

You can see that we're able to access the quantified area of each particle by the <code>area</code> attribute, and if we wanted to then pull together all the areas of our analyzed particles, we can iterate through our list of <code>RegionProperties</code> objects that was outputed by the <code>ski.measure.regionprops()</code> function.

Let's take a look at the measure areas for our particles.

<h3>Filter out noise</h3>

We can see that there are a lot of particles with just a size of 1 or are otherwise very small. This tells us that potentially, our thresholding picked up some noise as well. Let's take a look at the distribution of our particle areas.

You can see that we have a fair number of tiny tiny particles with an area of less than 100 sq pixels, which probably correspond to noise (since we wouldn't expect our nuclei to be that small). So this isn't something we want to include in our analysis. So what we can do is make use of a conditional statement to filter out particles based on whether or not their area meets our threshold to not be considered noise.

Let's take a look at our images after we filtered out the noise.

Now that we've filtered out any noise, we can then relabel our particles again.

<h3>Separate particles using the watershed method</h3>

For one of our images, we can see that we have two nuclei that are close together, and as a result, they end up getting labeled together as well. We can separate out these nuclei so that they are instead understood as two separate particles rather than a single particle. There are a number of different ways to computationally separate out particles, and we'll be using the watershed method, which is available as a function in scikit-image <code>ski.segmentation.watershed()</code>. <a href="https://scikit-image.org/docs/0.25.x/api/skimage.segmentation.html#skimage.segmentation.watershed" rel="noopener noreferrer"><u>Documentation for <code>ski.segmentation.watershed()</code> can be found here.</u></a>

First, we'll prepare our filtered and labeled image for watershed by dilating our particles so that they are smoother, so the watershed function doesn't mistake small irregularities as things to segment. Let's take a look at our nuclei from our serum-starved cells.

Let's apply a dilation to smooth out irregularities so that they don't interfere with our segmentation. We can make use of the <code>ski.morphology.dilation()</code> function that we used in notebook 5-1. There, we used this function as part of a process to fill holes, but here, we'll use it to smooth out the edges of our thresholded nuclei.

Then we can calculate the centers of our individual particles by calculating the distance within the particle from the edges (or background), which will give us an idea of where the center of each "true" particle should be. To do this, we'll make use of the <code>ndi.distance_transform_edt()</code> function, which calculates the Euclidean distance from the background. <a href="https://docs.scipy.org/doc//scipy-1.8.0/reference/generated/scipy.ndimage.distance_transform_edt.html" rel="noopener noreferrer"><u>Documentation for <code>ndi.distance_transform_edt()</code> can be found here.</u></a>

You can see that the regions of greatest distance from the background no have the highest value, which we can visualize with the viridis colormap. We can then identify the coordinates of the centers of each "true" particle by identifying where the local maxima are using the calculated distances. We'll make use of the <code>ski.feature.peak_local_max()</code> function, which will find the spots of highest value (peaks) and return their coordinates to us. <a href="https://scikit-image.org/docs/0.25.x/api/skimage.feature.html#skimage.feature.peak_local_max" rel="noopener noreferrer"><u>Documentation for <code>ski.feature.peak_local_max()</code> can be found here.</u></a>

Then we'll take our coordinates and set them as points in a 2D matrix matching the shape of our labeled nuclei. These coordinates will determine where we want to initiate our watershed method.

We can then add labels to each coordinate point.

With those coordinates set and labeled, we can then initiate the watershed using the <code>ski.segmentation.watershed()</code> function. <a href="https://scikit-image.org/docs/0.25.x/api/skimage.segmentation.html#skimage.segmentation.watershed" rel="noopener noreferrer"><u>Documentation for <code>ski.segmentation.watershed()</code> can be found here.</u></a>.

What we're doing is essentially identifying a local minimum, which we can set as the inverse of our local maxima, and then "flood" the region. We can also specify how much "flooding" we want to happen by using our original labeled nuclei as a mask.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #1: Pull out a single cell</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 850px;"></hr>

Now that we have our nuclei individually labeled, see if you can pull out a single nucleus and display a binary image of your single nucleus.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #2: Measure mean nuclear fluorescence</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 900px;"></hr>

Now that you're able to pull out a single nucleus at a time, see if you can then quantify the mean nuclear fluorescence for a single nucleus. Recall from 5-1 how you can use binary images as a mask, which will allow you to focus on just a particular region of interest, and you can apply this mask to the YAP channel.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #3: Analyze all nuclei in a single image</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 900px;"></hr>

See if you can set up a way to instead of manually analyzing a single nucleus one at a time, to analyze all the nuclei in a single image. Specifically for this exercise, see if you can analyze the mean nuclear YAP fluorescence intensity for all the nuclei in a single image, either the serum-starvation or the serum-stimulation condition.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #4: Perform statistical analysis</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 900px;"></hr>

Let's run a quick statistical analysis on our results from our single set of images.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #5: Run a full analysis</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 900px;"></hr>

For this exercise, let's take what we've learned for image analysis and set up a workflow for us to analyze a full set of fluorescence images to determine whether there is a significant increase in the nuclear intensity of YAP following serum stimulation.