<h1 style="font-size: 40px; margin-bottom: 0px;">4.2 Python Image Analysis</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 850px;"></hr>

In this week's lecture, we went over how images are expressed as either 2D matrices (single channel) or 3D matrices (multi-channel or true-color) with each pixel as an element in the matrix containing a numerical value representing the intensity of light. We're all familiar with using ImageJ for analyzing images, but here, we'll use Python to perform many of the same analyses that we can do in ImageJ. If we have time, we can also make use of both ImageJ and Python to develop a pipeline for analyzing more complex data, such as our beating cardiomyocytes from MCB201A. 

<strong>Learning objectives:</strong>
<ul>
    <li>Understand how images are represented as a matrix</li>
    <li>Learn how to import and display images in a Python notebook</li>
    <li>Learn how to process images for analysis</li>
    <li>Learn how to analyze images to extract quantitative data</li>
    <li>Set up for loops and functions to analyze and plot a mulitple particles</li>
</ul>

For today's lesson, we'll be making use of the following packages:
<ul>
    <li>numpy</li>
    <li>pandas</li>
    <li>matplotlib.pyplot - we will also use matplotlib to display our images</li>
    <li><a href="https://scikit-image.org/" rel="noopener noreferrer" target="_blank">scikit-image (skimage)</a> - a package with a suite of tools for image analysis in Python. We'll make use of the <mark style="background-color: #EEEEEE;"><strong>skimage.measure</strong></mark> package and the <mark style="background-color: #EEEEEE;"><strong>color</strong></mark> package as well as other packages later on if we have time</li>
    <li>seaborn</li>
    <li>scipy.stats</li>
</ul>

<h1 style="font-size: 40px; margin-bottom: 0px;">Images as a matrix</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

As you recall from lecture, an 8-bit grayscale image can be represented as a 2D matrix containing values from 0 (black) to 255 (white).

Let's set up a quick, random 2D array below and use <mark style="background-color: #EEEEEE;"><strong>plt.imshow</strong></mark> to display our grayscale image. <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html" rel="noopener noreferrer" target="_blank"><u>Documentation for <mark style="background-color: #EEEEEE;"><strong>plt.imshow</strong></mark> is here.</u></a>

We can make use of our usual random integer generator <mark style="background-color: #EEEEEE;"><strong>np.random.randint()</strong></mark>, but we will need to specify that we are working with 8-bit integers by setting the <mark style="background-color: #EEEEEE;"><strong>dtype</strong></mark> parameter to <mark style="background-color: #EEEEEE;"><strong>np.uint8</strong></mark>.
```
gray_img = np.random.randint(0, 256, size=[50,50], dtype=np.uint8)
```

Now that we have our 2D array of pixel intensities, we can display our image using <mark style="background-color: #EEEEEE;"><strong>plt.imshow</strong></mark>. However, we'll need to make some adjustments to the default parameters, so that we display our image as a grayscale image.

We'll set the <mark style="background-color: #EEEEEE;"><strong>cmap</strong></mark> parameter equal to <mark style="background-color: #EEEEEE;"><strong>'gray'</strong></mark> to tell the function to display our image in grayscale. And we'll specify the bit-depth by providing it with the maximum and minimum values, corresponding to black and white respectively. To do this, we'll set <mark style="background-color: #EEEEEE;"><strong>vmin</strong></mark> equal to 0 and <mark style="background-color: #EEEEEE;"><strong>vmax</strong></mark> equal to 255. We can also hide our plot axes by setting it to <mark style="background-color: #EEEEEE;"><strong>False</strong></mark> and suppressing further outputs in the line with a semicolon <mark style="background-color: #EEEEEE;"><strong>;</strong></mark>, so we only display our plot.
```
plt.imshow(gray_img, cmap='gray', vmin=0, vmax=255)
plt.axis(False);
```

We can also do something similar for a true-color image. Let's create a 3-channel image using a similar set up that we used for our grayscale image. This time we will have 3 elements in the size object that we pass to our function in order to create a 3D array or random integers.
```
color_img = np.random.randint(0, 256, size=[50, 50, 3], dtype=np.uint8)
```

If we wanted to pull information from a specific channel, we can use slice notation to pull out either the red, green or blue channel.
```
red_color = color_img[:, :, 0]
green_color = color_img[:, :, 1]
blue_color = color_img[;, ;, 2]
```

What this means is that if we look along the third axis, we have 3 elements that are 2D arrays: <mark style="background-color: #EEEEEE;"><strong>[red_color, green_color, blue_color]</strong></mark>. You'll want to keep this in mind for later when you are combining channels together or trying to pseudocolor a single channel because the order in which the 2D arrays are arranged along the third axis (channel axis) dictates their color.

Since each pixel is an element within our 2D matrix, we can also pull out sections of our images to perform analyses or to visualize the intensity profiles.
```
red_color_profile = red_color[25,:]
green_color_profile = green_color[25,:]
blue_color_profile = blue_color[25,:]
```

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #1: Importing and displaying images</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 900px;"></hr>

Let's make use of our immunofluorescence data from MCB201A to do some image processing and then later today use it for image analysis. Upload three channels corresponding to a representative image of your serum starved cells (either YAP or TAZ) and (if you want) also a set of three channels for your serum stimulated cells. We'll import all of them and use the images to learn how to process images and analyze them to extract quantitative data from our images.

To import images, we will use <mark style="background-color: #EEEEEE;"><strong>plt.imread()</strong></mark>, which will turn our images into an array that we can play with using Python. <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imread.html" rel="noopener noreferrer" target="_blank"><u>Documentation for <mark style="background-color: #EEEEEE;"><strong>plt.imread()</strong></mark> is here.</u></a>
```
variable = plt.imread('File_name.extension')
```

Now let's take a look at our individual images side by side.

<h2>Displaying images in color</h2>

If we want to display our images as a composite, we can construct a 3D array out of our separate 2D arrays by using the <mark style="background-color: #EEEEEE;"><strong>np.dstack()</strong></mark> function, which will stack our arrays along the third axis. 

For <mark style="background-color: #EEEEEE;"><strong>plt.imshow()</strong></mark>, the channels are defined in the order <mark style="background-color: #EEEEEE;"><strong>[red, green, blue]</strong></mark> along the third axis of our 3D matrix. Recall from earlier that the third axis contains the channels, and the order in which they are positioned along the third axis dictates their assigned color. So when we construct the 3D matrix, we'll need to pay attention to the order in which we provide our 2D matrices.
```
composite = np.dstack((red_channel, green_channel, blue_channel))
```

If you're noticing a warning with your RGB image being displayed, you may need to normalize the intensity values by the maximum intensity value for each channel.
```
composite = np.dstack((red_channel/red_channel.max(), green_channel/green_channel.max(), blue_channel/blue_channel.max()))
```
This will scale your values between 0 and 1, allowing <mark style="background-color: #EEEEEE;"><strong>plt.imshow()</strong></mark> to display your color image without the warning.

If the image doesn't appear bright enough, you can also scale it with a value smaller than your max value, and the values that remain above 1 will be clipped to 1.
```
composite = np.dstack((red_channel/1500, green_channel/1500, blue_channel/1500))
```

If instead, you wanted to pseudocolor a single channel a particular color, you'll need to specify which channel(s) you want your image to be displayed in. In this case, you'll need to construct a 3D matrix consisting of your 2D array repeated along all three channels. Then you can clear the channel(s) that you don't want, while leaving the one(s) that you want displayed.

First, let's create a 3D matrix consisting of just our DAPI intensities.
```
Blue = np.dstack((blue_channel, blue_channel, blue_channel))
```

Then, what we can do is to clear the channels that we don't want displayed and keep the blue channel. To do this, we can multiply our 3D matrix with a list containing three elements, with each element being either a 1 or 0 depending on whether or not we want to display a channel, leaving us with a 3D matrix that contains values only in the channel(s) we want displayed.
```
Blue_only = Blue*[0, 0, 1]
```

Remember that screens follow the additive color model, so if we wanted a color like magenta or yellow, we would then make use of 2 channels rather than one.
```
Blue_as_magenta = Blue*[1, 0, 1]
```

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #2: Adjust threshold to isolate nuclei</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 900px;"></hr>

To identify a good threshold for us to use, we can take a look at the distribution of intensity values in our two nuclei images in a histogram. We will first need to collapse our 2D array into a single 1D array to make it easier to plot as a histogram by making use of the <mark style="background-color: #EEEEEE;"><strong>numpy.ndarray.flatten()</strong></mark> function. <a href="https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html" rel="noopener noreferrer" target="_blank"><u>Documentation is here.</u></a>.

We can doing it simultaneously when we call up the histogram function:
```
sns.histplot(image_array.flatten(), bins=50)
```

Once you've identified a good spot to set as a threshold, you can apply a conditional statement to your DAPI images.
```
DAPI_thresh = DAPI_image > threshold
```

Let's take a look at the data contained within our thresholded 2D arrays.

You should be able to see now that your thresholded array contains Booleans. This is because we applied our conditional statement to each element in our 2D arrays, resulting in a Boolean output corresponding to whether or not each element in the array met that condition.

There are also many other ways to calculate thresholds. You can find additional information on how to use scikit-image to threshold your images <a href="https://scikit-image.org/docs/stable/auto_examples/segmentation/plot_thresholding.html" rel="noopener noreferrer" target="_blank"><u>here.</u></a> scikit-image can also test multiple thresholding algorithms at once, so you can see which one will best fit your specific processing needs. 
```
from skimage import try_all_threshold

fig, ax = try_all_threshold(your_image)
```

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #3: Pull quantitative data</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 650px;"></hr>

We can use the **scikit-image packages** that we imported earlier to pull quantitative data from our images, specifically, we are going to use it to identify individual particles in our image by using <mark style="background-color: #EEEEEE;"><strong>skimage.measure.label()</strong></mark>. <a href="https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.label" rel="noopener noreferrer" target="_blank"><u>Documentation is here.</u></a> 

The <mark style="background-color: #EEEEEE;"><strong>skimage.measure.label()</strong></mark> function works by assigning any element with the value 0 as background (by default) and then identifying particles as clusters of connected elements that share the same value (in our case the value 1). Then it will assign each particle a label.
```
DAPI_thresh_label = skimage.measure.label(DAPI_thresh, return_num=True)
```

Let's take a look at our labeled cells and differentiate them by assigning them each a color that they will display as. To do this, we'll need to make use of the <mark style="background-color: #EEEEEE;"><strong>label2rgb()</strong></mark> function, which returns a colored image where each label has its own color. So in our case, each one of our nuclei will have its own color. <a href="https://scikit-image.org/docs/stable/api/skimage.color.html#skimage.color.label2rgb" rel="noopener noreferrer" target="_blank"><u>Documentation for <mark style="background-color: #EEEEEE;"><strong>label2rgb()</strong></mark> is here.</u></a>
```
colored_nuclei = label2rgb(DAPI_thresh_label)
plt.imshow(colored_nuclei)
```

With our labeled nuclei, we can make use of the <mark style="background-color: #EEEEEE;"><strong>skimage.measure.regionprops()</strong></mark> function, which will measure a bunch of properties of each particle. <a href="https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.regionprops" rel="noopener noreferrer" target="_blank"><u>Documentation for <mark style="background-color: #EEEEEE;"><strong>skimage.measure.regionprops()</strong></mark>, including the full set of properties that it measures, can be found here.</u></a>
```
nuclei_properties = skimage.measure.regionprops(DAPI_thresh_label)
```

Try printing the results.

So what is going on? Let's pull out the first element to try and make sense of the output of this function.

What you can see is that the first element is some set of <mark style="background-color: #EEEEEE;"><strong>RegionProperties</strong></mark>, and we can see from the documentation that we can access each property as an attribute.

If we wanted to get the area for our first particle:
```
nuclei_properties[0].area
```

And the number of elements in this output corresponds to the number of particles that we labeled.
```
print(len(nuclei_properties))
```

You can see that the output is not something we can make sense of without digging into each element one by one, so that means we will need to initiate a for-loop in order to get all the output data for a particular property that we're interested in.

For example, if we're interested in the area of our particles:
```
areas = []

for i in nuclei_properties:
     areas.append(i.area)
```

In this case, the iterable object that we provide is not just a simple list of <mark style="background-color: #EEEEEE;"><strong>[0,1,2,3,4,5]</strong></mark>, but rather it will go through each element of the list which corresponds to the <mark style="background-color: #EEEEEE;"><strong>RegionProperties</strong></mark> of each particle.

Let's take a look at the areas for our particles:

So by using Python, we are able to extract quantitative the properties of our nuclei.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #4: Filter out noise</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 650px;"></hr>

Let's take a look at the distribution of our particle areas. This will help us see if we may have captured a lot of noise, and if we have noise, we can filter those "particles" out.

You can see that we have a fair number of tiny tiny particles with an area of less than 100 sq pixels, which probably correspond to noise and isn't something particularly interesting for us. So what we can do is apply a conditional statement to each of our particles based on whether or not their area is greater than what we consider noise.

Since we're going through each property again, we'll need a for loop to make our way through each particle.

```
nuclei_filtered = np.zeros_like(labels)

for i in nuclei_properties:
    if i.area > 100:
        nuclei_filtered = nuclei_filtered + (labels == i.label)

plt.imshow(nuclei_filtered)
```

Here, we've made use of the <mark style="background-color: #EEEEEE;"><strong>np.zeros_like()</strong></mark> function, which functions similarly to <mark style="background-color: #EEEEEE;"><strong>np.zeros()</strong></mark>, except that it will create an array of zeros with the shape of the object that we give it. So this is a convenient way of creating arrays of zeros with a desired shape.

Our for loop will again be working its way through each element in our <mark style="background-color: #EEEEEE;"><strong>nuclei_properties</strong></mark> and checking the area to see if it meets our threshold for an actual nucleus, and once it meets that condition, it will update our array with the label values. Here, we're not pulling the specific label value for each element. Instead, since anything not labeled will be zero, we can add the label array to our array of zeros, and it will essentially clone that label over without erasing any previous labels that were added.

Now we'll want to relabel our filtered cells, so that we can ignore the noise when we want to continue with our analysis.
```
nuclei_filtered_labels, number_labels = skimage.measure.label(nuclei_filtered, return_num=True)
```

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #5: Pull out a single cell</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 650px;"></hr>

We can continue to make use of conditional statements to pull out a single cell by its label.
```
nuclei_1 = nuclei_filtered_labels == 1
```
This will create a new 2D array that pulls out the nucleus labeled as 1. 

And if we take a look at the data type contained within the 2D array created:

You can see that it is a 2D array containing Boolean values, and only where the cell label matches what we specified will it return <mark style="background-color: #EEEEEE;"><strong>true</strong></mark>, which we can then visualize using <mark style="background-color: #EEEEEE;"><strong>plt.imshow()</strong></mark>.

Since we have a handful of nuclei to plot, let's set up two separate for loops to plot out each individual nuclei side by side. The first for loop will be to pull out each nucleus, and the second for loop will be to plot them all.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #6: Quantify mean nuclear fluorescence</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 950px;"></hr>

With all of our nuclei filtered down to ones that are actually nuclei, we can then make use of another for loop to go through our labeled nuclei to calculate the mean nuclear fluorescence intensity of our transcription factor. 

<h1 style="font-size: 40px; margin-bottom: 0px;">Challenge: Compare mean nuclear fluorescence</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 950px;"></hr>

Let's do the same for our serum-stimulated cells now, so we can compare the mean nuclear fluorescence intensity between serum-starved and serum-stimulated cells.

For those of you who are more familiar with Python, see if you can define a function that will output the array of mean nuclear intensities for each cell and the mean of all nuclear intensities.

<h1 style="font-size: 40px; margin-bottom: 0px;">Challenge: Watershed (separate overlapping particles)</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 950px;"></hr>

While there isn't a single function that will allow us to perform a watershed operation on overlapping particles, <a href="https://scikit-image.org/docs/stable/auto_examples/segmentation/plot_watershed.html" rel="noopener noreferrer" target="_blank"><u>we can take a look at the documentation in scikit-image to see how we can perform watershed segmentation.</u></a>

Then we can pull out specific segmented regions by their label and use that as a mask for our original threshold to separate the two overlapping nuclei.