<h1 style="font-size: 40px; margin-bottom: 0px;">5.1 Python image analysis (I)</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 800px;"></hr>

In today's lecture, we went over how images are expressed as either 2D matrices (single channel) or 3D matrices (multi-channel or true-color) with each pixel as an element in the matrix containing a numerical value representing the intensity of light. We're all familiar with using ImageJ for analyzing images, but here, we'll use Python to perform many of the same analyses that we can do in ImageJ. 

For today's notebook, we'll explore how images are represented on our computers in order to learn process our images. To do this, we'll make use of a package called <code>scikit-image</code>, which is an image processing package that uses numpy arrays. <a href="https://scikit-image.org/" rel="noopener noreferrer"><u>Information, including documentation, on <code>scikit-image</code> can be found here.</u></a>

<strong>Learning objectives:</strong>
<ul>
    <li>Learn how to import images in a Python notebook</li>
    <li>Understand how images are represented as a matrix</li>
    <li>Learn how to display images in a Python notebook</li>
    <li>Learn how to process images</li>
</ul>

<h1 style="font-size: 40px; margin-bottom: 0px;">Install <code>scikit-image</code> package</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

Recall from MCB201A notebook 16-2 that we can install packages that we need, and for today's lesson, we will make use of the <code>scikit-image</code> package, which is not currently in our virtual environment. 

To install the <code>scikit-image</code> package, we can make use of Terminal and the <code>pip</code> command to install our package.

<pre style="width: 350px; margin-top: 15px; margin-bottom: 15px; color: #000000; background-color: #EEEEEE; border: 1px solid; border-color: #AAAAAA; padding: 10px; border-radius: 15px; font-size: 12px;">pip install scikit-image</pre>

<h1 style="font-size: 40px; margin-bottom: 0px;">Import packages for today</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

After our package is successfully installed, let's import all the packages we'll use for today's notebook. 

We'll make use of our usual packages:

<ul>
    <li><code>numpy</code></li>
    <li><code>pandas</code></li>
    <li><code>matplotlib.pyplot</code></li>
    <li><code>seaborn</code></li>
    <li><code>os</code></li>
</ul>

And we'll make use of two packages that we haven't used before:

<ul>
    <li><code>skimage</code></li>
    <li><code>scipy.ndimage</code></li>
</ul>

These two packages are useful for processing images in Python to get them ready for analysis.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import skimage as ski
import scipy.ndimage as ndi

Let's take a look to see if our <code>ski</code> package is the proper version.

<h1 style="font-size: 40px; margin-bottom: 0px;">Importing images into Python</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

To work with our images in Python, we'll first need to import them like we would with any normal dataset. However, rather than using the panadas package and <code>pd.read_csv()</code> to import our file, we'll be using matplotlib's function <code>plt.imread()</code>. This function will read the data contained within an image file into a multidimensional array that we can then work with in Python.

Let's give it a try with a grayscale image of Liebchen.

As you recall from lecture, an 8-bit grayscale image is represented as a 2D matrix containing values from 0 (black) to 255 (white). We can take a look at a slice of our imported image to see how the image file is understood by Python.

What you can see is that each pixel is represented by a value that represents its grayscale value. In this case, our image file is an 8bit grayscale image, so we have a single 2D array where the values can span 0-255. 

<h1 style="font-size: 40px; margin-bottom: 0px;">Exploring grayscale images in Python</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 700px;"></hr>

To render this image in our Python notebook, we can make use of matplotlib, but this time instead of using this library to display our plots, we'll use it to display our images. To do this, we'll make use of the <code>plt.imshow()</code> function. <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html" rel="noopener noreferrer" target="_blank"><u>Documentation for <code>plt.imshow()</code> can be found here.</u></a> 

If we dig into the documentation, we'll see that we can pass a 2D numpy array to the function along with additional arguments to adjust how the image will be displayed/rendered. First, let's display our image with the default parameters.

You can see that by default, the colormap <code>cmap</code> will be a colormap called <a href="https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html" rel="noopener noreferrer"><u>'viridis', which was developed to improve readability and accessibility of quantitative data visualizations, such as as heatmaps</u></a>. To display our image as a grayscale image, we can pass <code>'gray'</code> to the <code>cmap</code> parameter.

We can also specify what value corresponds to pure black and what value corresponds to pure white using the <code>vmin</code> and <code>vmax</code> parameters, respectively. You'll want to keep in mind that the underlying data isn't changed by adjusting the <code>vmin</code> and the <code>vmax</code>. These parameters change how the image is displayed/rendered.

Since our images are 2D arrays, we can also use slice notation to pull out portions of our image.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exploring color images in Python</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 700px;"></hr>

Recall from lecture that true-color images are represented by three channels that are overlaid one on top of the other to display the final color. Let's import a color image and then explore how its represented as a 3D numpy array.

Recall from lecture that true color images are represented by a stack of three 2D arrays, where each 2D array corresponds to an individual channel, either red, green, or blue. Let's take a look at how the array itself looks like.

What you can see is that based on the syntax of our 3D matrix, it is a bit difficult to exactly tease out complete information on each channel by simply looking at the object in its entirety. From earlier, we know from our experience with 2D objects, like pandas DataFrames, that the dimensions are ordered as <code>&lbrack;a, b&rbrack;</code> with the first dimension <code>a</code> corresponding to the number of rows (height of our image), and the second dimension <code>b</code> to the number of columns (width of our image). So then for our 3D image, which is just our 2D arrays stacked one on top of another along a third dimension <code>c</code>:

<p style="font-size: 16px; text-align: center;"><strong>True-color image as 3D array</strong></p>
<img src="./ref-images/array-explanation.jpg" style="height: 300px; margin: auto"/>

So then, our true-color image can be represented as the 3D array <code>&lbrack;a, b, c&rbrack;</code>.

This means that for a 3-channel true-color image, the values for c will be either <code>0</code>, <code>1</code>, or <code>2</code>, representing grayscale intensity values for red, green, and blue, respectively. Let's pull out a single row from our true-color image to look at the RGB values.

And if we wanted all three channels of a single pixel:

So if we wanted a single channel, we can specify that as the third dimension. For our single pixel:

If we wanted the full 2D array for a single channel, we'll need to specify that we want all values from the first two dimensions, and just a single value for our third dimension.

We can then plot a single channel using the same function we used before.

Let's plot all three channels in grayscale side-by-side using <code>plt.implot()</code>.

<h1 style="font-size: 40px; margin-bottom: 0px;">Assembling a composite image</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 700px;"></hr>

More often than not, we're imaging our samples one channel at a time, so if we want to display our images as a composite, we need to construct a 3D array out of our separate 2D arrays. We can do this by using the <code>np.dstack()</code> function, which will stack/concatenate our arrays along the third axis. <a href="https://numpy.org/doc/stable/reference/generated/numpy.dstack.html" rel="noopener noreferrer"><u>Documentation for <code>np.dstack()</code> can be found here.</u></a>

Recall from earlier that the position of our 2D arrays in the 3rd dimension, specifies what channel it's in, so it's important to keep this in mind so that you know what channel(s) you are pulling out or compositing.

For example, if we mix up the channels, our true-color image won't be accurate to what we want it to show.

<h1 style="font-size: 40px; margin-bottom: 0px;">Pseudocolor an image</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 700px;"></hr>

We can take what we know about how true-color images are built up as a composite of three channels to also pseudocolor our images. If we repeat a single channel in all three channels, we will end up again with a grayscale image:

We can then take what we know about mathematical operations on arrays to then adjust the values for each channel. If we want to then pseudocolor our grayscale image red, we can zero-out the green and blue channels.

We can also achieve the same result by setting up a 2D array of zeros of the same dimensions as our image, and placing it in the positions corresponding to the channels we don't want to use. To set up an array of zeros that are the same dimensions as another object, we can make use of the <code>np.zeros_like()</code> function. <a href="https://numpy.org/doc/stable/reference/generated/numpy.zeros_like.html" rel="noopener noreferrer"><u>Documentation for <code>np.zeros_like()</code> can be found here.</u></a>

This function takes an object as an argument and then will initialize an array of zeros that matches the shape of the object that it was given, so if we provide it with one of our 2D arrays corresponding to our image, it will generate an array of zeros of the same shape as our image.

Let's then assemble the three channels with two of them assigned our array of zeros and generate an image.

We can also mess with the numbers to pseudocolor our image in a color that isn't exactly red, green, or blue.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exporting processed images</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 700px;"></hr>

Since we're using matplotlib to render our images, we can make use of our usual <code>plt.savefig()</code> or <code>fig.savefig()</code> functions to export our processed images. This time, instead of specifying our images with the extension <code>.pdf</code>, we can specify <code>.jpg</code> or another image extension, and our image will be exported as that file type.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #1: Import fluorescence data</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 850px;"></hr>

For the first set of exercises, let's take what we've learned and then apply it to our fluorescence data from MCB201A. First we'll import our fluorescence data files under the <code>data</code> subdirectory. We'll import all the files and apply what we know about how images are represented to process our fluorescence data and analyze them to extract quantitative data.

You can load it in one by one or if you remember from when we worked with multiple files for statistical analysis, see if you can apply that here to load in the data for today.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #2: Pseudocolor channels</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 850px;"></hr>

See if you can take what you've learned to then pseudocolor an individual channel from our set of images. Select either the no serum or serum stimulation condition and psuedocolor each channel for that treatment condition.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #3: Render three-channel composite images</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 950px;"></hr>

For this exercise, create three-channel composite images for both of our treatment conditions.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #4: Identify a "good" threshold</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 850px;"></hr>

For this exercise, we'll take a look at the underlying values of our DAPI images to identify what value we can set as a potentially "good" threshold to segment our nuclei from our background. 

We'll need to first take a look at the distribution of all pixel intensities for our DAPI images, and since our DAPI images are a 2D array, it won't plot neatly as is. So you'll first need to make use of the <code>np.ndarray.flatten()</code> to flatten your 2D array into a single dimension, so that it can be used to plot a simple histogram.

Plot a histogram displaying the distribution of values of our two DAPI channels, and plot each channel as its own distribution.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #5: Segment nuclei using your threshold</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 900px;"></hr>

Now that you've identified a good spot to set your threshold, see if you can make use of a conditional statement to threshold your DAPI images, thereby converting it into binary images.

Take a look at the array values now:

You should be able to see now that your thresholded array contains booleans. This is because we applied our conditional statement to each element in our 2D arrays, resulting in a boolean output corresponding to whether or not each element in the array met that condition. Now render your binary images side by side to see how the nuclei have been segmented for both images.

There are also many other ways to calculate thresholds, and when we reconvene here, we'll go over additional ways to threshold our images.

If we then wanted to use one of these algorithms to threshold our nuclei, we can call them up from the <code>scikit-image</code> package:

We can also apply a local threshold, where if our background signal was uneven in our image, we can take into account the local background to segment our images.

<h1 style="font-size: 40px; margin-bottom: 0px;">Fill holes in a thresholded image</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

Sometimes our thresholded image isn't exactly what we want it to be, and in the case of a few of our nuclei, we have some holes that will be helpful to fill so that we capture the entirety of each nuclei. To do this, we can make use of the <code>ndi.fill_binary_holes()</code> function, which is a quick way to fill in holes in our multidimensional binary array. <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.binary_fill_holes.html" rel="noopener noreferrer"><u>Documentation for <code>ndi.fill_binary_holes()</code> can be found here.</u></a>

Let's first take a look at one of our thresholded images:

We can see that there are holes within our segmented nuclei, and we can fill them in using <code>ndi.binary_fill_holes()</code>.

Another way, would be to do a dilation followed by an erosion, which will help to fill in gaps while limiting how much we expand our region of interest. To do this, we'll make use of some functions within scikit-image:

<ul>
    <li><code>ski.morphology.disk()</code> - this creates a binary circle for us to use as a footprint in our dilation and erosion. <a href="https://scikit-image.org/docs/0.25.x/api/skimage.morphology.html#skimage.morphology.disk" rel="noopener noreferrer"><u>Documentation is here.</u></a></li>
    <li><code>ski.morphology.dilation()</code> - this will increase the size of bright regions (our ROI). <a href="https://scikit-image.org/docs/0.25.x/auto_examples/applications/plot_morphology.html#dilation" rel="noopener noreferrer"><u>Documentation is here.</u></a></li>
    <li><code>ski.morphology.erosion()</code> - this will increase the size of dark regions (our background). <a href="https://scikit-image.org/docs/0.25.x/auto_examples/applications/plot_morphology.html#erosion" rel="noopener noreferrer"><u>Documentation is here.</u></a></li>
    <li><code>ski.morphology.closing()</code> - this will performs a dilation followed by an erosion, and can be another way to achieve a similar result. <a href="https://scikit-image.org/docs/0.25.x/auto_examples/applications/plot_morphology.html#closing" rel="noopener noreferrer"><u>Documentation is here.</u></a></li>
</ul>

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #6: Use binary threshold as a mask</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 850px;"></hr>

For this exercise, see if you can take what you now know about working with images and mathematical operations on arrays to use your segmented nuclei as a mask to look at just our region of interest (ROI) in the green and red channels.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #7: Segment cells and/or cytoplasm</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 850px;"></hr>

For this exercise, practice applying what you've learned so far to image analysis to pull cells as our ROI or just the cytoplasm.