<h1 style="font-size: 40px; margin-bottom: 0px;">5.1 Image analysis in Python Part II</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 700px;"></hr>

Today, we'll be continuing with some image analysis to see if we can update our function from last week in order to use it to analyze a series of nuclear YAP intensities at once. This will require an additional package that will let us navigate directories while in our Python notebook. Once we load in our images, we'll work through how to update our function from last week, where we analyzed just a single field of view, to one where we can analyze all our images.

<strong>Learning objectives:</strong>
<ul>
    <li>Practice navigating directories while working in a Python notebook</li>
    <li>Learn how to load in multiple files for simultaneous processing</li>
    <li>Set up a pipeline to analyze multiple images at once</li>
    <li>Continue using for loops and functions for image analysis</li>
</ul>

Packages that we'll use today are:
<ul>
    <li>numpy</li>
    <li>pandas</li>
    <li>matplotlib.pyplot</li>
    <li>skimage.measure</li>
    <li>seaborn</li>
    <li>scipy.stats</li>
    <li>scipy.ndimage</li>
    <li>os</li>
</ul>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import skimage.measure
from skimage.color import label2rgb
import seaborn as sns
import scipy.stats as stats
import scipy.ndimage as ndi
import os

<h1 style="font-size: 40px; margin-bottom: 0px;">Navigating directories</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

Today, you will learn to make use of the <mark style="background-color: #EEEEEE;"><strong>os</strong></mark> module in order to use operating system functionalities, such as changing directories. This module allows you to find files that you want to pull into Python, so you can load in multiple files at once for analysis. In our case, we will be using functions contained in this module to load in all our image files.

<h2>Identify current working directory</h2>

First, let's take a look at our current working directory by using the <mark style="background-color: #EEEEEE;"><strong>os.getcwd()</strong></mark> function. <a href="https://docs.python.org/3/library/os.html#os.getcwd" rel="noopener noreferrer" target="_blank"><u>Documentation is here.</u></a>

You can see that the <mark style="background-color: #EEEEEE;"><strong>os.getcwd()</strong></mark> function outputs to us the path to our current working directory, and by default, our working directory will be the folder that contains our Python notebook.

But what data type are we getting as an output?

You can see that we're getting a string data type as an output that contains the file path information to our current working directory. Recall that we can make use of the <mark style="background-color: #EEEEEE;"><strong>&plus;</strong></mark> operator to concatenate two strings together. What this means is that we can specify subdirectories or add onto the file path by combining the <mark style="background-color: #EEEEEE;"><strong>os.getcwd()</strong></mark> and a string that specifies the subdirectory that we want to go into.
```
os.getcwd()+'/images/'
```

<h2>Change directories</h2>

We can also change what folder we're working in by using the <mark style="background-color: #EEEEEE;"><strong>os.chdir()</strong></mark> function. <a href="https://docs.python.org/3/library/os.html#os.chdir" rel="noopener noreferrer" target="_blank"><u>Documentation is here.</a></u>

To reduce how much we need to type, we can use the <mark style="background-color: #EEEEEE;"><strong>os.getcwd()</strong></mark> function to get the file path to this week's directory and just add on the subdirectory information.
```
os.chdir(os.getcwd()+'/images/')
```

Now let's see what our current working directory is:

You should be able to see now that our current working directory has been updated to be the **images** folder contained within this week's directory.

<h2>Get the names of files and folders in a directory</h2>

Now that we're in a directory that we want to work in, we can take a look at what's containined within it by using the <mark style="background-color: #EEEEEE;"><strong>os.listdir()</strong></mark> function. <a href="https://docs.python.org/3/library/os.html#os.listdir" rel="noopener noreferrer" target="_blank"><u>Documentation is here.</u></a>
```
os.listdir()
```

You should be able to see that the output contains the folders for this week's images that we'll be analyzing. Specifically, the **bacteria**, **dapi**, and **yap** folders, as well as a hidden file **.ipynb_checkpoints**.

If we want to find the files contained within those subdirectories, we can either change our working directory, or we can add onto our current working directory file path.
```
os.listdir(os.getcwd()+'/dapi/no_serum/')
```

While you can change your current working directory, sometimes it's more convenient to stay in a parent directory, so you can pull all your files without having to keep changing directories beforehand.

<h3>Sort file names</h3>

What you should see now is a list containing all the file names for our no serum DAPI images that we want to analyze, but they are sorted in an arbitrary order, which means that if we want to match up our DAPI and our YAP images, we'll need to make sure that they are sorted properly by making use of the <mark style="background-color: #EEEEEE;"><strong>list.sort()</strong></mark> function. <a href="https://docs.python.org/3/library/stdtypes.html#list.sort" rel="noopener noreferrer" target="_blank"><u>Documentation is here.</u></a>
```
ns_dapi_files = os.listdir(os.getcwd()+'/dapi/no_serum/')
ns_dapi_files.sort()
```

Now that our file names are sorted properly, let's do the same for our no serum YAP images to see if they end up in the same order as our no serum DAPI images.
```
ns_yap_files = os.listdir(os.getcwd()+'/yap/no_serum/')
ns_yap_files.sort()
```

<h1 style="font-size: 40px; margin-bottom: 0px;">Load multiple files</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

So what we have now is a list of our files nicely sorted, which means that we can make use of for loops to load in all our files.

Let's set up a for loop that will simultaneously and pull in our DAPI and YAP images to their own respective lists.

<h2>Iterate through a list of tuples containing paired/grouped files</h2>

We can set up our for loop to loop through two lists simultaneously by giving it a list of tuples containing the elements of both lists paired together. To do this, we make use of the <mark style="background-color: #EEEEEE;"><strong>zip()</strong></mark> function. <a href="https://docs.python.org/3/library/functions.html#zip" rel="noopener noreferrer" target="_blank"><u>Documentation is here.</u></a>

The <mark style="background-color: #EEEEEE;"><strong>zip()</strong></mark> function is essentially a for loop that goes through each element of the lists that you provide it, and it combines those elements into tuples.
```
zip(ns_dapi_files, ns_yap_files)
```

And if you want to take a look at how all the files of both lists are grouped up:
```
for object in zip(ns_dapi_files, ns_yap_files):
    print(object)
```

You can see that our DAPI file is paired up with its respective YAP file since we sorted both of them earlier when we pulled the file names, and each object is now a tuple data type, which means that we can unpack it by providing a variable for each element of the tuple. 
```
for file_name in zip(ns_dapi_files, ns_yap_files):
    blue, green = file_name
    print(blue)
    print(green)
```

Rather than specifying the tuple unpacking after we initiate the for loop, we can also provide two variables when we start up our for loop, and the tuple will be unpacked while iterating through our list of tuples.
```
for blue, green in zip(ns_dapi_files, ns_yap_files):
    print(blue)
    print(green)
```
So we'll get the same result as the previous example, but we condense the code.

<h2>Load in both DAPI and YAP images simultaneously</h2>

We can use the same set up now to load in both our DAPI and YAP images simultaneously in a single for loop.
```
ns_dapi_images = []
ns_yap_images = []
for blue, green in zip(ns_dapi_files, ns_yap_files):
    dapi = plt.imread(f"dapi/no_serum/{blue}")
    yap = plt.imread(f"yap/no_serum/{green}")
    ns_dapi_images.append(dapi)
    ns_yap_images.append(yap)
```

Let's take a look at our images to see if they loaded in properly.

<h1 style="font-size: 40px; margin-bottom: 0px;">Process multiple images at once</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

So everything seems good, we have our DAPI and our YAP channels loaded in properly for each of our no serum fields of view. Let's then take our function from last week and bring it into this notebook, so we can make some slight modifications.

Let's update it so that we will generate the binary image within our function, so we don't have to generate a binary beforehand, and now we'll only need a single output since we're most interested in the mean nuclear intensity for each cell rather than an aggregate.

In [None]:
def mean_fluor(binary_image, yap_channel):
    """Calculating mean nuclear fluorescence"""
    #assign our YAP stain to a variable in the function to use later
    transcription_factor = yap_channel

    #Label our thresholded nuclei
    dapi_label_thresh, dapi_numbers = skimage.measure.label(binary_image, return_num=True)

    #get their properties
    nuc_props = skimage.measure.regionprops(dapi_label_thresh)

    #filter out the noise that we're not interested in
    areas = []
    for i in nuc_props:
        areas.append(i.area)

    #apply a conditional statement to remove noise
    nuc_filtered = np.zeros_like(dapi_label_thresh)
    for j in nuc_props:
        if j.area > 100:
            nuc_filtered = nuc_filtered + (dapi_label_thresh == j.label)

    #label our new filtered nuclei
    filtered_labels, filtered_numbers = skimage.measure.label(nuc_filtered, return_num=True)

    #group all our individual nuclei arrays into a single list
    all_nuclei = [np.zeros_like(filtered_labels)]*filtered_numbers
    for k in np.arange(0, filtered_numbers, 1):
        all_nuclei[k] = filtered_labels == k+1

    #measure mean fluorescence intensity
    mean_fluor_array = np.zeros(filtered_numbers)
    for l in np.arange(0, filtered_numbers, 1):
        fluorescence = transcription_factor*all_nuclei[l]
        mean_fluor_array[l] = np.sum(fluorescence)/np.sum(all_nuclei[l])

    result = mean_fluor_array.mean()
    
    return mean_fluor_array, result

Now that our function is defined, let's quickly take a look to see if we can find a good value for our threshold from all our DAPI images.
```
fig, ax = plt.subplots()

plt.ylim(0, 25000)

for im in ns_dapi_images:
    sns.histplot(im.flatten(), bins=50)
```

So it looks like the 400 threshold that we used last week should still work fine. Let's give our updated function a try with a single field of view.

Our function is able to output a list containing each cell's mean nuclear YAP fluorescence for us if we give it a single image. Now, we can make an additional modification so that our function will be able to loop through all our files and output all the mean nuclear intensities.

Now rather than running the function for each image repeatedly, we can just call up the function once, and we can analyze multiple images and get a single list of each cell's mean nuclear YAP fluorescence intensity.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #1: Update function to create basic analytical pipeline</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 1000px;"></hr>

Let's try to take what we've learned today to update our function so that all we need to do is change to our correct parent directory before we provide it with a string corresponding to the experimental condition and give it our desired threshold, and then Python can handle the rest.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #2: Compare no serum vs serum</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 800px;"></hr>

Now with two lists of data containing our mean nuclear fluorescence intensities for our no serum and serum stimulated cells, let's calculate if there is a significant difference in the nuclear localization of YAP.

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #3: Plot our results</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 800px;"></hr>

Let's take a look at how our results look like when plotted in a swarmplot with descriptive statistics overlaid on top of it.

In [None]:
fig, ax = plt.subplots()

Color_code = ['#3C5488', '#4DBBD5']

sns.swarmplot(
    data=full_results,
    s=7,
    palette=Color_code,
    zorder=0
)

sns.barplot(
    data=full_results,
    estimator=np.mean,
    alpha=0,
    errorbar='se',
    capsize=.3,
    err_kws={'linewidth': 1.5, 'color': 'black'}
)

sns.boxplot(
    data=full_results,
    showmeans=True,
    meanline=True,
    width=.5,
    meanprops = {'color': 'black', 'ls': '-', 'lw': 1.5},
    medianprops = {'visible': False},
    whiskerprops = {'visible': False},
    showfliers=False,
    showbox=False,
    showcaps=False
)

ax.set_ylabel('Mean YAP nuclear\nfluoresence intensity', fontsize=18)
ax.set_xticks([0,1], labels=['No Serum', 'Serum Stimulated'])
plt.xticks(rotation=30, ha='right', fontsize=18)
plt.yticks(fontsize=14)

fig.set_dpi(300)
fig.set_size_inches(6, 8)

plt.show()

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercise #4: Annotate plot</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

With our data analyzed and plotted, we can annotate our plot in order to show the results of our statistical analysis. Our annotations can be additional objects that we overlay on top of our plot, allowing us to provide information common to published figures.

<h2>Add lines</h2>

For example, let's draw a simple line on our plot by defining a set of x values and a set of y values, and then plotting those two points as a red line.
```
x1, x2 = 0, 1
y1, y2 = 200, 300

plt.plot([x1, x2], [y1, y2], lw=1.5, c='red')
```

Now let's create a line higher than the max value of our data.
```
x1, x2 = 0, 1
y = full_results.max().max() + 0.1*full_results.max().max()

plt.plot([x1, x2], [y, y], lw=1.5, c='red'
``` 

Now let's plot a bracket shape instead and change the color to black. A bracket is essentially just for points connected by lines, so let's define the four points to be plotted.
```
x1, x2 = 0, 1
y, height = full_results.max().max() + 0.1*full_results.max().max() , 0.05*full_results.max().max()

plt.plot([x1, x1, x2, x2], [y, y+h, y+h, y], lw=1.5, c='k')
```

<h2>Add text</h2>

We can also add text to our plots as well by making use of the <mark style="background-color: #EEEEEE;"><strong>plt.text()</strong></mark> function. <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html" rel="noopener noreferrer" target="_blank"><u>Documentation is here.</u></a> Using the same x and y values that we defined before, we can tell Python where we want the text to appear on our plot.
```
plt.plot((x1+x2)*0.5, y+height, '***', ha='center', va='bottom', color='k', size=12)
```