# Image Analysis with Python - <font color='green'>Tutorial Pipeline Section 2</font>

*originally created in 2016*<br>
*updated and converted to a Jupyter notebook in 2017*<br>
*updated and converted to python 3 in 2018*<br>
*by Jonas Hartmann (Gilmour group, EMBL Heidelberg)*<br>
*updated and modified in 2022 by Cheng-Yu Huang*<br>

##  Table of Contents

1. [About this Tutorial](#about)
2. [Initialization](#initialize)
11. [Postprocessing: Removing Cells at the Image Border](#postpro)
12. [Identifying Cell Edges](#edges)
13. [Extracting Quantitative Measurements](#measure)
14. [Simple Analysis & Visualization](#analysis)


To learn more about <font color='teal'>*Writing Output to Files*</font> and <font color='teal'>*Batch Processing*</font> please go to the solution file

##  About this Tutorial <a id=about></a>

*This tutorial covers the part 2 of the image analysis tutorial*


#### Instructions

- In the section 1 of the Codelab, you performed adaptive thresholding and connected-component analysis of our raw image.

- Here we are going to continue from where we left behind, starting with the segmentation result, we will first clean all the cell patches near the border of the image, and detect the edge of each cells. Then we will perform the statistical analysis to the results.

## Initialization <a id=initialize></a>

In this section we will import all the necessary modules and packages. Then we will load the raw image data and our segmentation results, for further processing

In [None]:
# (i) Importing all the necessary modules and packages

# The numerical arrays manipulation module numpy as np
# The plotting module matplotlib.pyplot as plt
# The image processing module scipy.ndimage as ndi
### YOUR CODE HERE!

In [None]:
# (ii) Set matplotlib backend

### YOUR CODE HERE! 

In [None]:
# (iii) Specify the directory path and file name

# Create a string variable with the relative (or absolute) path to your raw image
# and segmentation results. 
### YOUR CODE HERE!

In [None]:
# (iv) Load the raw image and the segmentation results

### YOUR CODE HERE!

In [None]:
# (v) Look at the images to confirm that everything worked as intended

### YOUR CODE HERE!

## Postprocessing: Removing Cells at the Image Border <a id=postpro></a>

#### Background

Since segmentation is never perfect, it often makes sense to explicitely remove artifacts afterwards. For example, one could filter out objects that are too small, have a very strange shape, or very strange intensity values. 

**Warning:** Filtering out objects is equivalent to the *removal of outliers* in data analysis and *should only be done for good reason and with caution!*

As an example of postprocessing, we will now filter out a particular group of problematic cells: those that are being cut off at the image border.

#### <font color='teal'> Exercise </font>

Iterate through all the cells in your segmentation and remove those touching the image border.

Follow the instructions in the comments below. Note that the instructions will get a little less specific from here on, so you need to figure out how to approach a problem yourself.

In [None]:
# (i) Create an image border mask

# We need some way to check if a cell is at the border. For this, we generate a 'mask' of the image border,
# i.e. a Boolean array of the same size as the image where only the border pixels are set to `1` and all 
# others to `0`, like this:
#   1 1 1 1 1
#   1 0 0 0 1
#   1 0 0 0 1
#   1 0 0 0 1
#   1 1 1 1 1
# There are multiple ways of generating this mask, for example by erosion or by array indexing.
# It is up to you to find a way to do it. (Hint: one of the the easiest ways to do this is via scipy.ndimage.binary_dilation.
# check the parameter "border_value")

### YOUR CODE HERE!

In [None]:
# (ii) 'Delete' the cells at the border

# When modifying a segmentation (in this case by deleting some cells), it makes sense
# to work on a copy of the array, not on the original. This avoids unexpected behaviors,
# especially within jupyter notebooks. Use the function 'np.copy' to copy an array.
### YOUR CODE HERE!

# Iterate over the IDs of all the cells in the segmentation. Use a for-loop and the 
# function 'np.unique' (remember that each cell in our segmentation is labeled with a 
# different integer value).
### YOUR CODE HERE!

    # Create a mask that contains only the 'current' cell of the iteration
    # Hint: Remember that the comparison of an array with some number (array==number)
    #       returns a Boolean mask of the pixels in 'array' whose value is 'number'.
    ### YOUR CODE HERE!

    
    # Using the cell mask and the border mask from above, test if the cell has pixels touching 
    # the image border or not.
    # Hint: 'np.logical_and'
    ### YOUR CODE HERE!
    
    # If a cell touches the image boundary, delete it by setting its pixels in the segmentation to 0.
    ### YOUR CODE HERE!

In [None]:
# OPTIONAL: re-label the remaining cells to keep the numbering consistent from 1 to N (with 0 as background).
# Hint: Use python function <enumerate>

### YOUR CODE HERE!

In [None]:
# (iii) Visualize the result

# Show the result as transparent overlay over the raw or smoothed image. 
# Here you have to combine alpha (to make cells transparent) and 'np.ma.array'
# (to hide empty space where the border cells were deleted).

### YOUR CODE HERE!

## Identifying Cell Edges <a id=edges></a>

#### Background

With the final segmentation in hand, we can now start to think about measurements and data analysis. However, to extract interesting measurements from our cells, the segmentation on its own is often not enough: additional masks that identify sub-regions for each cell allow more precise and more biologically relevant measurements.

The most useful example of this is an additional mask that identifies only the edge pixels of each cell. This is useful for a number of purposes, including:

- Edge intensity is a good measure of membrane intensity, which is often a desired readout.
- The intensity profile along the edge may contain information on cell polarity.
- The length of the edge (relative to the cell area) is an informative feature about the cell shape. 
- Showing colored edges is a nice way of visualizing cell segmentations.

There are many ways of identifying edge pixels in a fully labeled segmentation. Here, we will use a simple and relatively fast method based on erosion.

#### <font color='teal'> Exercise </font>

Create a labeled mask of cell edges by following these steps:


- Create an array of the same size and data type as the segmentation but filled with only zeros
    - This will be your final cell edge mask; you gradually add cell edges as you iterate over cells
    

- *For each cell...*
    - Erode the cell's mask by 1 pixel
    - Using the eroded mask and the original mask, create a new mask of only the cell's edge pixels
    - Add the cell's edge pixels into the empty image generated above, labeling them with the cell's original ID number


Follow the instructions in the comments below.

In [None]:
# (i) Create an array of the same size and data type as the segmentation but filled with only zeros

### YOUR CODE HERE!


In [None]:
# (ii) Iterate over the cell IDs
### YOUR CODE HERE!
    
    # (iii) Erode the cell's mask by 1 pixel
    # Hint: 'ndi.binary_erode'
    ### YOUR CODE HERE!

    # (iv) Create the cell edge mask
    # Hint: 'np.logical_xor'
    ### YOUR CODE HERE!

    
    # (v) Add the cell edge mask to the empty array generated above, labeling it with the cell's ID
    ### YOUR CODE HERE!


In [None]:
# (vi) Visualize the result

# Note: Because the lines are so thin (1pxl wide), they may not be displayed correctly in small figures.
#       You can 'zoom in' by showing a sub-region of the image which is then rendered bigger. You can
#       also go back to the edge identification code and make the edges multiple pixels wide (but keep 
#       in mind that this will have an effect on your quantification results!).

### YOUR CODE HERE!


## Extracting Quantitative Measurements <a id=measure></a>

#### Background

The ultimate goal of image segmentation is of course the extraction of quantitative measurements, in this case on a single-cell level. Measures of interest can be based on intensity (in different channels) or on the size and shape of the cells.

To exemplify how different properties of cells can be measured, we will extract the following:

- Cell ID (so all other measurements can be traced back to the cell that was measured)
- Mean intensity of each cell
- Mean intensity at the membrane of each cell
- The cell area, i.e. the number of pixels that make up the cell
- The cell outline length, i.e. the number of pixels that make up the cell edge

*Note: It makes sense to use smoothed/filtered/background-subtracted images for segmentation. When it comes to measurements, however, it's best to get back to the raw data!*

#### <font color='teal'>Exercise</font>

Extract the measurements listed above for each cell and collect them in a dictionary.

Note: The ideal data structure for data like this is the `DataFrame` offered by the module `Pandas`. However, for the sake of simplicity, we will here stick with a dictionary of lists.

Follow the instructions in the comments below.

In [None]:
# (i) Create a dictionary that contains a key-value pairing for each measurement

# The keys should be strings describing the type of measurement (e.g. 'intensity_mean') and 
# the values should be empty lists. These empty lists will be filled with the results of the
# measurements.

### YOUR CODE HERE!

In [None]:
# (ii) Record the measurements for each cell

# Iterate over the segmented cells ('np.unique').
# Inside the loop, create a mask for the current cell and use it to extract the measurements listed above. 
# Add them to the appropriate list in the dictionary using the 'append' method.
# Hint: Remember that you can get out all the values within a masked area by indexing the image 
#       with the mask. For example, 'np.mean(image[cell_mask])' will return the mean of all the 
#       intensity values of 'image' that are masked by 'cell_mask'!

### YOUR CODE HERE!

In [None]:
# (iii) Print some results and check that they make sense

### YOUR CODE HERE!


## Simple Analysis & Visualisation <a id=analysis></a>

#### Background

By extracting quantitative measurements from an image we cross over from 'image analysis' to 'data analysis'. 

This section briefly explains how to do basic data analysis and plotting, including boxplots, scatterplots and linear fits. It also showcases how to map data back onto the image, creating an "image-based heatmap".

#### <font color='teal'>Exercise</font>

Analyze and plot the extracted data in a variety of ways.

Follow the instructions in the comments below.

In [None]:
# (i) Familiarize yourself with the data structure of the results dict and summarize the results

# Recall that dictionaries are unordered; a dataset of interest is accessed through its key.
# In our case, the datasets inside the dict are lists of values, ordered in the same order
# as the cell IDs. 

# For each dataset in the results dict, print its name (the key) along with its mean, standard 
# deviation, maximum, minimum, and median. The appropriate numpy methods (e.g. 'np.median') work
# with lists just as well as with arrays.

### YOUR CODE HERE!

In [None]:
# (ii)-1 Create a histogram showing the distribution of cell surface area in pixels 

# Use the function 'plt.hist'. Change the "bins" parameter of the function to see the more detailed 
# trend of the data. What do you observe?

### YOUR CODE HERE!

In [None]:
# (ii)-2 Create a box plot showing the mean cell and mean membrane intensities for both channels. 

# Use the function 'plt.boxplot'. Use the 'label' keyword of 'plt.boxplot' to label the x axis with 
# the corresponding key names. Feel free to play around with the various options of the boxplot 
# function to make your plot look nicer. Remember that you can first call 'plt.figure' to adjust 
# settings such as the size of the plot.

### YOUR CODE HERE!

In [None]:
# (iii) Create a scatter plot of cell outline length over cell area

# Use the function 'plt.scatter' for this. Be sure to properly label the 
# plot using 'plt.xlabel' and 'plt.ylabel'.
# Note: it is a good idea to make the marker (the data point) more transparent so that
# where you found the plot less transparent it means there are data points overlapping.
### YOUR CODE HERE!

# BONUS: Do you understand why you are seeing the pattern this produces? 
###

# Can you generate a 'null model' curve that assumes all cells to be circular?
### YOUR CODE HERE!


# What is the result? Do you notice something odd about it? What could be the reason for
# this and how could it be fixed?
###

In [None]:
# (iv) Perform a linear fit of membrane intensity over cell area

# Use the function 'linregress' from the module 'scipy.stats'. Be sure to read the docs to
# understand the output of this function. Print the output.

### YOUR CODE HERE!

In [None]:
# (v) Think about the result

# Note that the fit seems to return a highly significant p-value but a very low correlation 
# coefficient (r-value). Based on prior knowledge, we would not expect a linear correlation of 
# this sort to be present in our data. 
#
# This should prompt several questions:
#   1) What does this p-value actually mean? Check the docs of 'linregress'!
###
#
#   2) Could there be artifacts in our segmentation that bias this analysis?
###
#
# In general, it's always good to be very careful when doing any kind of data analysis. Make sure you 
# understand the functions you are using and always check for possible errors or sources of bias!

In [None]:
# (vi) Overlay the linear fit onto a scatter plot

# Recall that a linear function is defined by `y = slope * x + intercept`.

# To define the line you'd like to plot, you need two values of x (the starting point and
# and the end point of the line). What values of x make sense? Can you get them automatically?
### YOUR CODE HERE!


# When you have the x-values for the starting point and end point, get the corresponding y 
# values from the fit through the equation above.
### YOUR CODE HERE!


# Plot the line with 'plt.plot'. Adjust the line's properties so it is well visible.
# Note: Remember that you have to create the scatterplot before plotting the line so that
#       the line will be placed on top of the scatterplot.
### YOUR CODE HERE!


# Use 'plt.legend' to add information about the line to the plot.
### YOUR CODE HERE!


# Label the plot and finally show it with 'plt.show'.
### YOUR CODE HERE!


In [None]:
# (vii) Map the cell area back onto the image as a 'heatmap'

# Scale the cell area data to 8bit so that it can be used as pixel intensity values.
# Hint: if the largest cell area should correspond to the value 255 in uint8, then 
#       the other cell areas correspond to 'cell_area * 255 / largest_cell_area'.
# Hint: To perform an operation on all cell area values at once, convert the list 
#       of cell areas to a numpy array.
### YOUR CODE HERE!


# Initialize a new image array; all values should be zeros, the shape should be identical 
# to the images we worked with before and the dtype should be uint8.
### YOUR CODE HERE!


# Iterate over the segmented cells. In addition to the cell IDs, the for-loop should
# also include a simple counter (starting from 0) with which the area measurement can be 
# accessed by indexing.
### YOUR CODE HERE!


    # Mask the current cell and assign the cell's (re-scaled) area value to the cell's pixels.
    ### YOUR CODE HERE!

    
# Visualize the result as a colored semi-transparent overlay over the raw/smoothed original input image.
# BONUS: See if you can exclude outliers to make the color mapping more informative!
### YOUR CODE HERE!


## <font color='teal'>*Congratulations! You have completed the tutorial!*</font>

We hope you enjoyed the ride and learned a lot!

Note: **Please go to the solution file to learn more about**
- **Writing output to files**
and
- **Batch Processing**

### Concluding Remarks

It's important to remember that the phrase ***"Use it or loose it!"*** fully applies for the skills taught in this tutorial.

If you now just go back to the lab and don't touch python or image analysis for the next half year, most of the things you have learned here will be lost.

So, what can you do?


- If possible, start applying what you have learned to your own work right away


- Even if your current work doesn't absolutely *need* coding / image analysis (which to be honest is hard to believe! ;p), you can still use it at least to make some nice plots!


- Another very good approach is to find yourself an interesting little side project you can play around with

***We wish you the best of luck for all your coding endeavors!***