
# Lesson 6: Quantifying Properties of Identified Regions or Cells

We have learned to detect cells in digital images by making masks, and then applying those masks to measure properties of the entire image such as mean fluorescence intensity. Sometimes, it might also be useful to have an idea of how these properties vary from cell to cell. Today, we'll learn how to identify (and measure attributes of) several different regions of interest at once.

We will
- Review how to detect cells by developing a pipeline to do so
- Access properties of cells that have been detected, such as
    - Area
    - Intensity
    - Image vs mask properties
    - Measures of roundness
        - Aspect ratio
        - Convexity
- View the statistics of properties of detected cells
- Filter out unwanted cells based on their properties

In [None]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from skimage.io import imread

sns.set_style('dark')

## Review Segmentation

We will begin with a new dataset: a single still frame from a movie of neurons expressing a calcium indicating fluorophore, GCaMP6f. The cells are on a coverslip and do not move, so it is sufficient to locate them in just one frame. Although neurons have long thin dendrites, in our case we are interested in the activity at the soma. The goal, then, is to come up with a list of masks that cover the cell bodies.

In [None]:
raw = imread('../data/neurons10x.tif')
plt.imshow(raw, cmap='inferno')

**Exercise** Thinking of the tools we already have at our disposal, what are some features of this image that we need to act on before attempting to find cells. 

The cells are brighter than their background. We can use thresholding to find them. However, there is uneven background, which will expand cells in the middle and shrink them at the periphery. Let's remove the background with a rolling ball, or min-filter.

In [None]:
from skimage.morphology import disk
from skimage.filters.rank import minimum as min_filter

structuring_element = disk(51)
background = min_filter(raw, structuring_element)

View the background. It should look fairly smooth.

In [None]:
plt.imshow(background, cmap='inferno')

Now we subtract the background.

In [None]:
bg_subtracted = raw - background
plt.imshow(bg_subtracted, cmap='inferno')

And finally apply a threshold. We will use Otsu's method to set the threshold automatically.

In [None]:
from skimage.filters import threshold_otsu

threshold = threshold_otsu(bg_subtracted)
foreground = bg_subtracted > threshold

plt.imshow(foreground, cmap='gray')

## Properties of Regions

The image segments into many small, well-separated cells.

Now we need a way of isolating each cell. Recall that the image above is a binary mask:

- 1 = foreground
- 0 = background

Our goal is to make a new image where each pixel gets a number corresponding to the cell it belongs to, or 0 if it's background. How might we do this?

One way would be to look for a pixel of value 1 and assign it a number. Then visit its neighbors and, if their value is 1, assign the neighbor the same number and repeat on _its_ neighbors. Rather than code this up ourselves, sckit-image supplies us with the function `label`.

In [None]:
from skimage.measure import label

In [None]:
cell_labels = label(foreground)
plt.imshow(cell_labels, cmap='Set1',vmin=0,vmax=cell_labels.max())

If we zoom in on a subset of the nuclei and look directly at the array, we see that the different color labels correspond to connected values within the array.

In [None]:
#fig, ax = plt.subplots(1, 2, figsize=(16, 4))
cropXmin = 105
cropYmin = 260
crop_size = 15
cropXmax = cropXmin + crop_size
cropYmax = cropYmin + crop_size

plt.imshow(cell_labels[cropYmin:cropYmax, cropXmin:cropXmax],cmap='Set1',vmin=0,vmax=cell_labels.max(),interpolation='nearest')

cell_labels[cropYmin:cropYmax, cropXmin:cropXmax]

Each cell is now shown in a slightly different color on a red-colored background. Now we would like to make a list of each of these cells with their masks and some information about them. Again scikit-images helps us.

In [None]:
from skimage.measure import regionprops

#look at regionprops to see what the inputs are
regionprops?

In [None]:
props = regionprops(cell_labels, bg_subtracted)

We give `regionprops` two arguments: the image of numerical labels and the original image with the background subtracted off. We need this second bit of information because some properties that `regionprops` gives us, like "mean intensity" depend on both the mask shape and the original image. Let's take a look at what the properties `regionprops` tells us first. We'll return to this important mask vs. original image distinction later.

*Exercise* Determine how the output `props` is organized.

In [None]:
props?

In [None]:
props.shape
#because props is a string, use len(props)

In [None]:
len(props)

What might the length of props mean? 

In [None]:
cell_labels.max()

Each element of props corresponds to the labeled nuclei region. 

Let's visualize each labeled nucleus to have a sense of its location on the mask and its shape.

In [None]:
from ipywidgets import interactive

@interactive
def show_labeled_cells(label_index = list(range(cell_labels.max()))):
    current_cell_labels = cell_labels == props[label_index].label
    plt.imshow(current_cell_labels.astype(int), cmap='gray',vmin=0,vmax=1)
    plt.title("Nucleus #{:d}".format(label_index + 1))
show_labeled_cells

Let's look at the area of the first labeled nucleus.

In [None]:
props[0].area

What about its mean pixel value?

In [None]:
props[0].mean_intensity

## Image vs. Mask Properties
The area is a property of the mask: it's the total number of pixels in the mask (i.e. that have value 1). The mean intensity is a property of the image. If it was a property of the mask, it would always be between 0 and 1, since those are the only two values in the mask. This is fairly clear for these two examples, but sometimes the difference is subtle.

In [None]:
props[12].centroid

In [None]:
props[12].weighted_centroid

The centroid is just the center-of-mass of the mask, but the weighted centroid is the center-of-mass of the image. They are similar here because cell brightness values have good radial symmetry, but for more complicated objects these numbers differ meaningfully.

## Measures of Roundness

We can use some of these properties to quality control our cell masks. One important quality is roundness - we want our cells to look round because we don't want to include dendrites in our analysis. There are a few ways to do this.

- Look at the aspect ratio i.e. width:height. If this is nearly 1:1, then the mask should be round.

In [None]:
bounding_box = props[0].bbox
aspect_ratio = 1. * (bounding_box[3] - bounding_box[1]) / (bounding_box[2] - bounding_box[0])
print(aspect_ratio)

**Exercise** Can you describe a situation where the aspect ratio is a bad metric?

What about an "X"?

**A better metric might be to check if the image is mostly convex.** We do can do this by taking the convex hull of the mask, which is the mask with pixels added until the mask is convex. If we didn't need to add many pixels, then the starting image was fairly convex.

- Soliditity: ratio of area of mask to area of the mask's convex hull.

In [None]:
props[0].solidity

However it's possible that a mask is convex, but elongated, making it a poor circle. But both metrics taken together may work nicely!

In [None]:
def circleness(properties):
    bounding_box = properties.bbox
    aspect_ratio = 1. * (bounding_box[3] - bounding_box[1]) / (bounding_box[2] - bounding_box[0])
    
    # Aspect can be > 1. We want a value that provides a penalty when aspect is not near 1.
    if aspect_ratio > 1:
        aspect_penalty = 1./aspect_ratio
    else:
        aspect_penalty = aspect_ratio
        
    return properties.solidity * aspect_penalty

In [None]:
circleness(props[0])

## Viewing Statistics of Region Properties

But is this a _good_ value of circleness? To find out, well need to know how to look at distributions of the properties. Let's start with `area` since that's a built-in property, not one we invented.

In [None]:
all_areas = [property.area for property in props]
sns.distplot(all_areas, bins=10)
plt.xlim([min(all_areas), max(all_areas)])

### Filtering regions by their properties

We can see that we have a lot of very small regions. Can we filter these out and make a new list of only large regions?

In [None]:
big_region_properties = [prop for prop in props if prop.area > 20]

In [None]:
print("{} regions out of {} are big enough".format(len(big_region_properties), len(props)))

**Exercise** View the histogram of circleness values in our large cells only.

In [None]:
circleness_values = [circleness(prop) for prop in big_region_properties]
sns.distplot(circleness_values, bins=20)
plt.xlim(min(circleness_values), max(circleness_values))

Let's look at an example with low circleness and with high. To do this, we will sort the big region properties by circlness.

In [None]:
props_by_circleness = sorted(big_region_properties, key=circleness)

low_circ = props_by_circleness[0]
high_circ = props_by_circleness[-1]

fig, ax = plt.subplots(1, 2, figsize=(8, 4))
ax[0].imshow(low_circ.image, cmap='inferno', interpolation='nearest')
ax[0].set_title("Lowest circleness")
ax[1].imshow(high_circ.image, cmap='inferno', interpolation='nearest')
ax[1].set_title("Highest circleness")

Notice how the low-circleness region looks like it might be a dendritic process rather than a cell body. Our circleness is working as expected.

Remember that you can use your intuition to design exotic region properties and use them to remove unwanted regions, _but_ see if the existing properties in `regionprops` can give you what you want first, or if there are established methods in the literature. If not, always invest time visualizing results from your metric to make sure it's working as you anticipate. For example, as I was putting this together I noticed that circleness works poorly on very small regions. Good thing we already know how to filter regions out by their area!