In [None]:
import glob
import os

import numpy as np
import pandas as pd

# A whole bunch of skimage stuff
import skimage.feature
import skimage.filters
import skimage.filters.rank
import skimage.io
import skimage.morphology
import skimage.segmentation

# And some useful scipy.ndimage stuff
import scipy.ndimage

import bebi103

import bokeh
bokeh.io.output_notebook()

In [None]:
cd ~/git/BE-Bi-103/Codes

To start, we find the all the sequential images from the directory, and see how many images we've got to make sure it is the right directory.

In [None]:
# The directory containing sequential images of bacteria
data_dir = '../data/iyer_biswas_et_al/sequential_full_field'

# Glob string for images
im_glob = os.path.join(data_dir, '*.tif')

# Get list of files in directory
im_list = sorted(glob.glob(im_glob))

# Let's look at the entries
im_list[:]

We used the code from last homework, which would load those images as a collection and we also took a look at the length of the list to make sure that all images are loaded.

In [None]:
# Load in TIFF stack
ic = skimage.io.ImageCollection(im_glob,
                                conserve_memory=False)

# How long is it?
print('There are {0:d} frames.'.format(len(ic)))

Take a look at one of the image to see what it is like:

In [None]:
ic[0]

Firstly, we want to look at all the images and 

In [None]:
#Define interpixel distance given by the dataset.
ip = 0.052
#Use the frame ID as the name for plots
im_names = '0123456789'

plots = [bebi103.viz.imshow(im,
                            plot_height=500,
                            title=name,
                            interpixel_distance=ip,
                            length_units='µm')
             for name, im in zip(im_names, ic)]

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))


They don't look that different, so we take one of them to try different image segmentation methods and then apply the best one on all of them to test. We just arbitrarily choose the fifth image to go with. Take a close look, and for the convenience of finding optimal zoom areas (potential ROIs), we plot it with pixel numbers instead of actual scales.

In [None]:
# Take a copy of the fifth image in the collection.
im = np.copy(ic[4])

plots = [[bebi103.viz.imshow(im,
                            plot_height=500,
                            title='Test copy',
                            #interpixel_distance=ip,
                            length_units='pixels')]]

bokeh.io.show(bokeh.layouts.gridplot(plots))

By just looking at this image, we can see some parts that would probably not be easy to segment. For instance, the bottom part has a lot of string-like artifacts, and so does the upper bottom part. There are also a couple of regions where some bacteria cluster together, thus we can give some conditions to choose bacteria for long time course imaging: 

1. Well separated with other bacteria;
2. Away from any types of artifacts;
3. Uniform lighting through single bactrium;
4. The fixed bacteria do not move among frames.

But we will not zoom to some speicific regions now; we plan to do some basic segmentation first to make sure our conditions work as epected.

As we proceed, we need to demonstrate the differences before and after segmentation, so we borrow this nice function from Justin's notes and make some modifications for convenient changes between pixel number and actual scale.

In [None]:
def show_two_ims(im_1, im_2, titles=[None, None], interpixel_distances=[],
                 color_mapper=None):
    """Convenient function for showing two images side by side."""
    if len(interpixel_distances) == 2:
        p_1 = bebi103.viz.imshow(im_1,
                                 plot_height=300,
                                 title=titles[0],
                                 color_mapper=color_mapper,
                                 interpixel_distance=interpixel_distances[0],
                                 length_units='µm')
        p_2 = bebi103.viz.imshow(im_2,
                                 plot_height=300,
                                 title=titles[1],
                                 color_mapper=color_mapper,
                                 interpixel_distance=interpixel_distances[0],
                                 length_units='µm')
        p_2.x_range = p_1.x_range
        p_2.y_range = p_1.y_range
    else:
        p_1 = bebi103.viz.imshow(im_1,
                                 plot_height=300,
                                 title=titles[0],
                                 color_mapper=color_mapper,
                                 length_units='pixels')
        p_2 = bebi103.viz.imshow(im_2,
                                 plot_height=300,
                                 title=titles[1],
                                 color_mapper=color_mapper,
                                 length_units='pixels')
        p_2.x_range = p_1.x_range
        p_2.y_range = p_1.y_range
    
    return bokeh.layouts.gridplot([p_1, p_2], ncols=2)

We did the Gaussian blur first just to check whether the background lighting is uniform or not.

In [None]:
# Convert image to float
im_float = skimage.img_as_float(im)

# Filter the image with a strong Gaussian blur
im_bg = skimage.filters.gaussian(im_float, 50.0)


bokeh.io.show(show_two_ims(im_float, im_bg, titles=['original', 'background']))

The background is not very uniform, but the purpose of ours is just to segment out the bacteria and we would like to do some threshold anyway, thus we choose not to do the background substration for now. Also, once we find that once converted to float number, the image cannot go through some function that requires uint8 or uint16 data type, like skimage.filter.rank.mean. 

Now, we want to start with the easiest thresholding methods like manually defining the threshould by looking at the histogram of the image, so we plot the histogram first.


In [None]:
# Compute histograms
hist_bins = skimage.exposure.histogram(im)

def plot_hist(hist_bin, title, y_axis_type='linear'):
    """Make plot of image histogram."""
    p = bokeh.plotting.figure(plot_height=300,
                              plot_width=400,
                              y_axis_type=y_axis_type,
                              x_axis_label='intensity',
                              y_axis_label='count',
                              title=title)
    hist, bins = hist_bin
    p.line(bins, hist, line_width=2)

    return p

# Display histograms
plots = [[plot_hist(hist_bins, 'Histogram','log')]]
bokeh.io.show(bokeh.layouts.gridplot(plots))

It looks like the turnover point is around 4000 where we suppose most of the background would add in, so we choose 4000 as the threshold for the manual thresholding.

In [None]:
# Threshold image
thesh = 4000
im_bw = im < thesh

# Take a look
bokeh.io.show(show_two_ims(im,
                           im_bw,
                           titles=['original', 'thresholded']))

It looks okay, but there are some 'bright' bacteria lost in the processing, but we do not need all of them to show up properly. There are also some artifacts around the edge and some small dots which we can use clear border and remove small object function to clean up.

In [None]:
# Clear the border
im_bw = skimage.segmentation.clear_border(im_bw)
# Remove all the small objects
im_bw = skimage.morphology.remove_small_objects(im_bw, min_size=400)

# Take a look
bokeh.io.show(show_two_ims(im,
                           im_bw,
                           titles=['original', 'thresholded']))

It is actually not bad, though we lose a significant number of bacteria. As expected, those regions that have clusted bacteria, or some artifacts cannot show up well in the processed image. However, it is kind of tricky to pick the threshold by just looking at the histogram, so we try to use the function Justin gives in the notes, which would compare each pixel values with its local mean.

In [None]:
def bebi103_thresh(im, selem, white_true=True, k_range=(0.5, 1.5), min_size=100):
    """
    Threshold image as described above.  Morphological mean filter is 
    applied using selem.
    """    
    # Determine comparison operator
    if white_true:
        compare = np.greater
        sign = -1
    else:
        compare = np.less
        sign = 1
    
    # Do the mean filter
    im_mean = skimage.filters.rank.mean(im, selem)

    # Compute number of pixels in binary image as a function of k
    k = np.linspace(k_range[0], k_range[1], 100)
    n_pix = np.empty_like(k)
    for i in range(len(k)):
        n_pix[i] = compare(im, k[i] * im_mean).sum() 

    # Compute rough second derivative
    dn_pix_dk2 = np.diff(np.diff(n_pix))

    # Find index of maximal second derivative
    max_ind = np.argmax(sign * dn_pix_dk2)

    # Use this index to set k
    k_opt = k[max_ind - sign * 2]

    # Threshold with this k
    im_bw = compare(im, k_opt * im_mean)

    # Remove all the small objects
    im_bw = skimage.morphology.remove_small_objects(im_bw, min_size=min_size)

    return im_bw, k_opt

# Make the structuring element 50 pixel radius disk
selem = skimage.morphology.disk(50)

# Threshhold based on mean filter
im_bw_1, k = bebi103_thresh(im, selem, white_true=False, k_range=(0.5, 0.8), min_size=400)

# Clear the border
im_bw_1 = skimage.segmentation.clear_border(im_bw_1)

bokeh.io.show(show_two_ims(im_float,
                           im_bw_1,
                           titles=['original', 'Auto_thres']))

More bacteria are gone but so are the artifacts, and those meet all of our conditions stay intact. We do play with two parameters a little bit: the size of structuring element and the range of the threshold ratio $k$. The biggest problem we encounter is the tails of bacteria showing up sometimes, so we try to decrease the size of structuring element, which does help but would also lead to some missing pixels inside bacteria. So we go with the 50 pixel radius disk as our structuring element, and tune down the ratio so that the pixels only with high enough contrast, i.e. here those whose intensity is significantly lower than the local mean, to eliminate the tail in segmented images. Due to the tail problem, we also try a similar way to automatically find a good threshold, but instead of comparing the pixel intensity with its local mean, we may compare it with some fixed values.

In [None]:
def bebi103_thresh_new(im, white_true=True, k_range=[], 
                   min_size=100):
    """
    Absolute threshold finder
    """    
    # Determine comparison operator
    if white_true:
        compare = np.greater
        sign = -1
    else:
        compare = np.less
        sign = 1
    
    if len(k_range) == 2:
        k = np.linspace(k_range[0], k_range[1], 1000)
    else:
        k = np.linspace(im.min(),im.max(),1000)
        
    n_pix = np.empty_like(k)
    for i in range(len(k)):
        n_pix[i] = compare(im, k[i]).sum()
        
    # Compute rough second derivative
    dn_pix_dk2 = np.diff(np.diff(n_pix))
        
    # Find index of maximal second derivative
    max_ind = np.argmax(sign * dn_pix_dk2)
        
    # Use this index to set k
    k_opt = k[max_ind - sign * 2]
        
    # Threshold with this k
    im_bw = compare(im, k_opt)
    
    # Clear the border
    im_bw = skimage.segmentation.clear_border(im_bw)

    # Remove all the small objects
    im_bw = skimage.morphology.remove_small_objects(im_bw, min_size=min_size)

    return im_bw, k_opt


# Threshhold based on mean filter
im_bw_2, k = bebi103_thresh_new(im, white_true=False, k_range=(3600,4000),min_size=400)

bokeh.io.show(show_two_ims(im_float,
                           im_bw_2,
                           titles=['original', 'New_auto_thres']))

This function still requires us to speicify a certain range for the optimal thresholding, and works quite similarily with the manual thresholding. But anyway, we think it is good enough for us to select bacteria for further analysis. We did try other advanced edge finder functions, but they all did not work well probably due to the tail problem. We suspect the reason is that no matter how we tune the parameters, those methods are mostly based on gradient, while the beginning parts of the tails usually have a similar gradient with bacterial bodies. Hence, we decide to go with the basic threshold methods and choose best bacteria based on those images.

Now, we think it is better to choose bacteria that are undergoing dividing, because this is what we really care about and choosing those are not dividing would give risk that something weird might happen when they actually divide. Combined with aforementioned conditions, we choose several 'ROIs' to select. For convenience, we write a function to generate our zoom slice from start points and sizs, since the image is actually displayed upside down due to the coordinate problem. 

In [None]:
def zoom_finder(im, x_start, y_start, size=120):
    """
    Generate zoom slice from start point and size
    """ 
    _, y = im.shape
    return np.s_[(y - size - y_start):(y - y_start), x_start:(x_start + size)]

And then we locate 3 regions that we think contain the most promising bacteria, the first two of which contains one bacterium of interest, while the last one contains multiple. We would like to further select from the last one after some test segmentation.

In [None]:
# Zoom start points and sizes
zoom_start = [[590, 260, 120], [715, 375, 120],[125, 250, 400]]

#Generate zoom slices
zooms = []
for i in range(len(zoom_start)):
    zooms.append(zoom_finder(im, zoom_start[i][0], zoom_start[i][1], zoom_start[i][2]))

#Plot out 3 ROIs in actual scale
names = '123'
plots = [bebi103.viz.imshow(im[zoom],
                            plot_height=300,
                            title=name,
                            interpixel_distance=ip,
                            length_units='µm')
             for zoom, name in zip(zooms, names)]

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=3))

Let's try to see the segmented versions of those, we write a modfieid version of show_two_ims, and 

In [None]:
def show_4_ims(im_1, im_2, im_3, im_4, titles=[None, None, None, None], interpixel_distances=[ip, ip, ip, ip],
                 color_mapper=None):
    """Convenient function for showing four images side by side."""
    p_1 = bebi103.viz.imshow(im_1,
                             plot_height=250,
                             title=titles[0],
                             color_mapper=color_mapper,
                             interpixel_distance=interpixel_distances[0],
                             length_units='µm')
    p_2 = bebi103.viz.imshow(im_2,
                             plot_height=250,
                             title=titles[1],
                             color_mapper=color_mapper,
                             interpixel_distance=interpixel_distances[1],
                             length_units='µm')
    p_3 = bebi103.viz.imshow(im_3,
                             plot_height=250,
                             title=titles[2],
                             color_mapper=color_mapper,
                             interpixel_distance=interpixel_distances[2],
                             length_units='µm')
    p_4 = bebi103.viz.imshow(im_4,
                             plot_height=250,
                             title=titles[3],
                             color_mapper=color_mapper,
                             interpixel_distance=interpixel_distances[3],
                             length_units='µm')
    p_2.x_range = p_1.x_range
    p_2.y_range = p_1.y_range

    p_3.x_range = p_1.x_range
    p_3.y_range = p_1.y_range
    
    p_4.x_range = p_1.x_range
    p_4.y_range = p_1.y_range
    
    #return bokeh.layouts.gridplot([p_1, p_2, p_3, p_4], ncols=4)
    return [p_1, p_2, p_3, p_4]

In [None]:
title = ["original", "Manual_thres","Auto_thres","New_auto_thres"]

p = [show_4_ims(im_float[zoom],
                           im_bw[zoom],
                           im_bw_1[zoom],
                           im_bw_2[zoom],
                           titles=[t + "_" + name for t in title])
     for zoom, name in zip(zooms, names)]

bokeh.io.show(bokeh.layouts.gridplot(p, ncols=1))

They are quite similar, and the thresholding function in the notes actually works slightly better for all of those ROIs, we may see what if we do some opening to separate those dividing cells.

In [None]:
selem = skimage.morphology.disk(5)

im_open = skimage.morphology.opening(im_bw, selem)
im_open_1 = skimage.morphology.opening(im_bw_1, selem)
im_open_2 = skimage.morphology.opening(im_bw_2, selem)

title = ["original", "Manual_thres_open","Auto_thres_open","New_auto_thres_open"]

p = [show_4_ims(im_float[zoom],
                           im_open[zoom],
                           im_open_1[zoom],
                           im_open_2[zoom],
                           titles=[t + "_" + name for t in title])
     for zoom, name in zip(zooms, names)]

bokeh.io.show(bokeh.layouts.gridplot(p, ncols=1))

In this case, the auto_thres case separates the bacteria too much which might cause a false positive division. We may use the modified thresholding function to furthor explore among the rest of frames. Now, we convert all the frames into binary images via our modified thresholding function.

In [None]:
def collection_thres(ic, 
                     white_true=True, 
                     k_range=[], 
                     min_size=100):
    #Initialize a empty list
    ic_seg = []

    # Segment each frame
    for img in ic:
        temp,_ = bebi103_thresh_new(img, white_true=white_true, k_range=k_range,min_size=min_size)
        ic_seg.append(temp)
    
    return ic_seg

ic_seg = collection_thres(ic,
                         white_true=False,
                         k_range=(3600, 4000),
                         min_size=400)

# Take a look    
ic_seg[0]

It looks like a correctly-segmented image, so we proceed and have a function to convert binary images to single-channel color images.

In [None]:
def bin2gray(im):
    x, y = im.shape
    temp = np.zeros([x, y, 1])
    for (i, j), val in np.ndenumerate(im):
        if val:
            temp[i][j] = 1
    return temp

# Test a little bit
bokeh.io.show(show_two_ims(ic_seg[4],
                           bin2gray(ic_seg[4]),
                           titles=['Binary', 'Single-channel']))

They are the same, so it should be good to integrate into our convertion function for the image collection, and reconvert the image collection.

In [None]:
def collection_thres_gray(ic,selem,
                     white_true=True, 
                     k_range=[], 
                     min_size=100):
    #Initialize a empty list
    ic_seg = []

    # Segment each frame
    for img in ic:
        temp,_ = bebi103_thresh_new(img, white_true=white_true, k_range=k_range,min_size=min_size)
        # Open to separate dividing bacteria
        temp = skimage.morphology.opening(temp, selem)
        ic_seg.append(bin2gray(temp))
    
    return ic_seg

selem = skimage.morphology.disk(5)

ic_seg = collection_thres_gray(ic,selem,
                         white_true=False,
                         k_range=(3600, 4000),
                         min_size=400)

# Take a look    
ic_seg[0].shape

It should be good, so we can proceed and add all the converted image together, supposing the scale and layout don't change over frames, which should be true. In this case, we can have sort of a distribution of how many frames share pixels after thresholding. We have some trouble tuning the color mapper and colorbar, but as the defalut color mapper goes, yellow indicates high pixel density and blue indicates low. We can then take a look at those zoom slices we choose.

In [None]:
ic_sum = np.array(ic_seg).sum(axis=0)


#Plot out 3 ROIs in actual scale
names = '123'
plots = [bebi103.viz.imshow(ic_sum[zoom],
                            plot_height=300,
                            title=name,
                            interpixel_distance=ip,
                            length_units='µm')
             for zoom, name in zip(zooms, names)]

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=3))

The first two looks fine, and from the third one, we can still pick out a couple of bacteria that fulfills our conditions, while in the third one, those are mostly close to one or multiple bacteria, which may or may not lead to error when we do long time course analysis, because they could probably be eliminated by making them close to the edge and then clearing border. However, being too close to other bacteria might also lead to problems like failure to flowing away the divden cells. Thus, we choose the best one from the thrid zoom slice and remake the slices for final selection.

In [None]:
zoom_start_final = [[590, 260, 120], [715, 375, 120],[200, 500, 100]]

#Generate zoom slices
zooms = []
for i in range(len(zoom_start)):
    zooms.append(zoom_finder(im, zoom_start_final[i][0], zoom_start_final[i][1], zoom_start_final[i][2]))
                    
plots = [bebi103.viz.imshow(ic_sum[zoom],
                            plot_height=300,
                            title=name,
                            interpixel_distance=ip,
                            length_units='µm')
             for zoom, name in zip(zooms, names)]

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=3))                    

In all, our criteria for choosing bacteria for long time course imaging could be concluded as:

1. Well separated with other bacteria;
2. Away from any types of artifacts;
3. Uniform lighting through single bactrium;
4. The fixed bacteria do not move among frames.

The first one is to avoid mislabelling and other problems that could happen when multiple cells are imaged; the second and third one is to make sure our segmentation is significantly affected through the imaging process. We use a modified thresholding function and then select bacteria that fulfill those first three conditions, and use the overlayed segmented image of all the frames to find bacteria that do not move through all frames, since we do not want part of the imaged cell move out of our ROI sometime if it is not well anchored.