# CellProfiling  in Python

A high-throughput screening analysis pipeline, similar to what you would do in CellProfiler, implemented in Python.

Aims: 

* create an image analysis pipeline for batch processing, somewhat similar to running Cellprofiler
* do some statistical analysis and create interactive visualizations using holoviews
* introduce pandas data frames

# High Throughput Screening Workflow

<img src="./Illustrations/HTMPipeline_alt.png" height=800>

# Sample images
Images are a subset of dataset BBBC022 from the [Broad Bioimage Benchmark Collection](https://data.broadinstitute.org/bbbc/)

The following description of the dataset is from their website (https://data.broadinstitute.org/bbbc/BBBC022/):

In [None]:
# Import modules
import pathlib
import re
import numpy as np
import mahotas # see other Image Analysis Packages !
import scipy.ndimage.morphology
import pandas as pd
from skimage.io import imread
# Alternative
#from tifffile import imread

## Find and load the images

same as last week, just removed some comments

## Don't forget to change the base folder  in the next cell!

In [None]:
# Set our base folder (adjust this to the path where your images are)
folder = pathlib.Path("/Users/volker/Downloads/BroadData/for_course/")

# Extracting Metadata from Path/File names with regular expressions

In many cases, you will find some Metadata that is related to your screen embedded in the file names of the images.
Therefore we need to extract this information from the images we analyze.

_Regular expressions_ are a flexible tool for analyzing/splitting strings that are built according to some regular pattern. If you google for "python regular expressions" you will find plenty of documentation and examples on how to use them. 

As regular expressions have a large number of building blocks that may be difficult to remember, it is nice to have a cheat sheet. There is a great online tool for creating and debugging regular expressions with a built-in cheat sheet, namely [regex101](http://regex101.com ). Make sure you select Python on the left.

<img src="./Illustrations/regex101.png" width=800>

In [None]:
# this regular expression should work, give it a try in regex101 
# you can also try and modify it so you extract well column and well row separately
regex = r"(?P<basepath>.*)[/\\].*images_(?P<Plate>.*)w\d[/\\](?P<Prefix>.*)_(?P<well>[A-Z]\d\d)_s(?P<subpos>\d)_w(?P<Channel>\d)(?P<ID>.*)\.tif$"


In [None]:
def get_metadata_as_series(filepath, regex, filename_key="filepath"):
    ''' 
    provided with a filepath (can be a string or a pathlib.PosixPath object),
    tries to match the path against the regular expression regex.
    The extracted keys, plus the filepath are returned as a pandas Series object
    '''
    filepath = str(filepath)
    m = re.match(regex, filepath)
    if m is not None:
        tmp = m.groupdict()
        tmp[filename_key] = filepath
        return pd.Series(tmp)
    else:
        print(f"Extracting metadata for {filepath} failed.")
        return None
    
#get_metadata_as_series(firstimage, regex)

# Find all files and extract metadata

Now let's create a data frame by analyzing all the filenames. 
There are several ways to do this: 
* You could use a for loop 
* You can use a `list comprehension`
* You can use `map`


In [None]:
# find files
files =  folder.rglob("*.tif") 
# for each file, extract the metadata ... using a list comprehension
metadata_series_list = [get_metadata_as_series(f, regex) for f in files]
metadata_series_list

Now combine all metadata series objects into a **pandas** `DataFrame`

In [None]:
# there are many ways to create a DataFrame. Here we pass a list of pd.Series objects
df_meta = pd.DataFrame(metadata_series_list)

In [None]:
df_meta.describe()

If you just want to get a quick feel for what kind of data is in a data frame, but you don't want to output a long frame use `.head()`

In [None]:
df_meta.head()

# Learning Pandas
There is not enough time to cover pandas in depth during this course. For an introduction from the ground up,
check out Jake VanDerPlas's [Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook).
[Direct link to the pandas chapter](https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.00-Introduction-to-Pandas.ipynb).

Alternatively there are also notebooks available for Wes McKinney's book [Python for Data Analysis](https://github.com/wesm/pydata-book). 
[Chapter 5 introduces Pandas](http://nbviewer.jupyter.org/github/pydata/pydata-book/blob/2nd-edition/ch05.ipynb)

In [None]:
# note: different way of referring to the column, an alternative to df_meta["subpos"]
df_meta["subpos"].unique()

**Refresher**

Exporting/Importing a data frame

* `.csv` files `df.to_csv`, `pd.read_csv`
* `.json` files `df.to_json`, `pd.read_json`
* `pickle`, `hdf5`, `sql` using similar function/method names `...`


You can also work with the system clipboard. 
You can try 
```
df_meta.to_clipboard()
```

and using paste in Excel. 
Or try copying something in Excel and running

```
pd.read_clipboard()
```

In [None]:
# We don't really need the columns ID, Prefix and basepath for our further analysis, so let's get rid of them
df_meta.drop(columns=['ID', 'Prefix', 'basepath'], inplace=True)

In [None]:
df_meta.head()

# DataFrame `groupby` method 

Our data frame has one row for each image file. However, some of these images clearly belong together, that is the images of the different fluorescence channels taken in the same _subposition_ of the same _well_ on the same _plate_.
Therefore we want to group these images together. This can be done naturally using the `groupby()` method of the pandas DataFrame. 

In [None]:
groupby = df_meta.groupby(["well", "subpos", "Plate"])
# one can get the group keys
keys = groupby.groups.keys()
# show only the first few
list(keys)[:10]

In [None]:
# access an individual group by key
groupby.get_group(list(keys)[0])

In [None]:
# pick a group as an example, here I randomly chose the group with index 8
example_group = groupby.get_group(list(keys)[8])
example_group

In [None]:
# create a file list from the group
filelist = list(example_group.sort_values("Channel", ascending=True)["filepath"])
# read all files using a list comprehension ... reminder, you could also use the `map` function
images = [imread(f) for f in filelist] 
images

In [None]:

type(images[0])

Let's convert the list of numpy arrays into an _n_-dimensional array (n=3)

In [None]:
all_in_one = np.array(images)
all_in_one.shape


## Interactive plotting and image viewing using holoviews/bokeh

**Bokeh** is a plotting library similar to `matplotlib`. However, instead of generating static plots it generates plots as HTML-files with Javascript that can be embedded in the notebook and allow user interaction.
**plot.ly **

**Seaborn**

**Holoviews** is a higher-level plotting library that can use **bokeh** and **matplotlib** as backened. 

In [None]:
import holoviews as hv
hv.extension('bokeh')  # without speciyfing the extension you won't see a plot

In [None]:
imview = hv.Image(all_in_one[3,:,:]).options(tools=['hover'], cmap="gray",width=700, height=500, colorbar=False)
imview

With holoviews you can create an image viewer with a channel slider using their  `DynamicMap`.

Here, we create a convenience function for it:

(Note that we define a function in a function ... you can do this in python !)

In [None]:
def viewer_with_channel(image_ch):
    def select_ch(c):
        tmp = image_ch[c,: , :]
        size = tmp.shape
        return  hv.Image(tmp).options(tools=['hover'], cmap="gray", width=size[1], height=size[0])
    
    return(hv.DynamicMap(select_ch, kdims=['c',]).redim.values(c=range(image_ch.shape[0])))

In [None]:
viewer_with_channel(all_in_one)

# Some more holoviews tricks: Add plots/images to layout  with +

In [None]:
# try it out

More information about holoviews layouting at http://holoviews.org/user_guide/Composing_Elements.html

If you are keen, you can also try to add additional sliders for adjusting the range, the colormap etc.
Here is a rough cut piece of code that demonstrates this functionality:
https://github.com/VolkerH/my_hv_gallery/blob/master/Images_with_interactors/Dynamic_map_interactor.ipynb

# Build a simple image analysis pipeline

* missing: Preprocessing (noise removal, illumination correction, background subraction)
* Segment nuclei using OTSU
* Split and label nuclei
* Expand to find cytoplasm
* missing: remove touching objects


In [None]:
import skimage.filters # this module provides the otsu algorithm

Segment Nuclei using thresholding.
Determine the threshold value using Otsu's method of maximizing the inter-class variance. 
(https://en.wikipedia.org/wiki/Otsu's_method)

In [None]:
#  put the nuclear channel in variable im
im = all_in_one[0, : , :]
threshval = skimage.filters.threshold_otsu(im)
threshval
hv.Image(im > threshval).options(cmap="gray",width=700, height=500, tools=['hover'])

# Object splitting and connected component labelling

The following function takes a binary image and tries to split adjacent nuclei using the distance transform and finding local maxima.

In [None]:
def split_and_label(thresholded_image, bc_size = (9,9)):
    
    '''split objects using distance transform and watershed
    this implementation uses functions from the mahotas package
    
    You could also try to implement this using scikit-image and scipy.ndimage functions
    such as scipy.ndimage.morphology.distance_transform_edt for the distance transform
    and peak_local_max to find the regional maxima of the seed points
    see for example here: scipy.ndimage.morphology.distance_transform_edt
    ''' 
    distances = mahotas.stretch(mahotas.distance(thresholded_image)) # you could try using 
    Bc = np.ones(bc_size) 
    maxima = mahotas.morph.regmax(distances, Bc=Bc) # you could try adapting this to s
    spots, n_spots = mahotas.label(maxima, Bc=Bc)
    surface = (distances.max() - distances)
    areas = mahotas.cwatershed(surface, spots)
    areas *= thresholded_image
    return(areas)

In [None]:
from scipy.ndimage.morphology import binary_fill_holes
from skimage.morphology import opening, opening, disk 

def find_nuclei(im, bc_size=(20,20), opening_disk_radius=4):
    '''Segment nuclei by using otsu thresholding. Fills holes, splits and labels.'''
    threshval = skimage.filters.threshold_otsu(im)
    tmp = binary_fill_holes(im > threshval)
    labelled = split_and_label(tmp, bc_size)
    labelled = opening(labelled,disk(opening_disk_radius)) # remove small isolated bits
    return(labelled)

labelled = find_nuclei(im)
hv.Image(labelled).options(cmap="flag", tools=['hover'], width=700, height=500)

# Expanding to the cytoplasm

Once you have the nuclei as seed points, you can use several methods to grow these seed regions to find the surrounding cytoplasm. There are a number of commonly used techniques, for example, CellProfiler provides the following techniques:

* watershed
* seeded region growing
* distance-N

Unless you have a marker that clearly delineates the cell boundary or marks the whole cytoplasm, you should use distance-N, otherwise you might bias your results (interactive whiteboard: explain why).

In [None]:
def distanceN(labels_in, distance):
    '''
    Distance-N implementation 
    Taken/adapted from the CellProfiler source code for their IdentifySecondaryObjects
    module.
    
    The basic idea is that you have some seed labels (in the context 
    of cell profiling these will typically be cell nuclei) that you want 
    to grow by n pixels to give a mask for a larger object (the cytoplasm).
    
    If you were only dealing with a single seed object, you could simply dilate with 
    a suitably sized structuring element. However, in general you have multiple seed 
    points and you don't want to merge those. Distance N will grow up to N pixels without
    merging objects that are closer together than 2N. 
    ''' 
    
    tmp = scipy.ndimage.morphology.distance_transform_edt(labels_in == 0, return_indices = True)
    distances, (i,j) = tmp
    labels_out = np.zeros(labels_in.shape, int)
    dilate_mask = distances <= distance
    labels_out[dilate_mask] = labels_in[i[dilate_mask],j[dilate_mask]]
    return labels_out    

In [None]:
cytolabel = distanceN(labelled, 30)
hv.Image(cytolabel).options(cmap="flag", tools=['hover'], width=700,height=500)

# Remove Cells touching the image boundary

These cells are not fully in the image, so if we calculate properties we may get misleading results.

`skimage` provides the `clear_border` method.

In [None]:
from skimage.segmentation import clear_border

cyto_no_border = clear_border(cytolabel)
hv.Image(cyto_no_border).options(cmap="flag", tools=['hover'], width=700,height=500)

## What about the nuclei ?

We should remove the nuclei corresponding to the cells we removed as well.

In [None]:
# find all the region labels reamaining after clear_border 
remaining_labels = np.unique(cyto_no_border) 
remaining_labels

In [None]:
# Check which of the original nuclei labels are in remaining_labels
# Note that this code is backward-compatible with older numpy versions... newer versions have np.isin 
remaining_mask = np.in1d(labelled, remaining_labels)
remaining_mask = remaining_mask.reshape(labelled.shape)
remaining_nuclei = labelled * remaining_mask
hv.Image(remaining_nuclei).options(cmap="flag", tools=['hover'], width=700,height=500)

In [None]:
# check cytoplasm
hv.Image(cyto_no_border-remaining_nuclei).options(cmap="flag", tools=['hover'], width=700,height=500)

# Practice session (about 10 minutes):

Combine the above code cells into a single function
`segment_image(input)` that 
takes an image of fluorescenlty labelled nuclei as input
and performs a segmentation of the nuclei and the cells. 

It should return a python dictionary of the form

```
{ "nuclei" : label_image_nuclei
  "cells" : cells
}
```




<p>
.
<p>
.
<p>
.
<p>
.
<p>
.
<p>
.
<p>



# Solution: 

You can find a possible solution in the next cell (that one contains additional error checking), but try not to skip ahead. 

In [None]:
def segment_image(im):
    '''
    Takes an image of fluorescenly labelled nuclei as input im
    and performs the following segmentation steps
    * Threshold using Otsu
    * Fill small holes
    * Split clumps using distance transform and watershed seeded from local maxima
    * Label image
    * Remove small objects with an opening 
    * Use distance - N to expand to cell region
    * Remove cell region label components that touch the image boundary
    * Create a new label image for the nuclei without the nuclei corresponding to cells that touched the boundary
    * returns the masks as a dictionary
    '''
    initial_nuclei = find_nuclei(im)
    initial_cells = distanceN(initial_nuclei, 30)
    cells = clear_border(initial_cells)
    remaining_labels = np.unique(cells)
    remaining_mask =  np.in1d(initial_nuclei, remaining_labels)
    remaining_mask = remaining_mask.reshape(initial_nuclei.shape)
    nuclei = initial_nuclei * remaining_mask
    
    # sanity check:
    # make sure we only retain labels that are in both masks
    labels_n = set(np.unique(nuclei))
    labels_c = set(np.unique(cells))
    
    difference = labels_n.symmetric_difference(labels_c)
    if bool(difference):
        print("Warning, sets differ by labels ", sorted(list(difference)))
        print("Nuclei", sorted(list(labels_n)))
        print("cells", sorted(list(labels_c)))
    
    return({"nuclei": nuclei, 
            "cells": cells,})

In [None]:
masks = segment_image(im)
hv.Image(masks["cells"])

# How would you improve the `segment_image` function ?


* suggestions

# Feature extraction for the cell regions in each channel

**TODO:**

* Read the documentation of [`skimage.measure`](http://scikit-image.org/docs/dev/api/skimage.measure.html), in particular `regionprops`.
* Apply `regionprops` using a label image and a greyvalue image
* Try and make sense of the output
* assemble into a data frame
* save a crop or thumbnail for each segmented cell

In [None]:
# Interactive analysis of skimage.regionprops output


#  DataFrame reformatting and combining with Broad Data 

## Rearrange datatable 

Such that filenames referring to different channels of the same field of view appear in a single row.
As a result, each row will refer to a single position. 

We are going to use `groupby` and `unstack`.


In [None]:
# remind ourselves of the structure of the data frame
df_meta.head()

*  Group all dimensions other than filepath using `groupby`.
* use appply to apply a helper function to each group
* the helper function simply returns the value of the column filepath using `item()` (note that you sometimes get back an object with an index)
* finally, `unstack()` takes the values and turns them into columns.

Needing to call a helper function to simply return a column value seems unnecessarily complicated, there must be an easier way (I haven't found it :-( ). 

In [None]:
def retfp(d):
    """
    helper function that returns the value of the filepath column for dataframe d.
    """
    return d["filepath"].item()
wide = df_meta.groupby(["Plate", "well", "subpos", "Channel"]).apply(retfp).unstack()
wide.head()

The column names are not that nice, so we create new ones and assign them to the data frame

In [None]:
wide.columns

In [None]:
new_column_names  = ["Channel_"+cn for cn in wide.columns]
new_column_names

In [None]:
wide.columns = new_column_names

In [None]:
wide.head()

We still need to remove the multi index:

In [None]:
wide = wide.reset_index(level=["Plate","well", "subpos"])

In [None]:
wide.head()

# Changing data types from object to integer
During last session we noticed that some columns were having the generic data type object, although they were essentially numeric. 
One way to fix this is with `.astype`

In [None]:
wide["subpos"].mean()

In [None]:
wide = wide.astype({"subpos":np.int})
wide.head()

In [None]:
# This didn't work last time
wide.subpos.mean()
# now it does (at least in my notebook)

# Merging our data frame with the compound information from the Broad Institute 

* The Broad Institute provides a `.csv` file data frame has information about the compounds, which we want for our downstream analysis and plotting.
* The csv-file can be downloaded from  https://data.broadinstitute.org/bbbc/BBBC022/BBBC022_v1_image.csv

* We want to keep all rows from the data frame we created based on the metadate extracted from the filenames, but add a few meaningful columns from the Broad data frame.
* We want to find a unique identifier for each field, based on which we can merg. We can do this on the filename for channel 1 for example (this should be unique).



# Load and explore the data frame provided by the Broad Institute



Read it with pandas `pd.read_csv`.

It appears the `.csv` file is corrupt, therefore we need to set `error_bad_lines` to `False` so we can skip over bad lines.
(there seems to be a problem with delimiters in the file, I haven't had time to look into the exact problem)



In [None]:
# You can download the file in the notebook (makes it more reproducible for others)

#from urllib.request import urlretrieve
#urlretrieve("https://data.broadinstitute.org/bbbc/BBBC022/BBBC022_v1_image.csv", "BBBC022_v1_image.csv")



In [None]:
broaddf = pd.read_csv("BBBC022_v1_image.csv", error_bad_lines=True, warn_bad_lines=True)
broaddf.head()

Look at the column names

In [None]:
broaddf.columns

Create a new data frame with just the columns that are of interest to us:



In [None]:
broad_interesting_cols_df = broaddf[["Image_FileName_OrigHoechst","Image_Metadata_CPD_MMOL_CONC", "Image_Metadata_ASSAY_WELL_ROLE", "Image_Metadata_SOURCE_COMPOUND_NAME", 'Image_Metadata_SOURCE_NAME']]
broad_interesting_cols_df.head()

The `wide` data frame we created from the list of files and their metadata has the full path name. 
If we want to merge the two data frames based on the file name for the Hoeschst image, we need to add a file name  column to  the `wide` data frame.

Extract the filename from a path (one approach would be to use `os.path.split(filename)`.
With `pathlib` we first convert the string into a `Path` object and then use the `.name` attribute. 

In [None]:
# just to seeh how this works
ptmp = pathlib.Path(wide.Channel_1[0])
ptmp.name

Using `apply` we can apply a function to extract the filename from the full path to a full column. 

We assign the result to a new column that has the same name as the corresponding column in the Broad data frame.

In [None]:
wide["Image_FileName_OrigHoechst"] = wide.Channel_1.apply(lambda x: pathlib.Path(x).name)
wide["Image_FileName_OrigHoechst"].head()

Merge the two data frames based on the `Image_FileName_OrigHoechst` column.

In [None]:
merged_df = pd.merge(wide, broad_interesting_cols_df, on="Image_FileName_OrigHoechst")
merged_df.head()

# Processing rows in the image data frame

For each row in the data table, we want to do the following

* read all channels
* segment to find nuclear and cytoplasm regions
* calculate region properties for each combination of region and channel
* create a new data frame where each row represents one cell and the columns represent existing metadata and the numerical features we extract

Read images for one row

In [None]:
def read_files(files):
    '''given a list of filename, reads them using the providef imread function and returns a numpy array'''
    images = map(skimage.io.imread, files)
    cyx_im = np.array(list(images)) # channel, y, x ordered numpy array
    return(cyx_im)
    
def read_all_channels(row):
    ''' Given a row of a data frame (as a Series object), 
    reads all images referred to by filenames in columns containing
    "Channel" '''
    files = filter(lambda x: 'Channel' in x, row.keys())
    return read_files(row[files])

In [None]:
viewer_with_channel(read_all_channels(merged_df.iloc[10]))

# Processing rows in the image data frame

For each row in the data table, we want to do the following

* read all channels
* segment to find nuclear and cytoplasm regions
* calculate region properties for each combination of region and channel
* create a new data frame where each row represents one cell and the columns represent existing metadata and the numerical features we extract



In [None]:
# Dictionary to translate integer channel index of numpy array into a meaningful_name
channel_dict = {
                0: "Hoechst",
                1: "conA",
                2: "Syto",
                3: "PhaWGA",
                4: "Mito",
}

In [None]:
 def calc_features(masks, im, channel_dict, verbose=False):
    '''
    calculates region properties for each mask (dictionary of label images) and channel in image im [ch,y,x].
    channeldict is a dictionary that will assign a name to each channel number.
    '''
    nc, ny, nx = im.shape
    props = {}
    
    for maskname, mask in masks.items():
        if verbose:
            print("Processing region mask ", maskname)
        ch_props = {}
        for ch in range(nc): 
            ch_name = channel_dict[ch]
            if verbose:
                print("    calulating region props for channel", ch_name)           
            ch_props[ch_name] = skimage.measure.regionprops(mask, im[ch,:,:])        
        props[maskname] = ch_props
    return(props)

In [None]:
ignore_props= ['convex_image', 'coords', 'filled_image', 'image']
def unravel_features(features, ignore = ignore_props):
    '''
    unravels features calculated with calc_features to produce a data_frame with rows representing cells
    and columns representing features.
    
    produces a dictionary with the following keys
    '''
    label_features = {}
    for mask in features.keys():
        for channel in features[mask].keys():
            for region in features[mask][channel]:
                prefix = f"{region.label}_{channel}_{mask}_"
                # print(prefix)
                for prop in region:
                    if not prop in ignore_props:
                        tmp = region[prop]
                        #print(prop, type(tmp))
                        
                        # Commenting code
                        #
                        # bad comment: "if it is an array make it flat" 
                        # this comment duplicates the code in words, avoid such comments
                        #
                        # better comment: "flatten so we can later iterate over all elements easily"
                        # this comment documents the intent behind the code
                        if type(tmp) is np.ndarray:
                            tmp = tmp.flatten()
                        
                        # I don't know of an easy way to test whether we can iterate over
                        # an unknown object. With the try / except we simply try to do it 
                        # and catch the error if isn't possible.
                        
                        try: 
                            for i, value in enumerate(tmp):
                                num = str(i).zfill(3) 
                                label_features[prefix+prop+"_"+num] = value
                                #print(f"{prop}_{num}: {value}")
                        except TypeError:
                            label_features[prefix+prop] = tmp
    return(label_features)
    
def generate_thumbnails(features, im, channel_dict, prefix, path):
    '''
    Given 
    features = 
    im = ndarray of shape [n_channels,ny,nx]
    prefix
    path 
    '''
    return

In [None]:
def df_from_feature_dict(fd, col_prefix="num_"):
    ''' 
    given a feature dict with keys of the form label_featurename
    creates a DataFrame with a row for each label and a column for each feature.
    The feature name columns will be prepended with the prefix col_prefix
    '''
    keys = fd.keys()
    # find all region labels (alternatively we could pass the known labels to the function)
    tmp = [int(k.split("_",1)[0]) for k in keys]
    labels = sorted(np.unique(tmp))
    rows = {}
    for l in labels:
        # only the feature keys for the region l
        featurekeys = list(filter(lambda x: x.startswith(str(l)+"_"),keys))
        values = [fd[f] for f in featurekeys]
        shortened_featurekeys = [col_prefix + f.split("_",1)[1] for f in featurekeys]
        row = pd.Series(data=values, index=shortened_featurekeys)
        rows[l] = row
    df = pd.DataFrame(rows)
    return(df)

def process_image_table_row(row, channel_dict):
    '''
    Processes a single row of an image table:
    
    1. Reads all the .tif files for the individual channels
    2. Segments nuclei, cells
    3. Calculates regionprops for each combination of channel and object mask (nuclei, cells)
    4. Unravels the regionprops and turns it into a tidy DataFrame with one row per cell
    5. (not yet implemented: generates thumbnails)
    '''
    im = read_all_channels(row)
    masks = segment_image(im[0]) # dapi channel is channel 0
    labels = np.unique(masks["nuclei"])
    features = calc_features(masks, im, channel_dict)
    feature_dict = unravel_features(features)
    df = df_from_feature_dict(feature_dict)
    df = df.T
    # create a unique index for each cell across the screen
    # build this as platename_well_subpos_label
    index_prefix = "_".join((row["Plate"],row["well"],row["subpos"]))
    df.index = [index_prefix + str(i) for i in df.index]
    
    # add columns from row so we have all the interesting metadata for each cell as well
    for name in row.index:
        if not name.startswith("Channel_"):
            df[name] = row[name]
    
    generate_thumbnails(features, im, channel_dict)
    return({"masks": masks, "features":features, "df" : df})
    
    
    

In [None]:
r = merged_df.iloc[:5]
r

In [None]:
all_props = process_image_table_row(merged_df.iloc[3], channel_dict)

In [None]:
import tqdm
def process_image_table(df, channel_dict):
    cell_dfs = []
    for index, row in tqdm.tqdm(df.iterrows(), total=len(df)):
        tmp = process_image_table_row(row, channel_dict)
        cell_dfs.append(tmp["df"].T.drop_duplicates().T)
    #df = pd.concat(cell_dfs)
    # many of the features describing the shape are identical 
    # for all channels. Therefore we end up with duplicate
    # columns which we can safely eliminate.
    # Note that drop_duplicates removes duplicate rows, therefore
    # we need to transpose twice
    return cell_dfs #.T.drop_duplicates().T

In [None]:
ptb = process_image_table(merged_df, channel_dict)

In [None]:
final = pd.concat(ptb)
final