In [1]:
# Basic imports
import numpy as np
import pandas as pd

import cellseg

# High-throughput in vitro image segmentation workflow

The purpose of this code is to analyze a group of in vitro images for their overall transduction frequency and their brightness, to compare different virus based on their affinity to a receptor.

This specific notebook takes in a look up table containing experimental data, obtains images from a data directory using that information, and runs an image analysis workflow over those images to determine important characteristics about it. To test the same workflow on single images, please use the `analysis_pipeline.ipynb` notebook.

---

For the pipeline, files are organized in a folders by the Round (which are generally a group of experiments), Plate (which would contain at most 96 conditions), and Well (which would be a specific condition). To analyze this data, we first want to use a look up table to obtain the `Round`, `Plate`, and `Well` data. 

In [2]:
# We can start by indicating the filepath for the data look up table:
lut_filepath = 'data/quantification/20220927_data_lookup_table.csv'

# Read the CSV file into a pandas dataframe
df_lut = pd.read_csv(lut_filepath, comment='#')

If you are running your own images, or want to run a subset of the images, manipulation of the following code blocks facilitates this sort of analysis. Otherwise, all the images will be analyzed each time. To run a specific round/plate/well, we can set an index with that information. For example, to run a single well (i.e. Round 18, Plate 7, Well 59), the index would be written as follows (commented out to only apply as desired):

In [3]:
# # Set an index of a subset of the data.
# inds = (df_lut['Round'] == 18) & (df_lut['Plate'] == 7) & (df_lut['Well'] == 59))

# # Create a dataframe with those indices:
# df_lut = df_lut.loc[inds]

If we wanted to instead run all the cells up to a certain point, we could define the index as follows: 

In [4]:
# # To run all cells up to a point in the table, use the following: 
# df_lut = df_lut[:8023]

Not all of the wells are filled, so we want to make sure the `Include` condition is set for these data. If an imaging error occured, you can exclude those wells in the look up table.

In [5]:
# Only consider the wells marked to include
inds = (df_lut['Include'] == True) 
df_lut = df_lut.loc[inds]

#Display the table
df_lut

Unnamed: 0,Round,Plate,Well,Virus,Receptor,Include,Dose,DMSO (%),Brinzolamide (nM)
0,1,1,1,PHP.eB,Ly6a,True,1.000000e+09,0.0,0.0
1,1,1,2,PHP.eB,Ly6a,True,1.000000e+09,0.0,0.0
2,1,1,3,PHP.eB,Ly6a,True,1.000000e+09,0.0,0.0
3,1,1,4,PHP.eB,Ly6a,True,5.000000e+08,0.0,0.0
4,1,1,5,PHP.eB,Ly6a,True,5.000000e+08,0.0,0.0
...,...,...,...,...,...,...,...,...,...
12491,18,1,65,9P36,,True,1.000000e+09,0.0,0.0
12492,18,1,66,9P36,,True,1.000000e+09,0.0,0.0
12493,18,1,67,9P36,,True,5.000000e+08,0.0,0.0
12494,18,1,68,9P36,,True,5.000000e+08,0.0,0.0


Next, we want to run the image analysis workflow over our images. To do this, we need to specify a number of things. First, we need to supply the look up table `df_lut`. Then, we want to supply the data directory of the images we want to analyze. Next, we want to indicate the name of the output file, as a data path. Finally, we want to indicate whether there is an input file or not. 

Typically, you will have a previously run data file that you will want to update. To update this, first ensure that the supplied look up table doesn't overlap with the one you have set. Second, set `input_file` to the path of the data directory you want to update.

The cell below will then run the lookup table over the images in the data directory you set. It will save the data as a .csv file in the location you specify. A progress bar will indicate the progress (it will take a few seconds per well). 

In [None]:
cellseg.quant.workflow(df_lut, 'data/images', 'data/quantification/20230105-data.csv', input_file = None)

The output (besides the saved data csv), is a dataframe that you can further work with if necessary. 