# Quantifying cell counts from Cytation
Applying `py-seg` to Cytation 5 data generated in HTS (VAPR) core by Clayton Wandishin.  Single 384-well plate imaged multiple times, 2 channels (red nuclei and (Sytox) green for dead cells). Need plate map of cell line(s), drugs and drug concentrations from Clayton.

Steps needed to perform processing and assemble data:

* Identify all image files (saved on vu1file quaranta2 share)
* Parse file names to determine time point, channel, well, and position
* Assemble task arguments for `py-seg` processing
* Send jobs to RabbitMq/Celery for processing
* Collect cell counts per time point (similar to `plate.id` from ImageXpress HTS core output)

In [1]:
import os
import re
import pandas as pd
import numpy as np
from datetime import datetime

In [2]:
TOPDIR = '/mnt/darren/quaranta2/Cytation/2020-10-08'

def parseFileName(filename):
    filename = os.path.basename(filename)
    x = filename.split(".")[0]
    well = x.split("_")[0]
    ch = x.split("_")[4]
    time_i = x.split("_")[5]
    out = [well,ch,time_i]
    return(out)

# [x+1 if x >= 45 else x+5 for x in l]

def fixWellName(well_name):
    # function to fix well names by ensuring 3-digit length 
    # (i.e., include preceding 0 in single-digit column numbers) 

    if isinstance(well_name, list):
        return([f'{wn[0]}0{wn[1]}' if len(wn) < 3 else wn for wn in well_name])
    elif(isinstance(well_name, str)):
        if len(well_name) < 3:
            return(f'{well_name[0]}0{well_name[1]}')
    else:
        well_name

def getDateTime(filepath):
    pat = "\d{6}_\d{6}"

    d = [re.search(pat, x) for x in filepath]
    d = [x[0] for x in d]
    d = [datetime.strptime(x, '%y%m%d_%H%M%S%f') for x in d]
    o = [x.strftime("%Y-%m-%d %H:%M:%S") for x in d]
    return(o)

def getTimeIdx(filepath):
    pat = "Experiment\d{1,2}"
    i = [re.search(pat,x) for x in filepath]
    i = [x[0] for x in i]
    i = [int(x.strip("Experiment")) for x in i]
    return(i)

#### Find all image files

In [3]:
os.chdir(TOPDIR)
fn = []
dn = []

for (dirpath, dirnames, filenames) in os.walk(TOPDIR):
    fn += [os.path.join(dirpath, f) for f in filenames]
    dn += [os.path.join(dirpath, d) for d in dirnames]

# remove .DS_Store (hiddent Spotlight) files, if present
fn = [f for f in fn if ".DS_Store" not in f]

print(f"{len(fn)} files were found.")
print(f"{len(dn)} directories were found")

if(os.path.isfile(fn[0])):
    print(f"The file {os.path.basename(fn[0])} has a complete path.")
else:
    print(f"The file {os.path.basename(fn[0])} does NOT have a complete path.")

20160 files were found.
42 directories were found
The file E7_02_1_1_RFP_001.tif has a complete path.


In [4]:
fn.sort()
fn[:6]
# fn[20150:]

['/mnt/darren/quaranta2/Cytation/2020-10-02/201002_175058_Experiment1/201002_175058_Plate 1/B10_02_1_1_RFP_001.tif',
 '/mnt/darren/quaranta2/Cytation/2020-10-02/201002_175058_Experiment1/201002_175058_Plate 1/B10_02_1_2_RFP_001.tif',
 '/mnt/darren/quaranta2/Cytation/2020-10-02/201002_175058_Experiment1/201002_175058_Plate 1/B10_02_2_1_GFP_001.tif',
 '/mnt/darren/quaranta2/Cytation/2020-10-02/201002_175058_Experiment1/201002_175058_Plate 1/B10_02_2_2_GFP_001.tif',
 '/mnt/darren/quaranta2/Cytation/2020-10-02/201002_175058_Experiment1/201002_175058_Plate 1/B11_02_1_1_RFP_001.tif',
 '/mnt/darren/quaranta2/Cytation/2020-10-02/201002_175058_Experiment1/201002_175058_Plate 1/B11_02_1_2_RFP_001.tif']

#### Filename structure
Example filename: `B10_04_1_1_RFP_001.tif`  

* `B10` = well  
* `04` = unknown  
* `1` = channel number (`1` or `2` in these data)  
* `1` = position number (`1` or `2` in these data)  
* `RFP` = channel name (`RFP` or `GFP` in these data)  
* `001` = time point index (only `001` in these data; actual time point index in enclosing directory (2 up) `Experiment[0-9]{1,2}`)  
* `tif` = image file format (only `tif` in these data)  




In [5]:
file_info = pd.DataFrame([parseFileName(x) for x in fn])
file_info.columns = ['well','ch','time_i']
file_info['file_name'] = fn

In [6]:
file_info.head()

Unnamed: 0,well,ch,time_i,file_name
0,B10,RFP,1,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...
1,B10,RFP,1,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...
2,B10,GFP,1,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...
3,B10,GFP,1,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...
4,B11,RFP,1,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...


In [7]:
red = file_info.loc[file_info['ch']=='RFP','file_name']
red = red.reset_index(drop=True)
green = file_info.loc[file_info['ch']=='GFP','file_name']
green = green.reset_index(drop=True)

In [8]:
wells = file_info.loc[file_info['ch']=='RFP','well']
wells = wells.reset_index(drop=True)
wells = fixWellName(wells.tolist())
wells = pd.Series(wells)

In [9]:
temp = pd.DataFrame({'image_time': getDateTime(file_info.loc[file_info['ch']=='RFP','file_name']),
                     'time_i': getTimeIdx(file_info.loc[file_info['ch']=='RFP','file_name'])})

In [10]:
taskargs = pd.DataFrame({
                        'ch2_im_path': green,
                        'nuc_im_path': red,
                        'overwrite': 'TRUE',
                        'plate_id': temp['time_i'],
                        'regprops': 'FALSE',
                        'save_path': os.path.join(TOPDIR,'Segmentation'),
                        'well': wells
})

In [11]:
taskargs.head()

Unnamed: 0,ch2_im_path,nuc_im_path,overwrite,plate_id,regprops,save_path,well
0,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...,True,1,False,/mnt/darren/quaranta2/Cytation/2020-10-02/Segm...,B10
1,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...,True,1,False,/mnt/darren/quaranta2/Cytation/2020-10-02/Segm...,B10
2,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...,True,1,False,/mnt/darren/quaranta2/Cytation/2020-10-02/Segm...,B11
3,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...,True,1,False,/mnt/darren/quaranta2/Cytation/2020-10-02/Segm...,B11
4,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...,/mnt/darren/quaranta2/Cytation/2020-10-02/2010...,True,1,False,/mnt/darren/quaranta2/Cytation/2020-10-02/Segm...,B12


#### Save Task Arguments to file
(Will not overwrite if file exists; must delete previous to write new file.)

In [12]:
argfilepath = os.path.join(TOPDIR,'TaskArgs_20201002.csv')
if not os.path.isfile(argfilepath):
    taskargs.to_csv(argfilepath, index=False)

#### Examine some processing output

In [13]:
import sys
sys.path.append(r'/home/darren/git-repos/Segmentation-other/py-seg')

In [14]:
from MXtasksTempo import processIm
import cv2
import numpy as np
from pylab import imshow, gray

In [17]:
processIm(taskargs.loc[1].to_list())

Output worked ok, although most objects identified as Ch2-positive. Unclear whether this represents actual dead cells or is artifactual.

### Set up celery workers and send jobs to RabbitMQ
This is done via `ssh` to `tempo` in the `improc` Conda environment. Must also be in `~/git-repos/Segmentation-other/py-seg/`  

Must specify maximum concurrency when calling Celery worker.  

Then execute:  
`screen`
`celery -A MXtasksTempo worker --concurrency=120`  
<ctrl-A,D>  


`python sendMXtempoJobs.py /mnt/darren/quaranta2/Cytation/2020-10-02/TaskArgs_20201002.csv`  

