# Color Analysis - Nucleus rollup
Rollup the nucleus statistics per patch. Class 3.

## Overall Plan
* Run CellProfiler on 80K patches. Make CSV files.
* Record bounding box of every nucleus of every patch.
* Run CNN on 80K patches. 
* For each class c, label correctly classified patches c_Cor.
* For each class c, label in correctly classified patches c_Inc.
* Run CNN attention on 80K patches. Make heatmaps.
* Compute average heatmap color per nucleus bounding box.
* Set aside test set: 20% of images (and all their patch data) per class.
* Possibly set aside patches with too little tissue, too many RBC, or too few nuclei.
* Remove useless columns such as XY locations.
* Add dispersion columns such as deciles.
* Train a Cor/Inc binary classifier for each class.
* Evaluate the model by cross-validation over training data.
* If the model is accurate, extract important features.

In [1]:
import datetime
print(datetime.datetime.now())
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import sklearn
print('scikit-learn version',sklearn.__version__)

2022-06-02 16:07:24.412406
scikit-learn version 1.0.2


In [2]:
THIS_CLASS=3   
NUM_CLASSES=6
FILEPATHS=['path']*NUM_CLASSES
FILEPATHS[1]='/home/jrm/Adjeroh/Naved/CP_80K/Output1/' #'/Users/jasonmiller/WVU/Output1/'
FILEPATHS[2]='/home/jrm/Adjeroh/Naved/CP_80K/Output2/' #'/Users/jasonmiller/WVU/Output2/'
FILEPATHS[3]='/home/jrm/Adjeroh/Naved/CP_80K/Output3/' #'/Users/jasonmiller/WVU/Output3/'
FILEPATHS[4]='/home/jrm/Adjeroh/Naved/CP_80K/Output4/' #'/Users/jasonmiller/WVU/Output4/'
FILEPATHS[5]='/home/jrm/Adjeroh/Naved/CP_80K/Output5/' #'/Users/jasonmiller/WVU/Output5/'

In [3]:
from CellProfiler_Util import CP_Util
cputil = CP_Util(FILEPATHS[THIS_CLASS])
cputil.train_test_split() 
cputil.validate_split()
train_set=cputil.get_train_patches()
nuc = cputil.get_nuclei()

In [4]:
print(datetime.datetime.now())
rollup = nuc.groupby(['PatchNumber']).describe() ## this is slow
print(datetime.datetime.now())
rollup.columns=rollup.columns.map('_'.join)  ## helps random forest code
print(datetime.datetime.now())
rollup

2022-06-02 16:07:36.546087
2022-06-02 16:49:31.449349
2022-06-02 16:49:31.450531


Unnamed: 0_level_0,ObjectNumber_count,ObjectNumber_mean,ObjectNumber_std,ObjectNumber_min,ObjectNumber_25%,ObjectNumber_50%,ObjectNumber_75%,ObjectNumber_max,AreaShape_Area_count,AreaShape_Area_mean,...,Texture_Variance_Hematoxylin_7_02_256_75%,Texture_Variance_Hematoxylin_7_02_256_max,Texture_Variance_Hematoxylin_7_03_256_count,Texture_Variance_Hematoxylin_7_03_256_mean,Texture_Variance_Hematoxylin_7_03_256_std,Texture_Variance_Hematoxylin_7_03_256_min,Texture_Variance_Hematoxylin_7_03_256_25%,Texture_Variance_Hematoxylin_7_03_256_50%,Texture_Variance_Hematoxylin_7_03_256_75%,Texture_Variance_Hematoxylin_7_03_256_max
PatchNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
402,29.0,15.0,8.514693,1.0,8.00,15.0,22.00,29.0,29.0,408.862069,...,1034.840164,2223.482133,29.0,832.172700,374.250719,133.966102,536.857960,846.262670,1029.301694,1786.946670
403,16.0,8.5,4.760952,1.0,4.75,8.5,12.25,16.0,16.0,471.312500,...,840.107822,1137.918893,16.0,592.498148,247.751722,186.036309,436.455293,586.107486,749.786480,1101.636327
404,29.0,15.0,8.514693,1.0,8.00,15.0,22.00,29.0,29.0,473.344828,...,734.794679,1117.163589,29.0,639.419600,217.790823,260.929209,476.479968,645.236379,781.532072,1096.836567
405,26.0,13.5,7.648529,1.0,7.25,13.5,19.75,26.0,26.0,511.384615,...,457.564510,829.702614,26.0,348.066347,217.628688,60.165134,200.977920,350.102507,462.710822,926.929579
406,21.0,11.0,6.204837,1.0,6.00,11.0,16.00,21.0,21.0,686.000000,...,1011.244657,2117.001654,21.0,878.551724,456.586825,419.280769,641.301914,792.249560,917.822548,2212.220309
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6360,7.0,4.0,2.160247,1.0,2.50,4.0,5.50,7.0,7.0,381.857143,...,1943.731248,2228.198139,7.0,1584.045169,729.537680,668.169660,1019.002820,1512.234615,2163.954133,2541.998005
6361,12.0,6.5,3.605551,1.0,3.75,6.5,9.25,12.0,12.0,372.250000,...,1784.521311,4060.307291,12.0,1680.224384,975.619303,675.781592,1220.790146,1437.883542,1734.427390,4410.270399
6362,6.0,3.5,1.870829,1.0,2.25,3.5,4.75,6.0,6.0,327.000000,...,2402.803686,2669.753875,6.0,1736.132655,755.871116,994.227234,1128.678603,1539.020975,2343.561553,2740.887188
6363,9.0,5.0,2.738613,1.0,3.00,5.0,7.00,9.0,9.0,440.333333,...,1444.226354,2576.707208,9.0,1291.330794,719.739195,320.444444,912.198371,1298.008076,1493.769989,2742.139349


In [5]:
# Alien
rollup.to_csv('Nucleus_Rollup_3.csv')