# Color Analysis - Nucleus rollup
Rollup the nucleus statistics per patch. Class 4.

## Overall Plan
* Run CellProfiler on 80K patches. Make CSV files.
* Record bounding box of every nucleus of every patch.
* Run CNN on 80K patches. 
* For each class c, label correctly classified patches c_Cor.
* For each class c, label in correctly classified patches c_Inc.
* Run CNN attention on 80K patches. Make heatmaps.
* Compute average heatmap color per nucleus bounding box.
* Set aside test set: 20% of images (and all their patch data) per class.
* Possibly set aside patches with too little tissue, too many RBC, or too few nuclei.
* Remove useless columns such as XY locations.
* Add dispersion columns such as deciles.
* Train a Cor/Inc binary classifier for each class.
* Evaluate the model by cross-validation over training data.
* If the model is accurate, extract important features.

In [1]:
import datetime
print(datetime.datetime.now())
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import sklearn
print('scikit-learn version',sklearn.__version__)

2022-06-02 13:49:14.202902
scikit-learn version 1.1.1


In [2]:
THIS_CLASS=4   # use a small class for process development
NUM_CLASSES=6
FILEPATHS=['path']*NUM_CLASSES
FILEPATHS[THIS_CLASS]='/Users/jasonmiller/WVU/Output4/'

In [3]:
from CellProfiler_Util import CP_Util
cputil = CP_Util(FILEPATHS[THIS_CLASS])
cputil.train_test_split() 
cputil.validate_split()
train_set=cputil.get_train_patches()
nuc = cputil.get_nuclei()

In [4]:
print(datetime.datetime.now())
rollup = nuc.groupby(['PatchNumber']).describe() ## this is slow
print(datetime.datetime.now())
rollup.columns=rollup.columns.map('_'.join)  ## helps random forest code
print(datetime.datetime.now())
rollup

2022-06-02 13:49:26.319563
2022-06-02 14:40:07.240298
2022-06-02 14:40:07.245715


Unnamed: 0_level_0,ObjectNumber_count,ObjectNumber_mean,ObjectNumber_std,ObjectNumber_min,ObjectNumber_25%,ObjectNumber_50%,ObjectNumber_75%,ObjectNumber_max,AreaShape_Area_count,AreaShape_Area_mean,...,Texture_Variance_Hematoxylin_7_02_256_75%,Texture_Variance_Hematoxylin_7_02_256_max,Texture_Variance_Hematoxylin_7_03_256_count,Texture_Variance_Hematoxylin_7_03_256_mean,Texture_Variance_Hematoxylin_7_03_256_std,Texture_Variance_Hematoxylin_7_03_256_min,Texture_Variance_Hematoxylin_7_03_256_25%,Texture_Variance_Hematoxylin_7_03_256_50%,Texture_Variance_Hematoxylin_7_03_256_75%,Texture_Variance_Hematoxylin_7_03_256_max
PatchNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
404,19.0,10.0,5.627314,1.0,5.50,10.0,14.50,19.0,19.0,424.684211,...,1144.132739,1478.077515,19.0,966.724256,362.321824,390.335351,742.188304,975.444261,1220.595523,1783.006359
405,17.0,9.0,5.049752,1.0,5.00,9.0,13.00,17.0,17.0,535.764706,...,760.967400,980.534043,17.0,604.258528,178.547488,205.978733,525.962433,605.744924,655.724148,973.894819
406,19.0,10.0,5.627314,1.0,5.50,10.0,14.50,19.0,19.0,481.157895,...,1109.622853,1710.046251,19.0,956.605344,369.223619,216.555801,757.328160,980.648035,1123.184088,1611.128641
407,12.0,6.5,3.605551,1.0,3.75,6.5,9.25,12.0,12.0,453.666667,...,1513.105372,1724.063900,12.0,1237.469664,474.639615,360.595372,1009.561389,1262.841822,1550.750741,1961.868103
408,3.0,2.0,1.000000,1.0,1.50,2.0,2.50,3.0,3.0,506.666667,...,893.088973,902.385591,3.0,796.103551,197.290069,608.127305,693.382321,778.637336,890.091675,1001.546013
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3193,10.0,5.5,3.027650,1.0,3.25,5.5,7.75,10.0,10.0,246.300000,...,867.389534,1408.820416,10.0,1013.784549,242.712387,612.377347,911.755203,997.626186,1036.249485,1485.589286
3194,4.0,2.5,1.290994,1.0,1.75,2.5,3.25,4.0,4.0,253.000000,...,1387.826614,1533.312697,4.0,776.585274,483.701500,225.000000,446.297845,824.530748,1154.818177,1232.279600
3195,8.0,4.5,2.449490,1.0,2.75,4.5,6.25,8.0,8.0,238.000000,...,1570.067549,1758.346939,8.0,1131.140847,280.249141,738.115136,935.266859,1179.695510,1329.423353,1472.774203
3196,9.0,5.0,2.738613,1.0,3.00,5.0,7.00,9.0,9.0,307.000000,...,1667.108287,4118.329557,9.0,1928.474576,1000.825058,921.543388,1310.263711,1757.844843,2022.937578,4133.010975


In [6]:
# Mac Air
rollup.to_csv('Nucleus_Rollup_4.csv')
# 2385 rows including header