# Extracting nuclear morphology and genome organization features 

## Background and aim

The packing of the genome within the nucleus informs transcription and thereby reflects cell state. Therefore, changes in DNA packing information would be a powerful barometer of cell state transitions in important processes such as cancer progression, where there is an unmet need for biomakers. The gold standard for cell abnormalities in cancer is the nuclear grading done by a pathologist using a microscope. High resolution images of DNA as visualised using a fluorescent microscope is a convenient method to measure DNA organization and build a quantitative framework for tracing cell state transitions.

Our samples are obtained from tissue microarray (TMA) slides containing annotated clinical biopsies from cancer patients as well as healthy individuals stained for DNA using Hoechst. In this notebook, we demonstrate how we perform nuclear segmentation and feature extraction.

## Nuclear Segmentation

We segment nuclei using a custom stardist model that was trained using a combination of the 2018 DBS challenge and a few regions from our dataset that were manually annotated in house. 

In [4]:
# import libraries
%load_ext autoreload
import sys
sys.path.append("..")
import os

from pathlib import Path
from glob import glob
import pandas as pd
from tqdm import tqdm

from src.utlis.segmentation_stardist_model import segment_objects_stardist2d
from src.utlis.Run_nuclear_feature_extraction import run_nuclear_chromatin_feat_ext
from src.utlis.summarising_features import summarise_feature_table


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
#define paths
path_to_rawimages = '/home/pathy_s/Documents/Mechanogenomic_Score_Breast_Cancer/Raw_Images/all_cores/'
path_to_output_segmented_images = "/home/pathy_s/Documents/Mechanogenomic_Score_Breast_Cancer/Segmented_labels/"
path_to_output_ij_rois = "/home/pathy_s/Documents/Mechanogenomic_Score_Breast_Cancer/Segmented_ij_roi/"

path_to_model = os.path.join(os.path.dirname(os.getcwd()),'models/')
#create output directories if they do not exist
Path(path_to_output_segmented_images).mkdir(parents=True, exist_ok=True)
Path(path_to_output_ij_rois).mkdir(parents=True, exist_ok=True)

In [3]:
#Segmentation
segment_objects_stardist2d(image_dir = path_to_rawimages,
                           output_dir_labels = path_to_output_segmented_images,
                           output_dir_ijroi = path_to_output_ij_rois,
                           use_pretrained = False,
                           model_name ='tissue_nuclear_segmentation',
                           model_dir = path_to_model)

Loading network weights from 'weights_best.h5'.
Loading thresholds from 'thresholds.json'.
Using default values: prob_thresh=0.507725, nms_thresh=0.3.


100%|██████████| 650/650 [00:34<00:00, 18.84it/s]
100%|██████████| 650/650 [00:31<00:00, 20.89it/s]
100%|██████████| 650/650 [00:30<00:00, 21.00it/s]
100%|██████████| 650/650 [00:30<00:00, 21.51it/s]
100%|██████████| 650/650 [00:30<00:00, 21.36it/s]
100%|██████████| 650/650 [00:30<00:00, 21.29it/s]
100%|██████████| 650/650 [00:30<00:00, 21.12it/s]
100%|██████████| 650/650 [00:30<00:00, 21.50it/s]
100%|██████████| 650/650 [00:30<00:00, 21.27it/s]
100%|██████████| 729/729 [00:35<00:00, 20.58it/s]
100%|██████████| 650/650 [00:31<00:00, 20.70it/s]
100%|██████████| 650/650 [00:31<00:00, 20.96it/s]
100%|██████████| 650/650 [00:30<00:00, 21.00it/s]
100%|██████████| 650/650 [00:31<00:00, 20.55it/s]
100%|██████████| 650/650 [00:30<00:00, 21.00it/s]
100%|██████████| 650/650 [00:31<00:00, 20.61it/s]
100%|██████████| 650/650 [00:30<00:00, 21.01it/s]
100%|██████████| 650/650 [00:31<00:00, 20.44it/s]
100%|██████████| 650/650 [00:31<00:00, 20.85it/s]
100%|██████████| 650/650 [00:31<00:00, 20.90it/s]


## Compute descriptors of spatial chromatin organization 

Following segmentation, we measure features that characterizes the chromatin features using a combination of features from the scikit package as well as custom features such as local curvature and heterochromatin organization. 

In [5]:
## setup paths
path_to_rawimages='/home/pathy_s/Documents/Mechanogenomic_Score_Breast_Cancer/Raw_Images/all_cores/'
path_to_segmented_images="/home/pathy_s/Documents/Mechanogenomic_Score_Breast_Cancer/Segmented_labels/"

path_to_output_nuclear_features = "/home/pathy_s/Documents/Mechanogenomic_Score_Breast_Cancer/NMCO_features/"
path_to_output_tissue_features = "/home/pathy_s/Documents/Mechanogenomic_Score_Breast_Cancer/Tissue_Summary/"
#create output directories if they do not exist
Path(path_to_output_nuclear_features).mkdir(parents=True, exist_ok=True)
Path(path_to_output_tissue_features).mkdir(parents=True, exist_ok=True)


In [None]:
raw_image_dirs = sorted(glob(path_to_rawimages + "*.tif"))
seg_image_dirs = sorted(glob(path_to_segmented_images + "*.tif"))

### Extract single nuclear features for all images and summarise at tissue level
Tissue_summary = pd.DataFrame()

for i in tqdm(range(len(raw_image_dirs))):
    
    feature = run_nuclear_chromatin_feat_ext(raw_image_dirs[i],seg_image_dirs[i],path_to_output_nuclear_features)
    
    feature = feature.drop(['Image','label'], axis=1)
    feature = feature.replace('NA',0, regex=True)
    feature = feature.replace('NaN',0, regex=True)
    
    temp = summarise_feature_table(feature)
    temp['Image'] = raw_image_dirs[i].rsplit('/', 1)[-1][:-4]
    Tissue_summary=pd.concat([Tissue_summary, temp], ignore_index=True, axis=0)
    
    del feature, temp

Tissue_summary.to_csv(path_to_output_tissue_features+"NMCO_tissue_summary.csv")

  0%|          | 0/77 [00:00<?, ?it/s]

  0%|          | 0/3561 [00:00<?, ?it/s]

  1%|▏         | 1/77 [04:01<5:05:29, 241.17s/it]

  0%|          | 0/3807 [00:00<?, ?it/s]

  3%|▎         | 2/77 [08:19<5:13:46, 251.02s/it]

  0%|          | 0/3702 [00:00<?, ?it/s]

  4%|▍         | 3/77 [11:49<4:46:53, 232.61s/it]

  0%|          | 0/2904 [00:00<?, ?it/s]

  5%|▌         | 4/77 [14:34<4:10:14, 205.68s/it]

  0%|          | 0/2722 [00:00<?, ?it/s]

  6%|▋         | 5/77 [17:09<3:45:12, 187.67s/it]

  0%|          | 0/2189 [00:00<?, ?it/s]

  8%|▊         | 6/77 [19:19<3:18:40, 167.89s/it]

  0%|          | 0/5447 [00:00<?, ?it/s]

  9%|▉         | 7/77 [24:51<4:18:24, 221.50s/it]

  0%|          | 0/2768 [00:00<?, ?it/s]

 10%|█         | 8/77 [28:10<4:06:28, 214.33s/it]

  0%|          | 0/6642 [00:00<?, ?it/s]

 12%|█▏        | 9/77 [34:49<5:08:29, 272.21s/it]

  0%|          | 0/7066 [00:00<?, ?it/s]

 13%|█▎        | 10/77 [41:43<5:52:49, 315.96s/it]

  0%|          | 0/7258 [00:00<?, ?it/s]

 14%|█▍        | 11/77 [50:22<6:55:58, 378.16s/it]

  0%|          | 0/7641 [00:00<?, ?it/s]

 16%|█▌        | 12/77 [58:04<7:17:02, 403.43s/it]

  0%|          | 0/7100 [00:00<?, ?it/s]

 17%|█▋        | 13/77 [1:05:16<7:19:40, 412.20s/it]

  0%|          | 0/11682 [00:00<?, ?it/s]

 18%|█▊        | 14/77 [1:16:58<8:44:36, 499.62s/it]

  0%|          | 0/5690 [00:00<?, ?it/s]

 19%|█▉        | 15/77 [1:22:33<7:44:59, 449.99s/it]

  0%|          | 0/5604 [00:00<?, ?it/s]

 21%|██        | 16/77 [1:28:06<7:01:53, 414.98s/it]

  0%|          | 0/6241 [00:00<?, ?it/s]

 22%|██▏       | 17/77 [1:34:06<6:38:27, 398.45s/it]

  0%|          | 0/5366 [00:00<?, ?it/s]

 23%|██▎       | 18/77 [1:39:19<6:06:30, 372.73s/it]

  0%|          | 0/5797 [00:00<?, ?it/s]

 25%|██▍       | 19/77 [1:45:02<5:51:41, 363.82s/it]

  0%|          | 0/3780 [00:00<?, ?it/s]

 26%|██▌       | 20/77 [1:48:46<5:05:44, 321.83s/it]

  0%|          | 0/7163 [00:00<?, ?it/s]

 27%|██▋       | 21/77 [1:55:56<5:30:43, 354.35s/it]

  0%|          | 0/7295 [00:00<?, ?it/s]

 29%|██▊       | 22/77 [2:03:07<5:45:50, 377.29s/it]

  0%|          | 0/7505 [00:00<?, ?it/s]

 30%|██▉       | 23/77 [2:12:02<6:22:09, 424.61s/it]

  0%|          | 0/5928 [00:00<?, ?it/s]

 31%|███       | 24/77 [2:18:02<5:58:00, 405.28s/it]

  0%|          | 0/6374 [00:00<?, ?it/s]

 32%|███▏      | 25/77 [2:25:39<6:04:42, 420.82s/it]

  0%|          | 0/6890 [00:00<?, ?it/s]

 34%|███▍      | 26/77 [2:32:38<5:57:12, 420.25s/it]

  0%|          | 0/6451 [00:00<?, ?it/s]

 35%|███▌      | 27/77 [2:39:08<5:42:37, 411.16s/it]

  0%|          | 0/5031 [00:00<?, ?it/s]

 36%|███▋      | 28/77 [2:44:04<5:07:37, 376.67s/it]

  0%|          | 0/7784 [00:00<?, ?it/s]

 38%|███▊      | 29/77 [2:51:47<5:21:57, 402.45s/it]

  0%|          | 0/4982 [00:00<?, ?it/s]

 39%|███▉      | 30/77 [2:56:43<4:50:13, 370.49s/it]

  0%|          | 0/8452 [00:00<?, ?it/s]

 40%|████      | 31/77 [3:05:33<5:20:45, 418.38s/it]

  0%|          | 0/6719 [00:00<?, ?it/s]

 42%|████▏     | 32/77 [3:12:59<5:20:01, 426.70s/it]

  0%|          | 0/4701 [00:00<?, ?it/s]

 43%|████▎     | 33/77 [3:18:07<4:46:42, 390.96s/it]

  0%|          | 0/3915 [00:00<?, ?it/s]

 44%|████▍     | 34/77 [3:22:00<4:06:19, 343.71s/it]

  0%|          | 0/6218 [00:00<?, ?it/s]

 45%|████▌     | 35/77 [3:28:14<4:06:51, 352.66s/it]

  0%|          | 0/5613 [00:00<?, ?it/s]

 47%|████▋     | 36/77 [3:33:41<3:55:42, 344.94s/it]

  0%|          | 0/4525 [00:00<?, ?it/s]

 48%|████▊     | 37/77 [3:37:57<3:32:18, 318.46s/it]

  0%|          | 0/4880 [00:00<?, ?it/s]

 49%|████▉     | 38/77 [3:42:34<3:18:55, 306.05s/it]

  0%|          | 0/6537 [00:00<?, ?it/s]

 51%|█████     | 39/77 [3:48:51<3:27:11, 327.15s/it]

  0%|          | 0/4787 [00:00<?, ?it/s]

 52%|█████▏    | 40/77 [3:53:21<3:11:16, 310.17s/it]

  0%|          | 0/4926 [00:00<?, ?it/s]

 53%|█████▎    | 41/77 [3:58:04<3:01:12, 302.02s/it]

  0%|          | 0/4576 [00:00<?, ?it/s]

 55%|█████▍    | 42/77 [4:02:30<2:49:48, 291.10s/it]

  0%|          | 0/6354 [00:00<?, ?it/s]

 56%|█████▌    | 43/77 [4:08:38<2:57:59, 314.10s/it]

  0%|          | 0/6569 [00:00<?, ?it/s]

 57%|█████▋    | 44/77 [4:14:57<3:03:29, 333.61s/it]

  0%|          | 0/6126 [00:00<?, ?it/s]

 58%|█████▊    | 45/77 [4:20:51<3:01:09, 339.68s/it]

  0%|          | 0/4423 [00:00<?, ?it/s]

 60%|█████▉    | 46/77 [4:25:11<2:43:11, 315.85s/it]

  0%|          | 0/2152 [00:00<?, ?it/s]

 61%|██████    | 47/77 [4:27:15<2:09:13, 258.44s/it]

  0%|          | 0/3539 [00:00<?, ?it/s]

 62%|██████▏   | 48/77 [4:30:42<1:57:21, 242.80s/it]

  0%|          | 0/5907 [00:00<?, ?it/s]

 64%|██████▎   | 49/77 [4:36:29<2:07:56, 274.18s/it]

  0%|          | 0/4736 [00:00<?, ?it/s]

 65%|██████▍   | 50/77 [4:41:08<2:03:57, 275.45s/it]

  0%|          | 0/4242 [00:00<?, ?it/s]

 66%|██████▌   | 51/77 [4:45:13<1:55:30, 266.54s/it]

  0%|          | 0/7350 [00:00<?, ?it/s]