# Combining the ground truth channels into one.

Forom conversations online in the community surrounding segmentations that may exist that are allowed to overlap I found and consider the following links in the attempt to make work:

https://github.com/MIC-DKFZ/nnUNet/issues/653

https://github.com/MIC-DKFZ/nnUNet/issues/1823

https://github.com/MIC-DKFZ/nnUNet/issues/1952

## Rules

- Let the Anorectum be denoted as $A$
- Let the Bladder be denoted as $B$
- Let the Cervix be denoted with $C$
- Let the CTVn be denoted with $C_n$
- Let the CTVp be denoted with $C_p$
- Let the GTVp be denoted with $G_p$
- Let the GTVn be denoted with $G_n$
- Let the Pelvic Lymph Node be denoted as $L_p$
- Let the Common Iliac Lymph Node be denoted as $L_i$
- Let the Para-aortic Lymph Node be denoted as $L_{pa}$
- Let the Parametrium be denoted with $P$
- Let the Uterus be denoted with $U$
- Let the Vagina be denoted with $V$
- Let $O$ denote the set $O = \{B, A, C_n, C_p, P \}$ for a particular patient. If we want to talk about a specific patient, we should use the super-script notation to differentiate patients, e.g., $O^i = \{B^i, A^i, C_n^i, C_p^i, P^i\}$.
- Let the overlap of two structures be denoted by the set intersect symbol $\cap$.
- Let the joint area of two structures be denoted by the set union symbol $\cup$.

The rules using the following notation are:

1. There should be no overlap between the CTVn, CTVp or Anorectum.

    $\forall{i,j \in \{C_n, C_p, A\}}\text{ with } i \neq j, i \cap j = \emptyset$

2. The Parametrium may overlap with all of the other structures.

    $\forall i \in S, \quad P \cap S_i \neq \emptyset \quad \text{(Possibly)}$

3. The Bladder may overlap with the CTVn.

    $B \cap C_n \neq \emptyset \vee B \cap C_n = \emptyset$

4. The CTVp is defined as a compound structure containing:

    $C_p = \overbrace{C \cup G_p}^{\text{High Risk CTV}} \quad \cup \quad U \cup V$

5. The CTVn is defined as a compound structure containing:

    $C_n = G_n \cup L_i \cup L_p + L_{pa}$


We only have the contours for the following classes: $A, B, C_n, C_p, P, U, V$:

- $A \mapsto 1$
- $B \mapsto 2$
- $C_n \mapsto 3$
- $C_p \mapsto 4$
- $P \mapsto 5$
- $U \mapsto 6$
- $V \mapsto 7$

- From `1.` we can mark $C_n, C_p, A$ with labels $3,4,1$ respectively becuase they are not meant to overlap

- From `2.` the parametrium _may_ overlap with _all_ other structures. Therefore, we must have a binary segmentation which may describe this:

    - $P \cap A \mapsto 8$
    - $P \cap B \mapsto 9$
    - $P \cap C_n \mapsto 10$
    - $P \cap C_p \mapsto 11$
    - $P \cap U \mapsto 12$
    - $P \cap V \mapsto 13$

- From `3.` the bladder may overlap with the $C_n$, however, we also find that the bladder may overlap with the $C_p$ from anecdotal experience with the scans

    - $B \cap C_n \mapsto 14$

- From `2.` and `3.` there exists an overlap where we have $B \cap C_n$ and $P \cap C_n$ and $P \cap B$ therefore $B \cap C_n \cap P$ is possible.

- From `4.`  $C_p$ is a structure which is composed of multiple other structures. Including, one that is a High-risk-CTV. We don't have a segmentation for this. However, We can nest the structures using `nnUNets` regions_class_order such that we draw the CTVp, then the Uterus then the Vagina. 

    - $U \subseteq C_p$
    - $V \subseteq C_p$
    - however, not necessarily $U \subseteq V$ or $V \subseteq U$

- From `5.` $C_n$ is composed of substructures we don't have segmentations for. Therefore, it is its own segmentation id.

## Id mappings

 - $background \mapsto 0$
 - $anorectum \mapsto 1$
 - $bladder \mapsto 2$
 - $ctvn \mapsto 3$
 - $ctvp \mapsto 4$
 - $parametrium \mapsto 7$ (for later convenience)
 - $uterus \mapsto 6$
 - $vagina \mapsto 5$ (for later convenience)
 - $pararect \mapsto 8$
 - $parablad \mapsto 9$
 - $paractvn \mapsto 10$
 - $paractvp \mapsto 11$
 - $parauter \mapsto 12$
 - $paravagi \mapsto 13$
 - $bladctvn \mapsto 14$
 - $bladctvnpara \mapsto 15$
 - $ctvputerpara \mapsto 16$
 - $ctvpvagipara \mapsto 17$

# Setting up Dataset class for nnUNet

In [8]:
import os, sys
dir1 = os.path.abspath(os.path.join(os.path.abspath(''), '..', '..'))
if not dir1 in sys.path: sys.path.append(dir1)

from utils.environment import setup_data_vars
setup_data_vars()

In [9]:
destination = os.path.join(os.environ.get('nnUNet_raw'), os.environ.get('TotalBinary'))
assert os.path.exists(destination), f"Destination folder {destination} does not exist"

In [10]:
gt_path_for_anatomy = lambda x: os.path.join(os.environ.get('nnUNet_raw'), os.environ.get(x), os.environ.get('data_trainingLabels'))
gt_path_for_each_anatomy = dict([(os.environ.get(x), gt_path_for_anatomy(x)) for x in ['Anorectum','Bladder','CTVn','CTVp','Parametrium','Uterus','Vagina']])
assert all([os.path.exists(x) for x in gt_path_for_each_anatomy.values()])

In [11]:
from nnunetv2.dataset_conversion.generate_dataset_json import generate_dataset_json

from tqdm import tqdm
from multiprocessing import Pool
import multiprocessing
import shutil
import re
import numpy as np
from skimage import io

import SimpleITK as sitk

In [12]:
# gt_per_anatomy = {
#     os.environ.get('Parametrium') : np.array([[0,0,1],
#                                               [1,1,0],
#                                               [1,0,0]]),
#     os.environ.get('Bladder') : np.array([[1,0,1],
#                                           [0,1,0],
#                                           [0,0,1]]),
#     os.environ.get('CTVn') : np.array([[0,1,0],
#                                        [0,1,0],
#                                        [1,1,0]]),
# }

# for k, v in gt_per_anatomy.items():
#     dataset_id = int(re.findall(r'\d+', k)[0])
#     dataset_id = dataset_id = 5 if dataset_id == 7 else 7 if dataset_id == 5 else dataset_id
#     gt_per_anatomy[k] = v * dataset_id

# # stack all the ground truths
# gt = np.stack([v for _,v in gt_per_anatomy.items()]) # (7, D, H, W)

# # reduce the stack along axis 0 by defining a custom reducing function
# def reduce_fn(x):
#     values = np.unique(x) # returns a sorted array
#     """
#     - $background \mapsto 0$
#     - $anorectum \mapsto 1$
#     - $bladder \mapsto 2$
#     - $ctvn \mapsto 3$
#     - $ctvp \mapsto 4$
#     - $parametrium \mapsto 7$ (for later convenience)
#     - $uterus \mapsto 6$
#     - $vagina \mapsto 5$ (for later convenience)
#     - $pararect \mapsto 8$
#     - $parablad \mapsto 9$
#     - $paractvn \mapsto 10$
#     - $paractvp \mapsto 11$
#     - $parauter \mapsto 12$
#     - $paravagi \mapsto 13$
#     - $bladctvn \mapsto 14$
#     - $bladctvp \mapsto 15$
#     - $bladctvnpara \mapsto 16$
#     - $bladctvputer \mapsto 17$
#     - $ctvputerpara \mapsto 18$
#     - $ctvpvagipara \mapsto 19$
#     """
#     # handles cases: [0]
#     if np.array_equal(values, [0]):
#         return 0

#     if 0 in values:
#         # remove the background id
#         if len(values) == 1:
#             return 0  
#         values = values[1:]

#     # handles cases: [1], [2], [3], [4], [5], [6], [7]
#     if len(values) == 1:
#         # no contention
#         return values[0]
    
#     # handles case: [7, 1], [7, 2], [7, 3], [7, 4], [7, 5], [7, 6]
#     if len(values) == 2 and 7 in values:
#         # return the other number in the array
#         idx = np.where(values == 7)[0][0]
#         # idx = 0 -> 1, idx = 1 -> 0
#         return values[abs(idx-1)] + 7

#     if np.array_equal(values, [2,3]):
#         return 14
    
#     if np.array_equal(values, [2,4]):
#         return 15
    
#     if np.array_equal(values, [2,3,7]):
#         return 16
    
#     if np.array_equal(values, [2,4,6]):
#         return 17

#     if np.array_equal(values, [4,6,7]):
#         return 18
    
#     if np.array_equal(values, [4,5,7]):
#         return 19
    
    
#     raise NotImplementedError(f'Unhandled case: {values}')

# gt = np.apply_along_axis(reduce_fn, 0, gt) # (D, H, W)

# print(gt)

In [63]:
import numpy as np

gt = np.array(
    [
        [
            [1,1,1,1,1],
            [1,2,1,1,1],
            [1,2,1,1,1]
        ],
        [
            [1,1,1,1,1],
            [1,2,1,1,1],
            [1,2,1,1,1]
        ]
    ]
)

# Define the shape of the array
shape = gt.shape[1:]

# Generate all possible indices separately for each axis
indices = [np.arange(s) for s in shape]

# Create meshgrid of indices
meshgrid = np.meshgrid(*indices, indexing='ij')

# Reshape meshgrid to match the shape of the array
reshaped_indices = [idx.reshape(1, -1) for idx in meshgrid]

# Concatenate the reshaped indices along the first axis
reshaped_indices = np.concatenate(reshaped_indices, axis=0)

# Get unique pairs at each specified index
unique_pairs = np.unique(gt[:, tuple(reshaped_indices)], axis=1)

print(unique_pairs)


IndexError: index 3 is out of bounds for axis 1 with size 3

In [14]:
import os
import re
import numpy as np
import SimpleITK as sitk

combos = set()

def combine_gt(id: int):
    assert 0 <= id <= 100, 'assumed that there are only 100 ids'

    # read in each anatomy ground truth
    sample_name = f'zzAMLART_{id:03d}.nii.gz'

    gt_per_anatomy = dict([(k, sitk.GetArrayFromImage(sitk.ReadImage(os.path.join(v, sample_name)))) for k, v in
                           gt_path_for_each_anatomy.items()])
    assert all([x.shape == gt_per_anatomy[os.environ.get('Anorectum')].shape for _, x in gt_per_anatomy.items()]), \
        'ground truths contain at least one element that isn\'t the same size!'

    for k, v in gt_per_anatomy.items():
        dataset_id = int(re.findall(r'\d+', k)[0])
        dataset_id = 5 if dataset_id == 7 else 7 if dataset_id == 5 else dataset_id
        gt_per_anatomy[k] = v * dataset_id

    # stack all the ground truths
    gt = np.stack([v for _, v in gt_per_anatomy.items()])  # (7, D, H, W)

    # get all the unique combinations along axis 0
    print(np.unique(gt, axis=0))

combine_gt(1)

MemoryError: 

In [6]:
from gt_processing import combine_gt, combos

if __name__ == "__main__":
    with multiprocessing.get_context("spawn").Pool(8) as p:
        r = []
        for i in tqdm(range(1, 101)):
            r.append(p.starmap_async(
                combine_gt,
                [(i, gt_path_for_each_anatomy)]
            ))
        _ = [i.get() for i in r]

    print(combos)

100%|██████████| 100/100 [00:00<00:00, 38409.38it/s]




[0]
[0]
[0]
[0]
[0]
[0]
[0]
[0]
[0 1]


KeyboardInterrupt: 