# Example Code for Cropping Tissue Bounding Box and Generating Binary Mask Segmentation for tubular lumen + tubular epithelium

This example demonstrates how to crop a tissue bounding box image from a Whole Slide Image (WSI) and generate a binary mask segmentation ground truth image from a JSON file. This is a common task in digital pathology analysis, especially prepare ground-truth for training a deep-learning model.

## Steps Involved:

1. **Load the Whole Slide Image (WSI):**
    - Use an appropriate library to read the WSI file.
    - Ensure the image is loaded correctly for further processing.

2. **Parse the JSON File:**
    - Load the JSON file containing the annotations.
    - Extract the coordinates of the bounding boxes and segmentation masks.

3. **Crop the Tissue Bounding Box:**
    - Use the coordinates from the JSON file to crop the tissue region from the WSI.
    - Save or display the cropped image for verification.

4. **Generate Binary Mask Segmentation:**
    - Create a blank mask image with the same dimensions as the WSI.
    - Draw the segmentation masks on the blank image using the coordinates from the JSON file.
    - Save or display the binary mask image for verification.



## Example Code:

The following code snippets illustrate each step in the process. Make sure to install the necessary libraries before running the code.



### 0. Import packages and define functions

In [42]:
import cv2
from shapely.strtree import STRtree
import matplotlib.pyplot as plt
import csv
import numpy as np
import geopandas
from shapely import affinity
import os
from shapely.ops import unary_union
from shapely.geometry import Polygon, MultiPolygon
import glob
import json
import openslide
from tqdm import tqdm

from skimage.segmentation import watershed
from skimage.feature import peak_local_max
from scipy import ndimage as ndi
from skimage import io, color, filters, morphology, segmentation,exposure
from skimage.color import label2rgb

In [28]:
def get_structure(load_dict, classname):
    """
    Extracts structures from the loaded JSON dictionary based on the given class names.

    Parameters:
    - load_dict (list): List of dictionaries containing the annotations.
    - classname (list): List of class names to filter the annotations.

    Returns:
    - structure (list): List of shapely Polygon objects representing the extracted structures.
    """
    structure = []
    for load_dict_i in tqdm(load_dict, desc=f"Extracting {classname}"):
        # Check if the annotation has a classification property
        if 'classification' in load_dict_i['properties']:
            prop_name = load_dict_i['properties']['classification']['name']
            # If the classification matches the provided class name
            if prop_name == classname[0]:
                contours = load_dict_i['geometry']['coordinates']
                # Handle MultiPolygon geometry type
                if load_dict_i['geometry']['type'] == 'MultiPolygon':
                    for contour in contours:
                        points = [pt for pt in contour[0]]
                        structure.append(Polygon(points))
                # Handle Polygon geometry type
                elif load_dict_i['geometry']['type'] == 'Polygon':
                    for contour in contours:
                        points = [pt for pt in contour]
                        structure.append(Polygon(points))
    return structure

### 1. Load the Whole Slide Image (WSI)

In [29]:
wsi_filename = '/Users/fanfan/Library/CloudStorage/OneDrive-Emory/AI_tubule/for_raymond/WSI/13_26609_022_023 L11 PAS.ndpi'
wsiid = os.path.basename(wsi_filename).split('.ndpi')[0]
print(wsi_filename)
slide = openslide.OpenSlide(wsi_filename)

/Users/fanfan/Library/CloudStorage/OneDrive-Emory/AI_tubule/for_raymond/WSI/13_26609_022_023 L11 PAS.ndpi


### 2. Parse the JSON File
There are two different annotation files involved in this process:

1. **Cortex Annotation:** This file contains the bounding box coordinates for the cortex region. We will use this to crop the tissue bounding box from the cortex area of the WSI.
2. **Tubule Segmentation Annotation:** This file includes segmentation results for various structures such as tubular epithelium, tubular lumen, tubular nuclei, and tubular basement membrane.

#### Objective:
- **2.0** Parse the json file
- **2.1** Crop the cortex bounding box from the WSI at 10X magnification (0.25 from the base magnification of 40X).
- **2.2** Assign the contours of tubular epithelium and tubular lumen to the cortex contour.
- **2.3** Draw these contours on the cortex tissue area to generate the ground truth for tubular epithelium and tubular lumen (TE+TL).

#### 2.0 Parse the json files and read annotation, store annotations as a list of shapely Polygon

In [30]:
cortex_anno_filename = glob.glob('/Users/fanfan/Library/CloudStorage/OneDrive-Emory/AI_tubule/for_raymond/annotation/13_26609_022_023 L11 PAS.ndpi/CORTEX_IFTA_PREIFTA/*.json')[0]
tubule_anno_filename = glob.glob('/Users/fanfan/Library/CloudStorage/OneDrive-Emory/AI_tubule/for_raymond/annotation/13_26609_022_023 L11 PAS.ndpi/TUBULE_Qupath/*.json')[0]

with open(cortex_anno_filename, 'r') as load_f:
       cortex = json.load(load_f)
with open(tubule_anno_filename, 'r') as load_f:
       tubu = json.load(load_f)

In [31]:
# Print a message indicating the start of cortex data extraction
print('Get cortex data...')

# Extract structures from the cortex annotation JSON data
# The function get_structure is used to filter and convert the JSON annotations into shapely Polygon objects
# 'Cortex_QCed' is the class name used to filter the annotations
cortexs = get_structure(cortex, classname=['Cortex_QCed'])

# Print a message indicating the start of TE+LUMEN annotation extraction
print('Get TE+LUMEN annotation')

# Extract structures from the tubule annotation JSON data
# The function get_structure is used to filter and convert the JSON annotations into shapely Polygon objects
# 'te+lumen' is the class name used to filter the annotations
te_lumens = get_structure(tubu, classname=['te+lumen'])

Get cortex data...


Extracting ['Cortex_QCed']: 100%|██████████| 2/2 [00:00<00:00, 3367.57it/s]


Get TE+LUMEN annotation


Extracting ['te+lumen']: 100%|██████████| 7899/7899 [00:00<00:00, 60566.32it/s]


#### **2.1** Crop the cortex bounding box from the WSI at 10X magnification (0.25 from the base magnification of 40X).
#### **2.2** Assign the contours of tubular epithelium and tubular lumen to the cortex contour.
#### **2.3** Draw these contours on the cortex tissue area to generate the ground truth for tubular epithelium and tubular lumen (TE+TL).

In [44]:
for cor in cortexs:
    minx, miny, maxx, maxy = cor.bounds
    width = maxx - minx
    height = maxy - miny
    print(f"Left top coordinates: ({minx}, {miny}), Width: {width}, Height: {height}")
    # Crop the bounding box from the WSI at 10X magnification (0.25 from the base magnification of 40X)
    roi = np.asarray(slide.read_region((int(minx), int(miny)), 0, (int(width), int(height))))[:, :, 0:3]
    roi = cv2.cvtColor(roi, cv2.COLOR_RGB2BGR) 
    roi = cv2.resize(roi, (0, 0), fx=.25, fy=.25, interpolation=cv2.INTER_LINEAR)
    # Please remember to store the x y coordinates of the cortex tissue bounding box
    cv2.imwrite(f'{wsiid}_{minx}_{miny}.png', roi)
    
     # Create an empty black image
    mask = np.zeros((roi.shape[0], roi.shape[1]), dtype=np.uint8)
            
    for te_lumen in te_lumens:
        if cor.intersects(te_lumen):
            
            # Get the boundary coordinates of the te_lumen polygon
            
            # Scale the coordinates to match the 10X magnification

            # Draw the contours on the mask image
            
            # Save the mask image
            cv2.imwrite(f'{wsiid}_{minx}_{miny}_mask.png', mask)


42854389.02185001
Left top coordinates: (5423.58, 7103.72), Width: 4627.74, Height: 16211.8
6248.0
36096.0
111480.0
192352.0
4992.0
18504.0
2336.0
36840.0
93968.0
21600.0
15440.0
7912.0
15920.0
5432.0
8176.0
3864.0
3768.0
31120.0
19048.0
30296.0
10024.0
30208.0
17600.0
26936.0
24824.0
6760.0
11944.0
515184.0
12848.0
6976.0
9048.0
1448.0
11048.0
7264.0
9384.0
24488.0
87216.0
2360.0
8408.0
10784.0
36704.0
15248.0
5200.0
222168.0
37912.0
181560.0
124128.0
9160.0
1912.0
112776.0
42552.0
15552.0
78416.0
44352.0
37112.0
4120.0
15320.0
11064.0
6120.0
40968.0
32000.0
84648.0
41600.0
51600.0
10080.0
21272.0
41144.0
99816.0
84512.0
9400.0
37952.0
55856.0
33792.0
71800.0
8176.0
123352.0
8264.0
54168.0
69120.0
21016.0
26352.0
39272.0
40664.0
13560.0
11944.0
60104.0
3912.0
243616.0
367176.0
52056.0
8624.0
30048.0
12264.0
5904.0
63184.0
50256.0
7776.0
11104.0
12560.0
26536.0
21808.0
52992.0
18168.0
35536.0
66944.0
41616.0
20512.0
25008.0
27184.0
33320.0
53144.0
8856.0
61072.0
2816.0
26296.0
44088.0
