 Tutorial - Recommended histopathology WSI PROCESSING
            Overlaying Tissue Compartment Annotations on a Slide Image of Patient 15

1) Setting up the environment and downloading the example data

Libraries:
The libraries required for this notebook are Spectral Python (SPy), NumPy, SciPy, and Matplotlib.

Openslide. Python module for reading whole-slide image formats. https://openslide.org/

Spectral Python (SPy). Python module for hyperspectral image processing. https://www.spectralpython.net

Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2

J. D. Hunter, "Matplotlib: A 2D Graphics Environment," in Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, May-June 2007, doi: 10.1109/MCSE.2007.55.

Virtanen, P., Gommers, R., Oliphant, T.E. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2

Data:
The data used on this notebook correspond to a folder containing histological and hyperspectral data from a breast histological slide from the HistologyHSI-BRCA-Recurrence dataset. The demographic and clinical data are stores as excel, the histological slide are stored as mrxs files and the hyperspectral cubes from the histological slide and the white and dark references are stored as ENVI files. The ENVI format is the concention for HSI data and it consists of a flat-binary raster file with an accompanying ASCII header file.

In [3]:
# Install the Spectral Python (SPy) library
!pip install spectral
!pip install openslide-python
!pip install openslide-bin

Collecting spectral
  Downloading spectral-0.23.1-py3-none-any.whl.metadata (1.3 kB)
Downloading spectral-0.23.1-py3-none-any.whl (212 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/212.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m204.8/212.9 kB[0m [31m7.0 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.9/212.9 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: spectral
Successfully installed spectral-0.23.1
Collecting openslide-python
  Downloading openslide_python-1.4.1-cp311-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (4.3 kB)
Downloading openslide_python-1.4.1-cp311-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.whl (33 kB)
Installing collected packages: openslide-python
Successfully installed openslide-python-1.4.1
Collecting openslide-bin
  Downloading openslide_bin-4.0.0.6-py3-none-manylin

In [2]:
!wget -O downloaded_file.zip 'https://nimbus.iuma.ulpgc.es/s/JaxL6Rb4G74Wz6x/download'
!unzip downloaded_file.zip && rm downloaded_file.zip

--2025-03-24 17:17:54--  https://nimbus.iuma.ulpgc.es/s/JaxL6Rb4G74Wz6x/download
Resolving nimbus.iuma.ulpgc.es (nimbus.iuma.ulpgc.es)... 193.145.147.66
Connecting to nimbus.iuma.ulpgc.es (nimbus.iuma.ulpgc.es)|193.145.147.66|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘downloaded_file.zip’

downloaded_file.zip     [               <=>  ]   2.80G  11.9MB/s    in 4m 4s   

2025-03-24 17:21:59 (11.7 MB/s) - ‘downloaded_file.zip’ saved [3007677160]

Archive:  downloaded_file.zip
   creating: Scientific Data/
 extracting: Scientific Data/00_01_Clinical_Demographic_Data.xlsx  
   creating: Scientific Data/01_01_Histological_Images/
   creating: Scientific Data/01_01_Histological_Images/15_07B3634/
 extracting: Scientific Data/01_01_Histological_Images/15_07B3634/Data0000.dat  
 extracting: Scientific Data/01_01_Histological_Images/15_07B3634/Data0001.dat  
 extracting: Scientific Data/01_01_Histological_Images/15_07B3634/

In [4]:
import os
import json
import openslide
from PIL import Image, ImageDraw

In [None]:
FILE_PATH = '/content/Scientific Data/01_01_Histological_Images/15_07B3634.mrxs'
FILE_ANNOTATION = '/content/Scientific Data/01_02_Tissue_Annotations/15.geojson' # Este es el geoJSON de los tejidos
LEVEL = 7 # Selected the level from the slide
CASES = ['in situ carcinoma', 'infiltrant carcinoma', 'normal tissue'] # Cases types

In [None]:
selected_image = openslide.OpenSlide(FILE_PATH) # Load the slide using OpenSlide

LEVEL_DOWNSAMPLE = int(selected_image.level_downsamples[LEVEL]) # Downsample factor of the selected level

# The X coordinate of the rectangle bounding the non-empty region of the slide
shift_x = selected_image.properties['openslide.bounds-x']
shift_x = int(shift_x)
# The Y coordinate of the rectangle bounding the non-empty region of the slide
shift_y = selected_image.properties['openslide.bounds-y']
shift_y = int(shift_y)

In [None]:
# Load the GeoJSON file
geojson_data = ""
with open(FILE_ANNOTATION, "r") as file:
    geojson_data = json.load(file)

# Extract the annotations
mfeatures = geojson_data['features']

# Prepare the annotations in a list where we select for each of them the coordinates and the case type
list_annotations = []
for mannotation in mfeatures:
    list_annotations.append([mannotation['geometry']['coordinates'][0], mannotation['properties']['classification']['name']])

In [None]:
# The height of the rectangle bounding the non-empty region of the slide at level LEVEL
bh_LEVEL_DOWNSAMPLE = int(selected_image.properties['openslide.bounds-height'])//LEVEL_DOWNSAMPLE
# The width of the rectangle bounding the non-empty region of the slide at level LEVEL
bw_LEVEL_DOWNSAMPLE = int(selected_image.properties['openslide.bounds-width'])//LEVEL_DOWNSAMPLE

# Compute the annotations at level LEVEL
list_annotations_downsample = []
for mannotation in list_annotations:
    mannnotation_coordinates = [[x/LEVEL_DOWNSAMPLE, y/LEVEL_DOWNSAMPLE] for [x, y] in mannotation[0]]
    list_annotations_downsample.append([mannnotation_coordinates, mannotation[1]])

In [None]:
# Extract the rectangle bounding the non-empty region of the slide at level LEVEL
bounding_box = selected_image.read_region(location=(shift_x, shift_y), level=LEVEL, size=(bw_LEVEL_DOWNSAMPLE, bh_LEVEL_DOWNSAMPLE))
bounding_box = bounding_box.convert("RGBA")

In [None]:
# Create a transparent overlay image
overlay = Image.new("RGBA", bounding_box.size, (0, 0, 0, 0))
draw = ImageDraw.Draw(overlay)

# Define annotation colors based on CASES
annotation_colors = {
    CASES[0]: (255, 0, 0, 100),    # Red with transparency
    CASES[1]: (0, 0, 255, 100),    # Blue with transparency
    CASES[2]: (0, 255, 0, 100)     # Green with transparency
}

# Draw semi-transparent contours on the overlay
for annotation in list_annotations:
    coordinates = [(x / LEVEL_DOWNSAMPLE, y / LEVEL_DOWNSAMPLE) for x, y in annotation[0]]
    annotation_type = annotation[1]  # This will match CASES[0], CASES[1], or CASES[2]

    # Get the corresponding color
    color = annotation_colors.get(annotation_type, (255, 255, 255, 100))  # Default: white with transparency

    # Draw the contour as a semi-transparent polygon
    draw.polygon(coordinates, outline=color, fill=color)

# Blend the overlay with the slide at level LEVEL
final_image = Image.alpha_composite(bounding_box, overlay)

In [None]:
# Save the result
final_image.save("overlay_tissues_15.png")