# The .h5path File Format

In this notebook we will explore the .h5path file format. Let's begin by creating and processing a H&E whole slide image.

In [1]:
# load slide
from pathml.core.slide_classes import HESlide
from pathml.core.h5path import read
wsi = HESlide("../tests/testdata/small_HE.svs", name = "example")

Our H&E slide has been converted into a pathml SlideData object of type HESlide. SlideData objects contain a reference to the original slide with an appropriate backend (in this case OpenSlide reads RGB .svs files), masks, tiles, and labels.

In [2]:
wsi

HESlide(name=example, slide = <pathml.core.slide_backends.OpenSlideBackend object at 0x7f8c1149c850>, masks=None, tiles=None, labels=None, history=None)

In [3]:
# declare and run pipeline
from pathml.preprocessing.pipeline import Pipeline
from pathml.preprocessing.transforms import BoxBlur, TissueDetectionHE

pipeline = Pipeline([
    BoxBlur(kernel_size=15),
    TissueDetectionHE(mask_name = "tissue", min_region_size=500, 
                      threshold=30, outer_contours_only=True)
])

wsi.run(pipeline, tile_size=256)

Our pipeline added tiles to wsi.tiles.

In [4]:
wsi.tiles

Tiles(keys=OrderedDict([('(256, 0)', {'name': None, 'labels': None, 'coords': '(256, 0)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(512, 0)', {'name': None, 'labels': None, 'coords': '(512, 0)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(512, 256)', {'name': None, 'labels': None, 'coords': '(512, 256)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(768, 512)', {'name': None, 'labels': None, 'coords': '(768, 512)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(0, 512)', {'name': None, 'labels': None, 'coords': '(0, 512)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(1536, 512)', {'name': None, 'labels': None, 'coords': '(1536, 512)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(256, 256)', {'name': None, 'labels': None, 'coords': '(256, 256)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(768, 256)', {'name': None, 'labels': None, 'coords': '(768, 256)', 'slide

Behind the scenes, these tiles are managed in a temporary h5py file.

In [5]:
wsi.tiles.h5manager.h5.keys()

<KeysViewHDF5 ['array', 'masks', 'tiles']>

When we save to h5path, we are persisting this temporary file and all other data attached to wsi (i.e. our whole slidedata object) to disk.

In [6]:
wsi.write('slidedataobject.h5path')

For developers: let's look at slidedataobject.h5path on disk. 

In [7]:
import h5py
f = h5py.File('slidedataobject.h5path', 'r')
f.keys()

<KeysViewHDF5 ['array', 'fields', 'masks', 'tiles']>

h5path contains h5py.Datasets 'array' and 'masks' where all tiles and masks are aggregated into slide-sized arrays. 

In [8]:
print(f['array'].shape)
print(f['masks']['tissue'].shape)

(2048, 2816, 3)
(2048, 2816)


'fields' is a h5py.Group with attached h5py.Attributes that store metadata about slidedata.

In [9]:
print(f['fields'].attrs.keys())

<KeysViewHDF5 ['name', 'slide_backend']>


'tiles' is a h5py.Group that stores where each tile is located in 'array' and 'masks', and metadata describing each tile.

In [10]:
print(f['tiles'].keys())
print(f['tiles']['(0, 0)'].keys())

<KeysViewHDF5 ['(0, 0)', '(0, 1024)', '(0, 1280)', '(0, 1536)', '(0, 1792)', '(0, 2048)', '(0, 2304)', '(0, 256)', '(0, 2560)', '(0, 512)', '(0, 768)', '(1024, 0)', '(1024, 1024)', '(1024, 1280)', '(1024, 1536)', '(1024, 1792)', '(1024, 2048)', '(1024, 2304)', '(1024, 256)', '(1024, 2560)', '(1024, 512)', '(1024, 768)', '(1280, 0)', '(1280, 1024)', '(1280, 1280)', '(1280, 1536)', '(1280, 1792)', '(1280, 2048)', '(1280, 2304)', '(1280, 256)', '(1280, 2560)', '(1280, 512)', '(1280, 768)', '(1536, 0)', '(1536, 1024)', '(1536, 1280)', '(1536, 1536)', '(1536, 1792)', '(1536, 2048)', '(1536, 2304)', '(1536, 256)', '(1536, 2560)', '(1536, 512)', '(1536, 768)', '(1792, 0)', '(1792, 1024)', '(1792, 1280)', '(1792, 1536)', '(1792, 1792)', '(1792, 2048)', '(1792, 2304)', '(1792, 256)', '(1792, 2560)', '(1792, 512)', '(1792, 768)', '(256, 0)', '(256, 1024)', '(256, 1280)', '(256, 1536)', '(256, 1792)', '(256, 2048)', '(256, 2304)', '(256, 256)', '(256, 2560)', '(256, 512)', '(256, 768)', '(512, 0)

PathML's read function recognizes and loads slidedata objects from the .h5path format.

In [11]:
loadedwsi = read('slidedataobject.h5path')
loadedwsi

SlideData(name=example, slide = None, masks=None, tiles=Tiles(keys=OrderedDict([('(0, 0)', {'name': None, 'labels': None, 'coords': '(0, 0)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(0, 1024)', {'name': None, 'labels': None, 'coords': '(0, 1024)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(0, 1280)', {'name': None, 'labels': None, 'coords': '(0, 1280)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(0, 1536)', {'name': None, 'labels': None, 'coords': '(0, 1536)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(0, 1792)', {'name': None, 'labels': None, 'coords': '(0, 1792)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(0, 2048)', {'name': None, 'labels': None, 'coords': '(0, 2048)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(0, 2304)', {'name': None, 'labels': None, 'coords': '(0, 2304)', 'slidetype': <class 'pathml.core.slide_classes.HESlide'>}), ('(0, 256)', {'name': None, 'l