# Demonstration of histomics_stream

Click to open in [[GitHub](https://github.com/DigitalSlideArchive/HistomicsStream/tree/master/example/tensorflow_stream.ipynb)] [[Google Colab](https://colab.research.google.com/github/DigitalSlideArchive/HistomicsStream/blob/master/example/tensorflow_stream.ipynb)]

The `histomics_stream` Python package sits at the start of any machine learning workflow that is built on the TensorFlow machine learning library.  The package is responsible for efficient access to the input image data that will be used to fit a new machine learning model or will be used to predict regions of interest in novel inputs using an already learned model.

## Installation

If you are running this notebook on Google Colab or another system where `histomics_stream` and its dependencies are not yet installed then they can be installed with the following commands.  Note that image readers in addition to openslide are also supported by using, e.g., `large_image[openslide,ometiff,openjpeg,bioformats]` on the below pip install command line.

In [None]:
# Get histomics_stream and its dependencies
!apt update
!apt install -y python3-openslide openslide-tools
!pip install 'large_image[openslide]' --find-links https://girder.github.io/large_image_wheels
!pip install histomics_stream

# Get other packages used in this notebook
# N.B. itkwidgets works with jupyter<=3.0.0
!apt install libcudnn8 libcudnn8-dev
!pip install histomics_detect pooch itkwidgets
!jupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-matplotlib jupyterlab-datawidgets itkwidgets

print(
    "\nNOTE!: On Google Colab you may need to choose 'Runtime->Restart runtime' for these updates to take effect."
)

## Fetching and creating the test data
This notebook has demonstrations that use the files `TCGA-AN-A0G0-01Z-00-DX1.svs` (365 MB) and `TCGA-AN-A0G0-01Z-00-DX1.mask.png` (4 kB),  The pooch commands will fetch them if they are not already available.

In [2]:
import os
import pooch

# download whole slide image
wsi_path = pooch.retrieve(
    fname="TCGA-AN-A0G0-01Z-00-DX1.svs",
    url="https://northwestern.box.com/shared/static/qelyzb45bigg6sqyumtj8kt2vwxztpzm",
    known_hash="d046f952759ff6987374786768fc588740eef1e54e4e295a684f3bd356c8528f",
    path=str(pooch.os_cache("pooch")) + os.sep + "wsi",
)
print(f"Have {wsi_path}")

# download binary mask image
mask_path = pooch.retrieve(
    fname="TCGA-AN-A0G0-01Z-00-DX1.mask.png",
    url="https://northwestern.box.com/shared/static/2q13q2r83avqjz9glrpt3s3nop6uhi2i",
    known_hash="bb657ead9fd3b8284db6ecc1ca8a1efa57a0e9fd73d2ea63ce6053fbd3d65171",
    path=str(pooch.os_cache("pooch")) + os.sep + "wsi",
)
print(f"Have {mask_path}")

Downloading data from 'https://northwestern.box.com/shared/static/qelyzb45bigg6sqyumtj8kt2vwxztpzm' to file '/root/.cache/pooch/wsi/TCGA-AN-A0G0-01Z-00-DX1.svs'.
Downloading data from 'https://northwestern.box.com/shared/static/2q13q2r83avqjz9glrpt3s3nop6uhi2i' to file '/root/.cache/pooch/wsi/TCGA-AN-A0G0-01Z-00-DX1.mask.png'.


Have /root/.cache/pooch/wsi/TCGA-AN-A0G0-01Z-00-DX1.svs
Have /root/.cache/pooch/wsi/TCGA-AN-A0G0-01Z-00-DX1.mask.png


## Creating a study for use with histomics_stream

We describe the input and desired parameters using standard Python lists and dictionaries.  Here we give a high-level configuration; selection of tiles is done subsequently.

N.B.: __*all*__ values that are number of pixels are based upon the `target_magnification` that is supplied to `FindResolutionForSlide`.  This includes pixel sizes of a slide, chunk, or tile and it includes the pixel coordinates for a chunk or tile.  It applies whether the numbers are supplied to histomics_stream or returned by histomics_stream.  However, if the `magnification_source` is not `exact` the `returned_magnification` may not equal the `target_magnification`; to get the number of pixels that is relevant for the `returned_magnification`, typically these numbers of pixels are multiplied by the ratio `returned_magnification / target_magnification`.  In particular, the *pixel size of the returned tiles* will be the requested size times this ratio.

In [3]:
import histomics_stream as hs
import histomics_stream.tensorflow
import tensorflow as tf

# Create a study and insert study-wide information
my_study0 = {"version": "version-1"}
my_study0["number_pixel_rows_for_tile"] = 256
my_study0["number_pixel_columns_for_tile"] = 256
my_slides = my_study0["slides"] = {}

# Add a slide to the study, including slide-wide information with it.
my_slide0 = my_slides["Slide_0"] = {}
my_slide0["filename"] = wsi_path
my_slide0["slide_name"] = "TCGA-AN-A0G0-01Z-00-DX1"
my_slide0["slide_group"] = "Group 3"
my_slide0["number_pixel_rows_for_chunk"] = 2048
my_slide0["number_pixel_columns_for_chunk"] = 2048

# For each slide, find the appropriate resolution given the target_magnification and
# magnification_tolerance.  In this example, we use the same parameters for each slide,
# but this is not required generally.
find_resolution_for_slide = hs.configure.FindResolutionForSlide(
    my_study0, target_magnification=20, magnification_source="native"
)
for slide in my_study0["slides"].values():
    find_resolution_for_slide(slide)
print(f"my_study0 = {my_study0}")

Using python for large_image caching


my_study0 = {'version': 'version-1', 'number_pixel_rows_for_tile': 256, 'number_pixel_columns_for_tile': 256, 'slides': {'Slide_0': {'filename': '/root/.cache/pooch/wsi/TCGA-AN-A0G0-01Z-00-DX1.svs', 'slide_name': 'TCGA-AN-A0G0-01Z-00-DX1', 'slide_group': 'Group 3', 'number_pixel_rows_for_chunk': 2048, 'number_pixel_columns_for_chunk': 2048, 'target_magnification': 20.0, 'scan_magnification': 40.0, 'read_magnification': 40.0, 'returned_magnification': 40.0, 'level': 8, 'number_pixel_rows_for_slide': 20572, 'number_pixel_columns_for_slide': 27607}}}


## Tile selection

We are going to demonstrate several approaches to choosing tiles.  Each approach will start with its own copy of the `my_study0` that we have built so far.

In [4]:
import copy

In [5]:
# Demonstrate TilesByGridAndMask without a mask
my_study_tiles_by_grid = copy.deepcopy(my_study0)
tiles_by_grid = hs.configure.TilesByGridAndMask(
    my_study_tiles_by_grid,
    number_pixel_overlap_rows_for_tile=32,
    number_pixel_overlap_columns_for_tile=32,
    randomly_select=5,
)
# We could apply this to a subset of the slides, but we will apply it to all slides in
# this example.
for slide in my_study_tiles_by_grid["slides"].values():
    tiles_by_grid(slide)
print(f"my_study_tiles_by_grid = {my_study_tiles_by_grid}")

my_study_tiles_by_grid = {'version': 'version-1', 'number_pixel_rows_for_tile': 256, 'number_pixel_columns_for_tile': 256, 'slides': {'Slide_0': {'filename': '/root/.cache/pooch/wsi/TCGA-AN-A0G0-01Z-00-DX1.svs', 'slide_name': 'TCGA-AN-A0G0-01Z-00-DX1', 'slide_group': 'Group 3', 'number_pixel_rows_for_chunk': 2048, 'number_pixel_columns_for_chunk': 2048, 'target_magnification': 20.0, 'scan_magnification': 40.0, 'read_magnification': 40.0, 'returned_magnification': 40.0, 'level': 8, 'number_pixel_rows_for_slide': 20572, 'number_pixel_columns_for_slide': 27607, 'number_tile_rows_for_slide': 91, 'number_tile_columns_for_slide': 123, 'tiles': {'tile_2912': {'tile_top': 5152, 'tile_left': 18592}, 'tile_6396': {'tile_top': 11648, 'tile_left': 0}, 'tile_6517': {'tile_top': 11648, 'tile_left': 27104}, 'tile_10782': {'tile_top': 19488, 'tile_left': 18144}, 'tile_10876': {'tile_top': 19712, 'tile_left': 11648}}}}}


In [6]:
# Demonstrate TilesByGridAndMask with a mask
my_study_tiles_by_grid_and_mask = copy.deepcopy(my_study0)
tiles_by_grid_and_mask = hs.configure.TilesByGridAndMask(
    my_study_tiles_by_grid_and_mask,
    number_pixel_overlap_rows_for_tile=0,
    number_pixel_overlap_columns_for_tile=0,
    mask_filename=mask_path,
    randomly_select=10,
)
# We could apply this to a subset of the slides, but we will apply it to all slides in
# this example.
for slide in my_study_tiles_by_grid_and_mask["slides"].values():
    tiles_by_grid_and_mask(slide)
print(f"my_study_tiles_by_grid_and_mask = {my_study_tiles_by_grid_and_mask}")

my_study_tiles_by_grid_and_mask = {'version': 'version-1', 'number_pixel_rows_for_tile': 256, 'number_pixel_columns_for_tile': 256, 'slides': {'Slide_0': {'filename': '/root/.cache/pooch/wsi/TCGA-AN-A0G0-01Z-00-DX1.svs', 'slide_name': 'TCGA-AN-A0G0-01Z-00-DX1', 'slide_group': 'Group 3', 'number_pixel_rows_for_chunk': 2048, 'number_pixel_columns_for_chunk': 2048, 'target_magnification': 20.0, 'scan_magnification': 40.0, 'read_magnification': 40.0, 'returned_magnification': 40.0, 'level': 8, 'number_pixel_rows_for_slide': 20572, 'number_pixel_columns_for_slide': 27607, 'number_tile_rows_for_slide': 80, 'number_tile_columns_for_slide': 107, 'number_pixel_rows_for_mask': 642, 'number_pixel_columns_for_mask': 862, 'tiles': {'tile_298': {'tile_top': 512, 'tile_left': 21504}, 'tile_2758': {'tile_top': 6400, 'tile_left': 21248}, 'tile_3693': {'tile_top': 8704, 'tile_left': 14080}, 'tile_4256': {'tile_top': 9984, 'tile_left': 21248}, 'tile_4431': {'tile_top': 10496, 'tile_left': 11264}, 'tile_4

In [7]:
# Demonstrate TilesByList
my_study_tiles_by_list = copy.deepcopy(my_study0)
tiles_by_list = hs.configure.TilesByList(
    my_study_tiles_by_list,
    randomly_select=5,
    tiles_dictionary=my_study_tiles_by_grid["slides"]["Slide_0"]["tiles"],
)
# We could apply this to a subset of the slides, but we will apply it to all slides in
# this example.
for slide in my_study_tiles_by_list["slides"].values():
    tiles_by_list(slide)
print(f"my_study_tiles_by_list = {my_study_tiles_by_list}")

my_study_tiles_by_list = {'version': 'version-1', 'number_pixel_rows_for_tile': 256, 'number_pixel_columns_for_tile': 256, 'slides': {'Slide_0': {'filename': '/root/.cache/pooch/wsi/TCGA-AN-A0G0-01Z-00-DX1.svs', 'slide_name': 'TCGA-AN-A0G0-01Z-00-DX1', 'slide_group': 'Group 3', 'number_pixel_rows_for_chunk': 2048, 'number_pixel_columns_for_chunk': 2048, 'target_magnification': 20.0, 'scan_magnification': 40.0, 'read_magnification': 40.0, 'returned_magnification': 40.0, 'level': 8, 'number_pixel_rows_for_slide': 20572, 'number_pixel_columns_for_slide': 27607, 'tiles': {'tile_2912': {'tile_top': 5152, 'tile_left': 18592}, 'tile_6396': {'tile_top': 11648, 'tile_left': 0}, 'tile_6517': {'tile_top': 11648, 'tile_left': 27104}, 'tile_10782': {'tile_top': 19488, 'tile_left': 18144}, 'tile_10876': {'tile_top': 19712, 'tile_left': 11648}}}}}


In [8]:
# Demonstrate TilesRandomly
my_study_tiles_randomly = copy.deepcopy(my_study0)
tiles_randomly = hs.configure.TilesRandomly(my_study_tiles_randomly, randomly_select=10)
# We could apply this to a subset of the slides, but we will apply it to all slides in
# this example.
for slide in my_study_tiles_randomly["slides"].values():
    tiles_randomly(slide)
print(f"my_study_tiles_randomly = {my_study_tiles_randomly}")

my_study_tiles_randomly = {'version': 'version-1', 'number_pixel_rows_for_tile': 256, 'number_pixel_columns_for_tile': 256, 'slides': {'Slide_0': {'filename': '/root/.cache/pooch/wsi/TCGA-AN-A0G0-01Z-00-DX1.svs', 'slide_name': 'TCGA-AN-A0G0-01Z-00-DX1', 'slide_group': 'Group 3', 'number_pixel_rows_for_chunk': 2048, 'number_pixel_columns_for_chunk': 2048, 'target_magnification': 20.0, 'scan_magnification': 40.0, 'read_magnification': 40.0, 'returned_magnification': 40.0, 'level': 8, 'number_pixel_rows_for_slide': 20572, 'number_pixel_columns_for_slide': 27607, 'tiles': {'tile_0': {'tile_top': 4641, 'tile_left': 5282}, 'tile_1': {'tile_top': 598, 'tile_left': 12915}, 'tile_2': {'tile_top': 18388, 'tile_left': 21249}, 'tile_3': {'tile_top': 13223, 'tile_left': 19118}, 'tile_4': {'tile_top': 13723, 'tile_left': 16106}, 'tile_5': {'tile_top': 5966, 'tile_left': 9924}, 'tile_6': {'tile_top': 6321, 'tile_left': 18795}, 'tile_7': {'tile_top': 14803, 'tile_left': 13870}, 'tile_8': {'tile_top': 

## Creating a TensorFlow Dataset

We request tiles indicated by the mask and create a tensorflow Dataset that has the image data for these tiles as well as associated parameters for each tile, such as its location.

In [9]:
# Demonstrate TilesByGridAndMask with a mask
my_study_of_tiles = copy.deepcopy(my_study0)
tiles_by_grid_and_mask = hs.configure.TilesByGridAndMask(
    my_study_of_tiles,
    number_pixel_overlap_rows_for_tile=0,
    number_pixel_overlap_columns_for_tile=0,
    mask_filename=mask_path,
    mask_threshold=0.5,
    randomly_select=100,
)
for slide in my_study_of_tiles["slides"].values():
    tiles_by_grid_and_mask(slide)
print("Finished selecting tiles.")

create_tensorflow_dataset = hs.tensorflow.CreateTensorFlowDataset()
tiles = create_tensorflow_dataset(my_study_of_tiles)
print("Finished with CreateTensorFlowDataset")
print(f"... with tile shape = {tiles.take(1).get_single_element()[0][0].shape}")

Finished selecting tiles.


2022-04-26 18:24:41.817040: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-04-26 18:24:42.505833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9650 MB memory:  -> device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:db:00.0, compute capability: 7.5


Finished with CreateTensorFlowDataset
... with tile shape = (512, 512, 3)


## Fetch a model for prediction

We fetch a model (840 MB compressed, 1.3 GB decompressed) that we will use to make predictions.

Because each element of our Dataset is a tuple `(rgb_image_data, dictionary_of_annotation)`, a typical model that accepts only the former as its input needs to be wrapped.

Note that this model assumes that the tiles/images are not batched, with the understanding that if there is enough memory to do batching then one should instead choose a larger tile size. 

In [10]:
# download trained model.
model_path = pooch.retrieve(
    fname="tcga_brca_model",
    url="https://northwestern.box.com/shared/static/4g6idrqlpvgxnsktz8pym5386njyvyb6",
    known_hash="b5b5444cc8874d17811a89261abeafd9b9603e7891a8b2a98d8f13e2846a6689",
    path=str(pooch.os_cache("pooch")) + os.sep + "model",
    processor=pooch.Unzip(),
)
model_path = os.path.split(model_path[0])[0]
print(f"Have {model_path}.")

# restore keras model
from histomics_detect.models import FasterRCNN

model = tf.keras.models.load_model(
    model_path, custom_objects={"FasterRCNN": FasterRCNN}
)

# Each element of the `tiles` tensorflow Dataset is a (rgb_image_data, dictionary_of_annotation) pair.
# Wrap the unwrapped_model so that it knows to use the image.
class WrappedModel(tf.keras.Model):
    def __init__(self, model, *args, **kwargs):
        super(WrappedModel, self).__init__(*args, **kwargs)
        self.model = model

    def call(self, element):
        return (self.model(element[0]), element[1])


unwrapped_model = model
model = WrappedModel(unwrapped_model)
print("Model built and wrapped.")

Downloading data from 'https://northwestern.box.com/shared/static/4g6idrqlpvgxnsktz8pym5386njyvyb6' to file '/root/.cache/pooch/model/tcga_brca_model'.
Unzipping contents of '/root/.cache/pooch/model/tcga_brca_model' to '/root/.cache/pooch/model/tcga_brca_model.unzip'


Have /root/.cache/pooch/model/tcga_brca_model.unzip/tcga_brca_model.
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50v2_weights_tf_dim_ordering_tf_kernels_notop.h5


2022-04-26 18:25:24.536937: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8400
2022-04-26 18:25:36.339473: W tensorflow/core/common_runtime/graph_constructor.cc:803] Node 'gradients/PartitionedCall_6_grad/PartitionedCall' has 3 outputs but the _output_shapes attribute specifies shapes for 33 outputs. Output shapes may be inaccurate.
2022-04-26 18:25:36.339574: W tensorflow/core/common_runtime/graph_constructor.cc:803] Node 'gradients/PartitionedCall_4_grad/PartitionedCall' has 3 outputs but the _output_shapes attribute specifies shapes for 33 outputs. Output shapes may be inaccurate.
2022-04-26 18:25:36.339629: W tensorflow/core/common_runtime/graph_constructor.cc:803] Node 'gradients/PartitionedCall_5_grad/PartitionedCall' has 3 outputs but the _output_shapes attribute specifies shapes for 33 outputs. Output shapes may be inaccurate.
2022-04-26 18:25:36.339665: W tensorflow/core/common_runtime/graph_constructor.cc:803] Node 'gradients/PartitionedCall_2_grad/P

Model built and wrapped.


## Make predictions

In [11]:
import time

print("Starting predictions")
start_time = time.time()
# This model assumes that the tiles are not batched.  Do not use, e.g., tiles.batch(32).
predictions = model.predict(tiles)
end_time = time.time()
number_of_inputs = len([0 for tile in tiles])
number_of_predictions = predictions[0].shape[0]
print(
    f"Made {number_of_predictions} predictions for {number_of_inputs} tiles in {end_time - start_time} s."
)
print(f"Average of {(end_time - start_time) / number_of_inputs} s per tile.")

Starting predictions
Made 8168 predictions for 100 tiles in 12.364816904067993 s.
Average of 0.12364816904067993 s per tile.


## Look at internals

In [12]:
my_element = tiles.take(1).get_single_element()
my_pair = my_element[0]
my_target = my_element[1]
my_weight = my_element[2]
my_image = my_pair[0]
my_annotation = my_pair[1]

print(f"   type(my_element) = {type(my_element)}")
print(f"    len(my_element) = {len(my_element)}")
print(f"      type(my_pair) = {type(my_pair)}")
print(f"       len(my_pair) = {len(my_pair)}")
print(f"    type(my_target) = {type(my_target)}")
print(f"    type(my_weight) = {type(my_weight)}")
print(f"     type(my_image) = {type(my_image)}")
print(f"     my_image.shape = {my_image.shape}")
print(f"type(my_annotation) = {type(my_annotation)}")

   type(my_element) = <class 'tuple'>
    len(my_element) = 3
      type(my_pair) = <class 'tuple'>
       len(my_pair) = 2
    type(my_target) = <class 'NoneType'>
    type(my_weight) = <class 'NoneType'>
     type(my_image) = <class 'tensorflow.python.framework.ops.EagerTensor'>
     my_image.shape = (512, 512, 3)
type(my_annotation) = <class 'dict'>


## Display a tile

In [13]:
import itk, itkwidgets

itkwidgets.view(itk.image_from_array(my_image.numpy(), is_vector=True))

Viewer(geometries=[], gradient_opacity=0.22, point_sets=[], rendered_image=<itk.itkImagePython.itkImageRGBUC2;…