# Demonstration of histomics_stream

Click to open in [[GitHub](https://github.com/DigitalSlideArchive/histomics_stream/tree/master/example/tensorflow_stream.ipynb)] [[Google Colab](https://colab.research.google.com/github/DigitalSlideArchive/histomics_stream/blob/master/example/tensorflow_stream.ipynb)]

The `histomics_stream` Python package sits at the start of any machine learning workflow that is built on the TensorFlow machine learning library.  The package is responsible for efficient access to the input image data that will be used to fit a new machine learning model or will be used to predict regions of interest in novel inputs using an already learned model.

## Installation

If you are running this notebook on Google Colab or another system where `histomics_stream` and its dependencies are not yet installed then they can be installed with the following commands.

In [None]:
!apt update
!apt install -y python3-openslide openslide-tools
!pip uninstall -y histomics_stream large_image tensorflow
!pip install histomics_stream 'large_image[all]' --find-links https://girder.github.io/large_image_wheels
print(
    "NOTE!: On Google Colab you may need to choose 'Runtime->Restart runtime' for these updates to take effect."
)

## Fetching and creating the test data
This notebook has demonstrations that use the files `TCGA-3L-AA1B-01Z-00-DX1.8923A151-A690-40B7-9E5A-FCBEDFC2394F.svs` (1.2 GB) and `TCGA-3L-AA1B-01Z-00-DX1.8923A151-A690-40B7-9E5A-FCBEDFC2394F-mask.png` (28 kB),  If we don't already have them then the first is fetched and the second is randomly created.

In [None]:
import os

remote_filename = "https://tiatoolbox.dcs.warwick.ac.uk/sample_wsis/TCGA-3L-AA1B-01Z-00-DX1.8923A151-A690-40B7-9E5A-FCBEDFC2394F.svs"
local_filename = remote_filename.split("/")[-1]
if not os.path.exists(local_filename):
    print(f"Downloading {remote_filename} ...")
    import requests
    import shutil

    # This does not decompress zip files or anything like that.
    with requests.get(remote_filename, stream=True) as r:
        with open(local_filename, "wb") as f:
            shutil.copyfileobj(r.raw, f)
print(f"Have {local_filename}.")

# Create a random mask for the image if necessary
mask_filename = os.path.splitext(local_filename)[0] + "-mask.png"
if not os.path.exists(mask_filename):
    import large_image

    print(f"Creating {mask_filename} ...")
    ts = large_image.open(local_filename)
    import numpy as np
    from PIL import Image

    arr = np.random.randint(0, 2, (ts.sizeY // 256, ts.sizeX // 256), dtype=np.int8)
    im = Image.fromarray(arr)
    im.save(mask_filename)
print(f"Have {mask_filename}.")

## Creating a study for use with histomics_stream

We describe the input and desired parameters using standard Python lists and dictionaries.  Here we give a high-level configuration; selection of tiles is done subsequently. 

In [None]:
import copy
import histomics_stream as hs
import tensorflow

# Create a study and insert study-wide information
my_study0 = {"version": "version-1"}
my_study0["number_pixel_rows_for_tile"] = 256
my_study0["number_pixel_columns_for_tile"] = 256
my_slides = my_study0["slides"] = {}

# Add a slide to the study, including slide-wide information with it.
my_slide0 = my_slides["Slide_0"] = {}
my_slide0["filename"] = local_filename
my_slide0["slide_name"] = "local_filename_0"
my_slide0["slide_group"] = "control"
my_slide0["number_pixel_rows_for_chunk"] = 2048
my_slide0["number_pixel_columns_for_chunk"] = 2048

# For each slide, find the appropriate resolution given the desired_magnification and
# magnification_tolerance.  In this example, we use the same parameters for each slide,
# but this is not required generally.
find_resolution_for_slide = hs.configure.FindResolutionForSlide(
    my_study0, desired_magnification=20, magnification_tolerance=0.02
)
for slide in my_study0["slides"].values():
    find_resolution_for_slide(slide)
print(f"my_study0 = {my_study0}")

## Tile selection

We are going to demonstrate several approaches to choosing tiles.  Each approach will start with its own copy of the `my_study0` that we have built so far.

In [None]:
# Demonstrate TilesByGridAndMask without a mask
my_study_tiles_by_grid = copy.deepcopy(my_study0)
tiles_by_grid = hs.configure.TilesByGridAndMask(
    my_study_tiles_by_grid,
    number_pixel_overlap_rows_for_tile=32,
    number_pixel_overlap_columns_for_tile=32,
    randomly_select=5,
)
# We could apply this to a subset of the slides, but we will apply it to all slides in
# this example.
for slide in my_study_tiles_by_grid["slides"].values():
    tiles_by_grid(slide)
print(f"my_study_tiles_by_grid = {my_study_tiles_by_grid}")

In [None]:
# Demonstrate TilesByGridAndMask with a mask
my_study_tiles_by_grid_and_mask = copy.deepcopy(my_study0)
tiles_by_grid_and_mask = hs.configure.TilesByGridAndMask(
    my_study_tiles_by_grid_and_mask,
    number_pixel_overlap_rows_for_tile=0,
    number_pixel_overlap_columns_for_tile=0,
    mask_filename="TA232-mask.png",
    randomly_select=10,
)
# We could apply this to a subset of the slides, but we will apply it to all slides in
# this example.
for slide in my_study_tiles_by_grid_and_mask["slides"].values():
    tiles_by_grid_and_mask(slide)
print(f"my_study_tiles_by_grid_and_mask = {my_study_tiles_by_grid_and_mask}")

In [None]:
# Demonstrate TilesByList
my_study_tiles_by_list = copy.deepcopy(my_study0)
tiles_by_list = hs.configure.TilesByList(
    my_study_tiles_by_list,
    randomly_select=5,
    tiles_dictionary=my_study_tiles_by_grid["slides"]["Slide_0"]["tiles"],
)
# We could apply this to a subset of the slides, but we will apply it to all slides in
# this example.
for slide in my_study_tiles_by_list["slides"].values():
    tiles_by_list(slide)
print(f"my_study_tiles_by_list = {my_study_tiles_by_list}")

In [None]:
# Demonstrate TilesRandomly
my_study_tiles_randomly = copy.deepcopy(my_study0)
tiles_randomly = hs.configure.TilesRandomly(
    my_study_tiles_randomly,
    randomly_select=10,
)
# We could apply this to a subset of the slides, but we will apply it to all slides in
# this example.
for slide in my_study_tiles_randomly["slides"].values():
    tiles_randomly(slide)
print(f"my_study_tiles_randomly = {my_study_tiles_randomly}")

## Creating a TensorFlow Dataset

We request tiles indicated by the mask and create a tensorflow Dataset that has the image data for these tiles as well as associated parameters for each tile, such as its location.

In [None]:
# Demonstrate TilesByGridAndMask with a mask
my_study_of_tiles = copy.deepcopy(my_study0)
tiles_by_grid_and_mask = hs.configure.TilesByGridAndMask(
    my_study_of_tiles,
    number_pixel_overlap_rows_for_tile=0,
    number_pixel_overlap_columns_for_tile=0,
    mask_filename="TA232-mask.png",
    randomly_select=1000,
)
for slide in my_study_of_tiles["slides"].values():
    tiles_by_grid_and_mask(slide)
print("Finished selecting tiles.")

create_tensorflow_dataset = hs.tensorflow.CreateTensorFlowDataset()
tiles = create_tensorflow_dataset(my_study_of_tiles)
print("Finished with CreateTensorFlowDataset")

## Create a model for prediction

We create a nonsense model to demonstrate how the tensorflow Dataset we just created is used with models.  Note that because each element of our Dataset is a tuple `(rgb_image_data, dictionary_of_annotation)`, a typical model that accepts only the former as its input needs to be wrapped.

In [None]:
num_classes = 3
unwrapped_model = tensorflow.keras.models.Sequential(
    [
        tensorflow.keras.layers.Rescaling(
            1.0 / 255,
            input_shape=(
                my_study_of_tiles["number_pixel_rows_for_tile"],
                my_study_of_tiles["number_pixel_columns_for_tile"],
                3,
            ),
        ),
        tensorflow.keras.layers.Conv2D(16, 3, padding="same", activation="relu"),
        tensorflow.keras.layers.MaxPooling2D(),
        tensorflow.keras.layers.Conv2D(32, 3, padding="same", activation="relu"),
        tensorflow.keras.layers.MaxPooling2D(),
        tensorflow.keras.layers.Conv2D(64, 3, padding="same", activation="relu"),
        tensorflow.keras.layers.MaxPooling2D(),
        tensorflow.keras.layers.Flatten(),
        tensorflow.keras.layers.Dense(128, activation="relu"),
        tensorflow.keras.layers.Dense(num_classes),
    ]
)

# For this demonstration, we will use the model even though we have not trained it!

# Each element of the `tiles` tensorflow Dataset is a (rgb_image_data, dictionary_of_annotation) pair.
# Wrap the unwrapped_model so that it knows to use the image.
class WrappedModel(tensorflow.keras.Model):
    def __init__(self, model, *args, **kwargs):
        super(WrappedModel, self).__init__(*args, **kwargs)
        self.model = model

    def call(self, element):
        return (self.model(element[0]), element[1])


model = WrappedModel(unwrapped_model)
print("Model built and wrapped.")

## Make predictions

In [None]:
import time

print("Starting predictions")
start_time = time.time()
predictions = model.predict(tiles.batch(256))
end_time = time.time()
print(f"Completed {predictions[0].shape[0]} predictions in {end_time - start_time} s.")
print(
    f"Average of {(end_time - start_time) / (predictions[0].shape[0])} s per prediction."
)