<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.sandbox.google.com/github/Google-Health/imaging-research/blob/master/path-foundation/linear-classifier-demo.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/Google-Health/imaging-research/tree/master/path-foundation"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>


~~~
Copyright 2024 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
~~~
# Getting Started Guide For Generating Pathology Embeddings

This notebook contains a series of examples covering a range of situations for generating Google [Digital Pathology embeddings](https://research.google/blog/health-specific-embedding-tools-for-dermatology-and-pathology/). Each example is standalone, and you are free to jump around.

## Use of EZ-WSI

This notebook leverages [EZ-WSI DICOMweb](https://github.com/GoogleCloudPlatform/EZ-WSI-DICOMweb), which makes it easier to work with DICOM data and generate embeddings from a variety of data sources: [DICOM store](https://cloud.google.com/healthcare-api/docs/how-tos/dicom), [Google Cloud Storage](https://cloud.google.com/storage?hl=en), and  locally using files or in memory representations.

## Use of Google Research Digital Pathology Embedding API

To make these examples simpler and more performant, they commonly use the Google Research Digital Pathology Embedding API.

Using this API allows us to streamline the examples and not force you to spin up your own serving environment. You are always free to use EZ-WSI with local files or your own serving container. Access to the Embedding API is available by [allow list](https://github.com/Google-Health/imaging-research/tree/master/path-foundation#access-options).

On performance, using the Embedding API (or your own spun up instance) accelerates embedding generation for data stored within Cloud by enabling the embedding generation endpoint to directly connect and retrieve the imaging data required for the embedding generation. Doing this enables the source imaging to be retrieved at much greater efficiency using the Google datacenter network and removes the cost and time required for users of the API to retrieve and then transmit giga-pixel imaging to the embedding endpoints. Data retrieved by the endpoints is used only for embedding generation; <u>data is not retained</u>. Source code and the terraform infrastructure as code (IAC) required to deploy an embedding endpoint will be made available via open source.

The Embeddings API generates pathology embeddings from image patches, cropped sub-regions of a digital pathology image. Higher level ML will typically bring together embeddings from many patches sampled across a region of interest or the entire slide to produce a prediction. EZ-WSI provides interfaces to easily define patches and retrieve embeddings. This co-lab will step through a number of contained examples. Each example will stand alone

# EZ-WSI Patches

The embedding generation interfaces built into EZ-WSI are designed to make it easy to transform image patches into embeddings. EZ-WSI provides mechanisms to define image patches for imaging stored within a DICOM store, on Google Cloud Storage, or from local data sources. At their core EZ-WSI patches are light-weight and describe only the parameters necessary to access the pixel information described by a patch. When referencing data stored in the Cloud , pixel data is retrieved only when necessary, e.g., when image bytes are accessed. The Cloud Embedding API (e.g., V2PatchEmbeddingEndpoint) is based around this same concept, the embedding API accepts requests that define data stored within Cloud. To fulfill the request the Cloud endpoint will then retrieve the necessary data and generate the embedding and return the results. Together this enables clients using EZ-WSI to efficiently generate large and complex requests for patch embeddings. In the example colabs, the image bytes associated with patches are accessed, this done demonstration purposes only, to visualize the image bytes from which the embeddings are being generated, and is not required for patch embedding generation.

In [None]:
# @title Pip install EZ-WSI DICOMweb
%%capture
!pip install --upgrade ez_wsi_dicomweb>=6.0.8

**Colab specific code to authenticate colab and display patch imaging .**

In [None]:
from typing import Union
from ez_wsi_dicomweb import dicom_slide
from ez_wsi_dicomweb import gcs_image
from google.colab import auth
import matplotlib.pyplot as plt
# Authenticate user for access. There will be a popup asking you to sign in with your user account and approve access.
auth.authenticate_user()

def render_patch(patch: Union[dicom_slide.DicomPatch, gcs_image.GcsPatch], plot_name: str = '') -> None:
  patch_bytes = patch.image_bytes()
  if len(patch_bytes.shape) == 2 or (
      len(patch_bytes.shape) == 3 and patch_bytes.shape[-1] == 1
      ):
    mem = np.zeros((224, 224, 3), dtype=patch_bytes.dtype)
    mem[..., np.arange(3)] = patch_bytes[...]
    patch_bytes = mem
  print(patch_bytes.shape)
  plt.figure(figsize=(2, 2))
  plt.imshow(patch_bytes)
  plt.title(plot_name)
  plt.axis('off')
  plt.show()

# Computing Embeddings For Single Patch

The examples that follow illustrate how to use EZ-WSI to generate embeddings using the Pathology Embeddings API for a single patch from a DICOM image, image stored on Google Cloud Storage, and from an in memory representation. All examples accomplish this using the <u>get_patch_embedding</u> function.

***Performance-Tip:***

The get_patch_embedding function is the simplest patch-to-embedding interface. When embeddings are needed from multiple patches then embedding generation efficiency can be <u>greatly increased</u> by generating the embeddings in batch using either the generate_patch_embeddings function or the PatchEmbeddingSequence class.

```
def get_patch_embedding(
    endpoint: patch_embedding_endpoints.AbstractPatchEmbeddingEndpoint,
    patch: patch_embedding_types.EmbeddingPatch,
    ensemble_method: Optional[
        patch_embedding_ensemble_methods.PatchEnsembleMethod
    ] = None,
) -> np.ndarray:
 ```

Get patch embeddings function accepts the following parameters:

- <u>endpoint</u>: Is the abstraction interface through which EZ-WSI communicates with the various embedding model VertexAI endpoints and or local execution. See [endpoints](endpoints) for more information.

- <u>Patch</u>: A patch is patches are cropped sub regions from a DICOM WSI pyramid layer, a slide microscopy image, a or a traditional image stored either on Google Cloud Storage, or from local data source.

- <u>ensemble_method</u>: Ensemble methods are optional and enable EZ-WSI to generate embeddings for patches which exceed the embedding dimensions of the endpoint. If not provided, input patches must match the input width and height dimensions of the endpoint. See [ensemble methods](ensemble_method) for more information.

**Returns:**

Numpy array containing patch imaging embedding.


## Example: Generating a Single Patch Embedding from a DICOM Image Stored in the Google DICOM Store.

In [None]:
from ez_wsi_dicomweb import credential_factory
from ez_wsi_dicomweb import dicom_web_interface
from ez_wsi_dicomweb import dicom_slide
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import ez_wsi_dicomweb.ml_toolkit.dicom_path as dicom_path

# Defines DICOM image stored within a Google DICOM store.
DATASET_PROJECT_ID = 'hai-cd3-foundations'
DATASET_LOCATION = 'us-west1'
DATASET_ID = 'pathology'
STORE_ID  = 'camelyon'
STUDY_INSTANCE_UID  = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344594461'
SERIES_INSTANCE_UID = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344626463'


# Full path to DICOM store and DICOM series containing whole slide imaging.
series_path = dicom_path.FromString(f'https://healthcare.googleapis.com/v1/projects/{DATASET_PROJECT_ID}/locations/{DATASET_LOCATION}/datasets/{DATASET_ID}/dicomStores/{STORE_ID}/dicomWeb/studies/{STUDY_INSTANCE_UID}/series/{SERIES_INSTANCE_UID}')

# Crediental factory that provides EZ-WSI with credentials to access DICOM imaging metadata.
dcf = credential_factory.DefaultCredentialFactory()

# Create interface to slide, retrieves slide metadata but not slide imaging.
ds = dicom_slide.DicomSlide(path=series_path, dwi=dicom_web_interface.DicomWebInterface(dcf))

# Request a single patch of imaging from the highest magnfication.
patch = ds.get_patch(level=ds.native_level, x=43000, y=10000, width=224, height=224)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a patch (DicomPatch or GcsPatch) and returns an embedding.
embedding = patch_embedding.get_patch_embedding(endpoint, patch)

# Display image (Not required purely to illustrate source imaging for embedding)
render_patch(patch)

# Display first 12 values in embeding
print (embedding[:12])

## Example: Generating an Single Patch Embedding from a Traditional Image Stored in Google Cloud Storage




In [None]:
from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints


# Create a reference to an image stored on Google Cloud Storage
# By default authenticates with default credientals.
image = gcs_image.GcsImage('gs://healthai-us/pathology/example_large_patch.jpeg',
                           credential_factory=credential_factory.NoAuthCredentialsFactory())

# Define coordinates of image patch
patch = image.get_patch(x=10, y=10, width=224, height=224)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a patch (DicomPatch or GcsPatch) and returns an embedding.
embedding = patch_embedding.get_patch_embedding(endpoint, patch)

# Display image (Not required purely to illustrate source imaging for embedding)
render_patch(patch)

# Display first 12 values in embeding
print (embedding[:12])

## Example: Generating an Single Patch Embedding from In Memory Numpy Array

In [None]:
from ez_wsi_dicomweb import credential_factory
from ez_wsi_dicomweb import local_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import numpy as np


# Create a in memory uncompressed v
memory = np.zeros((224, 224, 3), dtype=np.uint8)

# Construct an image from the in memory patch
image = local_image.LocalImage(memory)

# Define coordinates of image patch
patch = image.get_patch(x=0, y=0, width=224, height=224)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a patch (DicomPatch or GcsPatch) and returns an embedding.
embedding = patch_embedding.get_patch_embedding(endpoint, patch)

# Display image (Not required purely to illustrate source imaging for embedding)
render_patch(patch, 'Image is expected to be entirely black')

# Display first 12 values in embeding
# Embeddings will match previous same image bytes
# but from local sourceembeddings.
print (embedding[:12])

# Resizing Embedding Source Imaging Dimensions

It is not uncommon that Digital Pathology ML performs best when executed on imaging encoded at an optimal pixel spacing / magnification (e.g. 20x). However, for a variety of reasons, imaging may not be stored within cloud sources (e.g., [DICOM Store,](https://cloud.google.com/healthcare-api/docs/how-tos/dicom) [Google Cloud Storage](https://cloud.google.com/storage?hl=en)) or locally at these optimial magnifications. The Pathology Embedding API enables source imaging to be retrieved and resampled on the embedding endpoint to match a desired target magnification prior to embedding generation.

The pathology API requires that input for patch embeddings match the input dimensions of embedding endpoints. The pathology embedding endpoints generate  embedding for patches that are 224 x 224 pixels. EZ-WSI provides patch embedding ensemble methods to work around these limitations and enable embeddings to be generated for patches of nearly any dimension. However, image resizing can also be used as an alternative to generate embeddings for image patches that would otherwise not match the embedding endpoint input requirements.

The mechanisms that EZ-WSI uses to define and rescale patch source imaging dimensions are slightly different for images stored on a DICOM store and for images stored on Google Cloud Storage.

## Example: Resizing DICOM Imaging at the Embedding Endpoint Prior to Embedding Generation.


In [None]:
from ez_wsi_dicomweb import credential_factory
from ez_wsi_dicomweb import dicom_web_interface
from ez_wsi_dicomweb import dicom_slide
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import ez_wsi_dicomweb.ml_toolkit.dicom_path as dicom_path

# Defines DICOM image stored within a Google DICOM store.
DATASET_PROJECT_ID = 'hai-cd3-foundations'
DATASET_LOCATION = 'us-west1'
DATASET_ID = 'pathology'
STORE_ID  = 'camelyon'
STUDY_INSTANCE_UID  = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344594461'
SERIES_INSTANCE_UID = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344626463'


# Full path to DICOM store and DICOM series containing whole slide imaging.
series_path = dicom_path.FromString(f'https://healthcare.googleapis.com/v1/projects/{DATASET_PROJECT_ID}/locations/{DATASET_LOCATION}/datasets/{DATASET_ID}/dicomStores/{STORE_ID}/dicomWeb/studies/{STUDY_INSTANCE_UID}/series/{SERIES_INSTANCE_UID}')

# Crediental factory that provides EZ-WSI with credentials to access DICOM imaging metadata.
dcf = credential_factory.DefaultCredentialFactory()

# Create interface to slide, retrieves slide metadata but not slide imaging.
ds = dicom_slide.DicomSlide(path=series_path, dwi=dicom_web_interface.DicomWebInterface(dcf))

# Resizes source imaging native_level to defined target dimensions that 4x
# smaller.
resized_level = ds.native_level.resize(dicom_slide.ImageDimensions(ds.native_level.width // 4, ds.native_level.height // 4))

# Request a single patch of imaging from the resized level.
# Generating the resized level will require the backend server fetch 16
# frames (4^2) when generating the embedding. The V1 and V2 endpoints support up
# to 8x server resampling.
#
# Important: Patch coordinates are defined in the coordinate system of the
# of the resized image. In the example here the image is downsampled 4x to
# and so source imaging patch coordinates are also scaled down 4x
patch = ds.get_patch(level=resized_level, x=43000//4, y=10000//4, width=224, height=224)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a patch (DicomPatch or GcsPatch) and returns an embedding.
embedding = patch_embedding.get_patch_embedding(endpoint, patch)

# Display image (Not required purely to illustrate source imaging for embedding)
render_patch(patch)

# Display first 12 values in embeding
print (embedding[:12])

## Example: Resizing Google Cloud Storage Imaging at the Embedding Endpoint Prior to Embedding Generation.

In [None]:
from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints


# Defines image dimensions for GCS Image.
image_dimensions = gcs_image.ImageDimensions(224, 224)

# Create a reference to an image stored on Google Cloud Storage
# By default authenticates with default credientals.
#
# Passing the optional image dimensions parameter into the GcsImage constructor
# instructs the GcsImage to resize the image to the desired target dimensions.
image = gcs_image.GcsImage('gs://healthai-us/pathology/example_large_patch.jpeg',
                           image_dimensions=image_dimensions,
                           credential_factory=credential_factory.NoAuthCredentialsFactory())

# Define coordinates of image patch
patch = image.get_patch(x=0, y=0, width=224, height=224)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a patch (DicomPatch or GcsPatch) and returns an embedding.
embedding = patch_embedding.get_patch_embedding(endpoint, patch)

# Display image (Not required purely to illustrate source imaging for embedding)
render_patch(patch)

# Display first 12 values in embeding
print (embedding[:12])

## Example: Resizing a Local Image Prior to Embedding Generation

Imaging defined from a local data source (memory or file). Similar to the imaging stored on Google Cloud Storage source local data can be resized. In contrast to data stored on cloud, local data is rescaled prior to sending the data to the cloud embedding endpoint.

In [None]:
from ez_wsi_dicomweb import local_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import numpy as np


# Create a in memory uncompressed image, white X on black background.
memory = np.zeros((224*10, 224*10, 3), dtype=np.uint8)
for y, x in zip(range(memory.shape[0]), range(memory.shape[1])):
  memory[y, x-3:x+4, :] = 255
  flipped_x =  memory.shape[1] - x - 1
  memory[y, flipped_x-3:flipped_x+4, :] = 255

# Defines image dimensions for GCS Image.
image_dimensions = gcs_image.ImageDimensions(224, 224)

# Create a reference to an image stored on Google Cloud Storage
# By default authenticates with default credientals.
#
# Passing the optional image dimensions parameter into the LocalImage constructor
# instructs the LocalImage to resize the image to the desired target dimensions.
image = local_image.LocalImage(memory, image_dimensions=image_dimensions)

# Define coordinates of image patch
patch = image.get_patch(x=0, y=0, width=224, height=224)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a patch (DicomPatch or GcsPatch) and returns an embedding.
embedding = patch_embedding.get_patch_embedding(endpoint, patch)

# Display image (Not required purely to illustrate source imaging for embedding)
render_patch(patch)

# Display first 12 values in embeding
print (embedding[:12])

# Generating Embeddings In Batch

When embeddings are required for multiple patches the time required for embedding generation can be greatly reduced by requesting the embeddings in batch from cloud data sources. The EZ-WSI provides general purpose interfaces for batch embedding generation. These methods accept an iterator or sequence of patches. Patches sampled from the same source are grouped, embedding requests are parallelized both at request generation and on the backend endpoints.

***Performance-Tip:***

The batching performance optimizations require that patches sampled from  the same source, e.g., WSI pyramid layer, DICOM image, natural world image, or local data  source should be clustered in the input iterator or sequence by data source; each pyramid layer of a DICOM slide is considered a unique data source. If requested at random the performance of batch requests will degrade to that of the individual patch requests.

***Embedding Generator***

The patch embedding generator is the most flexible batching mechanism. The generator accepts either an iterator or sequence of patches and yields an EmbeddingResult for each input patch.

```
def generate_patch_embeddings(
    endpoint: patch_embedding_endpoints.AbstractPatchEmbeddingEndpoint,
    patches: Union[
        Sequence[patch_embedding_types.EmbeddingPatch],
        Iterator[patch_embedding_types.EmbeddingPatch],
    ],
    ensemble_method: Optional[
        patch_embedding_ensemble_methods.PatchEnsembleMethod
    ] = None,
) -> Iterator[patch_embedding_types.EmbeddingResult]:
```

Get patch embeddings function accepts the following parameters:

- <u>endpoint</u>: Is the abstraction interface through which EZ-WSI communicates with the various embedding model VertexAI endpoints and or local execution. See [endpoints](endpoints) for more information.

- <u>patches</u>: Either an iterator or a sequence of patches. Patches should be clustered in the input iterator/sequence such that patches from the same data source are listed sequentially in the iterator or sequence input.

- <u>ensemble_method</u>: Ensemble methods are optional and enable EZ-WSI to generate embeddings for patches which exceed the embedding dimensions of the endpoint. If not provided, input patches must match the input width and height dimensions of the endpoint. See [ensemble methods](ensemble_method) for more information.

***Returns:***

<u>patch_embedding_types.EmbeddingResult</u>

EmbeddingResult is a common type returned by all batch methods. Embedding result is a data class which encapsulates the patch which was the source for the embedding and the produced embedding.

```
@dataclasses.dataclass(frozen=True)
class EmbeddingResult:
  """Embedding result for a patch."""
  patch: EmbeddingPatch
  embedding: np.ndarray
```



## Generate_patch_embeddings Embedding Generator

### Example: Generating Patches Embeddings in Batch from WSI DICOM Image


In [None]:
from typing import Iterator

from ez_wsi_dicomweb import credential_factory
from ez_wsi_dicomweb import dicom_web_interface
from ez_wsi_dicomweb import dicom_slide
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import ez_wsi_dicomweb.ml_toolkit.dicom_path as dicom_path

# Defines DICOM image stored within a Google DICOM store.
DATASET_PROJECT_ID = 'hai-cd3-foundations'
DATASET_LOCATION = 'us-west1'
DATASET_ID = 'pathology'
STORE_ID  = 'camelyon'
STUDY_INSTANCE_UID  = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344594461'
SERIES_INSTANCE_UID = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344626463'


def patch_generator(ds: dicom_slide.DicomSlide, level: dicom_slide.Level, x: int, y: int, step: int, num_patches: int) -> Iterator[dicom_slide.DicomPatch]:
  """Generates patches sequential patches the a pyramid level of a dicom slide."""
  for _ in range(num_patches):
    yield ds.get_patch(level, x, y, 224, 224)
    x += step
    if x+step >= level.width:
      x = 0
      y += step

# Full path to DICOM store and DICOM series containing whole slide imaging.
series_path = dicom_path.FromString(f'https://healthcare.googleapis.com/v1/projects/{DATASET_PROJECT_ID}/locations/{DATASET_LOCATION}/datasets/{DATASET_ID}/dicomStores/{STORE_ID}/dicomWeb/studies/{STUDY_INSTANCE_UID}/series/{SERIES_INSTANCE_UID}')

# Crediental factory that provides EZ-WSI with credentials to access DICOM imaging metadata.
dcf = credential_factory.DefaultCredentialFactory()

# Create interface to slide, retrieves slide metadata but not slide imaging.
ds = dicom_slide.DicomSlide(path=series_path, dwi=dicom_web_interface.DicomWebInterface(dcf))

# The generator, generates 500 patches sampled across the a single pyramid.
# Note, the generator could return patches sampled across multiple pyramid
# layers or images, the only requirement is patches from the same source image
# or pyramid layer are clustered.
patches = patch_generator(ds, ds.native_level, 43000, 10000, 224, 500)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a the patch (DicomPatch or GcsPatch) and returns an embedding.
embeddings = patch_embedding.generate_patch_embeddings(endpoint, patches)

# Convert the embedding generator into a list of values.
embeddings = list(embeddings)

# Display total number of embeddings generated
print(f'Embeddings returned: {len(embeddings)}')

# Display results for first two embeddings returned
print('First two embeddings results')
for result in embeddings[:2]:
  # render the source embedding patch
  render_patch(result.patch)

  # print the location and dimensions of the patch
  print(f'Patch Location x: {result.patch.x} y: {result.patch.y} width: {result.patch.width} height: {result.patch.height}')

  # print the first 12 values of the returned embedding
  print('First 12 values of patch image embedding.')
  print(result.embedding[:12])


### Example: Generating Patches Embeddings in Batch from a Google Cloud Storage

In [None]:
from typing import Iterator

from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints

def patch_generator(image: gcs_image.GcsImage, x: int, y: int, step: int, num_patches: int) -> Iterator[gcs_image.GcsPatch]:
  """Generates patches sequential patches the a pyramid level of a dicom slide."""
  for _ in range(num_patches):
    yield image.get_patch(x, y, 224, 224)
    x += step
    if x+224 >= image.width:
      x = 0
      y += step


# Create a reference to an image stored on Google Cloud Storage
# By default authenticates with default credientals.
image = gcs_image.GcsImage('gs://healthai-us/pathology/example_large_patch.jpeg',
                           credential_factory=credential_factory.NoAuthCredentialsFactory())

# The generator, generates 500 patches sampled across the GCS image.
# The image is relatively small so were generating embeddings for overlapping
# patches, stepping the patches 10 pixels at a time.
patches = patch_generator(image, 0, 0, 10, 500)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a the patch (DicomPatch or GcsPatch) and returns an embedding.
embeddings = patch_embedding.generate_patch_embeddings(endpoint, patches)

# Convert the embedding generator into a list of values.
embeddings = list(embeddings)

# Display total number of embeddings generated
print(f'Embeddings returned: {len(embeddings)}')

# Display results for two embeddings returned
print('Embeddings results')
for result in (embeddings[0], embeddings[20]):
  # render the source embedding patch
  render_patch(result.patch)

  # print the location and dimensions of the patch
  print(f'Patch Location x: {result.patch.x} y: {result.patch.y} width: {result.patch.width} height: {result.patch.height}')

  # print the first 12 values of the returned embedding
  print('First 12 values of patch image embedding.')
  print(result.embedding[:12])

### Example: Generating Embeddings In Batch From Local File and in Memory Data Sources

The performance gains from batch processing in memory requests to cloud embedding endpoints are lower due to inefficiencies associated with transferring large amounts of data from the client to the cloud endpoints. Performance is highly dependent upon the network bandwidth between the machine running EZ-WSI and the embedding Endpoint. To provide reasonable response times the number of embeddings retrieved in batch for the in memory test has been reduced by a factor of 10 from 500 to 50.

In [None]:
from typing import Iterator

from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import local_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import numpy as np


def patch_generator(image: local_image.LocalImage, x: int, y: int, step: int, num_patches: int) -> Iterator[gcs_image.GcsPatch]:
  """Generates patches sequential patches the a pyramid level of a dicom slide."""
  for _ in range(num_patches):
    yield image.get_patch(x, y, 224, 224)
    x += step
    if x+224 >= image.width:
      x = 0
      y += step


# Create a in memory uncompressed image filled with random noise.
memory = np.random.randint(0, high=255, size=(224*50, 224, 3), dtype=np.uint8)

# Construct an image from the in memory patch
image = local_image.LocalImage(memory)

# The generator, generates 50 patches sampled across the GCS image.
# The image is relatively small so were generating embeddings for overlapping
# patches, stepping the patches 10 pixels at a time.
patches = patch_generator(image, 0, 0, 224, 50)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a the patch (DicomPatch or GcsPatch) and returns an embedding.
embeddings = patch_embedding.generate_patch_embeddings(endpoint, patches)

# Convert the embedding generator into a list of values.
embeddings = list(embeddings)

# Display total number of embeddings generated
print(f'Embeddings returned: {len(embeddings)}')

# Display results for two embeddings returned
print('Embeddings results')
for result in (embeddings[0], embeddings[20]):
  # render the source embedding patch
  render_patch(result.patch, 'Random Colors Expected')

  # print the location and dimensions of the patch
  print(f'Patch Location x: {result.patch.x} y: {result.patch.y} width: {result.patch.width} height: {result.patch.height}')

  # print the first 12 values of the returned embedding
  print('First 12 values of patch image embedding.')
  print(result.embedding[:12])

## PatchEmbeddingSequence Class

Patch embedding sequences is a Python class that provides enhanced support for generating patch embeddings from sequences of patches, e.g., Lists or Tuples. The PatchEmbeddingSequence enables high performance iterative access and random access to patch embeddings contained within the sequence.


```
class PatchEmbeddingSequence(
    collections.abc.Sequence[patch_embedding_types.EmbeddingResult]
):

  def __init__(
      self,
      endpoint: patch_embedding_endpoints.AbstractPatchEmbeddingEndpoint,
      patches: Sequence[patch_embedding_types.EmbeddingPatch],
      ensemble_method: Optional[
          patch_embedding_ensemble_methods.PatchEnsembleMethod
      ] = None,
  ):
  """Constructor for PatchEmbeddingSequence.
  
  Args:
    endpoint: Is the an abstraction interface through which EZ-WSI communicates
    with the various embedding model VertexAI endpoints and or local execution.

    patches: A sequence of patches. Patches should be clustered in the input
    sequence such that patch from the same data source are fall sequentially in
    the sequence.

    ensemble_method: Ensemble methods are optional and enable EZ-WSI to
    generate embeddings for patches which exceed the embedding dimensions of
    the endpoint. If not provided, input patches must match the input width and
    height dimensions of the endpoint.

  """

  def __eq__(self, value: Any) -> bool:
    # Tests if two patch embedding sequences are composed of the same patches
  
  def __contains__(self, value: Any) -> bool:
    # Tests if a patch is in the Patch Embedding Sequence.

  def __getitem__(self, index: Union[int, slice]):
    # Indexed based access to embedding results.
    # Functionally equivalent to calling get_patch_embedding if called with int.
    # Functionally equivalent to calling generate_patch_embeddings if called
    # with a slice.

  def get_patch(self, index: int) -> patch_embedding_types,EmbeddingPatch:
    # Index based access to the source patch for an embedding.

  def get_embedding(self, index: int) -> np.ndarray:
    # Index based access to patch embedding results.

  def __iter__(self) -> Iterator[patch_embedding_types.EmbeddingResult]:
    # High performance batch access to embendding results for all patches
    # in the sequence.
    # Functionally equivalent to calling generate_patch_embeddings.

  def __len__(self) -> int:
    # Returns number of input patches in sequence.
    # Number of patches in sequence == number of embedding results that will be # returned.
```

**Performance-Tip:**

If embeddings are going to be requested for most-to-all patches in a PatchEmbedding sequence then patches then patches should be accessed using the PatchEmbeddingSequence iterator. e.g. or using an indexed slice,

- <u>Recommended</u><p>Access all sequence embedding using iterator or access a subset of sequence embeddings using index slice.</p>

```
# Accessing all embeddings
seq = patch_embedding.PatchEmbeddingSequence(list_of_patches)
for patch_embedding_result in seq:
  print(patch_embedding_result.embedding)

# Accessing embeddings for patches at indexs: 0, 2, 4, 6, and 8.
seq = patch_embedding.PatchEmbeddingSequence(list_of_patches)
for patch_embedding_result in seq[:10:2]:
  print(patch_embedding_result.embedding)
```

- <u>Not Recommended</u><p>
Repeated iterative access to index accessor.</p>

```
seq = index .PatchEmbeddingSequence(list_of_patches)
for index in range(len(seq)):
  patch_embedding_result = seq[index]
  print(patch_embedding_result.embedding)
```

### Example: Generating Patches Embeddings in Batch from WSI DICOM Image using PatchEmbeddingSequence

In [None]:
from typing import Sequence

from ez_wsi_dicomweb import credential_factory
from ez_wsi_dicomweb import dicom_web_interface
from ez_wsi_dicomweb import dicom_slide
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import ez_wsi_dicomweb.ml_toolkit.dicom_path as dicom_path

# Defines DICOM image stored within a Google DICOM store.
DATASET_PROJECT_ID = 'hai-cd3-foundations'
DATASET_LOCATION = 'us-west1'
DATASET_ID = 'pathology'
STORE_ID  = 'camelyon'
STUDY_INSTANCE_UID  = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344594461'
SERIES_INSTANCE_UID = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344626463'


def patch_sequence(ds: dicom_slide.DicomSlide, level: dicom_slide.Level, x: int, y: int, step: int, num_patches: int) -> Sequence[dicom_slide.DicomPatch]:
  """Generates patches sequential patches the a pyramid level of a dicom slide."""
  patches = []
  for _ in range(num_patches):
    patches.append(ds.get_patch(level, x, y, 224, 224))
    x += step
    if x+step >= level.width:
      x = 0
      y += step
  return patches

# Full path to DICOM store and DICOM series containing whole slide imaging.
series_path = dicom_path.FromString(f'https://healthcare.googleapis.com/v1/projects/{DATASET_PROJECT_ID}/locations/{DATASET_LOCATION}/datasets/{DATASET_ID}/dicomStores/{STORE_ID}/dicomWeb/studies/{STUDY_INSTANCE_UID}/series/{SERIES_INSTANCE_UID}')

# Crediental factory that provides EZ-WSI with credentials to access DICOM imaging metadata.
dcf = credential_factory.DefaultCredentialFactory()

# Create interface to slide, retrieves slide metadata but not slide imaging.
ds = dicom_slide.DicomSlide(path=series_path, dwi=dicom_web_interface.DicomWebInterface(dcf))

# The generator, generates 500 patches sampled across the a single pyramid.
# Note, the generator could return patches sampled across multiple pyramid
# layers or images, the only requirement is patches from the same source image
# or pyramid layer are clustered.
patches = patch_sequence(ds, ds.native_level, 43000, 10000, 224, 500)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a the patch (DicomPatch or GcsPatch) and returns an embedding.
embeddings = patch_embedding.PatchEmbeddingSequence(endpoint, patches)

# Display total number of embeddings generated
print(f'Embeddings returned: {len(embeddings)}')

# Display results for first two embeddings returned
print('First two embeddings results')
for result in embeddings[:2]:
  # render the source embedding patch
  render_patch(result.patch)

  # print the location and dimensions of the patch
  print(f'Patch Location x: {result.patch.x} y: {result.patch.y} width: {result.patch.width} height: {result.patch.height}')

  # print the first 12 values of the returned embedding
  print('First 12 values of patch image embedding.')
  print(result.embedding[:12])

### Example: Generating Patches Embeddings in Batch from a Google Cloud Storage Image UsingPatchEmbeddingSequence


In [None]:
from typing import Sequence

from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints

def patch_sequence(image: gcs_image.GcsImage, x: int, y: int, step: int, num_patches: int) -> Sequence[gcs_image.GcsPatch]:
  """Generates patches sequential patches the a pyramid level of a dicom slide."""
  patches = []
  for _ in range(num_patches):
    patches.append(image.get_patch(x, y, 224, 224))
    x += step
    if x+224 >= image.width:
      x = 0
      y += step
  return patches


# Create a reference to an image stored on Google Cloud Storage
# By default authenticates with default credientals.
image = gcs_image.GcsImage('gs://healthai-us/pathology/example_large_patch.jpeg',
                           credential_factory=credential_factory.NoAuthCredentialsFactory())

# The generator, generates 500 patches sampled across the a single pyramid.
# Note, the generator could return patches sampled across multiple pyramid
# layers or images, the only requirement is patches from the same source image
# or pyramid layer are clustered.
patches = patch_sequence(image, 0, 0, 10, 500)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a the patch (DicomPatch or GcsPatch) and returns an embedding.
embeddings = patch_embedding.PatchEmbeddingSequence(endpoint, patches)

# Display total number of embeddings generated
print(f'Embeddings returned: {len(embeddings)}')

# Display results for first two embeddings returned
print('First two embeddings results')
for result in embeddings[:2]:
  # render the source embedding patch
  render_patch(result.patch)

  # print the location and dimensions of the patch
  print(f'Patch Location x: {result.patch.x} y: {result.patch.y} width: {result.patch.width} height: {result.patch.height}')

  # print the first 12 values of the returned embedding
  print('First 12 values of patch image embedding.')
  print(result.embedding[:12])

### Example: Generating Embeddings In Batch From In Local/Memory Data Sources using PatchEmbeddingSequence

The performance gains from batch processing in memory requests to cloud embedding endpoints are lower due to inefficencys associated with transfering large amounts of data from the client to the cloud endpoints. Performance is highly dependent upon the network bandwith between the machine running EZ-WSI and the embedding Endpoint. To provided resonable colab response times the number of embeddings retrieved in batch for the in memory test has been reduced by a factor of 10 from 500 to 50.

In [None]:
from typing import Sequence

from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import local_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import numpy as np


def patch_sequence(image: gcs_image.GcsImage, x: int, y: int, step: int, num_patches: int) -> Sequence[gcs_image.GcsPatch]:
  """Generates patches sequential patches the a pyramid level of a dicom slide."""
  patches = []
  for _ in range(num_patches):
    patches.append(image.get_patch(x, y, 224, 224))
    x += step
    if x+224 >= image.width:
      x = 0
      y += step
  return patches


# Create a in memory uncompressed image filled with random noise.
memory = np.random.randint(0, high=255, size=(224*50, 224, 3), dtype=np.uint8)

# Construct an image from the in memory patch
image = local_image.LocalImage(memory)

# The generator, generates 50 patches sampled across the GCS image.
# The image is relatively small so were generating embeddings for overlapping
# patches, stepping the patches 10 pixels at a time.
patches = patch_sequence(image, 0, 0, 224, 50)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes a the patch (DicomPatch or GcsPatch) and returns an embedding.
embeddings = patch_embedding.PatchEmbeddingSequence(endpoint, patches)

# Display total number of embeddings generated
print(f'Embeddings returned: {len(embeddings)}')

# Display results for first two embeddings returned
print('First two embeddings results')
for result in embeddings[:2]:
  # render the source embedding patch
  render_patch(result.patch)

  # print the location and dimensions of the patch
  print(f'Patch Location x: {result.patch.x} y: {result.patch.y} width: {result.patch.width} height: {result.patch.height}')

  # print the first 12 values of the returned embedding
  print('First 12 values of patch image embedding.')
  print(result.embedding[:12])

# Higher Level Embedding Generation Functions

## Selectively Generating Embeddings From Sub-Regions of a Whole Image

EZ-WSI contains high level patch generation functions to selectively generate patches from regions of interest (e.g., areas containing tissue) within a DICOM pyramid layer, Google Cloud Storage Image, or from a local data source.

### Selectively Generating Embeddings from a DICOM WSI Pyramid Layer or DICOM Image

The function get_dicom_image_embeddings returns a PatchEmbeddingSequence of embeddings that are selectively sampled across a DICOM WSI pyramid layer or DICOM Image. If a mask is not provided and mask generation parameters are not defined then a default mask is generated using the DicomPatchGenerator defaults.

```
def get_dicom_image_embeddings(
    endpoint: patch_embedding_endpoints.AbstractPatchEmbeddingEndpoint,
    slide: dicom_slide.DicomSlide,
    ps: Union[
        slide_level_map.Level,
        slide_level_map.ResizedLevel,
        pixel_spacing.PixelSpacing,
    ],
    patch_size: Optional[int] = None,
    mask: Optional[np.ndarray] = None,
    stride_size: Optional[int] = None,
    min_luminance: Optional[float] = None,
    max_luminance: Optional[float] = None,
    mask_level: Union[
        slide_level_map.Level,
        slide_level_map.ResizedLevel,
        pixel_spacing.PixelSpacing,
        None,
    ] = None,
    ensemble_method: Optional[
        patch_embedding_ensemble_methods.PatchEnsembleMethod
    ] = None,
) -> PatchEmbeddingSequence:
```

The function takes the parameters listed:

- <u>endpoint</u>: Is the abstraction interface through which EZ-WSI communicates with the various embedding model VertexAI endpoints and or local execution. See [endpoints](endpoints) for more information.

- <u>slide</u>: The DICOM Slide that will be used to generate patch embeddings.

- <u>ps</u>: DICOM image level (wsi pyramid level or DICOM untiled DICOM image) that is the source for patch embeddings. If imaging is being generated from a WSI pyramid then ps can define the pixel spacing of the desired image level.

- <u>patch_size</u>: Source embedding patches are square. Patch size defines the length in pixels of an edge of the patch. If undefined defaults to patch dimensions of the endpoint.

- <u>mask</u>: User defined mask (numpy array, dtype=np.bool_). That defines which regions of the image contain regions of interest (e.g. tissue).  If not defined, masks will be generated using mask generation parameters.

- <u>stride_size</u>: The spacing between the upper left coordinate of the patches. Patches sampled with stride < patch_size will overlap. If undefined defaults to patch_size.

- <u>min_luminance</u>: If a tissue mask is not defined. Defines the minimum luminance value (range 0 - 1.0), for pixels to be considered of interest (e.g.,  tissue). If undefined defaults to (1.0 / 255.0), not applicable if generating from user defined mask.

- <u>max_luminance</u>: If a tissue mask is not defined. Defines the maximum luminance value (range 0 - 1.0), for pixels to be considered regions of interest (e.g., tissue). If undefined defaults to (204.0 / 255.0), not applicable if generating from user defined mask.

- <u>mask_level</u>: Defines the image level within the DICOM WSI pyramid layer that will be thresholded and used to generate a sampling mask. All imaging from level will be retrieved to generate sampling mask. It is highly recommended that relatively low magnification, high pixel spacing, levels  be used to generate the mask. The parameter is not applicable if a user defined mask is provided or if sampling from a non-tiled DICOM image. If undefined and applicable, this parameter will default to retrieve imaging that is ~1.25X.

- <u>Ensemble_method</u>: Ensemble methods are optional and enable EZ-WSI to generate embeddings for patches which exceed the embedding dimensions of the endpoint. If not provided, input patches must match the input width and height dimensions of the endpoint. See [ensemble methods](ensemble_method) for more information.

**Returns:**

  A PatchEmbeddingSequence that will generate and return embeddings sampled across the DICOM imaging.

In [None]:
from ez_wsi_dicomweb import credential_factory
from ez_wsi_dicomweb import dicom_web_interface
from ez_wsi_dicomweb import dicom_slide
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import ez_wsi_dicomweb.ml_toolkit.dicom_path as dicom_path

# Defines DICOM image stored within a Google DICOM store.
DATASET_PROJECT_ID = 'hai-cd3-foundations'
DATASET_LOCATION = 'us-west1'
DATASET_ID = 'pathology'
STORE_ID  = 'camelyon'
STUDY_INSTANCE_UID  = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344594461'
SERIES_INSTANCE_UID = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344626463'


# Full path to DICOM store and DICOM series containing whole slide imaging.
series_path = dicom_path.FromString(f'https://healthcare.googleapis.com/v1/projects/{DATASET_PROJECT_ID}/locations/{DATASET_LOCATION}/datasets/{DATASET_ID}/dicomStores/{STORE_ID}/dicomWeb/studies/{STUDY_INSTANCE_UID}/series/{SERIES_INSTANCE_UID}')

# Crediental factory that provides EZ-WSI with credentials to access DICOM imaging metadata.
dcf = credential_factory.DefaultCredentialFactory()

# Create interface to slide, retrieves slide metadata but not slide imaging.
ds = dicom_slide.DicomSlide(path=series_path, dwi=dicom_web_interface.DicomWebInterface(dcf))

# Optional but highly recommended, enables DS to retrieve patch imaging more
# efficiently when generating the tissue mask.
ds.init_slide_frame_cache()

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

embeddings = patch_embedding.get_dicom_image_embeddings(endpoint, ds, ds.native_level)

# Display total number of embeddings generated
print(f'Embeddings returned: {len(embeddings)}')

# Display results for first two embeddings returned
print('First two embeddings results')
for result in embeddings[:2]:
  # render the source embedding patch
  render_patch(result.patch)

  # print the location and dimensions of the patch
  print(f'Patch Location x: {result.patch.x} y: {result.patch.y} width: {result.patch.width} height: {result.patch.height}')

  # print the first 12 values of the returned embedding
  print('First 12 values of patch image embedding.')
  print(result.embedding[:12])

###Selectively Generating Embeddings from a image stored in Google Cloud Storage or from Local File or in Memory Data.

The function get_gcs_image_embeddings returns a PatchEmbeddingSequence of embeddings that are selectively sampled from an image stored on Google Cloud Storage or from an image loaded in memory. If a mask is not provided defined and mask generation parameters are not defined then a default mask is generated using the GcsImagePatchGenerator defaults.

```
def get_dicom_image_embeddings(
    endpoint: patch_embedding_endpoints.AbstractPatchEmbeddingEndpoint,
    image: Union[gcs_image.GcsImage, local_image.LocalImage],
    patch_size: Optional[int] = None,
    mask: Optional[np.ndarray] = None,
    stride_size: Optional[int] = None,
    min_luminance: Optional[float] = None,
    max_luminance: Optional[float] = None,
    ensemble_method: Optional[
        patch_embedding_ensemble_methods.PatchEnsembleMethod
    ] = None,
) -> PatchEmbeddingSequence:
```

The function takes the parameters listed:

- <u>endpoint</u>: Is the abstraction interface through which EZ-WSI communicates with the various embedding model VertexAI endpoints and or local execution. See [endpoints](endpoints) for more information.

- <u>image</u>: The source image that patches and embeddings will be generated from. An image can be an image stored on Google Cloud Store, or locally.

- <u>patch_size</u>: Source embedding patches are square. Patch size defines the length in pixels of an edge of the patch. If undefined defaults to patch dimensions of the endpoint.

- <u>mask</u>: User defined mask (numpy array, dtype=np.bool_). That will define which regions of the image contain regions of interest (e.g. tissue). If not defined, masks will be generated using tissue mask generation parameters.

- <u>stride_size</u>: The spacing between the upper left coordinate of the patches. Patches sampled with stride < patch_size will overlap. If undefined defaults to patch_size.

- <u>min_luminance</u>: If a tissue mask is not defined. Defines the minimum luminance value (range 0 - 1.0), for pixels to be considered of interest (e.g., tissue). If undefined defaults to (1.0 / 255.0), not applicable if generated from a user defined tissue mask.

- <u>max_luminance</u>: If a tissue mask is not defined. Defines the maximum luminance value (range 0 - 1.0), for pixels to be considered regions of interest (e.g., tissue). If undefined defaults to (204.0 / 255.0), not applicable if generated from a user defined tissue mask.

- <u>Ensemble_method</u>: Ensemble methods are optional and enable EZ-WSI to generate embeddings for patches which exceed the embedding dimensions of the endpoint. If not provided, input patches must match the input width and height dimensions of the endpoint. See [ensemble methods](ensemble_method) for more information.

**Returns:**

A PatchEmbeddingSequence that will generate and return embeddings sampled across the DICOM imaging.

In [None]:
from ez_wsi_dicomweb import credential_factory
from ez_wsi_dicomweb import dicom_web_interface
from ez_wsi_dicomweb import dicom_slide
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
from ez_wsi_dicomweb import patch_generator
import ez_wsi_dicomweb.ml_toolkit.dicom_path as dicom_path
import numpy as np

# Defines DICOM image stored within a Google DICOM store.
DATASET_PROJECT_ID = 'hai-cd3-foundations'
DATASET_LOCATION = 'us-west1'
DATASET_ID = 'pathology'
STORE_ID  = 'camelyon'
STUDY_INSTANCE_UID  = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344594461'
SERIES_INSTANCE_UID = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344626463'


# Full path to DICOM store and DICOM series containing whole slide imaging.
series_path = dicom_path.FromString(f'https://healthcare.googleapis.com/v1/projects/{DATASET_PROJECT_ID}/locations/{DATASET_LOCATION}/datasets/{DATASET_ID}/dicomStores/{STORE_ID}/dicomWeb/studies/{STUDY_INSTANCE_UID}/series/{SERIES_INSTANCE_UID}')

# Crediental factory that provides EZ-WSI with credentials to access DICOM imaging metadata.
dcf = credential_factory.DefaultCredentialFactory()

# Create interface to slide, retrieves slide metadata but not slide imaging.
ds = dicom_slide.DicomSlide(path=series_path, dwi=dicom_web_interface.DicomWebInterface(dcf))

# Optional but highly recommended, enables DS to retrieve patch imaging more
# efficiently when generating the tissue mask.
ds.init_slide_frame_cache()

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

mask = np.zeros((ds.native_level.height//10, ds.native_level.width//10), dtype=np.bool_)
mask[int(mask.shape[0]/ 2), int(mask.shape[0]/ 2)] = True
embeddings = patch_embedding.get_dicom_image_embeddings(endpoint, ds, ds.native_level, mask=mask)

# Display total number of embeddings generated
print(f'Embeddings returned: {len(embeddings)}')

# Display results for first two embeddings returned
for result in embeddings[:2]:
  # render the source embedding patch
  render_patch(result.patch)

  # print the location and dimensions of the patch
  print(f'Patch Location x: {result.patch.x} y: {result.patch.y} width: {result.patch.width} height: {result.patch.height}')

  # print the first 12 values of the returned embedding
  print('First 12 values of patch image embedding.')
  print(result.embedding[:12])

In [None]:
from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints


# Create a reference to an image stored on Google Cloud Storage
# By default authenticates with default credientals.
image = gcs_image.GcsImage('gs://healthai-us/pathology/example_large_patch.jpeg',
                           credential_factory=credential_factory.NoAuthCredentialsFactory())

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

embeddings = patch_embedding.get_gcs_image_embeddings(endpoint, image)

# Display total number of embeddings generated
print(f'Embeddings returned: {len(embeddings)}')

# Display results for first two embeddings returned
print('First two embeddings results')
for result in embeddings[:2]:
  # render the source embedding patch
  render_patch(result.patch)

  # print the location and dimensions of the patch
  print(f'Patch Location x: {result.patch.x} y: {result.patch.y} width: {result.patch.width} height: {result.patch.height}')

  # print the first 12 values of the returned embedding
  print('First 12 values of patch image embedding.')
  print(result.embedding[:12])

### Example: Selectively Generating Embeddings from a Local File or in Memory Data Source

The performance gains from batch processing in memory requests to cloud embedding endpoints are lower due to inefficiencies associated with transferring large amounts of data from the client to the cloud endpoints. Performance is highly dependent upon the network bandwidth between the machine running EZ-WSI and the embedding Endpoint.

In [None]:
from ez_wsi_dicomweb import local_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import numpy as np


# Create a in memory uncompressed image filled with random noise.
memory = np.random.randint(0, high=255, size=(224*50, 224, 3), dtype=np.uint8)

# Construct an image from the in memory patch
image = local_image.LocalImage(memory)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

embeddings = patch_embedding.get_gcs_image_embeddings(endpoint, image)

# Display total number of embeddings generated
print(f'Embeddings returned: {len(embeddings)}')

# Display results for first two embeddings returned
print('First two embeddings results')
for result in embeddings[:2]:
  # render the source embedding patch
  render_patch(result.patch)

  # print the location and dimensions of the patch
  print(f'Patch Location x: {result.patch.x} y: {result.patch.y} width: {result.patch.width} height: {result.patch.height}')

  # print the first 12 values of the returned embedding
  print('First 12 values of patch image embedding.')
  print(result.embedding[:12])

### Reducing Embedding Result Sequences or Iterators of Embedding Results to a Single Embedding
The high level functions, get_dicom_image_embeddings and get_gcs_image_embeddings, that selectively generate embeddings for a DICOM image, Google Cloud Storage image, or from a local data source can also be reduced using a utility function, mean_patch_embedding, that is included in the the patch_embedding_ensemble_methods module. This function takes an iterator or sequence of embedding results, and returns the mean embedding result of all the embeddings in the sequence.


```
def mean_patch_embedding(
    embeddings: Union[
        Iterator[patch_embedding_types.EmbeddingResult],
        Sequence[patch_embedding_types.EmbeddingResult],
    ],
) -> np.ndarray:
```

In [None]:
from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
from ez_wsi_dicomweb import patch_embedding_ensemble_methods


# Create a reference to an image stored on Google Cloud Storage
# By default authenticates with default credientals.
image = gcs_image.GcsImage('gs://healthai-us/pathology/example_large_patch.jpeg',
                           credential_factory=credential_factory.NoAuthCredentialsFactory())

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Selectively generates embeddings cross a GCS Image.
embeddings = patch_embedding.get_gcs_image_embeddings(endpoint, image)

print(f'Reducing {len(embeddings)} to a single embedding.')
# Reduces the 20 embeddings returned by get_gcs_image_embeddings to a single
# embedding.
embedding = patch_embedding_ensemble_methods.mean_patch_embedding(embeddings)

print('First 12 values of patch image embedding.')
print(embedding[:12])

## Generating Embeddings for Collections of Images Stored on Google Cloud Storage or Locally.

For ML training pipelines it is often necessary to generate embeddings for collections of images. EZ-WSI provides methods which enable sequences or iterators which return references to images stored on Google Cloud Storage or on locally to be easily converted into batch requests for pathology embeddings. These methods support automated image rescaling to a common dimensions and more advanced embedding ensembling to enable embedding generation from images of different sizes. Methods for selective sampling of sub-regions of the images is not supported by the provided methods.

### Generating Embeddings for Lists of Images stored on Google Cloud Storage

The gcs_images_to_embeddings function will transform lists of references to imaging stored on Google Cloud Storage into Embeddings. The [Google Cloud Storage Python Client Library](https://pypi.org/project/google-cloud-storage/) provides high level methods which will return iterators that list data (e.g., images) stored on the service, e.g. [google.cloud.storage.Client.list_blobs](https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.client.Client#google_cloud_storage_client_Client_list_blobs) and [google.cloud.storage.Blucket.list_blobs](https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.bucket.Bucket#google_cloud_storage_bucket_Bucket_list_blobs)). The output of these methods can be directly connected to this method to return embeddings for imaging in batch. Alternatively, a sequence or iterator that contains/returns a string referencing imaging using gs style paths 'gs://bucket/image.png' can be passed as input to gcs_images_to_embeddings to define a list of images stored on Google Cloud Storage.

```
def gcs_images_to_embeddings(
    endpoint: patch_embedding_endpoints.AbstractPatchEmbeddingEndpoint,
    images: patch_generator.GcsImagesToPatchesInputTypes,
    credential_factory: Optional[
        credential_factory_module.AbstractCredentialFactory
    ] = None,
    image_dimensions: Optional[gcs_image.ImageDimensions] = None,
    ensemble_method: Optional[
        patch_embedding_ensemble_methods.PatchEnsembleMethod
    ] = None,
) -> Iterator[patch_embedding_types.EmbeddingResult]:
```

The function takes the parameters listed:

- <u>endpoint</u>: Is the abstraction interface through which EZ-WSI communicates with the various embedding model VertexAI endpoints and or local execution. See [endpoints](endpoints) for more information.

- <u>images</u>: A sequence or iterator of paths to images stored on Google Cloud Storage.

- <u>credential_factory</u>: A ez-wsi credential factory that provides authentication credentials to connect to the imaging on Google Cloud Storage. If undefined will use the associated Application Default Credentials of the user / service account executing EZ-WSI.

- <u>image_dimensions</u>: Optional dimensions that all images will be rescaled to prior to embedding generation.

- <u>ensemble_method</u>: Ensemble methods are optional and enable EZ-WSI to generate embeddings for patches which exceed the embedding dimensions of the endpoint. If not provided, input patches must match the input width and height dimensions of the endpoint. See [ensemble methods](ensemble_method) for more information.

**Returns:**

Function returns an iterator which returns the embeddings and a reference to the source imaging that the embedding was generated from.

In [None]:
from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import google.cloud.storage


# Create a Python Client to Google Cloud Storage
# Use Application Default Credentials to connect.
cl = google.cloud.storage.Client.create_anonymous_client()

# Connect to bucket using client credientals
bucket = google.cloud.storage.bucket.Bucket(name='healthai-us', client=cl)

# Create an iterator that returns all blobs stored on the bucket at gs://healthai-us/pathology/training/cancer
images = bucket.list_blobs(prefix='pathology/training/cancer')

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Pass the iterator to gcs_images_to_embeddings to generate embeddings for all images stored at the bucket
# Use application default credientals to access imaging.
# Resize all imaging to 224 x 224 pixels prior to embedding generation.
embeddings = patch_embedding.gcs_images_to_embeddings(endpoint, images, image_dimensions=gcs_image.ImageDimensions(224, 224))

# iterate over the returned embedding results
for index, result in enumerate(embeddings):
  if index == 3:
    break
  # render the source embedding patch and list the uri of the
  # source data.
  render_patch(result.patch, result.patch.source.uri)

  # print the first 12 values of the returned embedding
  print('First 12 values of patch image embedding.')
  print(result.embedding[:12])
  print()

### Generating Embeddings for Lists of Images stored on Locally

The local_images_to_embeddings function will transform lists of in memory data or image files stored locally into embeddings.

```
def local_images_to_embeddings(
    endpoint: patch_embedding_endpoints.AbstractPatchEmbeddingEndpoint,
    images: patch_generator.GcsImagesToPatchesInputTypes,
    image_dimensions: Optional[gcs_image.ImageDimensions] = None,
    ensemble_method: Optional[
        patch_embedding_ensemble_methods.PatchEnsembleMethod
    ] = None,
) -> Iterator[patch_embedding_types.EmbeddingResult]:
```

The function takes the parameters listed:

- <u>endpoint</u>: Is the an abstraction interface through which EZ-WSI communicates with the various embedding model VertexAI endpoints and or local execution. See [endpoints](endpoints) for more information.

- <u>images</u>: A sequence or iterator of paths to images stored locally or image data loaded into memory uncompressed as numpy array or compressed as bytes.

- <u>image_dimensions</u>: Optional dimensions that all images will be rescaled to prior to embedding generation.

- <u>ensemble_method</u>: Ensemble methods are optional and enable EZ-WSI to generate embeddings for patches which exceed the embedding dimensions of the endpoint. If not provided, input patches must match the input width and height dimensions of the endpoint. See [ensemble methods](ensemble_method) for more information.

**Returns:**

Function returns an iterator which returns the embeddings and references to the source imaging that the embedding was generated from.

In [None]:
import os
import tempfile

from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
import numpy as np
import PIL.Image


# Create a temporary directory in the colab to write three example images too.
# Directory and images removed when context block is exited.
with tempfile.TemporaryDirectory() as temp_dir:
  # create memory hold temporary images that we are creating
  image = np.zeros((224, 224), dtype=np.uint8)
  # Write three images that are just a solid monochrome color
  for color in (0, 128, 180):
    image[...] = color
    # write the image to the temporary directory
    with PIL.Image.fromarray(image) as img:
      img.save(os.path.join(temp_dir, f'image_{color}.png'))

  # Colab Setup Complete

  # Create a list of images that we wrote to the temporary directory
  images = [os.path.join(temp_dir, fname) for fname in os.listdir(temp_dir)]

  # Defines the endpoint that will called to generate the embedding.
  endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

  # Request embeddings for the list iof images
  embeddings = patch_embedding.local_images_to_embeddings(endpoint, images)
  # iterate over the returned embedding results
  for result in embeddings:
    # render the source embedding patch and list the filename of the
    # source data.
    render_patch(result.patch, result.patch.source.filename)

    # print the first 12 values of the returned embedding
    print('First 12 values of patch image embedding.')
    print(result.embedding[:12])
    print()

## <a name="ensemble_method">Embedding Ensemble Methods</a>
The pathology embeddings REST API <u>requires</u> that the size of a patch in an embedding request matches the endpoints input requirements. The V1 and V2 endpoints generate embeddings for 224 x 224 images, and require all input patch imaging to be 224 x 224 pixels.

EZ-WSI loosens this requirement and enables <u>embeddings to be generated for patches of arbitrary dimensions</u>. EZ-WSI accomplishes this by providing all embedding generation functions and classes with an optional PatchEnsembleMethod parameter. This parameter defines methods that: 1) which enable imaging of arbitrary size to be broken into one or more patches, and 2) reduce a collection of embeddings into a single embedding.

EZ-WSI defines four patch embedding ensemble methods and an interface through which custom embedding ensemble methods can be easily integrated. The methods provided by EZ-WSI are:

- <b>DefaultSinglePatchEnsemble</b>: This is the default ensemble method that all embedding functions and classes use. This ensemble method enforces the requirement that the dimensions of input imaging match the endpoint.

```
class DefaultSinglePatchEnsemble(SinglePatchEnsemble):
  """Returns single embedding for patch, validates patch dim = embedding dim."""

  def __init__(self):

```

- <b>SinglePatchEnsemble</b>: This ensemble method accepts a patch of arbitrary input dimensions and replaces them with a <u>single patch embedding request</u> that matches the endpoints input requirements. The position of the new newly created patch is defined in one of 5 positions relative to the source patch.


```
class SinglePatchEnsemblePosition(enum.Enum):
  UPPER_LEFT = 'UPPER_LEFT'
  UPPER_RIGHT = 'UPPER_RIGHT'
  CENTER = 'CENTER'
  LOWER_LEFT = 'LOWER_LEFT'
  LOWER_RIGHT = 'LOWER_RIGHT'


class SinglePatchEnsemble(PatchEnsembleMethod):
  """Returns embedding generated from a single patch."""

  def __init__(self, position: SinglePatchEnsemblePosition):
    """SinglePatchEnsemble Constructor.

    Args:
      position: Position of patch to generate embedding.

    Raises:
      ez_wsi_errors.SinglePatchEmbeddingEnsemblePositionError: Invalid
        SinglePatchEnsemblePosition.
    """
```

- <b>FivePatchMeanEnsemble</b>: This method returns the mean embedding of five patches sampled at the upper left, upper right, center, lower left, and lower right positions.


```
class FivePatchMeanEnsemble(PatchEnsembleMethod):
  """Returns mean embedding from five patches sampled across the patch."""

  def __init__(self):
```

- <b>MeanPatchEmbeddingEnsemble</b>: This ensemble method accepts a patch of arbitrary input dimensions and replaces it with multiple patch requests that are sampled across the source patches dimensions. Patch embeddings are reduced to a single embedding by computing the element wise mean of the generated embeddings.


```
class MeanPatchEmbeddingEnsemble(PatchEnsembleMethod):
  """Returns mean embedding from set of embeddings sampled across the patch."""

  def __init__(self, step_x_px: int, step_y_px: int):
    """MeanPatchEmbeddingEnsemble Constructor.

    Args:
      step_x_px: Step size in x direction to sample patch for embedding.
      step_y_px: Step size in y direction to sample patch for embedding.

    Raises:
      ez_wsi_errors.SinglePatchEmbeddingEnsemblePositionError: Invalid
        SinglePatchEnsemblePosition.
    """
```

** Custom Ensemble Methods **

Writing custom patch embedding ensemble methods. Patch embedding ensemble methods are extensible by creating a custom class that inherits from PatchEnsembleMethod. Ensemble methods are required to implement two abstract methods that define how the ensemble method will transform an arbitrary patch embedding request into one or more requests that an endpoint can process and then how those generated embeddings will be reduced to a single embedding. The methods implemented in EZ-WSI are relatively simple. However these methods need not be limited to simple summary statistics (e.g., mean) and could conceptually be used to trigger additional complex ML that identifies the relevant regions of patch and then performs more complex/intelligent embedding reduction.

```
  @abc.abstractmethod
  def generate_ensemble(
      self,
      endpoint: patch_embedding_endpoints.AbstractPatchEmbeddingEndpoint,
      patch: patch_embedding_types.EmbeddingPatch,
  ) -> Iterator[patch_embedding_types.PatchEmbeddingSource]:
    """Yields iterator of patches of embedding dim to gen embedding for patch.

    Args:
      endpoint: Embedding endpoint used to generate patch embeddings.
      patch: Input pixel region to generate an embedding.

    Yields:
      PatchEmbeddingSource that define one or more sub patches that are
      required to generate an embedding for the patch.
    """

  @abc.abstractmethod
  def reduce_ensemble(
      self,
      patch: patch_embedding_types.EmbeddingPatch,
      ensemble_list: _ReducedType,
  ) -> patch_embedding_types.EmbeddingResult:
    """Returns single embedding result from ensemble of patch embeddings.

    Args:
      patch: Input pixel region embedding was generated from
      ensemble_list: List of embedding results generated within patch

    Returns:
      Single embedding result for patch.
    """
```

### Example: Generating an Five Patch Mean Ensemble Embedding for a 512 pixel x 512 pixel DICOM Patch.

In [None]:
from ez_wsi_dicomweb import credential_factory
from ez_wsi_dicomweb import dicom_web_interface
from ez_wsi_dicomweb import dicom_slide
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
from ez_wsi_dicomweb import patch_embedding_ensemble_methods
import ez_wsi_dicomweb.ml_toolkit.dicom_path as dicom_path

# Defines DICOM image stored within a Google DICOM store.
DATASET_PROJECT_ID = 'hai-cd3-foundations'
DATASET_LOCATION = 'us-west1'
DATASET_ID = 'pathology'
STORE_ID  = 'camelyon'
STUDY_INSTANCE_UID  = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344594461'
SERIES_INSTANCE_UID = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344626463'


# Full path to DICOM store and DICOM series containing whole slide imaging.
series_path = dicom_path.FromString(f'https://healthcare.googleapis.com/v1/projects/{DATASET_PROJECT_ID}/locations/{DATASET_LOCATION}/datasets/{DATASET_ID}/dicomStores/{STORE_ID}/dicomWeb/studies/{STUDY_INSTANCE_UID}/series/{SERIES_INSTANCE_UID}')

# Crediental factory that provides EZ-WSI with credentials to access DICOM imaging metadata.
dcf = credential_factory.DefaultCredentialFactory()

# Create interface to slide, retrieves slide metadata but not slide imaging.
ds = dicom_slide.DicomSlide(path=series_path, dwi=dicom_web_interface.DicomWebInterface(dcf))

# Request a single patch of imaging from the highest magnfication.
# Dimensions of the patch exceed the endpoint input requirements.
patch = ds.get_patch(level=ds.native_level, x=43000, y=10000, width=512, height=512)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes the input patch of arbitrary size and breaks it into 5 224 x 224 pixel
# patches which are sampled at the corners and center of each patch.
ensemble_method = patch_embedding_ensemble_methods.FivePatchMeanEnsemble()

# Takes a patch (DicomPatch or GcsPatch) and returns an embedding.
embedding = patch_embedding.get_patch_embedding(endpoint, patch, ensemble_method)

# Display image (Not required purely to illustrate source imaging for embedding)
render_patch(patch)

# Display first 12 values in embeding
print (embedding[:12])

### Example: Generating an Five Patch Mean Ensemble Embedding for a 512 pixel x 512 pixel Google Cloud Storage Image Patch.

In [None]:
from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints


# Create a reference to an image stored on Google Cloud Storage
# By default authenticates with default credientals.
image = gcs_image.GcsImage('gs://healthai-us/pathology/example_large_patch.jpeg',
                           credential_factory=credential_factory.NoAuthCredentialsFactory())

# Define coordinates of image patch
patch = image.get_patch(x=10, y=10, width=512, height=512)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes the input patch of arbitrary size and breaks it into 5 224 x 224 pixel
# patches which are sampled at the corners and center of each patch.
ensemble_method = patch_embedding_ensemble_methods.FivePatchMeanEnsemble()

# Takes a patch (DicomPatch or GcsPatch) and returns an embedding.
embedding = patch_embedding.get_patch_embedding(endpoint, patch, ensemble_method)
# Display image (Not required purely to illustrate source imaging for embedding)
render_patch(patch)

# Display first 12 values in embeding
print (embedding[:12])

### Example: Generating an Mean Ensemble Embedding for a 512 pixel x 512 pixel DICOM Patch.

In [None]:
from ez_wsi_dicomweb import credential_factory
from ez_wsi_dicomweb import dicom_web_interface
from ez_wsi_dicomweb import dicom_slide
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints
from ez_wsi_dicomweb import patch_embedding_ensemble_methods
import ez_wsi_dicomweb.ml_toolkit.dicom_path as dicom_path

# Defines DICOM image stored within a Google DICOM store.
DATASET_PROJECT_ID = 'hai-cd3-foundations'
DATASET_LOCATION = 'us-west1'
DATASET_ID = 'pathology'
STORE_ID  = 'camelyon'
STUDY_INSTANCE_UID  = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344594461'
SERIES_INSTANCE_UID = '1.3.6.1.4.1.11129.5.7.999.186491099540.79362771.1709051344626463'


# Full path to DICOM store and DICOM series containing whole slide imaging.
series_path = dicom_path.FromString(f'https://healthcare.googleapis.com/v1/projects/{DATASET_PROJECT_ID}/locations/{DATASET_LOCATION}/datasets/{DATASET_ID}/dicomStores/{STORE_ID}/dicomWeb/studies/{STUDY_INSTANCE_UID}/series/{SERIES_INSTANCE_UID}')

# Crediental factory that provides EZ-WSI with credentials to access DICOM imaging metadata.
dcf = credential_factory.DefaultCredentialFactory()

# Create interface to slide, retrieves slide metadata but not slide imaging.
ds = dicom_slide.DicomSlide(path=series_path, dwi=dicom_web_interface.DicomWebInterface(dcf))

# Request a single patch of imaging from the highest magnfication.
# Dimensions of the patch exceed the endpoint input requirements.
patch = ds.get_patch(level=ds.native_level, x=43000, y=10000, width=512, height=512)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes the input patch of abritrary size and breaks it into 224 x 224 pixel
# patches which are sampled regularlly across the with a horizontal and vertical
# spacing of 250 pixels, (upper left corner to adjacent patch upper left corner)
ensemble_method = patch_embedding_ensemble_methods.MeanPatchEmbeddingEnsemble(250, 250)

# Takes a patch (DicomPatch or GcsPatch) and returns an embedding.
embedding = patch_embedding.get_patch_embedding(endpoint, patch, ensemble_method)

# Display image (Not required purely to illustrate source imaging for embedding)
render_patch(patch)

# Display first 12 values in embeding
print (embedding[:12])

### Example: Generating an Mean Ensemble Embedding for a 512 pixel x 512 pixel Google Cloud Storage Image Patch.

In [None]:
from ez_wsi_dicomweb import gcs_image
from ez_wsi_dicomweb import patch_embedding
from ez_wsi_dicomweb import patch_embedding_endpoints


# Create a reference to an image stored on Google Cloud Storage
# By default authenticates with default credientals.
image = gcs_image.GcsImage('gs://healthai-us/pathology/example_large_patch.jpeg',
                           credential_factory=credential_factory.NoAuthCredentialsFactory())

# Define coordinates of image patch
patch = image.get_patch(x=10, y=10, width=512, height=512)

# Defines the endpoint that will called to generate the embedding.
endpoint = patch_embedding_endpoints.V2PatchEmbeddingEndpoint()

# Takes the input patch of abritrary size and breaks it into 224 x 224 pixel
# patches which are sampled regularlly across the with a horizontal and vertical
# spacing of 250 pixels, (upper left corner to adjacent patch upper left corner)
ensemble_method = patch_embedding_ensemble_methods.MeanPatchEmbeddingEnsemble(250, 250)

# Takes a patch (DicomPatch or GcsPatch) and returns an embedding.
embedding = patch_embedding.get_patch_embedding(endpoint, patch, ensemble_method)
# Display image (Not required purely to illustrate source imaging for embedding)
render_patch(patch)

# Display first 12 values in embeding
print (embedding[:12])

# <a name="endpoints">Embedding Endpoints</a>

All embedding generation functions and classes require the endpoint parameter. The the endpoint is a high level abstraction which defines the underlying interface through which EZ-WSI requests and receives embeddings. EZ-WSI provides three endpoint implementations: V2PatchEmbeddingEndpoint, LocalEndpoint, and V1PatchEmbeddingEndpoint. Additional endpoints can be implemented by writing a custom implementation that is derived from patch_embedding_endpoints.AbstractPatchEmbeddingEndpoint.

A description of the three endpoints provided by EZ-WSI follow:


## V2PatchEmbeddingEndpoint

The V2PatchEmbeddingEndpoint defines an endpoint connection to the just launched V2 Pathology Embeddings API. V2 endpoint supports embedding generation for patch requests for DICOM imaging, imaging stored on Google Cloud Storage, and imaging defined from local data sources. The V2Patch embedding endpoint supports combining requests from multiple sources from within a single batch embedding request. The V2 endpoint supports server side resizing and ICC color profile normalization.

```
class V2PatchEmbeddingEndpoint(_PatchEmbeddingEndpointBase):
  """Implements Patch embedding V2 API."""

  def __init__(
      self,
      endpoint_api_version: str = 'v1',  # Vertex API version
      project_id: str = 'hai-cd3-foundations',
      endpoint_location: str = 'us-central1',
      endpoint_id: str = '162',
      max_threads: int = _DEFAULT_ENDPOINT_THREADS,
      max_patches_per_request: int = _DEFAULT_MAX_PATCHES_PER_REQUEST,
      retry_count: int = _DEFAULT_RETRY_COUNT,
      icc_profile_normalization: IccProfileNormalization = (
          IccProfileNormalization.NONE
      ),
      send_gcs_patch_bytes_from_client_to_server: bool = False,
      require_fully_in_source_image: bool = True,
      credential_factory: Optional[
          credential_factory_module.AbstractCredentialFactory
      ] = None,
  ):
```

Constructor parameters:

- <u>endpoint_api_version</u>: The API version of the VertexAI endpoint hosting the V2 endpoint. Defaults to v1. (**Recommended** leave set to default value)

- <u>project_id</u>: The Google Cloud Project Id of the project hosting the Vertex AI Endpoint.

- <u>endpoint_id</u>: The endpoint Id of the VertexAI endpoint hosting the Pathology 2.0 endpoint.

- <u>max_threads</u>:To accelerate embedding generation for large batch jobs, EZ-WSI will split embedding requests across multiple threads, request embeddings in parallel from the endpoint and then recombine the results. Parallelization happens transparently when using the EZ-WSIs embedding interfaces. The maximum number of threads to launch when processing embedding requests; defaults to 5.

- <u>max_patches_per_request</u>: Maximum number of patches to request in one endpoint request. Defaults to 100. Requests which exceed this size are transparently split and processed across multiple requests. The V2 endpoint has an absolute maximum setting of 3,000 patches per-request.

- <u>retry_count</u>: Number of times the endpoint will retry failed requests.

- <u>send_gcs_patch_bytes_from_client_to_server</u>: Optimization flag, defaults to False (Recommended setting), if set to True then image patches referencing imaging stored on Google Cloud Storage that have retrieved the imaging pixel data prior to the embedding request may include the retrieved imaging bytes within the embedding request to enable the endpoint to process the imaging without additional Google Cloud Storage transactions. In most cases enabling this <u>will not improve performance</u>. Enabling this setting may be advantageous if EZ-WSI is being run from an environment with exceptionally high upload bandwidth.

- <b><u>icc_profile_normalization</u></b>: The V2 endpoint can perform ICC color profile normalization to transform patch imaging into a common color space prior to embedding generation. This parameter accepts an enum value that defines a target ICC Color profile that imaging containing an ICC Color profile will be transformed to prior to embedding generation. The V2 endpoint supports transforming imaging to the ICC Color Profiles defined as rendering parameters in the DICOM Standard (sRGB, AdobeRgb, and RommRGB). The parameter defaults to None indicating that no ICC color profile normalization will be performed.

```
class IccProfileNormalization(enum.Enum):
  """ICC Profile To Normalize Embedding Patches To."""

  NONE = 'NONE'
  SRGB = 'SRGB'
  ADOBERGB = 'ADOBERGB'
  ROMMRGB = 'ROMMRGB'
```

<b>Why is this important?</b>

Whole slide imaging is commonly acquired within the color space of the slide scanners. The color space of slide scanners varies within vendors, and across vendors. The effect of this is that when viewed on a monitor or processed by a ML model the identical slide scanned on two different systems may appear visually different (monitor) or quantitatively different by ML. ICC profile normalization transforms the color values within the imaging into a common space. This transformation should make imaging that is captured by calibrated systems with different optical characteristics more similar. Performing ICC Profile normalization may help make ML more generalizable across scanners.

- <u>require_fully_in_source_image</u>: Requires that patch coordinates fall within the source image coordinates.  Defaults to True. If True (default) then an error will be returned for embedding requests which define invalid patches. If false (not recommended) the non-overlapping portion of a patch will be set to black RGB (0, 0, 0).

- <u>credential_factory</u>: The credential factory defines how the OAuth credentials are obtained to connect to the embedding endpoint. By default the endpoint will use the application default credentials associated with the user / service account running EZ-WSI to connect to the Vertex AI endpoint. In most cases this parameter should be left set to this default setting. Note, these credentials <u>do not define the credentials used by the endpoint to retrieve imaging</u>. Image retrieval credentials are defined by credential_factorys defined on the interfaces used to generate patches from cloud imaging (GcsImage or DicomSlide) classes.

## LocalEndpoint

The LocalEndpoint defines an endpoint that can run locally within the machine executing EZ-WSI. If processing data from Cloud the local endpoint will retrieve imaging data. The local endpoint supports embedding generation for patch requests from DICOM imaging, imaging stored on Google Cloud Storage, and imaging defined from local data sources. The LocalEndpoint supports combining requests from multiple sources from within a single batch embedding request. The local endpoint supports image resizing and ICC color profile normalization.

** Performance Tip **

The local endpoint will typically have significantly lower performance when working from cloud data sources than the V2PatchEmbeddingEndpoint. The local endpoint may offer better performance than the cloud endpoints when processing data that is stored locally.

```
class LocalEndpoint(AbstractPatchEmbeddingEndpoint[np.ndarray]):
  """Endpoint for generating embeddings with a locally loaded model."""

  def __init__(
      self,
      model: Callable[[np.ndarray], np.ndarray],
      icc_profile_normalization: IccProfileNormalization = (
          IccProfileNormalization.NONE
      ),
      patch_width: int = 224,
      patch_height: int = 224,
      require_fully_in_source_image: bool = True,
      max_threads: int = _DEFAULT_ENDPOINT_THREADS,
      retry_count: int = _DEFAULT_RETRY_COUNT,
      max_patches_per_request: int = _DEFAULT_MAX_PATCHES_PER_REQUEST,
      dicom_instance_icc_profile_cache_count: int = _DEFAULT_DICOM_INSTANCE_ICC_PROFILE_CACHE_COUNT,
  ):
```

Constructor parameters:

model: A python function or other callable that when called with a batch of patch imaging (numpy) returns image embeddings. The shape of the numpy data is (batch, height, width, 3).  

- <u>icc_profile_normalization</u>: The local endpoint can perform ICC color profile normalization. See the V2Endpoint documentation for more information on this parameter. Defaults to None.

- <u>patch_width</u>: The patch image width that the endpoint accepts. Defaults to 224. Parameter should be set to the value required by the model.

- <u>patch_height</u>: The patch image width that the endpoint accepts. Defaults to 224. Parameter should be set to the value required by the model.

- <u>require_fully_in_source_image</u>: Requires that patch coordinates fall within the source image coordinates.  Defaults to True. If True (default) then an error will be returned for embedding requests which define invalid patches. If false (not recommended) the non-overlapping portion of a patch will be set to black RGB (0, 0, 0).

- <u>max_threads</u>:The maximum number of threads to use when retrieving data from the cloud; defaults to 5.

- <u>retry_count</u>: Number of times the endpoint will retry failed requests.

- <u>max_patches_per_request</u>: The maximum number of patches which will be processed at once. Batch requests which exceed this size will be split, executed in chunks, and then recombined. These operations occur transparently inside EZ-WSI.

- <u>dicom_instance_icc_profile_cache_count</u>: The number of ICC color profiles to cache within the endpoint. When generating ICC Color Profile corrected embeddings from DICOM imaging stored within the cloud both the imaging (pixel frame data) and the ICC Color Profile will need to be retrieved by the local endpoint. Some WSI scanners, e.g. Leica, produce images with very large ICC color profiles (~12 MB). The ICC profile cache temporarily stores the DICOM ICC Profiles being used to avoid repeated profile retrieval. This parameter has no effect if the imaging does not contain an ICC Color Profile or the patches are defined from a source other than DICOM.

## V1PatchEmbeddingEndpoint (Not Recommended)

The V1PatchEmbeddingEndpoint defines an endpoint connection to the V1 Pathology Embeddings API endpoint. The V1 endpoint is not recommended for general use. The V1 endpoint supports processing embedding requests on imaging stored within a Google DICOM store or on Google Cloud Storage. The V1 Google Research endpoints that process these requests are different. As a result a single instance of the EZ-WSI V1 endpoint can be configured to communicate with a DicomStore Endpoint or Google Cloud Storage Endpoint, but not both. The V1 endpoints do not support image resizing, icc color profile correction, or processing locally stored data.

```
class V1PatchEmbeddingEndpoint(_PatchEmbeddingEndpointBase):
  """Implements Patch embedding V1 API."""

  def __init__(
      self,
      endpoint_api_version: str = 'v1',  # Vertex API version
      project_id: str = 'hai-cd3-foundations',
      endpoint_location: str = 'us-central1',
      endpoint_id: str = '160',
      max_threads: int = _DEFAULT_ENDPOINT_THREADS,
      max_patches_per_request: int = _DEFAULT_MAX_PATCHES_PER_REQUEST,
      retry_count: int = _DEFAULT_RETRY_COUNT,
      send_gcs_patch_bytes_from_client_to_server: bool = False,
      credential_factory: Optional[
          credential_factory_module.AbstractCredentialFactory
      ] = None,
  ):
```

Constructor parameters:

- <u>endpoint_api_version</u>: The API version of the VertexAI endpoint hosting the V1 endpoint. Defaults to v1. (**Recommended** leave set to default value)

- <u>project_id</u>: The Google Cloud Project Id, name, of the project hosting the Vertex AI Endpoint.

- <u>endpoint_id</u>: The endpoint Id of the VertexAI endpoint hosting the Pathology 1.0 endpoint.

- <u>max_threads</u>:To accelerate embedding generation for large batch jobs, EZ-WSI will split embedding requests across multiple threads, request embeddings in parallel from the endpoint and then recombine the results. Parallelization happens transparently when using the EZ-WSIs embedding interfaces. The maximum number of threads to launch when processing embedding requests; defaults to 5.

- <u>max_patches_per_request</u>: Maximum number of patches to request in one endpoint request. Defaults to 100. Requests which exceed this size are transparently split and processed across multiple requests. The V1 endpoint has an absolute maximum setting of 3,000 patches per-request.

- <u>retry_count</u>: Number of times the endpoint will retry failed requests.

- <u>send_gcs_patch_bytes_from_client_to_server</u>: Optimization flag, defaults to False (Recommended setting), if set to True then image patches referencing imaging stored on Google Cloud Storage that have retrieved the imaging pixel data prior to the embedding request may include the retrieved imaging bytes within the embedding request to enable the endpoint to process the imaging without additional Google Cloud Storage transactions. In most cases enabling this <u>will not improve performance</u>. Enabling this setting may be advantageous if EZ-WSI is being run from an environment with exceptionally high upload bandwidth.

- <u>credential_factory</u>: The credential factory defines how the OAuth credentials are obtained to connect to the embedding endpoint. By default the endpoint will use the application default credentials associated with the user / service account running EZ-WSI to connect to the Vertex AI endpoint. In most cases this parameter should be left set to this default setting. Note, these credentials <u>do not define the credentials used by the endpoint to retrieve imaging</u>. Image retrieval credientals are defined by credential_factorys defined on the interfaces used to generate patches from cloud imaging (GcsImage or DicomSlide) classes.