~~~
Copyright 2025 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
~~~

# Prompt MedGemma 1.5 with DICOM Computed Tomography (CT) Imaging

<table><tbody><tr>
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/google-health/medgemma/blob/main/notebooks/rl_with_trl.ipynb">
      <img alt="Google Colab logo" src="https://www.tensorflow.org/images/colab_logo_32px.png" width="32px"><br> Run in Google Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogle-Health%2Fmedgemma%2Fmain%2Fnotebooks%2Frl_with_trl.ipynb">
      <img alt="Google Cloud Colab Enterprise logo" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" width="32px"><br> Run in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/google-health/medgemma/blob/main/notebooks/rl_with_trl.ipynb">
      <img alt="GitHub logo" src="https://github.githubassets.com/assets/GitHub-Mark-ea2971cee799.png" width="32px"><br> View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://huggingface.co/collections/google/medgemma-release-680aade845f90bec6a3f60c4">
      <img alt="Hugging Face logo" src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" width="32px"><br> View on Hugging Face
    </a>
  </td>
</tr></tbody></table>

This notebook demonstrates how to use 3D representation of [computed tomography (CT)](https://www.nibib.nih.gov/science-education/science-topics/computed-tomography-ct) imaging to prompt MedGemma 1.5 running on VertexAI.

Vertex AI makes it easy to serve your model and make it accessible to the world. Learn more about [Vertex AI](https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform).

### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing); use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

In [None]:
# @title Install pydicom Python library
%%capture
! pip install pydicom

In [None]:
# @title Authenticate Colab User to connect to DICOM store.
from google.colab import auth


# There will be a popup asking you to sign in with your user account and approve
# access.
auth.authenticate_user()

# Retrieve CT Imaging from Imaging Data Commons (IDC)

[Imaging Data Commons (IDC)](ttps://datacommons.cancer.gov/repository/imaging-data-commons#) is one of the largest publicly available, de-identified, repositories for cancer imaging. The repository is funded by the [National Cancer Institute (NCI)](https://www.cancer.gov/), an institute of the [National Institutes of Health (NIH)](https://www.nih.gov/), a part of the [U.S. Department of Health and Human Service](https://www.hhs.gov/). IDC contains imaging for all major medical imaging modalities. Imaging is stored within the archive as [DICOM](https://www.dicomstandard.org/). Imaging and its associated metadata can be searched, visualized through the [IDC website](https://portal.imaging.datacommons.cancer.gov/explore/), [BigQuery](https://cloud.google.com/healthcare-api/docs/resources/public-datasets/idc), and can be accessed using DICOMweb ([IDC tutorial](https://learn.canceridc.dev/data/downloading-data/dicomweb-access)).

DICOM is the medical imaging format generated by CT scanners.
[DICOM](https://www.dicomstandard.org/) images are uniquely identified by three UIDs, Study Instance UID, Series Instance UID, and a SOP Instance UID.  The CoLab retrieve a CT imaging from IDC by downloading all of the images associated with a CT scan. Conceptually, a Study Instance UID can be thought of as the UID that identifies all imaging acquired or generated as a result of a patient exam. Each medical image acquired as part of the exam, (e.g., CT acquisition), is identified by a unique Series Instance UID. Each image acquired or generated as part of the an acquisition is, in turn, identified with a unique SOP Instance UID.

CT imaging is commonly represented as an ordered collection of 2D images (slices). Each slice is represented by a 2D image that describes an axial cross section (volume) of the imaged body.



In [None]:
# @title Read metadata for the DICOM instances in a series.

import pydicom
import requests

# This notebook uses imaging hosted by: Imaging Data Commons (IDC)
# This notebook utilizes data from The Cancer Imaging Archive (TCIA).
# Collection: 	C4KC-KiTS   Case: KiTS-00004
study_instance_uid = '1.3.6.1.4.1.14519.5.2.1.6919.4624.108154519927657031748507898966'
series_instance_uid = '1.3.6.1.4.1.14519.5.2.1.6919.4624.203718711189211521545414320103'

# Read DICOM instance metadata for imaging from IDC
series = f'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/{study_instance_uid}/series/{series_instance_uid}'
metadata = requests.get(f'{series}/instances').json()
dicom_instances = [pydicom.Dataset.from_json(i) for i in metadata]
# Order instance metadata by instance number
dicom_instances = sorted(dicom_instances, key=lambda i: int(i.InstanceNumber))


In [None]:
# @title Load slice imaging for all DICOM instances in the series.

# @markdown Background: Each slice of a CT imaging is typically represented in DICOM as its own instance.
import io

ct_volume_slices = []
headers = {'Accept': 'application/dicom; transfer-syntax=*'}
for i in dicom_instances:
  # Download DICOM instance
  pixel_data = requests.get(f'{series}/instances/{i.SOPInstanceUID}', headers=headers).content
  with io.BytesIO(pixel_data) as pd:
    # Read DICOM instance from in memory buffer
    with pydicom.dcmread(pd) as dcm:
      # Extract CT Slice pixel data from and scale to hounsfield units.(
      ct_volume_slices.append(pydicom.pixels.apply_rescale(dcm.pixel_array, dcm))


#Define windowing for CT imaging

The voxels encoded within CT imaging are typically expressed as signed 16-bit [hounsfield units (HU)](https://en.wikipedia.org/wiki/Hounsfield_scale), one value per-voxel. CT images are commonly visualized as a grayscale image. The imaging is typically [windowed](https://radiopaedia.org/articles/windowing-ct) for human reading tasks to increase the contrast across a task specific diagnostic range.

The MedGemma image encoder accepts RGB, 8-bits per channel, imaging as input. MedGemma has also been trained to interpret CT imaging where the RGB channels of input imaging correspond to a novel windowing representation to enable it to interpret multiple representations of CT imaging simultaneously.  Specifically, MedGemma 1.5 has been trained with the components defined as follows:
  * Red (component 0):  Wide window; range: -1024 HU ([air](https://en.wikipedia.org/wiki/Hounsfield_scale)) to 1024 HU ([above bone](https://en.wikipedia.org/wiki/Hounsfield_scale))
  * Green (component 1):  Soft tissue window; range: 135 HU ([fat](https://en.wikipedia.org/wiki/Hounsfield_scale)) to 215 HU ([start of bone](https://en.wikipedia.org/wiki/Hounsfield_scale))
  * Blue (component 2): Brain window; range: 0 HU ([water](https://en.wikipedia.org/wiki/Hounsfield_scale)) to 80 HU  ([brain](https://en.wikipedia.org/wiki/Hounsfield_scale))

  Because, each of the RGB channels in the prompt imaging correspond to a windowing prompt imaging prepared using this method will visually appear color.



In [None]:
import io

import numpy as np
import IPython.display
import PIL.Image

def norm(ct_vol: np.ndarray, min: float, max: float) -> np.ndarray:
  """Window and normalize CT imaging Houndsfield values to values 0 - 255."""
  ct_vol = np.clip(ct_vol, min, max)  # Clip the imaging value range
  ct_vol = ct_vol.astype(np.float32)
  ct_vol -= min
  ct_vol /= (max - min) # Norm to values between 0 - 1.0
  ct_vol *= 255.0  # Norm to values been 0 - 255.0
  return ct_vol

def window(ct_vol: np.ndarray, dcm: pydicom.Dataset) -> np.ndarray:
  # Window CT slice imaging with three windows (wide, mediastinum(chest), brain)
  # Imaging will appear color when visualized, RGB channels contain different
  # representations of the data.
  window_clips = [(-1024, 1024), (-135, 215), (0, 80)]
  return np.stack([norm(ct_vol, clip[0], clip[1]) for clip in window_clips], axis=-1)

# Window CT Slice Data.
normalized_ct_volume_slices = []
for ct_slice in ct_volume_slices:
  windowed_slice = window(ct_slice, dcm)
  # Round slice voxels to nearest integer number.
  windowed_slice = np.round(windowed_slice, 0).astype(np.uint8)
  normalized_ct_volume_slices.append(windowed_slice)

# @markdown **Visualize CT Slice Windowing**

# Visualize windowed CT Slices.
ct_slice_images = [PIL.Image.fromarray(ct_slice) for ct_slice in normalized_ct_volume_slices]

# Save slice images as animated GIF
with io.BytesIO() as gif_bytes:
  ct_slice_images[0].save(gif_bytes, format='GIF', loop=0, save_all=True, append_images=ct_slice_images[1:], optimize=False, duration=len(normalized_ct_volume_slices)*3)
  ct_slice_animation = gif_bytes.getvalue()
# Display animated gif in colab
IPython.display.display(IPython.display.Image(data=ct_slice_animation, format='GIF'))


In [None]:
# @title Construct MedGemma 1.5 prompt formatted as Chat Completion.


# @markdown This section shows how to construct [chat completions](https://platform.openai.com/docs/api-reference/chat) requests to the endpoint using Vertex AI [prediction](https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions).

import base64
import PIL.Image

def _encode(data: np.ndarray) -> str:
  """Encode CT slice imaging inline in prompt."""
  # Image format to encode ct slice images as.
  # options: 'jpeg' or 'png'
  format = 'jpeg'
  with io.BytesIO() as img_bytes:
    with PIL.Image.fromarray(data) as img:
      img.save(img_bytes, format=format)
    img_bytes.seek(0)
    encoded_string = base64.b64encode(img_bytes.getbuffer()).decode("utf-8")
  return f"data:image/{format};base64,{encoded_string}"


prompt = 'Is there evidence of renal carcinoma in this CT scan volume?' # @param ['Is there evidence of renal carcinoma in this CT scan volume?', 'Is there evidence of arterial calcification in this ct volume']

# Generate chat completion formatted prompt.
content = []
for slice_number, ct_slice in enumerate(normalized_ct_volume_slices, 1):
  content.append({"type": "image_url", "image_url": {"url": _encode(ct_slice)}})
  content.append({"type": "text", "text": f'SLICE {slice_number}'})
content.append({"type": "text", "text": prompt})

messages = [
    {
        "role": "user",
        "content": content
    }
]

instance = {
    "@requestFormat": "chatCompletions",
    "messages": messages,
    "max_tokens": 500,
    "temperature": 0
}

In [None]:
# @title Display MedGemma 1.5 prompt.
import json
from IPython.display import display, Markdown


def truncate_prompt(obj, max_len):
  # Clip strings in prompt to avoid displaying excessively large content in colab notebook.
  if isinstance(obj, dict):
    return {k: truncate_prompt(v, max_len) for k, v in obj.items()}
  elif isinstance(obj, list):
    return [truncate_prompt(elem, max_len) for elem in obj]
  elif isinstance(obj, str) and len(obj) > max_len:
    return obj[:max_len] + "..."  # Add ellipsis for truncated strings
  return obj


txt = json.dumps(truncate_prompt(instance, 100), indent=4, sort_keys=True)
display(Markdown(f'```json\n{txt}'))

In [None]:
# @title Configure CoLab to call MedGemma 1.5 running in Vertex AI

# @markdown #### Prerequisites

# @markdown 1. Make sure that [billing is enabled](https://cloud.google.com/billing/docs/how-to/modify-project) for your project.

# @markdown 2. Make sure that either the Compute Engine API is enabled or that you have the [Service Usage Admin](https://cloud.google.com/iam/docs/understanding-roles#serviceusage.serviceUsageAdmin) (`roles/serviceusage.serviceUsageAdmin`) role to enable the API.

# @markdown This section sets the default Google Cloud project and enables the Compute Engine API (if not already enabled), and initializes the Vertex AI API.

import os
from google.cloud import aiplatform

Google_Cloud_Project = ""  # @param {type: "string", placeholder:"e.g. MyProject"}

# @markdown To get [online predictions](https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions), you will need a MedGemma [Vertex AI Endpoint](https://cloud.google.com/vertex-ai/docs/general/deployment) that has been deployed from Model Garden. If you have not already done so, go to the [MedGemma model card](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/medgemma) and click "Deploy options > Vertex AI" to deploy the model.

# @markdown **Note:** The examples in this notebook are intended to be used with instruction-tuned variants. Make sure to use an instruction-tuned model variant to run this notebook.

# @markdown This section gets the Vertex AI Endpoint resource that you deployed from Model Garden to use for online predictions.

# @markdown Fill in the endpoint ID and region below. You can find your deployed endpoint on the [Vertex AI online prediction page](https://console.cloud.google.com/vertex-ai/online-prediction/endpoints).


ENDPOINT_ID = ""  # @param {type: "string", placeholder:"e.g. 123456789"}
ENDPOINT_REGION = ""  # @param {type: "string", placeholder:"e.g. us-central1"}

# @markdown **Note:** The colab requires dedicated [Vertex AI endpoint](https://cloud.google.com/blog/products/ai-machine-learning/reliable-ai-with-vertex-ai-prediction-dedicated-endpoints?e=48754805).

os.environ["CLOUDSDK_CORE_PROJECT"] = Google_Cloud_Project
os.environ["GOOGLE_CLOUD_PROJECT"] = Google_Cloud_Project
os.environ["GOOGLE_CLOUD_REGION"] = ENDPOINT_REGION

# Enable the Compute Engine API, if not already.
print("Enabling Compute Engine API.")
! gcloud services enable compute.googleapis.com

# Initialize Vertex AI API.
print("Initializing Vertex AI API.")
aiplatform.init(project=os.environ["GOOGLE_CLOUD_PROJECT"],
                location=os.environ["GOOGLE_CLOUD_REGION"])

endpoint = aiplatform.Endpoint(
    endpoint_name=ENDPOINT_ID,
    project=Google_Cloud_Project,
    location=ENDPOINT_REGION,
)

# Use the endpoint name to check that you are using an appropriate model variant.
# These checks are based on the default endpoint name from the Model Garden
# deployment settings.
ENDPOINT_NAME = endpoint.display_name
if "pt" in ENDPOINT_NAME:
    raise ValueError(
        "The examples in this notebook are intended to be used with "
        "instruction-tuned variants. Please use an instruction-tuned model."
    )
if "text" in ENDPOINT_NAME:
    raise ValueError(
        "You are using a text-only variant which does not support multimodal"
        " inputs. Please proceed to the 'Run inference on text only' section."
    )

In [None]:
# @title # Call MedGemma 1.5 and return prediction

import json
from IPython.display import display, Markdown

response = endpoint.raw_predict(
    body=json.dumps(instance).encode('utf-8'), use_dedicated_endpoint=True,
    headers={'Content-Type': 'application/json'}
)
response.raise_for_status()
medgemma_response = response.json()["choices"][0]["message"]["content"]

display(Markdown(f"---\n\n**[ MedGemma ]**\n\n{medgemma_response}\n\n---"))