# MedImageInsight for Image and Text Embeddings Deployment and Inference using Batch Endpoints

This sample shows how to deploy MedImageInsight embedding type models to an batch endpoint for image and text embeddings inference. For this notebook, we use Python 3.10 - SDK v2. 

### Task
MedImageInsight takes in images and/or text samples. For each image and text sample, feature embeddings are returned from the model.
 
### Model
The models that can perform the `embeddings` task are tagged with `embeddings`. We will use the `MedImageInsight` model in this notebook. 

### Inference data
We will use a chest X-ray image and text as a sample input. 

### Outline
1. Setup pre-requisites
2. Pick a model to deploy
3. Deploy the model to an online endpoint
4. Test the endpoint
5. Clean up resources - delete the endpoint

## 1. Setup pre-requisites
* Install [Azure ML Client library for Python](https://learn.microsoft.com/en-us/python/api/overview/azure/ai-ml-readme?view=azure-python)
* Connect to AzureML Workspace and authenticate.

In [None]:
from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import (
    BatchEndpoint,
    ModelBatchDeployment,
    ModelBatchDeploymentSettings,
    Model,
    AmlCompute,
    Data,
    BatchRetrySettings,
    CodeConfiguration,
    Environment,
)
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction
from azure.identity import DefaultAzureCredential
import pandas as pd

credential = DefaultAzureCredential()
ml_workspace = MLClient.from_config(credential)
print("Workspace:", ml_workspace)
ml_registry = MLClient(credential, registry_name="azureml")
print("Registry:", ml_registry)

## 2. Pick a model to deploy

Browse models in the Model Catalog in the AzureML Studio, filtering by the `embeddings` task. In this example, we use the `MedImageInsight` model. If you have opened this notebook for a different model, replace the model name accordingly.

In [None]:
model = ml_registry.models.get(name="MedImageInsight", label="latest")
model

## Create compute cluster


In [None]:
compute_name = "mii-batch-cluster"
if not any(filter(lambda m: m.name == compute_name, ml_workspace.compute.list())):
    compute_cluster = AmlCompute(
        name=compute_name,
        description="GPU cluster compute for MedImageInsight inference",
        min_instances=0,
        max_instances=1,
        size="Standard_NC6s_v3",
    )
    ml_workspace.compute.begin_create_or_update(compute_cluster).result()

## 3. Deploy the model to an batch endpoint for inference
Batch endpoints give a durable REST API that can be used to integrate with applications that need to use the model.

### Create batch endpoint

In [None]:
import random
import string

endpoint_prefix = "mii-batch"
endpoint_list = list(
    filter(
        lambda m: m.name.startswith(endpoint_prefix),
        ml_workspace.batch_endpoints.list(),
    )
)

if endpoint_list:
    endpoint = endpoint_list and endpoint_list[0]
    print("Found existing endpoint:", endpoint.name)
else:
    # Creating a unique endpoint name by including a random suffix
    allowed_chars = string.ascii_lowercase + string.digits
    endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
    endpoint_name = f"{endpoint_prefix}-{endpoint_suffix}"
    endpoint = BatchEndpoint(
        name=endpoint_name,
        description="A batch endpoint for scoring images from MedImageInsigt.",
        tags={"type": "medimageinsight"},
    )
    ml_workspace.begin_create_or_update(endpoint).result()
    print(f"Created new endpoint: {endpoint_name}")

### Deploy MedImageInsight to batch endpoint

- **max_concurrency_per_instance**: Determines the number of worker process to spawn. Each worker process loads the model into GPU. We want to use multiple worker process to maximize GPU utilization, but not exceed available GPU memory.
- **retry_settings**: Timeout may need to be adjusted based on batch size. Larger batch size requires longer timeout; otherwise, worker process may end prematurely.

In [None]:
deployment = ModelBatchDeployment(
    name="mii-dpl",
    description="A deployment for model MedImageInsight",
    endpoint_name=endpoint.name,
    model=model,
    compute=compute_name,
    settings=ModelBatchDeploymentSettings(
        max_concurrency_per_instance=4,
        mini_batch_size=1,
        instance_count=1,
        output_action=BatchDeploymentOutputAction.APPEND_ROW,
        output_file_name="predictions.csv",
        retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
        logging_level="info",
    ),
)
ml_workspace.begin_create_or_update(deployment).result()

In [None]:
endpoint = ml_workspace.batch_endpoints.get(endpoint.name)
endpoint.defaults.deployment_name = deployment.name
ml_workspace.batch_endpoints.begin_create_or_update(endpoint).result()
print(f"The default deployment is {endpoint.defaults.deployment_name}")

## 4 Test the endpoint - base64 encoded image and text

We will test the batch endpoint using the sample dataset.

### Load sample dataset

Download the sample dataset using command `azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/`


In [None]:
import glob

root_dir = "/home/azureuser/data/healthcare-ai/medimageinsight-examparameter/pngs"

png_files = glob.glob(f"{root_dir}/**/*.png", recursive=True)
print(f"Found {len(png_files)} PNG files")

### Create the input CSV file


#### Why Zero-Padding Batch Filenames Matters

The function `write_to_csv()` will automatically create batch files with  **zero-padded numeric suffixes** (e.g., `batch_input_001.csv`, `batch_input_002.csv`, ..., `batch_input_010.csv`). 
It's essential to use that index for enumerating your batches. 

This ensures that files are **sorted in the correct numerical order**, rather than lexicographic string order. E.g., without padding, `batch10` would appear **before** `batch2` or `batch3` when sorting, which can lead to confusing or incorrect alignment between batch input files and batch output results. Zero-padding helps maintain predictable ordering and avoids mismatches during downstream processing or aggregation.


In [None]:
import base64
import os
import shutil

csv_folder = "batch_inputs"
sample_dataset_size = len(png_files)
target_dataset_size = 10000
batch_max_size = 100
batch_count = 0
batch = []

# Pad width for filename indexing based on dataset size
index_pad_width = len(str(target_dataset_size))

# read and encode image to base64
def read_base64_image(image_path):
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")


# write batch to a CSV file
def write_to_csv(batch_data, batch_index):
    """
    Write the current batch to a CSV file with zero-padded index.
    """
    filename = f"batch_input_{str(batch_index).zfill(index_pad_width)}.csv"
    csv_path = os.path.join(csv_folder, filename)
    df_input = pd.DataFrame(batch_data, columns=["image", "text"])
    df_input.to_csv(csv_path, index=False)


# remove and create folder for CSV files
if os.path.exists(csv_folder):
    print(f"Removing existing folder: {csv_folder}")
    shutil.rmtree(csv_folder)

print(f"Creating folder: {csv_folder}")
os.makedirs(csv_folder)

# create test dataset by repeating images from the sample dataset
for i in range(target_dataset_size):
    png_index = i % sample_dataset_size
    png_file = png_files[png_index]
    base64_image = read_base64_image(png_file)

    # Append one row to the batch
    batch.append([base64_image, "x-ray chest anteroposterior Pneumonia"])

    # If batch is full, write it out
    if len(batch) >= batch_max_size:
        write_to_csv(batch, batch_count)
        batch_count += 1
        batch = []

# Write any remaining data in the final batch
if batch:
    write_to_csv(batch, batch_count)

### Load the test dataset into AzureML


In [None]:
dataset_name = "mi2-png-dataset"

png_dataset = Data(
    path=csv_folder,
    type=AssetTypes.URI_FOLDER,
    description=f"Sample dataset consist of {target_dataset_size} PNG images with batch size of {batch_max_size}",
    name=dataset_name,
)

ml_workspace.data.create_or_update(png_dataset)

### Verify the test dataset is uploaded successfully

In [None]:
ml_workspace.data.get(name=dataset_name, label="latest")

### Submit a job to the batch endpoint

In [None]:
input = Input(type=AssetTypes.URI_FILE, path=png_dataset.path)
input

In [None]:
job = ml_workspace.batch_endpoints.invoke(endpoint_name=endpoint.name, input=input)

In [None]:
# Monitor job progress
ml_workspace.jobs.stream(job.name)

### Download the job output

MedImageInsight embeddings can be found in file `named-outputs/score/predictions.csv`


In [None]:
scoring_job = list(ml_workspace.jobs.list(parent_job_name=job.name))[0]
scoring_job

In [None]:
ml_workspace.jobs.download(
    name=scoring_job.name, download_path=".", output_name="score"
)

## Load job result


In [None]:
pred_csv_path = os.path.join(os.getcwd(), "named-outputs", "score", "predictions.csv")
df_result = pd.read_csv(pred_csv_path, header=None)
df_result.iloc[0]  # print first row

## 5. Clean up resources - delete the online endpoint

In [None]:
ml_workspace.batch_endpoints.begin_delete(endpoint_name).result()