## Image-text to Text Inference using Online Endpoints

This sample notebook shows how to deploy `image-text-to-text` type models to an online endpoint for inference.

### Task
`image-text-to-text` takes as input image-prompt pairs and generates an answer for each pair. The prompt can consist of a single question or a dialog between the user and the model that ends with a question from the user.

### Model
Models suitable to the `image-text-to-text` task are tagged with `image-text-to-text`. We will use the `llava-7b` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import_model_into_registry.ipynb) and then use them for inference.

### Inference data
We will use images from the [fridgeObjects](https://cvbp-secondary.z19.web.core.windows.net/datasets/image_classification/fridgeObjects.zip) dataset.


### Outline
1. Setup pre-requisites
2. Pick a model to deploy
3. Prepare data for inference
4. Deploy the model to an online endpoint for real time inference
5. Test the endpoint
6. Clean up resources - delete the online endpoint

### 1. Setup pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `azureml` system registry

In [None]:
%pip install llava-torch==1.0.2 --no-deps

In [None]:
import time

from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
)
from azure.ai.ml.entities import AmlCompute


try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

try:
    workspace_ml_client = MLClient.from_config(credential)
    subscription_id = workspace_ml_client.subscription_id
    resource_group = workspace_ml_client.resource_group_name
    workspace_name = workspace_ml_client.workspace_name
except Exception as ex:
    print(ex)
    # Enter details of your AML workspace
    subscription_id = "<SUBSCRIPTION_ID>"
    resource_group = "<RESOURCE_GROUP>"
    workspace_name = "<AML_WORKSPACE_NAME>"
workspace_ml_client = MLClient(
    credential, subscription_id, resource_group, workspace_name
)

# The models are available in the AzureML system registry, "azureml"
registry_ml_client = MLClient(
    credential,
    subscription_id,
    resource_group,
    registry_name="azureml",
)

### 2. Pick a model to deploy

Browse models in the Model Catalog in the AzureML Studio, filtering by the `image-text-to-text` task. In this example, we use the `llava-7b` model. If you have opened this notebook for a different model, replace the model name accordingly.

In [None]:
model_name = "llava-7b"

foundation_model = registry_ml_client.models.get(name=model_name, label="latest")
print(
    f"\n\nUsing model name: {foundation_model.name}, version: {foundation_model.version}, id: {foundation_model.id} for inferencing"
)

### 3. Prepare data for inference

We will use images in the [fridgeObjects](https://cvbp-secondary.z19.web.core.windows.net/datasets/image_classification/fridgeObjects.zip) dataset to construct example image-prompt pairs for the model.


In [None]:
import os
import urllib

from zipfile import ZipFile


# Change to a different location if you prefer
dataset_parent_dir = "./data"

# Create data folder if it doesnt exist.
os.makedirs(dataset_parent_dir, exist_ok=True)

# Download data
download_url = "https://cvbp-secondary.z19.web.core.windows.net/datasets/image_classification/fridgeObjects.zip"

# Extract current dataset name from dataset url
dataset_name = os.path.split(download_url)[-1].split(".")[0]
# Get dataset path for later use
dataset_dir = os.path.join(dataset_parent_dir, dataset_name)

# Get the data zip file path
data_file = os.path.join(dataset_parent_dir, f"{dataset_name}.zip")

# Download the dataset
urllib.request.urlretrieve(download_url, filename=data_file)

# Extract files
with ZipFile(data_file, "r") as zip:
    print("extracting files...")
    zip.extractall(path=dataset_parent_dir)
    print("done")
# Delete zip file
os.remove(data_file)

In [None]:
from IPython.display import Image


sample_image = os.path.join(dataset_dir, "milk_bottle", "99.jpg")
Image(filename=sample_image)

### 4. Deploy the model to an online endpoint for real time inference
Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model.

In [None]:
import time

from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
)


# Endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name
timestamp = int(time.time())
online_endpoint_name = "image-text-to-text-" + str(timestamp)
# Create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Online endpoint for "
    + foundation_model.name
    + ", for image-text-to-text task",
    auth_mode="key",
)
workspace_ml_client.begin_create_or_update(endpoint).wait()

In [None]:
from azure.ai.ml.entities import OnlineRequestSettings, ProbeSettings


deployment_name = "image-text-to-text-mlflow-deploy"

# Create a deployment
demo_deployment = ManagedOnlineDeployment(
    name=deployment_name,
    endpoint_name=online_endpoint_name,
    model=foundation_model.id,
    instance_type="Standard_NC6s_v3",
    instance_count=1,
    request_settings=OnlineRequestSettings(
        max_concurrent_requests_per_instance=1,
        request_timeout_ms=90000,
        max_queue_wait_ms=500,
    ),
    liveness_probe=ProbeSettings(
        failure_threshold=49,
        success_threshold=1,
        timeout=299,
        period=180,
        initial_delay=180,
    ),
    readiness_probe=ProbeSettings(
        failure_threshold=10,
        success_threshold=1,
        timeout=10,
        period=10,
        initial_delay=10,
    ),
)
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
endpoint.traffic = {deployment_name: 100}
workspace_ml_client.begin_create_or_update(endpoint).result()

### 5. Test the endpoint

We will use an image from the dataset and your questions about it to form example inputs for the model. When needed, the questions will be converted to the right format for the model version and submitted to the online endpoint for inference.

#### 5.1. Direct question

We provide functionality to directly submit a question to the model about an image, internally converting the question to the prompt format required by the model version. This is done by setting the "prompt" column of the input to the empty string and the "direct_question" column to the question, see the `make_request_direct_question()` function below.


In [None]:
import base64
import json


REQUEST_FILE_NAME = "request.json"


def make_request_direct_question(image_bytes, direct_question):
    request_json = {
        "input_data": {
            "columns": ["image", "prompt", "direct_question"],
            "data": [
                [base64.encodebytes(image_bytes).decode("utf-8"), "", direct_question],
            ],
        }
    }
    with open(REQUEST_FILE_NAME, "wt") as f:
        json.dump(request_json, f)


def get_response():
    response = workspace_ml_client.online_endpoints.invoke(
        endpoint_name=online_endpoint_name,
        deployment_name=demo_deployment.name,
        request_file=REQUEST_FILE_NAME,
    )

    return json.loads(response)[0]["response"]

In [None]:
image_bytes = open(os.path.join(dataset_dir, "milk_bottle", "99.jpg"), "rb").read()
question = "What is in this image?"

make_request_direct_question(image_bytes, question)
print(get_response())

#### 5.2. Dialog

We provide functionality to submit a prompt to the model, which can encode either a question or a dialog ending with a question for the model. This is done by setting the "prompt" column of the input, see the `make_request_prompt()` function below; note that the "direct_question" column is ignored in this case. We rely on functionality in the `llava-torch` package to manage the dialog and convert it into a prompt for the model, see the `run_dialog_loop()` function below.

Due to memory limitations on V100 machines, a dialog with the `llava-7b` model can only have a small number of questions, e.g. 7. To ask additional questions, a new dialog can be made; if there is a need for dialogs longer than 10 questions, we suggest using an A100 machine (or better) and the `llava-13b` model.


In [None]:
from llava.constants import DEFAULT_IMAGE_TOKEN
from llava.conversation import conv_templates


NUM_QUESTIONS = 7


def make_request_prompt(image_bytes, prompt):
    request_json = {
        "input_data": {
            "columns": ["image", "prompt", "direct_question"],
            "data": [
                [base64.encodebytes(image_bytes).decode("utf-8"), prompt, ""],
            ],
        }
    }
    with open(REQUEST_FILE_NAME, "wt") as f:
        json.dump(request_json, f)


def run_dialog_loop(image_bytes):
    conv_mode = "llava_llama_2"

    conv = conv_templates[conv_mode].copy()
    roles = conv.roles

    first_time = True
    for _ in range(NUM_QUESTIONS):
        try:
            inp = input(f"{roles[0]}: ")
        except EOFError:
            inp = ""
        if not inp:
            print("exit...")
            break

        print(f"{roles[1]}: ", end="")

        if first_time:
            first_time = False
            inp = DEFAULT_IMAGE_TOKEN + "\n" + inp
            conv.append_message(conv.roles[0], inp)
        else:
            # later messages
            conv.append_message(conv.roles[0], inp)
        conv.append_message(conv.roles[1], None)
        prompt = conv.get_prompt()

        make_request_prompt(image_bytes, prompt)
        response = get_response()

        conv.messages[-1][-1] = response

        printed_response = response
        if printed_response.endswith("<|im_end|>"):
            printed_response = printed_response[: -len("<|im_end|>")]
        if printed_response.endswith("</s>"):
            printed_response = printed_response[: -len("</s>")]
        print(printed_response)

In [None]:
image_bytes = open(os.path.join(dataset_dir, "milk_bottle", "99.jpg"), "rb").read()

run_dialog_loop(image_bytes)

### 6. Clean up resources (delete the online endpoint)
Do not forget to delete the online endpoint. You will be billed for the compute used by the endpoint if it is not deleted.

In [None]:
workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()