## Text to Image Retrieval using Online Endpoints

This sample shows how to deploy `embeddings` type models to an online endpoint for evaluating text to image retrieval embeddings.
 
### Model
Models that can perform the `embeddings` task are tagged with `embeddings`. We will use the `OpenAI-CLIP-Image-Text-Embeddings-vit-base-patch32` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import_model_into_registry.ipynb) and then use them for inference. 

### Outline
1. Deploy model to online endpoint
2. Parse the data in the expected format
3. Extract embeddings using the endpoint
4. Evaluate the embeddings for image retrieval

# 1. Deploy model to online endpoint

### 1a. Setup pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `azureml` system registry

In [None]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
    ClientSecretCredential,
)
from azure.ai.ml.entities import AmlCompute
import time

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

try:
    workspace_ml_client = MLClient.from_config(credential)
    subscription_id = workspace_ml_client.subscription_id
    resource_group = workspace_ml_client.resource_group_name
    workspace_name = workspace_ml_client.workspace_name
except Exception as ex:
    print(ex)
    # Enter details of your AML workspace
    subscription_id = "<SUBSCRIPTION_ID>"
    resource_group = "<RESOURCE_GROUP>"
    workspace_name = "<AML_WORKSPACE_NAME>"
workspace_ml_client = MLClient(
    credential, subscription_id, resource_group, workspace_name
)

# The models are available in the AzureML system registry, "azureml"
registry_ml_client = MLClient(
    credential,
    subscription_id,
    resource_group,
    registry_name="azureml",
)

### 1b. Pick a model to deploy

Browse models in the Model Catalog in the AzureML Studio, filtering by the `embeddings` task. In this example, we use the `OpenAI-CLIP-Image-Text-Embeddings-vit-base-patch32` model. If you have opened this notebook for a different model, replace the model name accordingly.

In [None]:
model_name = "OpenAI-CLIP-Image-Text-Embeddings-vit-base-patch32"
foundation_model = registry_ml_client.models.get(name=model_name, label="latest")
print(
    f"\n\nUsing model name: {foundation_model.name}, version: {foundation_model.version}, id: {foundation_model.id} for inferencing"
)

### 1c. Deploy the model to an online endpoint for real time inference

Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model.

In [None]:
import time, sys
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
)

# Endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name
timestamp = int(time.time())
online_endpoint_name = "clip-embeddings-" + str(timestamp)
# Create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Online endpoint for "
    + foundation_model.name
    + ", for image-text-embeddings task",
    auth_mode="key",
)
workspace_ml_client.begin_create_or_update(endpoint).wait()

In [None]:
from azure.ai.ml.entities import OnlineRequestSettings, ProbeSettings

deployment_name = "embeddings-mlflow-deploy"

# Create a deployment
demo_deployment = ManagedOnlineDeployment(
    name=deployment_name,
    endpoint_name=online_endpoint_name,
    model=foundation_model.id,
    instance_type="Standard_DS3_V2",  # Use GPU instance type like Standard_NC6s_v3 for faster inference
    instance_count=1,
    request_settings=OnlineRequestSettings(
        max_concurrent_requests_per_instance=1,
        request_timeout_ms=90000,
        max_queue_wait_ms=500,
    ),
    liveness_probe=ProbeSettings(
        failure_threshold=49,
        success_threshold=1,
        timeout=299,
        period=180,
        initial_delay=180,
    ),
    readiness_probe=ProbeSettings(
        failure_threshold=10,
        success_threshold=1,
        timeout=10,
        period=10,
        initial_delay=10,
    ),
)
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
endpoint.traffic = {deployment_name: 100}
workspace_ml_client.begin_create_or_update(endpoint).result()

# 2. Parse Data in Expected Format

### 2a. Load data in expected data format

The image+text dataset should have a corersponding json file with the following format:
```json
{
    "cloud_storage_directory_name": "<url to data storage>",
    "annotated_images": [
        {
            "image_id": "<unique image id>",
            "image_path_relative": "<image path relative to cloud_storage_directory_name>",
            "texts": [
                {"text_id": "<unique text id>", "text_content": "<sample text>"},
                {"text_id": "<unique text id>", "text_content": "<sample text>"},
                ...
            ]
        },
        ...
    ]
}
```

### Example:
```json
{
    "cloud_storage_directory_name": "https://customerblobstore.blob.core.windows.net/datasets/fashiondata",
    "annotated_images": [
        {
            "image_id": "i0",
            "image_path_relative": "images/1007129816.jpg",
            "texts": [
                {"text_id": "t0", "text_content": "a long black evening gown"},
                {"text_id": "t1", "text_content": "a formal black strapless dress"}
            ]
        },
        {
            "image_id": "i1",
            "image_path_relative": "images/1009434119.jpg",
            "texts": [
                {"text_id": "t2", "text_content": "a short-sleeve shirt with floral print"},
                {"text_id": "t3", "text_content": "a t-shirt with a floral pattern"}
            ]
        }
    ]
}
```

In [None]:
import json

DATASET_FILE_NAME = "<DATASET JSON FILE>"
with open(DATASET_FILE_NAME, "rt") as f:
    dataset = json.load(f)

# 3. Extract embeddings using the endpoint

In [None]:
BATCH_SIZE = 256
IMAGE_EMBEDDINGS_FILE_NAME = "./embeddings_image.json"
TEXT_EMBEDDINGS_FILE_NAME = "./embeddings_text.json"

### 3a. Define helper methods for extracting embeddings

In [None]:
_REQUEST_FILE_NAME = "request.json"

def make_request_images(image_urls):
    request_json = {
        "input_data": {
            "columns": ["image", "text"],
            "data": [
                [image_url, ""] for image_url in image_urls
            ],
        }
    }

    with open(_REQUEST_FILE_NAME, "wt") as f:
        json.dump(request_json, f)


def make_request_texts(texts):
    request_json = {
        "input_data": {
            "columns": ["image", "text"],
            "data": [
                ["", text] for text in texts
            ],
        }
    }

    with open(_REQUEST_FILE_NAME, "wt") as f:
        json.dump(request_json, f)

In [None]:
from tqdm import tqdm


def compute_and_save_embeddings(embeddings_file_name, images=True):
    embedding_records = []

    image_records = dataset["annotated_images"]
    for i in tqdm(range(0, len(image_records), BATCH_SIZE)):
        # Get the current batch of image records.
        j = min(len(image_records), i + BATCH_SIZE)
        batch_image_records = image_records[i:j]

        if images:
            # Make the list of image urls and their ids in the current batch.
            batch_image_urls = [
                dataset["cloud_storage_directory_name"] + "/" + r["image_path_relative"].replace("\\", "/")
                for r in batch_image_records
            ]
            make_request_images(batch_image_urls)
            batch_ids = [r["image_id"] for r in batch_image_records]
        else:
            # Make the list of texts and their ids in the current batch.
            # [FIXME] This may be larger than the batch size specified
            batch_texts = [
                t["text_content"]
                for r in batch_image_records if "texts" in r for t in r["texts"]
            ]
            make_request_texts(batch_texts)
            batch_ids = [t["text_id"] for r in batch_image_records if "texts" in r for t in r["texts"]]

        # Call the endpoint and get the embeddings for the current batch.
        response = workspace_ml_client.online_endpoints.invoke(
            endpoint_name=online_endpoint_name,
            deployment_name=deployment_name,
            request_file=_REQUEST_FILE_NAME,
        )
        try:
            response = json.loads(response)
        except:
            print(f"did not get embeddings for batch {i}-{j}")
            print(response)
            continue
        output_field_name = "image_features" if images else "text_features"
        batch_embeddings = [r[output_field_name] for r in response]

        # Store the embeddings and the corresponding ids for the current batch.
        id_field_name = "image_id" if images else "text_id"
        for id_, e in zip(batch_ids, batch_embeddings):
            embedding_records.append({id_field_name: id_, "embedding": e})

    # Save embeddings to file.
    with open(embeddings_file_name, "wt") as f:
        json.dump(embedding_records, f)


### 3b. Call methods for invoking endpoint and extracting embeddings

In [None]:
# compute image embeddings
compute_and_save_embeddings(IMAGE_EMBEDDINGS_FILE_NAME, True)
# compute text embeddings
compute_and_save_embeddings(TEXT_EMBEDDINGS_FILE_NAME, False)

# 4. Evaluate Embeddings for image retrieval

In [None]:
K = 1

### 4a. Build index with image embeddings and query with text embeddings

In [None]:
from sklearn.neighbors import NearestNeighbors
import json

# read query embeddings file into sklearn index with brute force algorithm
with open(IMAGE_EMBEDDINGS_FILE_NAME, "r") as f:
    index_file_data = json.load(f)
index_embeddings = [sample['embedding'] for sample in index_file_data]
index = NearestNeighbors(algorithm='brute', metric='cosine')
index.fit(index_embeddings)

In [None]:
# read query embeddings file into list of vectors
with open(TEXT_EMBEDDINGS_FILE_NAME, "r") as f:
    query_file_data = json.load(f)
query_embeddings = [sample['embedding'] for sample in query_file_data]

### 4b. Query the index and evaluate results

In [None]:
distance, result_ids = index.kneighbors(query_embeddings, n_neighbors=K)

In [None]:
import numpy as np

def evaluate_text_to_image_embeddings(dataset, result_ids, index_file_data, query_file_data):
    # build text to image table
    dataset_file_data = dataset["annotated_images"]
    text_to_image = {}
    for record in dataset_file_data:
        image_id = record["image_id"]
        for text in record["texts"]:
            text_to_image[text["text_id"]] = image_id

    # evaluate recall
    recall_array = np.zeros(len(result_ids))
    for idx, result in enumerate(result_ids):
        index_records = [index_file_data[i] for i in result]
        for index_record in index_records:
            if text_to_image[query_file_data[idx]["text_id"]] == index_record["image_id"]:
                recall_array[idx] = 1
                break
        
    return recall_array.sum() / recall_array.shape[0]

In [None]:
recall = evaluate_text_to_image_embeddings(dataset, result_ids, index_file_data, query_file_data)
print("R@{}={:.1f}".format(K, 100.0 * recall))