# Azure AI Search Multimodal Retrieval


As a scenario, this code shows you an approach for text-to-image and image-to-image vector queries. The multimodal embeddings used in this sample are provided by [Azure AI Vision 4.0](https://learn.microsoft.com/azure/ai-services/computer-vision/how-to/image-retrieval) and the [Image Retrieval REST API](https://learn.microsoft.com/rest/api/computervision/image-retrieval) which supports built-in vectorization of images. 

For indexing, the pattern uses the built in Vision Vectorizer skill to call the Image Retrieval API. Provisioning of this search service, AI Services account, and setup of the indexer is fully automated and included as a step in this notebook.

The AI services accounts is also used during queries, as the vectorizer. A vectorizer specifies which embedding model to use for vectorizing a text query string or an images. As always, it's strongly recommended that query vectorization is performed using the same embedding model used for document vectorization during indexing.



We use the [Azure Python SDK](https://learn.microsoft.com/en-us/python/api/azure-search-documents/?view=azure-python-preview) for indexer-driven indexing and vector query operations.

In [None]:
#! pip install azure-search-documents --pre --quiet
#! pip install openai python-dotenv azure-identity cohere azure-ai-vision-imageanalysis --quiet

Import libraries

In [None]:
import os

from azure.core.credentials import AzureKeyCredential
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient, SearchIndexerClient

from azure.search.documents.models import (
    HybridCountAndFacetMode,
    HybridSearch,
    SearchScoreThreshold,
    VectorizableTextQuery,
    VectorizableImageBinaryQuery,
    VectorizableImageUrlQuery,
    VectorSimilarityThreshold,
)
from azure.storage.blob import BlobServiceClient
from dotenv import load_dotenv
from IPython.display import Image, display, HTML
from openai import AzureOpenAI

Load environment variables

In [None]:
# Load environment variables
load_dotenv(override=True)

# Configuration
AZURE_AI_VISION_API_KEY = os.getenv("AZURE_AI_SERVICES_API_KEY")
AZURE_AI_VISION_ENDPOINT = os.getenv("AZURE_AI_SERVICES_ENDPOINT")
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
BLOB_CONNECTION_STRING = os.getenv("BLOB_CONNECTION_STRING")
BLOB_CONTAINER_NAME = os.getenv("AZURE_STORAGE_CONTAINER")
INDEX_NAME = os.getenv("AZURE_SEARCH_INDEX")
AZURE_SEARCH_API_KEY = os.getenv("AZURE_SEARCH_API_KEY")
AZURE_SEARCH_ENDPOINT = os.getenv("AZURE_SEARCH_ENDPOINT")

### Connect to **Azure AI Search**

In [None]:
# User-specified parameter
USE_AAD_FOR_SEARCH = False  # Set this to False to use API key for authentication

def authenticate_azure_search(api_key=None, use_aad_for_search=False):
    if use_aad_for_search:
        print("Using AAD for authentication.")
        credential = DefaultAzureCredential()
    else:
        print("Using API keys for authentication.")
        if api_key is None:
            raise ValueError("API key must be provided if not using AAD for authentication.")
        credential = AzureKeyCredential(api_key)
    return credential

azure_search_credential = authenticate_azure_search(api_key=AZURE_SEARCH_API_KEY, use_aad_for_search=USE_AAD_FOR_SEARCH)


### Check status of Indexer execution
Let's see how indexing process is going on. 

The result should be: Status: success, Items Processed: 0, Items Failed: 0


In [None]:
from azure.search.documents.indexes import SearchIndexerClient
from azure.identity import DefaultAzureCredential
from azure.core.credentials import AzureKeyCredential

indexer_name = f"{INDEX_NAME}-indexer"

search_indexer_client = SearchIndexerClient(endpoint=AZURE_SEARCH_ENDPOINT, credential=azure_search_credential)
status = search_indexer_client.get_indexer_status(name=indexer_name)
print(f"Status: {status.last_result.status}, Items Processed: {status.last_result.item_count}, Items Failed: {status.last_result.failed_item_count}")

## Simple vector search (text to text)

Here we can try text query and execute search against image caption and text vector

In [None]:
# Initialize the SearchClient
search_client = SearchClient(
    AZURE_SEARCH_ENDPOINT,
    index_name=INDEX_NAME,
    credential=azure_search_credential,
)

# Define the query
# query = "sunglasses for holiday"
# query = "休日のサングラス" # Japanese query
query = "female red dresses" # English query


vector_query = VectorizableTextQuery(
    text=query,
    k_nearest_neighbors=3,
    fields="captionVector",
    #fields="imageVector",
)

# Perform the search
results = search_client.search(
    search_text=None,
    vector_queries=[vector_query],
    top=3
)

blob_service_client = BlobServiceClient.from_connection_string(BLOB_CONNECTION_STRING)

# Print the results
for result in results:
    print(f"Caption: {result['caption']}")
    print(f"Score: {result['@search.score']}")
    print(f"File: {result['metadata_storage_path']}")
    blob_name = os.path.basename(result['metadata_storage_path'])
    blob_client = blob_service_client.get_blob_client(container=BLOB_CONTAINER_NAME, blob=blob_name)
    display(Image(data=blob_client.download_blob().readall(),  width=200,))
    print("-" * 50) 

## Simple Vector Search (text to image)

Here we can try text query and execute search against image vector

In [None]:
# Initialize the SearchClient
search_client = SearchClient(
    AZURE_SEARCH_ENDPOINT,
    index_name=INDEX_NAME,
    credential=azure_search_credential,
)

# Define the query
# query = "sunglasses for holiday"
# query = "休日のサングラス" # Japanese query
query = "female red dresses" # Spanish query
#query = "Lady in red" # Spanish query

vector_query = VectorizableTextQuery(
    text=query,
    k_nearest_neighbors=3,
    #fields="captionVector",
    fields="imageVector",
)

# Perform the search
results = search_client.search(
    search_text=None,
    vector_queries=[vector_query],
    top=3
)

blob_service_client = BlobServiceClient.from_connection_string(BLOB_CONNECTION_STRING)

# Print the results
for result in results:
    print(f"Caption: {result['caption']}")
    print(f"Score: {result['@search.score']}")
    print(f"File: {result['metadata_storage_path']}")
    blob_name = os.path.basename(result['metadata_storage_path'])
    blob_client = blob_service_client.get_blob_client(container=BLOB_CONTAINER_NAME, blob=blob_name)
    display(Image(data=blob_client.download_blob().readall(),  width=200,))
    print("-" * 50) 

## Combined Vector Search (text to text,  image to image)

Here we will execute multimodal query against text and image vectors

In [None]:
# Define the text query
query = "shoes for running"
url="https://images.unsplash.com/photo-1542291026-7eec264c27ff?q=80&w=1770&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D" # Image of a Red Nike Running Shoe

text_vector_query = VectorizableTextQuery(
    text=query,
    k_nearest_neighbors=10,
    fields="captionVector",
)
# Define the image query

image_vector_query = VectorizableImageUrlQuery(  # Alternatively, use VectorizableImageBinaryQuery
    url=url,    
    k_nearest_neighbors=10,
    fields="imageVector",
)

# Perform the search
results = search_client.search(
    search_text=None, vector_queries=[text_vector_query, image_vector_query], top=3, query_type="semantic", semantic_configuration_name="default"
)

print("Image input to search:")
display(Image(url=url, width=200))
print("Search results:")

# Print the results
for result in results:
    print(f"Caption: {result['caption']}")
    print(f"Score: {result['@search.score']}")
    print(f"File: {result['metadata_storage_path']}")
    blob_name = os.path.basename(result['metadata_storage_path'])
    blob_client = blob_service_client.get_blob_client(container=BLOB_CONTAINER_NAME, blob=blob_name)
    display(Image(data=blob_client.download_blob().readall(),  width=200,))
    print("-" * 50) 

## Multimodal Vector Search (text to image,  image to text)

It is even possible to run query using image input to text vector and vice versa

In [None]:
# Define the text query
query = "shoes for running"
url="https://images.unsplash.com/photo-1542291026-7eec264c27ff?q=80&w=1770&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D" # Image of a Red Nike Running Shoe

text_vector_query = VectorizableTextQuery(
    text=query,
    k_nearest_neighbors=10,
    #fields="captionVector",
     fields="imageVector",
)
# Define the image query
image_vector_query = VectorizableImageUrlQuery(  # Alternatively, use VectorizableImageBinaryQuery
    url=url,
    k_nearest_neighbors=10,
    #fields="imageVector",
    fields="captionVector",
)

# Perform the search
results = search_client.search(
    search_text=None, vector_queries=[text_vector_query, image_vector_query], top=3
)

print("Image input to search:")
display(Image(url=url, width=200))
print("Search results:")

# Print the results
for result in results:
    print(f"Caption: {result['caption']}")
    print(f"Score: {result['@search.score']}")
    print(f"File: {result['metadata_storage_path']}")
    blob_name = os.path.basename(result['metadata_storage_path'])
    blob_client = blob_service_client.get_blob_client(container=BLOB_CONTAINER_NAME, blob=blob_name)
    display(Image(data=blob_client.download_blob().readall(),  width=200,))
    print("-" * 50) 

## Multi-modal vector search with weighting images 100x more than captions

It is possible to weight search results differently, for example, rely 100 times more on Image search than on Text search

In [None]:
# Define the text query
query = "shoes for running"
url="https://images.unsplash.com/photo-1542291026-7eec264c27ff?q=80&w=1770&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D"  # Image of a Red Nike Running Shoe

text_vector_query = VectorizableTextQuery(
    text=query,
    k_nearest_neighbors=5,
    fields="captionVector",
)
# Define the image query
image_vector_query = VectorizableImageUrlQuery(  # Alternatively, use VectorizableImageBinaryQuery
    url=url,
    k_nearest_neighbors=5,
    fields="imageVector",
    weight=100,
)

# Perform the search
results = search_client.search(
    search_text=None, vector_queries=[text_vector_query, image_vector_query], top=3
)

print("Image input to search:")
display(Image(url=url, width=200))
print("Search results:")

# Print the results
for result in results:
    print(f"Caption: {result['caption']}")
    print(f"Score: {result['@search.score']}")
    print(f"File: {result['metadata_storage_path']}")
    blob_name = os.path.basename(result['metadata_storage_path'])
    blob_client = blob_service_client.get_blob_client(container=BLOB_CONTAINER_NAME, blob=blob_name)
    display(Image(data=blob_client.download_blob().readall(),  width=200,))
    print("-" * 50) 

# Challenge

Try different combination of search retrieval with your own images and search queries