# Using Image Data

This workbook acts as a spike for how to ingest images and ask questions about the data within them using Azure AI 
Vision and GPT-4-Vision.

## How to use

To use this you need the following resources provisioned:
- Azure AI Search
- Azure Storage Account
- Azure OpenAI - Check the supported regions here https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#standard-deployment-model-availability
    - A `text-embeddings-ada-002` model deployment
    - A `gpt-4-vision` model deployment
- Azure Computer Vision

You can reuse some of the components deployed as part of the Chat With Your Data project. However, at the time of 
writing you will also additionally need to deploy Azure Computer Vision and a `gpt-4-vision ` model.

Once provisioned, you need then to change the configuration below to match your setup.

## Data

This spike only uploads one image, which is a flow chart detailing when to use various Azure services. This image was 
sourced from https://learn.microsoft.com/en-us/azure/architecture/guide/technology-choices/compute-decision-tree.

Feel free to use a different image and question.

## How it works

### Ingestion

To ingest an image into Azure AI Search, the following steps are performed:
1. Create search index
2. Upload image to blob storage
3. Generate embeddings using computer vision `vectorizeImage` API
4. Generate a caption of the image using `gpt-4-vision`
5. Generate embeddings of the caption using `text-embeddings-002-ada`
6. Store data in search index


### Question

To ask a question using this data, the following steps are performed:
1. Generate embeddings for the question using computer vision `vectorizeText` API
2. Generate embeddings for the question using `text-embeddings-002-ada`
3. Search index using both embeddings
4. Generate blob sas url from returned search results
5. Pass question, along with blob sas url to `gpt-4-vision` chat completions end point


## Why do we need two different embedding models?

It is not required to use two different embedding models, however, using both Azure Computer Vision to embed the image
and `gpt-4-vision` to generate a description that is then embedded by `text-embeddings-002-ada` provides richer data and
provides better search results. This is useful in particular for diagrams and flow charts which show relationships and
decision points.

When embedding the question, we need to embed it with both embeddings models to allow us to search over the previously
generated vectors. This is due to the embeddings models generating differently sized and formatted vectors.

In [1]:
# Common imports
import re
import datetime
import base64
import json
from urllib.parse import urljoin
from azure.core.credentials import AzureKeyCredential


In [3]:

# Config
## Credentials
search_credential = AzureKeyCredential("")
storage_credential = ""
computer_vision_credential = ""
openai_credential = ""

## Image
image_file_dir = "./"
image_file = "azure-services.png"

## Question
question = "Which service should I use if I'm building a new application and require full control?"

## Search
search_service = ""
search_endpoint = f"https://{search_service}.search.windows.net/"
index_name = "image-index"

## Storage
storage_account = ""
storage_endpoint = f"https://{storage_account}.blob.core.windows.net"
storage_container = "imagecontent"

## Vision
computer_vision = ""
computer_vision_endpoint = f"https://{computer_vision}.cognitiveservices.azure.com/"
computer_vision_vectorize_image_url = urljoin(computer_vision_endpoint, "computervision/retrieval:vectorizeImage")

## OpenAI
openai_service = ""
openai_endpoint = endpoint = f"https://{openai_service}.openai.azure.com/openai/"
gpt4v_deployment_name = "gpt-4v"
embeddings_deployment_name = "text-embedding-ada-002"


In [None]:
# Create Search Clients
from azure.search.documents.indexes.aio import SearchIndexClient, SearchIndexerClient
from azure.search.documents.aio import SearchClient

search_client = SearchClient(endpoint=search_endpoint, index_name=index_name, credential=search_credential)
search_index_client = SearchIndexClient(endpoint=search_endpoint, credential=search_credential)
search_indexer_client = SearchIndexerClient(endpoint=search_endpoint, credential=search_credential)

In [None]:
# Create Index
from azure.search.documents.indexes.models import (
    HnswAlgorithmConfiguration,
    HnswParameters,
    SearchableField,
    SearchField,
    SearchFieldDataType,
    SearchIndex,
    SemanticConfiguration,
    SemanticField,
    SemanticPrioritizedFields,
    SemanticSearch,
    SimpleField,
    VectorSearch,
    VectorSearchProfile,
)

fields = [
    SimpleField(name="id", type="Edm.String", key=True),
    SearchableField(
        name="content",
        type="Edm.String",
        analyzer_name="en.microsoft",
    ),
    SearchField(
        name="embedding",
        type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
        hidden=False,
        searchable=True,
        filterable=False,
        sortable=False,
        facetable=False,
        vector_search_dimensions=1536,
        vector_search_profile_name="embedding_config",
    ),
    SimpleField(name="category", type="Edm.String", filterable=True, facetable=True),
    SimpleField(
        name="sourcepage",
        type="Edm.String",
        filterable=True,
        facetable=True,
    ),
    SimpleField(
        name="sourcefile",
        type="Edm.String",
        filterable=True,
        facetable=True,
    ),
]

# NEW ADDITIONAL FIELD
fields.append(
    SearchField(
        name="imageEmbedding",
        type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
        hidden=False,
        searchable=True,
        filterable=False,
        sortable=False,
        facetable=False,
        vector_search_dimensions=1024,
        vector_search_profile_name="embedding_config",
    ),
)

index = SearchIndex(
    name=index_name,
    fields=fields,
    semantic_search=SemanticSearch(
        configurations=[
            SemanticConfiguration(
                name="default",
                prioritized_fields=SemanticPrioritizedFields(
                    title_field=None, content_fields=[SemanticField(field_name="content")]
                ),
            )
        ]
    ),
    vector_search=VectorSearch(
        algorithms=[
            HnswAlgorithmConfiguration(
                name="hnsw_config",
                parameters=HnswParameters(metric="cosine"),
            )
        ],
        profiles=[
            VectorSearchProfile(
                name="embedding_config",
                algorithm_configuration_name="hnsw_config",
            ),
        ],
    ),
)

if index_name not in [name async for name in search_index_client.list_index_names()]:
    print(f"Creating {index_name} search index", )
    await search_index_client.create_index(index)
else:
    print(f"Search index {index_name} already exists")


In [None]:
# Create Blob Storage Clients
from azure.storage.blob.aio import BlobServiceClient

blob_client = BlobServiceClient(account_url=storage_endpoint, credential=storage_credential)
container_client = blob_client.get_container_client(storage_container)
if not await container_client.exists():
    await container_client.create_container()

In [None]:
# Upload file
from azure.storage.blob import generate_blob_sas

file_path = f"{image_file_dir}{image_file}"
file_contents = open(file_path, mode="rb")
blob_client = await container_client.upload_blob(name=image_file, data=file_contents, overwrite=True)
blob_sas = generate_blob_sas(
    container_client.account_name,
    container_client.container_name,
    image_file,
    account_key=storage_credential,
    permission="r",
    expiry=datetime.datetime.now() + datetime.timedelta(hours=1),
)
blob_url = f"{blob_client.url}?{blob_sas}"
print(blob_url)

In [None]:
# Generate image embeddings
import aiohttp

headers = {"Content-Type": "application/json", "Ocp-Apim-Subscription-Key": computer_vision_credential}
params = {"api-version": "2023-02-01-preview", "modelVersion": "latest"}

async with aiohttp.ClientSession(headers=headers) as session:
    body = {"url": blob_url}
    async with session.post(url=computer_vision_vectorize_image_url, params=params, json=body) as resp:
        resp_json = await resp.json()
        image_embeddings = resp_json["vector"]

print(image_embeddings)


In [None]:
# OpenAI client
from openai import AzureOpenAI

openai_client = AzureOpenAI(base_url=openai_endpoint, api_key=openai_credential, api_version="2024-02-01")

In [None]:
# Generate caption and embeddings for it
caption_system_message = """You are a assistant that generates rich descriptions of images.
You need to be accurate in the information you extract and detailed in the descriptons you generate.
Do not abbreviate anything and do not shorten sentances. Explain the image completely.
If you are provided with an image of a flow chart, describe the flow chart in detail.
If the image is mostly text, use OCR to extract the text as it is displayed in the image.
"""

messages = [
    {"role": "system", "content": caption_system_message},
    {"role": "user", "content": [{"text": "Describe this image in detail", "type": "text"},{ "image_url": blob_url, "type": "image_url" }]},
]

print(json.dumps(messages, indent=4))

chat_completion = openai_client.chat.completions.create(
    model=gpt4v_deployment_name,
    messages=messages,
    temperature=0.0,
    max_tokens=1024,
    n=1
)

print(chat_completion.choices[0].message.content)
caption = chat_completion.choices[0].message.content

# Generate embeddings for caption
caption_embeddings = openai_client.embeddings.create(
    model=embeddings_deployment_name,
    input=caption,
).data[0].embedding

print(caption_embeddings)

In [None]:
# Store embeddings in search index
index_doc = {
    "id": re.sub("[^0-9a-zA-Z_-]", "_", image_file),
    "content": caption,
    "embedding": caption_embeddings,
    "sourceFile": image_file,
    "imageEmbedding": image_embeddings,
}
await search_client.upload_documents([index_doc])

In [None]:
# Generate embeddings for question
computer_vision_vectorize_text_url = urljoin(computer_vision_endpoint, "computervision/retrieval:vectorizeText")
headers = {"Content-Type": "application/json", "Ocp-Apim-Subscription-Key": computer_vision_credential}
params = {"api-version": "2023-02-01-preview", "modelVersion": "latest"}

async with aiohttp.ClientSession(headers=headers) as session:
    body = {"text": question}
    async with session.post(url=computer_vision_vectorize_text_url, params=params, json=body) as resp:
        resp_json = await resp.json()
        question_image_embeddings = resp_json["vector"]

print(question_image_embeddings)

question_embeddings = openai_client.embeddings.create(
    model=embeddings_deployment_name,
    input=question,
).data[0].embedding

print(question_embeddings)

In [None]:
# Search Index
from azure.search.documents.models import VectorizedQuery
import json

vector_image_query = VectorizedQuery(vector=question_image_embeddings, k_nearest_neighbors=50, fields="imageEmbedding")
vector_query = VectorizedQuery(vector=question_embeddings, k_nearest_neighbors=50, fields="embedding")
search_results = await search_client.search(search_text=question, top=1, vector_queries=[vector_query, vector_image_query])


async for page in search_results.by_page():
    async for document in page:
        print(document.get("@search.score"))
        print(json.dumps(document, indent=2))
        found_document = document

In [None]:
# Download source image
source_file = found_document.get("sourcefile")
print(source_file)
blob_client = container_client.get_blob_client(source_file)

# Download the image from blob storage
# blob = await container_client.get_blob_client(source_file).download_blob()
# base64_image = base64.b64encode(await blob.readall()).decode("utf-8")
# formatted_image = f"data:image/png;base64,{base64_image}"
# print(formatted_image)

# image_url = {"url": formatted_image, "detail": "auto"}

# Alternatively provide a URL
blob_sas = generate_blob_sas(
    container_client.account_name,
    container_client.container_name,
    source_file,
    account_key=storage_credential,
    permission="r",
    expiry=datetime.datetime.now() + datetime.timedelta(hours=1),
)
blob_url = f"{blob_client.url}?{blob_sas}"
image_url = {"url": blob_url}
print(image_url)


In [None]:
# Chat
system_chat_template_gpt4v = """You are an intelligent assistant helping software engineers use Microsoft Azure.
Answer the following question using only the data provided in the sources.
If you cannot answer using the sources, say you don't know. Return just the answer without any input texts

When you give your answer, you ALWAYS MUST include the image url to one or more of the sources in your response in the following format: <answer> [urlX]
Always use square brackets to reference the document source url. When you create the answer from multiple sources, list each source separately, e.g. <answer> [urlX][urlY] and so on.
Always reply in english.
"""

messages = [
    {"role": "system", "content": system_chat_template_gpt4v},
    {"role": "user", "content": [{"text": question, "type": "text"},{ "image_url": image_url, "type": "image_url" }]},
]

print(json.dumps(messages, indent=4))

chat_completion = openai_client.chat.completions.create(
    model=gpt4v_deployment_name,
    messages=messages,
    temperature=0.0,
    max_tokens=1024,
    n=1
)

print(chat_completion)
print(chat_completion.choices[0].message)
print(chat_completion.choices[0].message.content)
