# Query Rewriting (Azure AI Search)

This code demonstrates how to use Azure AI Search with advanced query rewriting to improve the relevance of your search results. The code performs the following tasks:

+ Create an index schema
+ Load the sample data from a local folder
+ Embed the documents in-memory using Azure OpenAI's text-embedding-ada-002 model
+ Index the vector and nonvector fields on Azure AI Search
+ Rewrite a sample question to improve the relevance of the result documents
+ Manually combine the results of multiple rewritten queries using [Reciprocal Rank Fusion (RRF)](https://learn.microsoft.com/azure/search/hybrid-search-ranking).
+ Use [simple query syntax](https://learn.microsoft.com/azure/search/query-simple-syntax) and [multi-vector queries](https://learn.microsoft.com/azure/search/vector-search-how-to-query?tabs=query-2023-11-01%2Cfilter-2023-11-01#multiple-vector-queries) to automatically combine multiple rewritten queries using built-in RRF

The code uses Azure OpenAI to generate embeddings for title and content fields. You'll need access to Azure OpenAI to run this demo.

The code reads the `text-sample.json` file, which contains the input data for which embeddings need to be generated.

The output is a combination of human-readable text and embeddings that can be pushed into a search index.

## Prerequisites

- An Azure subscription, with [access to Azure OpenAI](https://aka.ms/oai/access). This sample uses two models.

  - Specify [2023-12-01-preview REST API](https://learn.microsoft.com/azure/ai-services/openai/reference) or later when providing an Azure OpenAI endpoint.

  - Specify a deployment of the `text-embedding-3-large` embedding model. As a naming convention, we name deployments after the model name: "text-embedding-3-large".
  
  - Specify a deployment of a chat model, such as gpt-4o or gpt-4o-mini. This example uses structured outputs to return a valid JSON object, which requires a specific version of a chat model.
  
    - [Review supported models](https://learn.microsoft.com/azure/ai-services/openai/how-to/json-mode?tabs=python#supported-models) for chat models supporting JSON mode. Note the model version number. If you already have a deployment, verify the model version is listed as a supported model.
  
    - [Check regional availability](https://learn.microsoft.com/azure/ai-services/openai/concepts/models#standard-deployment-model-availability) of the chat models. Make sure your Azure OpenAI resource is in a region that supports the model.

- Azure AI Search, any tier and region, but you must have Basic or higher to try the semantic ranker. This example creates an index. Check your index quota to make sure you have room. [Enable semantic ranking](https://learn.microsoft.com/azure/search/semantic-how-to-enable-disable) before running the hybrid query with semantic ranking.

We used Python 3.11, [Visual Studio Code with the Python extension](https://code.visualstudio.com/docs/python/python-tutorial), and the [Jupyter extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) to test this example.

### Set up a Python virtual environment in Visual Studio Code

1. Open the Command Palette (Ctrl+Shift+P).
1. Search for **Python: Create Environment**.
1. Select **Venv**.
1. Select a Python interpreter. Choose 3.10 or later.

It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments).

### Install packages

In [None]:
! pip install -r query-rewrite-requirements.txt --quiet

### Set up your envrionment variables

The demo-python folder contains a `.env-sample` file that you can modify for your environment variables.

Remember to omit API keys if you're using Azure role-based permissions. On Azure AI Search, you should have Search Service Contributor, Search Index Data Contributor, and Search Index Data Reader permissions. On Azure OpenAI, you should have Cognitive Services Contributor permissions.

For this notebook, provide the following variables. 

Save the `.env` file to the `demo-python/code` folder.

```
AZURE_SEARCH_SERVICE_ENDPOINT=<PLACEHOLDER FOR YOUR SEARCH SERVICE ENDPOINT>
AZURE_SEARCH_INDEX=<PLACEHOLDER FOR AN INDEX NAME>
# Optional, do not provide if using RBAC authentication
AZURE_SEARCH_ADMIN_KEY=

AZURE_OPENAI_ENDPOINT=<PLACEHOLDER FOR YOUR AZURE OPEAN ENDPOINT>
# Optional, do not provide if using RBAC authentication and Cognitive Search
AZURE_OPENAI_KEY=
# 2024-07-18 and later is required for JSON mode.
AZURE_OPENAI_API_VERSION=2024-10-21

# Use any embedding model on Azure OpenAI
AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-3-large
# Use any chat model on Azure OpenAI. Remember to check model version and regional availability.
AZURE_OPENAI_CHATGPT_DEPLOYMENT=gpt-4o-mini
```

## Import required libraries and environment variables

In [1]:
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from azure.core.credentials import AzureKeyCredential
import os

load_dotenv(override=True) # take environment variables from .env.

# The following variables from your .env file are used in this notebook
endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
admin_key = os.getenv("AZURE_SEARCH_ADMIN_KEY")
credential = DefaultAzureCredential() if not admin_key else AzureKeyCredential(admin_key)
index_name = os.getenv("AZURE_SEARCH_INDEX", "qr-example")
azure_openai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
aoai_key = os.getenv("AZURE_OPENAI_KEY")
azure_openai_embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-3-large")
azure_openai_api_version = os.getenv("AZURE_OPENAI_API_VERSION", "2024-10-21")
azure_openai_chatgpt_deployment = os.getenv("AZURE_OPENAI_CHATGPT_DEPLOYMENT", "gpt-4o")

## Create embeddings
Read your data, generate OpenAI embeddings and export to a format to insert your Azure AI Search index:

In [2]:
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
import json

openai_credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(openai_credential, "https://cognitiveservices.azure.com/.default")

client = AzureOpenAI(
    api_version=azure_openai_api_version,
    azure_endpoint=azure_openai_endpoint,
    api_key=aoai_key,
    azure_ad_token_provider=token_provider if not aoai_key else None
)

output_path = os.path.join('..', '..', '..', 'output', 'docVectors.json')

if not os.path.exists(output_path):
    # Generate Document Embeddings using OpenAI 3 large
    # Read the text-sample.json
    path = os.path.join('..', '..', '..', 'data', 'text-sample.json')
    with open(path, 'r', encoding='utf-8') as file:
        input_data = json.load(file)

    titles = [item['title'] for item in input_data]
    content = [item['content'] for item in input_data]
    title_response = client.embeddings.create(input=titles, model=azure_openai_embedding_deployment, dimensions=1024)
    title_embeddings = [item.embedding for item in title_response.data]
    content_response = client.embeddings.create(input=content, model=azure_openai_embedding_deployment, dimensions=1024)
    content_embeddings = [item.embedding for item in content_response.data]

    # Generate embeddings for title and content fields
    for i, item in enumerate(input_data):
        title = item['title']
        content = item['content']
        item['titleVector'] = title_embeddings[i]
        item['contentVector'] = content_embeddings[i]

    # Output embeddings to docVectors.json file
    output_directory = os.path.dirname(output_path)
    if not os.path.exists(output_directory):
        os.makedirs(output_directory)
    with open(output_path, "w") as f:
        json.dump(input_data, f)

## Create your search index

Create your search index schema and vector search configuration. If you get an error, check the search service for available quota and check the .env file to make sure you're using a unique search index name.

In [3]:
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SimpleField,
    SearchFieldDataType,
    SearchableField,
    SearchField,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    SemanticConfiguration,
    SemanticPrioritizedFields,
    SemanticField,
    SemanticSearch,
    SearchIndex,
    AzureOpenAIVectorizer,
    AzureOpenAIVectorizerParameters
)


# Create a search index
index_client = SearchIndexClient(
    endpoint=endpoint, credential=credential)
fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=False),
    SearchableField(name="title", type=SearchFieldDataType.String),
    SearchableField(name="content", type=SearchFieldDataType.String),
    SearchableField(name="category", type=SearchFieldDataType.String,
                    filterable=True),
    SearchField(name="titleVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, stored=False, vector_search_dimensions=1024, vector_search_profile_name="myHnswProfile"),
    SearchField(name="contentVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, stored=False, vector_search_dimensions=1024, vector_search_profile_name="myHnswProfile"),
]

# Configure the vector search configuration  
vector_search = VectorSearch(
    algorithms=[
        HnswAlgorithmConfiguration(
            name="myHnsw"
        )
    ],
    profiles=[
        VectorSearchProfile(
            name="myHnswProfile",
            algorithm_configuration_name="myHnsw",
            vectorizer_name="myVectorizer"
        )
    ],
    vectorizers=[
        AzureOpenAIVectorizer(
            vectorizer_name="myVectorizer",
            parameters=AzureOpenAIVectorizerParameters(
                resource_url=azure_openai_endpoint,
                deployment_name=azure_openai_embedding_deployment,
                api_key=aoai_key,
                model_name=azure_openai_embedding_deployment
            )
        )
    ]
)



semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        content_fields=[SemanticField(field_name="content")]
    )
)

# Create the semantic settings with the configuration
semantic_search = SemanticSearch(configurations=[semantic_config])

# Create the search index with the semantic settings
index = SearchIndex(name=index_name, fields=fields,
                    vector_search=vector_search, semantic_search=semantic_search)
result = index_client.create_or_update_index(index)
print(f'{result.name} created')


qr-example created


## Insert text and embeddings into vector store
Add texts and metadata from the JSON data to the vector store:

In [4]:
from azure.search.documents import SearchClient

search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)

In [5]:
# Upload some documents to the index
output_path = os.path.join('..', '..', '..', 'output', 'docVectors.json')
output_directory = os.path.dirname(output_path)
if not os.path.exists(output_directory):
    os.makedirs(output_directory)
with open(output_path, 'r') as file:  
    documents = json.load(file)  
search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)
result = search_client.upload_documents(documents)
print(f"Uploaded {len(documents)} documents") 

Uploaded 108 documents


## Retrieve chunks using hybrid search

Before evaluating the effects of query rewriting, it's useful to establish a baseline as to what hybrid search returns without any query rewriting

In [7]:
import pandas as pd
from azure.search.documents.models import VectorizableTextQuery

def hybrid_search(search_client: SearchClient, query: str) -> pd.DataFrame:
    results = search_client.search(
        search_text=query,
        vector_queries=[
            # k_nearest_neighbors should be set to 50 in order to boost the relevance of hybrid search
            # Increasing the vector recall set size from 1 to 50 in hybrid search benefits relevance by
            # improving the diversity of vector query results that will be considered by RRF, ensuring a more comprehensive representation
            # of the data results and more robustness to varying similarity scores or closely related similarity scores.
            VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="contentVector")
        ],
        top=3,
        select="id, title, content",
        search_fields=["content"]
    )
    data = [[result["id"], result["title"], result["content"], result["@search.score"]] for result in results]
    return pd.DataFrame(data, columns=["id", "title", "content", "@search.score"])



The following cell demonstrates the results of hybrid search using a sample query

In [8]:
hybrid_search(search_client, "scalable storage solution")

Unnamed: 0,id,title,content,@search.score
0,4,Azure Storage,"Azure Storage is a scalable, durable, and high...",0.033333
1,36,Azure Data Lake Storage,"Azure Data Lake Storage is a scalable, secure,...",0.032266
2,52,Azure Table Storage,"Azure Table Storage is a fully managed, NoSQL ...",0.03125


## Use built-in query rewriting

Search offers [query rewriting](https://learn.microsoft.com/azure/search/semantic-how-to-query-rewrite) built-in with usage of the [semantic ranker](https://learn.microsoft.com/azure/search/semantic-how-to-query-request). Evaluate this first before trying other solutions.

In [11]:
from typing import Optional

# Worakround to use query writes with debugging form the Python SDK
import azure.search.documents._generated.models
azure.search.documents._generated.models.SearchDocumentsResult._attribute_map["debug_info"]["key"] = "@search\\.debug"
from azure.search.documents._generated.models import DebugInfo
import azure.search.documents._paging
def get_debug_info(self) -> Optional[DebugInfo]:
    self.continuation_token = None
    return self._response.debug_info
azure.search.documents._paging.SearchPageIterator.get_debug_info = azure.search.documents._paging._ensure_response(get_debug_info)
azure.search.documents._paging.SearchItemPaged.get_debug_info = lambda self: self._first_iterator_instance().get_debug_info()

search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)

results = search_client.search(
    search_text="search service",
    # Issue a vector query for every single rewritten query
    vector_queries=[VectorizableTextQuery(text="srch service", k_nearest_neighbors=50, fields="contentVector")],
    query_type="semantic",
    semantic_configuration_name='my-semantic-config',
    query_rewrites="generative|count-3",
    query_language="en",
    debug="queryRewrites",
    search_fields=["content"],
    top=3,
    include_total_count=True
)

data = [[result["id"], result["title"], result["content"], result["@search.score"]] for result in results]
df = pd.DataFrame(data, columns=["id", "title", "content", "@search.score"])
query_rewrites = results.get_debug_info().query_rewrites.text.rewrites

display(df)
print(query_rewrites)

Unnamed: 0,id,title,content,@search.score
0,40,Azure Cognitive Search,Azure Cognitive Search is a fully managed sear...,0.033333
1,3,Azure Cognitive Services,Azure Cognitive Services are a set of AI servi...,0.032002
2,90,Azure Cognitive Services,Azure Cognitive Services is a collection of AI...,0.031545


['search engine services', 'online search engine services', 'online search services']


## Customize rewriting queries for improved relevance of results

Users often use terse terms such as "scalable storage solution". These terms may match the contents of documents in the search index, but often an LLM can rewrite the query to improve the results

In [12]:
import json
import openai
from pydantic import BaseModel

class QueryRewrites(BaseModel):
    queries: list[str]

tools = [openai.pydantic_function_tool(QueryRewrites)]

# This prompt can be customized to write the rewrites in a specific format or use specific words
REWRITE_PROMPT = """You are a helpful assistant. You help users search for the answers to their questions.
You have access to Azure AI Search index with 100's of documents. Rewrite the following question into useful search queries to find the most relevant documents.
The number of rewrites should be 3
"""

# If you are not using a supported model or region, you may not be able to use structured outputs
# https://learn.microsoft.com/azure/ai-services/openai/how-to/structured-outputs
def rewrite_query(openai_client: AzureOpenAI, query: str):
    response = openai_client.chat.completions.create(
        model=azure_openai_chatgpt_deployment,
        messages=[
            {"role": "system", "content": REWRITE_PROMPT},
            {"role": "user", "content": query}
        ],
        tools=tools
    )
    
    # The JSON is always valid because the function tool is set to use strict=True
    return json.loads(response.choices[0].message.tool_calls[0].function.arguments)["queries"]

The following cell demonstrates how an LLM can rewrite queries to improve their clarity

In [13]:
rewrite_query(client, "what is azure sarch?")

['What is Azure Search?',
 'Overview of Azure Search',
 'Features of Azure Search']

## Combining the rewritten queries manually using RRF

Now that we can use a LLM to rewrite the query, we need to issue our queries and combine the results. We'll start by doing this manually to demonstrate how the RRF calculation works

In [14]:
def query_rewrite_manual_rrf(search_client: SearchClient, openai_client: AzureOpenAI, query: str) -> pd.DataFrame:
    rewritten_queries = rewrite_query(openai_client, query)
    # pd.concat preserves the original index by default when concatenating tables
    # This is important for the RRF calculation below
    results = pd.concat([hybrid_search(search_client, rewritten_query) for rewritten_query in rewritten_queries], axis=0)
    def rrf_score(row: pd.Series) -> float:
        score = 0.0
        k = 60
        # rank = the original position in the results list the document was located at
        for rank, df_row in results.iterrows():
            # The RRF score is the sum of 1.0 / (k + document rank) in every result set the document shows up in
            if df_row["id"] == row["id"]:
                score += 1.0 / (k + rank)
        return score
    # Apply the RRF scoring function to every row in the data frame
    results["rrf_score"] = results.apply(rrf_score, axis=1)
    # Return the deduplicated result set sorted by the most relevant RRF score
    return rewritten_queries, results.drop_duplicates(subset=["id"]).sort_values(by="rrf_score", ascending=False)
    

The following cell demonstrates how an unclear query ("srch service") is automatically rewritten and made more clear by an LLM. The resulting RRF score is higher for the most relevant document compared to the original search score

In [15]:
from IPython.display import display

rewritten_queries, results = query_rewrite_manual_rrf(search_client, client, "srch service")
display(results)
print(rewritten_queries)

Unnamed: 0,id,title,content,@search.score,rrf_score
0,40,Azure Cognitive Search,Azure Cognitive Search is a fully managed sear...,0.033333,0.049727
1,3,Azure Cognitive Services,Azure Cognitive Services are a set of AI servi...,0.032522,0.049189
2,90,Azure Cognitive Services,Azure Cognitive Services is a collection of AI...,0.032522,0.048652


['search service details', 'information on search services', 'what are search services']


## Combining the rewritten queries automatically using RRF

We can use the built-in RRF instead of manually performing the RRF calculation ourselves. We will use query combination using boolean operators and multi-vector search to accomplish a similar goal. Please note that the RRF score will not be exactly the same as the manual calculation because the text index can be more efficiently queried using this approach and less-relevant documents are automatically filtered out

In [16]:
def query_rewrite_automatic_rrf(search_client: SearchClient, openai_client: AzureOpenAI, query: str) -> pd.DataFrame:
    rewritten_queries = rewrite_query(openai_client, query)
    # Quote the rewritten queries before joining them in the query syntax
    formatted_queries = [f'"{rewritten_query}"' for rewritten_query in rewritten_queries]
    # Use the OR operator to join rewritten queries together
    # https://learn.microsoft.com/azure/search/query-lucene-syntax#bkmk_boolean
    search_text = " | ".join(formatted_queries)
    results = search_client.search(
        search_text=search_text,
        # Issue a vector query for every single rewritten query
        vector_queries=[VectorizableTextQuery(text=rewritten_query, k_nearest_neighbors=50, fields="contentVector") for rewritten_query in rewritten_queries],
        query_type="simple",
        # Any rewritten query from the joined query could match
        search_mode="any",
        search_fields=["content"],
        top=3
    )
    # @search.score is equivalent to the manually computed RRF score above
    data = [[result["id"], result["title"], result["content"], result["@search.score"]] for result in results]
    return rewritten_queries, pd.DataFrame(data, columns=["id", "title", "content", "@search.score"])

The following cell demonstrates how the automatic approach has similar results to the manual one, even though the scores are not exactly equal.

In [17]:
rewritten_queries, results = query_rewrite_automatic_rrf(search_client, client, "srch service")
display(results)
print(rewritten_queries)

Unnamed: 0,id,title,content,@search.score
0,40,Azure Cognitive Search,Azure Cognitive Search is a fully managed sear...,0.05
1,3,Azure Cognitive Services,Azure Cognitive Services are a set of AI servi...,0.048916
2,90,Azure Cognitive Services,Azure Cognitive Services is a collection of AI...,0.048131


['search service', 'service search', 'service for search']


## Continue to improve relevance using hybrid and semantic

Once you are using the automatic RRF combination method, you can add semantic ranking to improve relevance further

In [18]:
def query_rewrite_automatic_rrf_semantic(search_client: SearchClient, openai_client: AzureOpenAI, query: str) -> pd.DataFrame:
    rewritten_queries = rewrite_query(openai_client, query)
    # Quote the rewritten queries before joining them together using the query syntax
    formatted_queries = [f'"{rewritten_query}"' for rewritten_query in rewritten_queries]
    # Use the OR operator to join rewritten queries together
    # https://learn.microsoft.com/azure/search/query-lucene-syntax#bkmk_boolean
    search_text = " | ".join(formatted_queries)
    # The semantic ranker expects plain text queries with no search operators
    semantic_query = " ".join(rewritten_queries)
    results = search_client.search(
        search_text=search_text,
        # Issue a vector query for every single rewritten query
        vector_queries=[VectorizableTextQuery(text=rewritten_query, k_nearest_neighbors=50, fields="contentVector") for rewritten_query in rewritten_queries],
        # Any rewritten query from the joined query could match
        search_mode="any",
        search_fields=["content"],
        query_type="simple",
        # Pass in the plain text concatenation of the rewritten queries for semantic ranking
        semantic_query=semantic_query,
        semantic_configuration_name='my-semantic-config',
        top=3
    )
    # @search.score is equivalent to the manually computed RRF score above
    # @search.rerankerscore is the semantic reranking of the combined results
    data = [[result["id"], result["title"], result["content"], result["@search.score"], result["@search.reranker_score"]] for result in results]
    return rewritten_queries, pd.DataFrame(data, columns=["id", "title", "content", "@search.score", "@search.reranker_score"])

The following cell demonstrates how the semantic score compares to the RRF score. The semantic score ranges from 0-4, where a higher score indicates higher relvance

In [19]:
rewritten_queries, results = query_rewrite_automatic_rrf_semantic(search_client, client, "srch service")
display(results)
print(rewritten_queries)

Unnamed: 0,id,title,content,@search.score,@search.reranker_score
0,40,Azure Cognitive Search,Azure Cognitive Search is a fully managed sear...,0.05,2.457265
1,90,Azure Cognitive Services,Azure Cognitive Services is a collection of AI...,0.04918,1.995392
2,3,Azure Cognitive Services,Azure Cognitive Services are a set of AI servi...,0.047619,1.867251


['What is Azure Search service?', 'How to use Azure Search service?', 'Features of Azure Search service']
