# Azure AI Search: Relevance

## Setup API clients

In [1]:
import os

import azure.identity
import dotenv
import openai
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

dotenv.load_dotenv()

azure_credential = azure.identity.AzureDeveloperCliCredential(tenant_id=os.getenv("AZURE_TENANT_ID"))

# Initialize Azure OpenAI client
AZURE_OPENAI_SERVICE = os.getenv("AZURE_OPENAI_SERVICE")
AZURE_OPENAI_ADA_DEPLOYMENT = os.getenv("AZURE_OPENAI_ADA_DEPLOYMENT")

token_provider = azure.identity.get_bearer_token_provider(azure_credential, "https://cognitiveservices.azure.com/.default")
openai_client = openai.AzureOpenAI(
    api_version="2023-07-01-preview",
    azure_endpoint=f"https://{AZURE_OPENAI_SERVICE}.openai.azure.com",
    azure_ad_token_provider=token_provider)

def get_embedding(text):
    get_embeddings_response = openai_client.embeddings.create(model=AZURE_OPENAI_ADA_DEPLOYMENT, input=text)
    return get_embeddings_response.data[0].embedding

# Initialize Azure search client
AZURE_SEARCH_SERVICE = os.getenv("AZURE_SEARCH_SERVICE")
AZURE_SEARCH_ENDPOINT = f"https://{AZURE_SEARCH_SERVICE}.search.windows.net"

AZURE_SEARCH_FULL_INDEX = "gptkbindex"
search_client = SearchClient(AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_FULL_INDEX, credential=azure_credential)


#### Vector search is not always optimal

For example, consider searches for exact strings.

In [2]:
search_query = "$45.00"
search_vector = get_embedding(search_query)
r = search_client.search(None, top=3, vector_queries=[
    VectorizedQuery(vector=search_vector, k_nearest_neighbors=50, fields="embedding")])
for doc in r:
    found_content = "Not found"
    if search_query.lower() in doc['content'].lower():
        found_content = doc['content'][doc['content'].lower().find(search_query.lower()):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 0.81581	Matching text: Not found
Score: 0.80998	Matching text: Not found
Score: 0.80965	Matching text: Not found


Compare to a text search for same query:

In [3]:
r = search_client.search(search_query, top=3)
for doc in r:
    found_content = "Not found"
    if search_query.lower() in doc['content'].lower():
        found_content = doc['content'][doc['content'].lower().find(search_query.lower()):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 4.14542	Matching text: $45.00</td><td>$55.00</td></tr><tr><td>Employee +1</td><td>$65.00</td><td>$71.00</td></tr><tr><td>Employee +2 or more</td><td>$78.00</td><td>$89.00</td></tr></table> Next Steps We hope that this information has been helpful in understanding the differences between Northwind Health Plus and Northwind Standard. We are confident that you will find the right plan for you and your family. Thank you for choosing Contoso Electronics!
Score: 3.37283	Matching text: Not found


#### Hybrid retrieval

Uses RRF to combine vector and text results.

In [4]:
search_vector = get_embedding(search_query)
r = search_client.search(search_query, top=15, vector_queries=[
    VectorizedQuery(vector=search_vector, k_nearest_neighbors=10, fields="embedding")])
for doc in r:
    found_content = "Not found"
    if search_query in doc['content']:
        found_content = doc['content'][doc['content'].find(search_query):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 0.03254	Matching text: $45.00</td><td>$55.00</td></tr><tr><td>Employee +1</td><td>$65.00</td><td>$71.00</td></tr><tr><td>Employee +2 or more</td><td>$78.00</td><td>$89.00</td></tr></table> Next Steps We hope that this information has been helpful in understanding the differences between Northwind Health Plus and Northwind Standard. We are confident that you will find the right plan for you and your family. Thank you for choosing Contoso Electronics!
Score: 0.03110	Matching text: Not found
Score: 0.01667	Matching text: Not found
Score: 0.01639	Matching text: Not found
Score: 0.01613	Matching text: Not found
Score: 0.01562	Matching text: Not found
Score: 0.01538	Matching text: Not found
Score: 0.01515	Matching text: Not found
Score: 0.01493	Matching text: Not found
Score: 0.01449	Matching text: Not found


#### Hybrid ranking is not always optimal

In [5]:
search_query = "learning about underwater activities"
search_vector = get_embedding(search_query)
r = search_client.search(search_query, top=5, vector_queries=[
    VectorizedQuery(vector=search_vector, k_nearest_neighbors=10, fields="embedding")])
for doc in r:
    content = doc["content"].replace("\n", " ")[:150]
    print(f"Score: {doc['@search.score']:.5f}\t\tContent: {content}")

Score: 0.03252		Content:  PerksPlus is not only designed to support employees' physical health, but also their mental health. Regular exercise has been shown to reduce stress,
Score: 0.03105		Content:  Under the Northwind Health Plus plan, habilitation services are covered up to a certain dollar amount and number of visits. This amount and the numbe
Score: 0.02797		Content:  Occupational Therapy Occupational therapy helps individuals develop, maintain, or restore skills for daily living and work. It can help individuals w
Score: 0.02766		Content:  Some of the lessons covered under PerksPlus include: · Skiing and snowboarding lessons · Scuba diving lessons · Surfing lessons · Horseback riding le
Score: 0.02290		Content:  Talk to your doctor or health care provider about the trial and ask any questions you may have. · Ask about the potential risks and benefits of parti


#### Hybrid + semantic reranking 🎉

In [6]:
search_query = "learning about underwater activities"
search_vector = get_embedding(search_query)
r = search_client.search(
        search_query,
        top=5, 
        vector_queries=[
                VectorizedQuery(vector=search_vector, k_nearest_neighbors=50, fields="embedding")],
        query_type="semantic",
        semantic_configuration_name="default")

for doc in r:
    content = doc["content"].replace("\n", " ")[:150]
    print(f"Score: {doc['@search.score']:.5f}\tReranker: {doc['@search.reranker_score']:.5f}\tContent: {content}")

Score: 0.02766	Reranker: 1.94936	Content:  Some of the lessons covered under PerksPlus include: · Skiing and snowboarding lessons · Scuba diving lessons · Surfing lessons · Horseback riding le
Score: 0.03252	Reranker: 1.77669	Content:  PerksPlus is not only designed to support employees' physical health, but also their mental health. Regular exercise has been shown to reduce stress,
Score: 0.02455	Reranker: 0.59237	Content:  By taking the time to research providers in-network with Northwind Health Plus and keeping track of your medical records and tests, you can make sure
Score: 0.03105	Reranker: 0.58931	Content:  Under the Northwind Health Plus plan, habilitation services are covered up to a certain dollar amount and number of visits. This amount and the numbe
Score: 0.02153	Reranker: 0.49740	Content:  This position will be responsible for designing and implementing innovative solutions to maximize product performance and optimize customer satisfact
