# Vector search in Python (Azure AI Search)

https://github.com/Azure/azure-search-vector-samples/tree/main/demo-python

This code demonstrates how to use Azure AI Search by using the push API to insert vectors into your search index:

+ Create an index schema
+ Load the sample data from a local folder
+ Embed the documents in-memory using Azure OpenAI's text-embedding-ada-002 model
+ Index the vector and nonvector fields on Azure AI Search
+ Run a series of vector and hybrid queries, including metadata filtering and hybrid (text + vectors) search. 

The code uses Azure OpenAI to generate embeddings for title and content fields. You'll need access to Azure OpenAI to run this demo.

The code reads the `text-sample.json` file, which contains the input data for which embeddings need to be generated.

The output is a combination of human-readable text and embeddings that can be pushed into a search index.

## Prerequisites

+ An Azure subscription, with [access to Azure OpenAI](https://aka.ms/oai/access). You must have the Azure OpenAI service name and an API key.

+ A deployment of the text-embedding-ada-002 embedding model.

+ Azure AI Search, any tier, but choose a service that has sufficient capacity for your vector index. We recommend Basic or higher. [Enable semantic ranking](https://learn.microsoft.com/azure/search/semantic-how-to-enable-disable) if you want to run the hybrid query with semantic ranking.

We used Python 3.11, [Visual Studio Code with the Python extension](https://code.visualstudio.com/docs/python/python-tutorial), and the [Jupyter extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) to test this example.

### Set up a Python virtual environment in Visual Studio Code

1. Open the Command Palette (Ctrl+Shift+P).
1. Search for **Python: Create Environment**.
1. Select **Venv**.
1. Select a Python interpreter. Choose 3.10 or later.

It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments).

### Install packages

In [1]:
! pip install -r azure-search-vector-python-sample-requirements.txt --quiet

## Import required libraries and environment variables

In [9]:
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from azure.core.credentials import AzureKeyCredential
import os

load_dotenv(override=True) # take environment variables from .env.

# The following variables from your .env file are used in this notebook
endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
credential = AzureKeyCredential(os.environ["AZURE_SEARCH_ADMIN_KEY"]) #if len(os.environ["AZURE_SEARCH_ADMIN_KEY"]) > 0 else DefaultAzureCredential()

#index_name = os.environ["AZURE_SEARCH_INDEX"]
index_name = "5360-ivan-basic-index"

azure_openai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
azure_openai_key = os.environ["AZURE_OPENAI_KEY"] #if len(os.environ["AZURE_OPENAI_KEY"]) > 0 else None
azure_openai_embedding_deployment = os.environ["AZURE_OPENAI_EMBEDDING_DEPLOYMENT"]
embedding_model_name = os.environ["AZURE_OPENAI_EMBEDDING_MODEL_NAME"]
azure_openai_api_version = os.environ["AZURE_OPENAI_API_VERSION"]


# # The following variables from your .env file are used in this notebook
# endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
# credential = AzureKeyCredential(os.environ["AZURE_SEARCH_ADMIN_KEY"]) #if len(os.environ["AZURE_SEARCH_ADMIN_KEY"]) > 0 else DefaultAzureCredential()

# index_name = os.environ["AZURE_SEARCH_INDEX"]

# azure_openai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
# azure_openai_key = os.environ["AZURE_OPENAI_KEY"] #if len(os.environ["AZURE_OPENAI_KEY"]) > 0 else None
# azure_openai_embedding_deployment = os.environ["AZURE_OPENAI_EMBEDDING_DEPLOYMENT"]
# azure_openai_api_version = os.environ["AZURE_OPENAI_API_VERSION"]
# azure_openai_chatgpt_deployment = os.environ["AZURE_OPENAI_CHATGPT_DEPLOYMENT"]

## Create embeddings
Read your data, generate OpenAI embeddings and export to a format to insert your Azure AI Search index:

In [5]:
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
import json

openai_credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(openai_credential, "https://cognitiveservices.azure.com/.default")

client = AzureOpenAI(
    azure_deployment=azure_openai_embedding_deployment,
    api_version=azure_openai_api_version,
    azure_endpoint=azure_openai_endpoint,
    api_key=azure_openai_key,
    azure_ad_token_provider=token_provider if not azure_openai_key else None
)

# Generate Document Embeddings using OpenAI Ada 002
# Read the text-sample.json
path = os.path.join('..', 'data', 'text-sample.json')
with open(path, 'r', encoding='utf-8') as file:
    input_data = json.load(file)

titles = [item['title'] for item in input_data]
content = [item['content'] for item in input_data]
title_response = client.embeddings.create(input=titles, model=embedding_model_name)
title_embeddings = [item.embedding for item in title_response.data]
content_response = client.embeddings.create(input=content, model=embedding_model_name)
content_embeddings = [item.embedding for item in content_response.data]

# Generate embeddings for title and content fields
for i, item in enumerate(input_data):
    title = item['title']
    content = item['content']
    item['titleVector'] = title_embeddings[i]
    item['contentVector'] = content_embeddings[i]

# Output embeddings to docVectors.json file
output_path = os.path.join('./output', 'docVectors.json')
output_directory = os.path.dirname(output_path)
if not os.path.exists(output_directory):
    os.makedirs(output_directory)
with open(output_path, "w") as f:
    json.dump(input_data, f)

The provided Python script is using the Azure OpenAI API to generate embeddings for text data. Here's a step-by-step breakdown of what the script does:

1. It sets up the Azure OpenAI client with the necessary credentials and endpoint information.

2. It reads a JSON file (`text-sample.json`) from a specified directory. This file contains the text data for which embeddings will be generated.

3. It extracts the titles and content from the input data.

4. It sends the titles and content to the Azure OpenAI API to generate embeddings. The model used for generating embeddings is specified by the `embedding_model_name` variable.

5. It adds the generated embeddings to the original data, associating each title and content with its corresponding embedding.

6. Finally, it writes the updated data (with the added embeddings) to a new JSON file (`docVectors.json`) in the `./output` directory.

If the `./output` directory does not exist, the script creates it.

This script is useful for tasks like text similarity search or text clustering, where you need to convert text data into a numerical form (embeddings) that can be processed by machine learning algorithms.

In [6]:
#print first 3 items
with open(output_path, 'r', encoding='utf-8') as file:
    input_data = json.load(file)
    for item in input_data[:3]:
        print(item['title'])
        print(item['content'])
        print(item['titleVector'])
        print(item['contentVector'])
        print('------------------------\n')


Azure App Service
Azure App Service is a fully managed platform for building, deploying, and scaling web apps. You can host web apps, mobile app backends, and RESTful APIs. It supports a variety of programming languages and frameworks, such as .NET, Java, Node.js, Python, and PHP. The service offers built-in auto-scaling and load balancing capabilities. It also provides integration with other Azure services, such as Azure DevOps, GitHub, and Bitbucket.
[-0.010541444644331932, -0.02157960645854473, 0.0017954260110855103, -0.014349610544741154, -0.02186935767531395, 0.011900518089532852, -0.026477791368961334, -0.00870634987950325, 0.013204401358962059, -0.025456760078668594, 0.0145427780225873, -0.01378390472382307, 0.005463890265673399, 0.0015280955703929067, -0.023939013481140137, 0.014570373110473156, 0.014052960090339184, 0.004311781842261553, 0.004084120038896799, -0.003459773724898696, -0.023042162880301476, 0.008057857863605022, 0.014432396739721298, 0.014391003176569939, -0.0013

## Create your search index

Create your search index schema and vector search configuration. If you get an error, check the search service for available quota and check the .env file to make sure you're using a unique search index name.

In [10]:
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SimpleField,
    SearchFieldDataType,
    SearchableField,
    SearchField,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    SemanticConfiguration,
    SemanticPrioritizedFields,
    SemanticField,
    SemanticSearch,
    SearchIndex
)

VECTOR_SEARCH_PROFILE_NAME = "5360-ivan-profile"
ALGORITHM_NAME = "5360-HNSW-algorithm"
VECTORIZER_NAME = "5360-vectorizer"
SEMANTIC_NAME = "5360-semantic"


# Create a search index
index_client = SearchIndexClient(
    endpoint=endpoint, credential=credential)
fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True),
    SearchableField(name="title", type=SearchFieldDataType.String),
    SearchableField(name="content", type=SearchFieldDataType.String),
    SearchableField(name="category", type=SearchFieldDataType.String,
                    filterable=True),
    SearchField(name="titleVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, vector_search_dimensions=1536, vector_search_profile_name=VECTOR_SEARCH_PROFILE_NAME),
    SearchField(name="contentVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, vector_search_dimensions=1536, vector_search_profile_name=VECTOR_SEARCH_PROFILE_NAME),
]

# Configure the vector search configuration  
vector_search = VectorSearch(
    algorithms=[
        HnswAlgorithmConfiguration(
            name=ALGORITHM_NAME
        )
    ],
    profiles=[
        VectorSearchProfile(
            name=VECTOR_SEARCH_PROFILE_NAME,
            algorithm_configuration_name=ALGORITHM_NAME,
        )
    ]
)



semantic_config = SemanticConfiguration(
    name=SEMANTIC_NAME,
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="title"),
        keywords_fields=[SemanticField(field_name="category")],
        content_fields=[SemanticField(field_name="content")]
    )
)

# Create the semantic settings with the configuration
semantic_search = SemanticSearch(configurations=[semantic_config])

# Create the search index with the semantic settings
index = SearchIndex(name=index_name, fields=fields,
                    vector_search=vector_search, semantic_search=semantic_search)
result = index_client.create_or_update_index(index)
print(f' {result.name} created')


 5360-ivan-basic-index created


The provided Python script is using the Azure Cognitive Search SDK to create or update a search index with specific configurations. Here's a step-by-step breakdown of what the script does:

1. It sets up the `SearchIndexClient` with the necessary endpoint and credentials.

2. It defines the fields that the search index will contain. These include `id`, `title`, `content`, `category`, `titleVector`, and `contentVector`. The `titleVector` and `contentVector` fields are configured to support vector search.

3. It configures the vector search settings using the `VectorSearch` class. This includes specifying the algorithm to use for vector search (`HnswAlgorithmConfiguration`) and the profile for vector search (`VectorSearchProfile`).

4. It sets up the semantic search configuration using the `SemanticConfiguration` class. This includes specifying the fields to prioritize in semantic search.

5. It creates the semantic search settings with the configured semantic search configurations.

6. It creates or updates the search index with the defined fields, vector search settings, and semantic search settings.

7. Finally, it prints the name of the created or updated search index.

This script is useful for setting up a search index in Azure Cognitive Search that supports both vector search and semantic search. This allows for more advanced search capabilities, such as finding similar items based on vector embeddings and understanding search queries in a more semantic way.

## Insert text and embeddings into vector store
Add texts and metadata from the JSON data to the vector store:

In [12]:
from azure.search.documents import SearchClient

# Upload some documents to the index
output_path = os.path.join('./output', 'docVectors.json')
output_directory = os.path.dirname(output_path)
if not os.path.exists(output_directory):
    os.makedirs(output_directory)
with open(output_path, 'r') as file:  
    documents = json.load(file)  
search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)
result = search_client.upload_documents(documents)
print(f"Uploaded {len(documents)} documents") 

Uploaded 108 documents


The provided Python script is using the Azure Cognitive Search SDK to upload documents to a search index. Here's a step-by-step breakdown of what the script does:

1. It sets the path to the JSON file (`docVectors.json`) that contains the documents to be uploaded.

2. It checks if the directory where the JSON file is located exists. If it doesn't, it creates the directory.

3. It opens the JSON file and loads the documents from it.

4. It sets up the `SearchClient` with the necessary endpoint, index name, and credentials.

5. It uploads the documents to the search index using the `upload_documents` method of the `SearchClient`.

6. Finally, it prints the number of documents that were uploaded.

This script is useful for populating a search index in Azure Cognitive Search with data. The data can then be searched using the various search capabilities provided by Azure Cognitive Search.

If you are indexing a very large number of documents, you can use the `SearchIndexingBufferedSender` which is an optimized way to automatically index the docs as it will handle the batching for you:

In [None]:
from azure.search.documents import SearchIndexingBufferedSender

# Upload some documents to the index  
with open(output_path, 'r') as file:  
    documents = json.load(file)  
  
# Use SearchIndexingBufferedSender to upload the documents in batches optimized for indexing  
with SearchIndexingBufferedSender(  
    endpoint=endpoint,  
    index_name=index_name,  
    credential=credential,  
) as batch_client:  
    # Add upload actions for all documents  
    batch_client.upload_documents(documents=documents)  
print(f"Uploaded {len(documents)} documents in total")  


## Perform a vector similarity search

This example shows a pure vector search using the vectorizable text query, all you need to do is pass in text and your vectorizer will handle the query vectorization.

In [13]:
from azure.search.documents.models import VectorizedQuery

# Pure Vector Search
query = "tools for software development"  
  
embedding = client.embeddings.create(input=query, model=embedding_model_name).data[0].embedding
vector_query = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="contentVector")
  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["title", "content", "category"],
)  
  
for result in results:  
    print(f"Title: {result['title']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['content']}")  
    print(f"Category: {result['category']}\n")  


Title: Azure DevOps
Score: 0.8289802
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure DevTest Labs
Score: 0.81774753
Content: Azure DevTest Labs is a fully managed service that enables you to create, manage, and share development and test environments in Azure. It provides features like custom templates, cost management, and integration with Azure DevOps. DevTest Labs supports various platforms, such

The provided Python script is using the Azure Cognitive Search SDK to perform a vector search on a search index. Here's a step-by-step breakdown of what the script does:

1. It sets the query to "tools for software development".

2. It generates an embedding for the query using the Azure OpenAI API. The model used for generating the embedding is specified by the `embedding_model_name` variable.

3. It creates a `VectorizedQuery` with the generated embedding. This query will find the 3 nearest neighbors in the `contentVector` field.

4. It performs the vector search on the search index using the `search` method of the `SearchClient`. The search results will include the `title`, `content`, and `category` fields.

5. Finally, it prints the `title`, `@search.score`, `content`, and `category` of each search result.

This script is useful for finding documents in a search index that are semantically similar to a given query. The similarity is determined based on the embeddings of the documents and the query.

This example shows a pure vector search to demonstrate OpenAI's text-embedding-ada-002 multilingual capabilities.

In [14]:
# Pure Vector Search multi-lingual (e.g 'tools for software development' in Dutch)  
query = "tools voor softwareontwikkeling"  
  
embedding = client.embeddings.create(input=query, model=embedding_model_name).data[0].embedding
vector_query = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="contentVector")

results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["title", "content", "category"],
)  
  
for result in results:  
    print(f"Title: {result['title']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['content']}")  
    print(f"Category: {result['category']}\n")  



Title: Azure DevOps
Score: 0.8034561
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure DevTest Labs
Score: 0.796056
Content: Azure DevTest Labs is a fully managed service that enables you to create, manage, and share development and test environments in Azure. It provides features like custom templates, cost management, and integration with Azure DevOps. DevTest Labs supports various platforms, such a

The provided Python script is performing a vector search on a search index using a query in Dutch ("tools voor softwareontwikkeling"). Here's a step-by-step breakdown of what the script does:

1. It sets the query to "tools voor softwareontwikkeling".

2. It generates an embedding for the query using the Azure OpenAI API. The model used for generating the embedding is specified by the `embedding_model_name` variable.

3. It creates a `VectorizedQuery` with the generated embedding. This query will find the 3 nearest neighbors in the `contentVector` field.

4. It performs the vector search on the search index using the `search` method of the `SearchClient`. The search results will include the `title`, `content`, and `category` fields.

5. Finally, it prints the `title`, `@search.score`, `content`, and `category` of each search result.

This script is useful for finding documents in a search index that are semantically similar to a given query, even when the query is in a different language. The similarity is determined based on the embeddings of the documents and the query.

## Perform an Exhaustive KNN exact nearest neighbor search

This example shows how you can exhaustively search your vector index regardless of what index you have, HNSW or ExhaustiveKNN. You can use this to calculate the ground-truth values.

In [15]:
# Pure Vector Search
query = "tools for software development"  
  
embedding = client.embeddings.create(input=query, model=embedding_model_name).data[0].embedding
vector_query = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="contentVector", exhaustive=True)
  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["title", "content", "category"],
)  
  
for result in results:  
    print(f"Title: {result['title']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['content']}")  
    print(f"Category: {result['category']}\n")  


Title: Azure DevOps
Score: 0.82898057
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure DevTest Labs
Score: 0.8177473
Content: Azure DevTest Labs is a fully managed service that enables you to create, manage, and share development and test environments in Azure. It provides features like custom templates, cost management, and integration with Azure DevOps. DevTest Labs supports various platforms, such

The provided Python script is performing a vector search on a search index using a query "tools for software development". Here's a step-by-step breakdown of what the script does:

1. It sets the query to "tools for software development".

2. It generates an embedding for the query using the Azure OpenAI API. The model used for generating the embedding is specified by the `embedding_model_name` variable.

3. It creates a `VectorizedQuery` with the generated embedding. This query will find the 3 nearest neighbors in the `contentVector` field. The `exhaustive` parameter is set to `True`, which means the search will go through all the vectors in the index to find the nearest neighbors.

4. It performs the vector search on the search index using the `search` method of the `SearchClient`. The search results will include the `title`, `content`, and `category` fields.

5. Finally, it prints the `title`, `@search.score`, `content`, and `category` of each search result.

This script is useful for finding documents in a search index that are semantically similar to a given query. The similarity is determined based on the embeddings of the documents and the query. The `exhaustive` parameter ensures that the search is thorough, but it may take longer, especially for large indexes.

## Perform a Cross-Field Vector Search

This example shows a cross-field vector search that allows you to query multiple vector fields at the same time. Note, ensure that the same embedding model was used for the vector fields you decide to query.

In [16]:
# Pure Vector Search
query = "tools for software development"  
  
embedding = client.embeddings.create(input=query, model=embedding_model_name).data[0].embedding
vector_query = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="contentVector, titleVector")

results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["title", "content", "category"],
)  
  
for result in results:  
    print(f"Title: {result['title']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['content']}")  
    print(f"Category: {result['category']}\n")  


Title: Azure DevOps
Score: 0.03333333507180214
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure DevTest Labs
Score: 0.032786883413791656
Content: Azure DevTest Labs is a fully managed service that enables you to create, manage, and share development and test environments in Azure. It provides features like custom templates, cost management, and integration with Azure DevOps. DevTest Labs supports var

The provided Python script is performing a vector search on a search index using a query "tools for software development". Here's a step-by-step breakdown of what the script does:

1. It sets the query to "tools for software development".

2. It generates an embedding for the query using the Azure OpenAI API. The model used for generating the embedding is specified by the `embedding_model_name` variable.

3. It creates a `VectorizedQuery` with the generated embedding. This query will find the 3 nearest neighbors in the `contentVector` and `titleVector` fields.

4. It performs the vector search on the search index using the `search` method of the `SearchClient`. The search results will include the `title`, `content`, and `category` fields.

5. Finally, it prints the `title`, `@search.score`, `content`, and `category` of each search result.

This script is useful for finding documents in a search index that are semantically similar to a given query. The similarity is determined based on the embeddings of the documents and the query. The search is performed on both the `contentVector` and `titleVector` fields, which means it considers both the content and title of the documents when finding similar items.

## Perform a Multi-Vector Search

This example shows a cross-field vector search that allows you to query multiple vector fields at the same time by passing in multiple query vectors. Note, in this case, you can pass in query vectors from two different embedding models to the corresponding vector fields in your index.

In [17]:
# Multi-Vector Search
query = "tools for software development"  
  

embedding = client.embeddings.create(input=query, model=embedding_model_name).data[0].embedding
vector_query_1 = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="titleVector")
vector_query_2 = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="contentVector")

results = search_client.search(  
    search_text=None,  
    vector_queries=[vector_query_1, vector_query_2],
    select=["title", "content", "category"],
)  
  
for result in results:  
    print(f"Title: {result['title']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['content']}")  
    print(f"Category: {result['category']}\n")  


Title: Azure DevOps
Score: 0.03333333507180214
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure DevTest Labs
Score: 0.032786883413791656
Content: Azure DevTest Labs is a fully managed service that enables you to create, manage, and share development and test environments in Azure. It provides features like custom templates, cost management, and integration with Azure DevOps. DevTest Labs supports var

The provided Python script is performing a multi-vector search on a search index using a query "tools for software development". Here's a step-by-step breakdown of what the script does:

1. It sets the query to "tools for software development".

2. It generates an embedding for the query using the Azure OpenAI API. The model used for generating the embedding is specified by the `embedding_model_name` variable.

3. It creates two `VectorizedQuery` objects with the generated embedding. The first query will find the 3 nearest neighbors in the `titleVector` field, and the second query will find the 3 nearest neighbors in the `contentVector` field.

4. It performs the vector search on the search index using the `search` method of the `SearchClient`. The search results will include the `title`, `content`, and `category` fields.

5. Finally, it prints the `title`, `@search.score`, `content`, and `category` of each search result.

This script is useful for finding documents in a search index that are semantically similar to a given query. The similarity is determined based on the embeddings of the documents and the query. The search is performed on both the `titleVector` and `contentVector` fields, which means it considers both the content and title of the documents when finding similar items. This is a form of multi-vector search, where multiple vector fields are searched simultaneously.

## Perform a Pure Vector Search with a filter
This example shows how to apply filters on your index. Note, that you can choose whether you want to use Pre-Filtering (default) or Post-Filtering.

In [18]:
from azure.search.documents.models import VectorFilterMode

# Pure Vector Search
query = "tools for software development"  
  
embedding = client.embeddings.create(input=query, model=embedding_model_name).data[0].embedding
vector_query = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="contentVector")

results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    vector_filter_mode=VectorFilterMode.PRE_FILTER,
    filter="category eq 'Developer Tools'",
    select=["title", "content", "category"],
)
  
for result in results:  
    print(f"Title: {result['title']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['content']}")  
    print(f"Category: {result['category']}\n")  


Title: Azure DevOps
Score: 0.82898057
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure DevTest Labs
Score: 0.8177473
Content: Azure DevTest Labs is a fully managed service that enables you to create, manage, and share development and test environments in Azure. It provides features like custom templates, cost management, and integration with Azure DevOps. DevTest Labs supports various platforms, such

The provided Python script is performing a vector search on a search index using a query "tools for software development" and applying a filter to the results. Here's a step-by-step breakdown of what the script does:

1. It sets the query to "tools for software development".

2. It generates an embedding for the query using the Azure OpenAI API. The model used for generating the embedding is specified by the `embedding_model_name` variable.

3. It creates a `VectorizedQuery` with the generated embedding. This query will find the 3 nearest neighbors in the `contentVector` field.

4. It performs the vector search on the search index using the `search` method of the `SearchClient`. The search results will include the `title`, `content`, and `category` fields.

5. It applies a filter to the search results to only include documents where the `category` field is equal to 'Developer Tools'. The `vector_filter_mode` is set to `PRE_FILTER`, which means the filter is applied before the vector search is performed.

6. Finally, it prints the `title`, `@search.score`, `content`, and `category` of each search result.

This script is useful for finding documents in a search index that are semantically similar to a given query and meet certain criteria. The similarity is determined based on the embeddings of the documents and the query. The filter allows you to narrow down the search results to only include relevant documents.

## Perform a Hybrid Search

In [19]:
# Hybrid Search
query = "scalable storage solution"  
  
embedding = client.embeddings.create(input=query, model=embedding_model_name).data[0].embedding
vector_query = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="contentVector")

results = search_client.search(  
    search_text=query,  
    vector_queries=[vector_query],
    select=["title", "content", "category"],
    top=3
)  
  
for result in results:  
    print(f"Title: {result['title']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['content']}")  
    print(f"Category: {result['category']}\n")  


Title: Azure Storage
Score: 0.03306011110544205
Content: Azure Storage is a scalable, durable, and highly available cloud storage service that supports a variety of data types, including blobs, files, queues, and tables. It provides a massively scalable object store for unstructured data. Storage supports data redundancy and geo-replication, ensuring high durability and availability. It offers a variety of data access and management options, including REST APIs, SDKs, and Azure Portal. You can secure your data using encryption at rest and in transit.
Category: Storage

Title: Azure Table Storage
Score: 0.032258063554763794
Content: Azure Table Storage is a fully managed, NoSQL datastore that enables you to store and query large amounts of structured, non-relational data. It provides features like automatic scaling, schema-less design, and a RESTful API. Table Storage supports various data types, such as strings, numbers, and booleans. You can use Azure Table Storage to store and manage

The provided Python script is performing a hybrid search on a search index using a query "scalable storage solution". Here's a step-by-step breakdown of what the script does:

1. It sets the query to "scalable storage solution".

2. It generates an embedding for the query using the Azure OpenAI API. The model used for generating the embedding is specified by the `embedding_model_name` variable.

3. It creates a `VectorizedQuery` with the generated embedding. This query will find the 3 nearest neighbors in the `contentVector` field.

4. It performs the hybrid search on the search index using the `search` method of the `SearchClient`. The search results will include the `title`, `content`, and `category` fields. The `search_text` parameter is set to the query, which means the search will also consider the text of the documents, not just their embeddings.

5. It limits the number of search results to 3 using the `top` parameter.

6. Finally, it prints the `title`, `@search.score`, `content`, and `category` of each search result.

This script is useful for finding documents in a search index that are semantically similar to a given query and also contain the query text. The similarity is determined based on the embeddings of the documents and the query, and the text search is performed on the entire text of the documents. This is a form of hybrid search, where both vector search and text search are used.

## Perform a Semantic Hybrid Search

In [20]:
from azure.search.documents.models import QueryType, QueryCaptionType, QueryAnswerType

# Semantic Hybrid Search
query = "what is azure sarch?"

embedding = client.embeddings.create(input=query, model=embedding_model_name).data[0].embedding
vector_query = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="contentVector", exhaustive=True)

results = search_client.search(  
    search_text=query,  
    vector_queries=[vector_query],
    select=["title", "content", "category"],
    query_type=QueryType.SEMANTIC, semantic_configuration_name=SEMANTIC_NAME, query_caption=QueryCaptionType.EXTRACTIVE, query_answer=QueryAnswerType.EXTRACTIVE,
    top=3
)

semantic_answers = results.get_answers()
for answer in semantic_answers:
    if answer.highlights:
        print(f"Semantic Answer: {answer.highlights}")
    else:
        print(f"Semantic Answer: {answer.text}")
    print(f"Semantic Answer Score: {answer.score}\n")

for result in results:
    print(f"Title: {result['title']}")
    print(f"Reranker Score: {result['@search.reranker_score']}")
    print(f"Content: {result['content']}")
    print(f"Category: {result['category']}")

    captions = result["@search.captions"]
    if captions:
        caption = captions[0]
        if caption.highlights:
            print(f"Caption: {caption.highlights}\n")
        else:
            print(f"Caption: {caption.text}\n")


Semantic Answer: Azure Cognitive Search is<em> a fully managed search-as-a-service that enables you to build rich search experiences for your applications.</em> It provides features like full-text search, faceted navigation, and filters. Azure Cognitive Search supports various data sources, such as Azure SQL Database, Azure Blob Storage, and Azure Cosmos DB.
Semantic Answer Score: 0.9462890625

Title: Azure Stack Edge
Reranker Score: 2.075716972351074
Content: Azure Stack Edge is a managed, edge computing appliance that enables you to run Azure services and AI workloads on-premises or at the edge. It provides features like hardware-accelerated machine learning, local caching, and integration with Azure IoT Hub. Azure Stack Edge supports various Azure services, such as Azure Functions, Azure Machine Learning, and Azure Kubernetes Service. You can use Azure Stack Edge to build edge computing applications, optimize your data processing, and ensure the security and compliance of your workl

The provided Python script is performing a semantic hybrid search on a search index using a query "what is azure sarch?". Here's a step-by-step breakdown of what the script does:

1. It sets the query to "what is azure sarch?".

2. It generates an embedding for the query using the Azure OpenAI API. The model used for generating the embedding is specified by the `embedding_model_name` variable.

3. It creates a `VectorizedQuery` with the generated embedding. This query will find the 3 nearest neighbors in the `contentVector` field.

4. It performs the hybrid search on the search index using the `search` method of the `SearchClient`. The search results will include the `title`, `content`, and `category` fields. The `search_text` parameter is set to the query, which means the search will also consider the text of the documents, not just their embeddings.

5. It sets the `query_type` to `SEMANTIC`, which means the search will use semantic ranking, a machine learning model that re-ranks the top search results based on their semantic relevance to the search query.

6. It sets the `query_caption` and `query_answer` to `EXTRACTIVE`, which means the search will extract captions and answers from the content of the documents.

7. It limits the number of search results to 3 using the `top` parameter.

8. It gets the semantic answers from the search results and prints them along with their scores.

9. Finally, it prints the `title`, `@search.reranker_score`, `content`, and `category` of each search result, as well as the extracted captions.

This script is useful for finding documents in a search index that are semantically similar to a given query, contain the query text, and can provide direct answers to the query. The similarity is determined based on the embeddings of the documents and the query, and the text search is performed on the entire text of the documents. This is a form of semantic hybrid search, where both vector search and semantic search are used.