# Build a RAG solution in Azure AI Search

This notebook provides sample script for the indexing pipeline in [Build a RAG solution in Azure AI Search](https://learn.microsoft.com/azure/search/tutorial-rag-build-solution). If you need more information than the readme provides, you can refer to that article.

Steps in this notebook include:

- Set up the environment
- Set up the Azure resources used in the pipeline
- Create an index, data source, skillset, and indexer on Azure AI Search
- Send a query to the search engine to check results
- Send a query to an LLM to chat with your data
- Revisit the index schema and query logic to improve relevance

Sample data is a collection of PDF pages from the NASA's Earth Book that you load into Azure Storage and retrieve during indexing.

This tutorial assumes embedding and chat models on Azure OpenAI so that you can use the integrated vectorization capabilities of Azure AI Search. You can use a different provider but you might need custom skills or a different approach for indexing and embedding your content.

## Prerequisites

You need the following Azure resources to run all of the script in this notebook.

- [Azure Storage](https://learn.microsoft.com/azure/storage/common/storage-account-create), general purpose account, used for providing the PDFs.

- [Azure OpenAI](https://learn.microsoft.com/azure/ai-services/openai/how-to/create-resource) provides the embedding and chat models.

- [Azure AI Services multiservice account](https://learn.microsoft.com/azure/ai-services/multi-service-resource), in the same region as Azure AI Search, used for recognizing location entities in the Earth Book.

- [Azure AI Search](https://learn.microsoft.com/azure/search/search-create-service-portal), basic tier or higher is recommended. Choose the same region as Azure OpenAI and Azure AI multiservice.

To meet the same-region requirement, start by reviewing the [regions for the embedding and chat models](https://learn.microsoft.com/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability) you want to use. Once you identify a region, confirm that Azure AI Search with AI services integration is available in the [same region](https://learn.microsoft.com/azure/search/search-region-support#azure-public-regions).

## Sign in to Azure

You might not need this step, but if downstream connections fail with a 401 during indexer pipeline execution, it could be because you're using the wrong tenant or subscription. You can avoid this issue by signing in from the command line, explicitly setting the tenant ID and choosing the right subscription.

This section assumes you have the [Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively).

1. Open a command line prompt.

1. Run this command to get the current tenant and subscription information: `az account show`

1. If you have multiple subscriptions, specify the one that has Azure AI Search and Azure OpenAI: `az account set --subscription <PUT YOUR SUBSCRIPTION ID HERE>`

1. If you have multiple tenants, you can list them: `az account tenant list`

1. Sign in to Azure, specifying the tenant used for Azure AI Search and Azure OpenAI: `az login --tenant <PUT YOUR TENANT ID HERE> `

You should now be logged in to Azure from your local device.

## Set up Azure resources using the Azure portal

We recommend using the Azure portal for setting up resources.

You must be a subscription **Owner** or **User Access Administrator** to create roles. If you don't have permission to create roles, you can use API keys instead. If you're using keys, you can skip the steps that enable system assigned managed identities.

1. Download the sample PDF files from [nasa-e-book/earth_book_2019_text_pages](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/nasa-e-book/earth_book_2019_text_pages).

1. Sign in to the [Azure portal](https://portal.azure.com).

1. Make sure Azure AI Search, Azure OpenAI, and Azure AI multiservice resources are in the same region.

### Configure Azure Storage

1. On the Azure Storage left menu, select **Storage browser** > **Blob containers**, and then **Add container**.

1. Name the container *nasa-ebooks-pdfs-all*.

1. Upload the PDFs to the container.

1. On the left menu, select **Settings** > **Identity** and turn on system assigned managed identity.

### Configure Azure AI Search

1. On the Azure AI Search left menu, select **Settings** > **Semantic ranker** and enable the free plan that authorizes 1,000 requests at no charge.

1. On the left menu, select **Settings** > **Keys** and turn on role-based access control or "both".

1. On the left menu, select **Settings** > **Identity** and turn on system assigned managed identity.

### Configure Azure OpenAI

Deploy the following models on Azure OpenAI:

- text-embedding-3-large on Azure OpenAI for embeddings
- gpt-4o on Azure OpenAI for chat completion

You must have [**Cognitive Services OpenAI Contributor**]( /azure/ai-services/openai/how-to/role-based-access-control#cognitive-services-openai-contributor) or higher to deploy models in Azure OpenAI.

1. Go to [Azure OpenAI Studio](https://oai.azure.com/).

1. Select **Deployments** on the left menu.

1. Select **Deploy model** > **Deploy base model**.

1. Select **text-embedding-3-large** from the dropdown list and confirm the selection.

1. Specify a deployment name. We recommend "text-embedding-3-large".

1. Accept the defaults.

1. Select **Deploy**.

1. Repeat the previous steps for **gpt-4o**.

Make a note of the model names and endpoint. Embedding skills and vectorizers assemble the full endpoint internally, so you only need the resource URI. For example, given `https://MY-FAKE-ACCOUNT.openai.azure.com/openai/deployments/text-embedding-3-large/embeddings?api-version=2024-06-01`, the endpoint you should provide in skill and vectorizer definitions is `https://MY-FAKE-ACCOUNT.openai.azure.com`.

### Configure search engine role-based access to Azure Storage

1. Sign in to the [Azure portal](https://portal.azure.com) and find your storage account.

1. On the left menu, select **Access control (IAM)**.

1. Add a role for **Storage Blob Data Reader**, assigned to the search service system-managed identity.

### Configure search engine role-based access to Azure models

Assign yourself *and* the search service identity permissions on Azure OpenAI. The code for this tutorial runs locally. Requests to Azure OpenAI originate from your system. Also, embedding requests and query reponses from the search engine are passed to Azure OpenAI. For these reasons, both you and the search service need permissions on Azure OpenAI.

1. Sign in to the [Azure portal](https://portal.azure.com) and find your Azure OpenAI resource.

1. On the left menu, select **Access control (IAM)**.

1. Add a role for [**Cognitive Services OpenAI User**](/azure/ai-services/openai/how-to/role-based-access-control#cognitive-services-openai-userpermissions).

1. Select **Managed identity** and then select **Members**. Find the system-managed identity for your search service in the dropdown list.

1. Next, select **User, group, or service principal** and then select **Members**. Search for your user account and then select it from the dropdown list.

1. Select **Review and Assign** to create the role assignments.

This step concludes provisioning services in the Azure portal. Continuing to the next section, you switch to Visual Studio Code and a local environment.

## Create a virtual environment in Visual Studio Code

Create a virtual environment so that you can install the dependencies in isolation.

1. In Visual Studio Code, open the folder containing tutorial-rag.ipynb.

1. Press Ctrl-shift-P to open the command palette, search for "Python: Create Environment", and then select `Venv` to create a virtual environment in the current workspace.

1. Select Tutorial-RAG\tutorial-rag-requirements.txt for the dependencies.

It takes several minutes to create the environment. When the environment is ready, continue to the next step.

## Install packages

In [4]:
! pip install -r tutorial-rag-requirements.txt --quiet

## Set endpoints

Provide the endpoints you collected in a previous step. You can leave the API keys empty if you enabled role-based authentication. Otherwise, if you can't use roles, provide API keys for each resource.

The Azure AI multiservice account is used for skills processing. The multiservice account key must be provided, even if you're using role-based access control. The key isn't used on the connection, but it's currently used for billing purposes.

In [1]:
# Set endpoints and API keys for Azure services
AZURE_SEARCH_SERVICE: str = "PUT YOUR SEARCH SERVICE URL HERE"
# AZURE_SEARCH_KEY: str = "DELETE IF USING ROLES, OTHERWISE PUT YOUR SEARCH SERVICE ADMIN KEY HERE"
AZURE_OPENAI_ACCOUNT: str = "PUT YOUR AZURE OPENAI ACCOUNT URL HERE"
# AZURE_OPENAI_KEY: str = "DELETE IF USING ROLES, OTHERWISE PUT YOUR AZURE OPENAI KEY HERE"
AZURE_AI_MULTISERVICE_ACCOUNT: str = "PUT YOUR AZURE AI MULTISERVICE ACCOUNT URL HERE"
AZURE_AI_MULTISERVICE_KEY: str = "PUT YOUR AZURE AI MULTISERVICE KEY HERE. ROLES ARE USED TO CONNECT. KEY IS USED FOR BILLING."
AZURE_STORAGE_CONNECTION: str = "PUT YOUR AZURE STORAGE CONNECTION STRING HERE (see example below for syntax)"

# Example connection string for a search service managed identity connection:
# "ResourceId=/subscriptions/FAKE-SUBCRIPTION=ID/resourceGroups/FAKE-RESOURCE-GROUP/providers/Microsoft.Storage/storageAccounts/FAKE-ACCOUNT;"

## Create an index

This is index schema used for [Build a RAG solution in Azure AI Search](https://learn.microsoft.com/azure/search/tutorial-rag-build-solution).

In [2]:
from azure.identity import DefaultAzureCredential
from azure.identity import get_bearer_token_provider
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchField,
    SearchFieldDataType,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    AzureOpenAIVectorizer,
    AzureOpenAIVectorizerParameters,
    SearchIndex
)

credential = DefaultAzureCredential()

# Create a search index  
index_name = "py-rag-tutorial-idx"
index_client = SearchIndexClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)  
fields = [
    SearchField(name="parent_id", type=SearchFieldDataType.String),  
    SearchField(name="title", type=SearchFieldDataType.String),
    SearchField(name="locations", type=SearchFieldDataType.Collection(SearchFieldDataType.String), filterable=True),
    SearchField(name="chunk_id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True, analyzer_name="keyword"),  
    SearchField(name="chunk", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),  
    SearchField(name="text_vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=1024, vector_search_profile_name="myHnswProfile")
    ]  
  
# Configure the vector search configuration  
vector_search = VectorSearch(  
    algorithms=[  
        HnswAlgorithmConfiguration(name="myHnsw"),
    ],  
    profiles=[  
        VectorSearchProfile(  
            name="myHnswProfile",  
            algorithm_configuration_name="myHnsw",  
            vectorizer_name="myOpenAI",  
        )
    ],  
    vectorizers=[  
        AzureOpenAIVectorizer(  
            vectorizer_name="myOpenAI",  
            kind="azureOpenAI",  
            parameters=AzureOpenAIVectorizerParameters(  
                resource_url=AZURE_OPENAI_ACCOUNT,  
                deployment_name="text-embedding-3-large",
                model_name="text-embedding-3-large"
            ),
        ),  
    ], 
)  
  
# Create the search index
index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search)  
result = index_client.create_or_update_index(index)  
print(f"{result.name} created")  

py-rag-tutorial-idx created


## Create a data source

Create a data source connection to PDFs in Azure Blob Storage.

In [3]:
from azure.search.documents.indexes import SearchIndexerClient
from azure.search.documents.indexes.models import (
    SearchIndexerDataContainer,
    SearchIndexerDataSourceConnection
)

# Create a data source 
indexer_client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)
container = SearchIndexerDataContainer(name="nasa-ebooks-pdfs-all")
data_source_connection = SearchIndexerDataSourceConnection(
    name="py-rag-tutorial-ds",
    type="azureblob",
    connection_string=AZURE_STORAGE_CONNECTION,
    container=container
)
data_source = indexer_client.create_or_update_data_source_connection(data_source_connection)

print(f"Data source '{data_source.name}' created or updated")

Data source 'py-rag-tutorial-ds' created or updated


## Create a skillset

This skillset chunks and embeds data. It also uses entity recognition to detect location entities.

In [4]:
from azure.search.documents.indexes.models import (
    SplitSkill,
    InputFieldMappingEntry,
    OutputFieldMappingEntry,
    AzureOpenAIEmbeddingSkill,
    EntityRecognitionSkill,
    SearchIndexerIndexProjection,
    SearchIndexerIndexProjectionSelector,
    SearchIndexerIndexProjectionsParameters,
    IndexProjectionMode,
    SearchIndexerSkillset,
    CognitiveServicesAccountKey
)

# Create a skillset  
skillset_name = "py-rag-tutorial-ss"

split_skill = SplitSkill(  
    description="Split skill to chunk documents",  
    text_split_mode="pages",  
    context="/document",  
    maximum_page_length=2000,  
    page_overlap_length=500,  
    inputs=[  
        InputFieldMappingEntry(name="text", source="/document/content"),  
    ],  
    outputs=[  
        OutputFieldMappingEntry(name="textItems", target_name="pages")  
    ],  
)  
  
embedding_skill = AzureOpenAIEmbeddingSkill(  
    description="Skill to generate embeddings via Azure OpenAI",  
    context="/document/pages/*",  
    resource_url=AZURE_OPENAI_ACCOUNT,  
    deployment_name="text-embedding-3-large",  
    model_name="text-embedding-3-large",
    dimensions=1024,
    inputs=[  
        InputFieldMappingEntry(name="text", source="/document/pages/*"),  
    ],  
    outputs=[  
        OutputFieldMappingEntry(name="embedding", target_name="text_vector")  
    ],  
)

entity_skill = EntityRecognitionSkill(
    description="Skill to recognize entities in text",
    context="/document/pages/*",
    categories=["Location"],
    default_language_code="en",
    inputs=[
        InputFieldMappingEntry(name="text", source="/document/pages/*")
    ],
    outputs=[
        OutputFieldMappingEntry(name="locations", target_name="locations")
    ]
)
  
index_projections = SearchIndexerIndexProjection(  
    selectors=[  
        SearchIndexerIndexProjectionSelector(  
            target_index_name=index_name,  
            parent_key_field_name="parent_id",  
            source_context="/document/pages/*",  
            mappings=[  
                InputFieldMappingEntry(name="chunk", source="/document/pages/*"),  
                InputFieldMappingEntry(name="text_vector", source="/document/pages/*/text_vector"),
                InputFieldMappingEntry(name="locations", source="/document/pages/*/locations"),  
                InputFieldMappingEntry(name="title", source="/document/metadata_storage_name"),  
            ],  
        ),  
    ],  
    parameters=SearchIndexerIndexProjectionsParameters(  
        projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS  
    ),  
) 

cognitive_services_account = CognitiveServicesAccountKey(key=AZURE_AI_MULTISERVICE_KEY)

skills = [split_skill, embedding_skill, entity_skill]

skillset = SearchIndexerSkillset(  
    name=skillset_name,  
    description="Skillset to chunk documents and generating embeddings",  
    skills=skills,  
    index_projection=index_projections,
    cognitive_services_account=cognitive_services_account
)
  
client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)  
client.create_or_update_skillset(skillset)  
print(f"{skillset.name} created")  

py-rag-tutorial-ss created


## Create an indexer

The indexer drives the pipeline. You can create an indexer in a disabled state, but the default is for the indexer to run as soon as you send the request.

In [5]:
from azure.search.documents.indexes.models import (
    SearchIndexer
)

# Create an indexer  
indexer_name = "py-rag-tutorial-idxr" 

indexer_parameters = None

indexer = SearchIndexer(  
    name=indexer_name,  
    description="Indexer to index documents and generate embeddings",  
    skillset_name=skillset_name,  
    target_index_name=index_name,  
    data_source_name=data_source.name,
    parameters=indexer_parameters
)  

# Create and run the indexer  
indexer_client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)  
indexer_result = indexer_client.create_or_update_indexer(indexer)  

print(f' {indexer_name} is created and running. Give the indexer a few minutes before running a query.')  

 py-rag-tutorial-idxr is created and running. Give the indexer a few minutes before running a query.


## Check results

After waiting several minutes, send a request to the search engine. There is no chat or generative AI at this point. The results are verbatim content from your search index.

In [6]:
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery

# Vector Search using text-to-vector conversion of the query string
query = "what's NASA's website?"  

search_client = SearchClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential, index_name=index_name)
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")
  
results = search_client.search(  
    search_text=query,  
    vector_queries= [vector_query],
    select=["chunk"],
    top=1
)  
  
for result in results:  
    print(f"Score: {result['@search.score']}")
    print(f"Chunk: {result['chunk']}")

Score: 0.01666666753590107
Chunk: national Aeronautics and Space Administration

earth Science

NASA Headquarters 

300 E Street SW 

Washington, DC 20546

www.nasa.gov

np-2018-05-2546-hQ


## Search using a chat model

This script sends a query, the query response, and a prompt to an LLM for chat completion. This time, the response is created using generative AI. We broke this task out into three separate tasks: set up the clients, set up the search query, call the LLM and get the response. We also give it a more interesting query. 

To learn more about this step, revisit [Search and generate answers](https://learn.microsoft.com/azure/search/tutorial-rag-build-solution-query) in the tutorial.

In [7]:
# Import libraries
from azure.search.documents import SearchClient
from openai import AzureOpenAI

# Set up the Azure OpenAI client
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
openai_client = AzureOpenAI(
     api_version="2024-06-01",
     azure_endpoint=AZURE_OPENAI_ACCOUNT,
     azure_ad_token_provider=token_provider
 )

deployment_name = "gpt-4o"

# Set up the Azure Azure AI Search client
search_client = SearchClient(
     endpoint=AZURE_SEARCH_SERVICE,
     index_name=index_name,
     credential=credential
 )

# Provide instructions to the model
GROUNDED_PROMPT="""
You are an AI assistant that helps users learn from the information found in the source material.
Answer the query using only the sources provided below.
Use bullets if the answer has multiple points.
If the answer is longer than 3 sentences, provide a summary.
Answer ONLY with the facts listed in the list of sources below. Cite your source when you answer the question
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
Query: {query}
Sources:\n{sources}
"""

In [8]:
# Provide the search query. 
# It's hybrid: a keyword search on "query", with text-to-vector conversion for "vector_query".
# The vector query finds 50 nearest neighbor matches in the search index
query="What's the NASA earth book about?"
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")

# Set up the search results and the chat thread.
# Retrieve the selected fields from the search index related to the question.
# Search results are limited to the top 5 matches. Limiting top can help you stay under LLM quotas.
search_results = search_client.search(
    search_text=query,
    vector_queries= [vector_query],
    select=["title", "chunk", "locations"],
    top=5,
)

# Newlines could be in the OCR'd content or in PDFs, as is the case for the sample PDFs used for this tutorial.
# Use a unique separator to make the sources distinct. 
# We chose repeated equal signs (=) followed by a newline because it's unlikely the source documents contain this sequence.
sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])


In [9]:
response = openai_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
        }
    ],
    model=deployment_name
)

print(response.choices[0].message.content)

The NASA Earth book is about the intricate and captivating science of our planet, studied through NASA's unique perspective and tools. It presents Earth as a dynamic and complex system, observed through various cycles and processes such as the water cycle and ocean circulation. The book combines stunning satellite images with detailed scientific insights, portraying Earth’s beauty and the continuous interaction of land, wind, water, ice, and air seen from above. It aims to inspire and demonstrate that the truth of our planet is as compelling as any fiction.

Source: page-8.pdf


## Try another query

The first query is very broad. Let's ask another question that requires the search engine and the LLM to find more granular information.

In [10]:
# Focused query on cloud formations and bodies of water
query="Are there any cloud formations specific to oceans and large bodies of water?"
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")

search_results = search_client.search(
    search_text=query,
    vector_queries= [vector_query],
    select=["title", "chunk", "locations"],
    top=5,
)

sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])

response = openai_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
        }
    ],
    model=deployment_name
)

print(response.choices[0].message.content)

Yes, there are cloud formations specific to oceans and large bodies of water. These include:

- **Cloud Streets**: Formed when wind blows from a cold surface like sea ice over the warmer, moister air near the open ocean. The winds create cylinders of spinning air, with clouds forming along the upward cycle of these cylinders. This phenomenon was observed over the Bering Strait in January 2010 (Source: page-21.pdf).
- **Dense Marine Clouds**: Commonly form over the ocean due to cooler, moist marine air. For example, along the coast of China, onshore winds carry these clouds toward the land, but they tend to evaporate as they move onshore due to the warmer, drier landmass (Source: page-33.pdf).

Summary: Specific cloud formations, such as cloud streets and dense marine clouds, occur over oceans and large bodies of water.


## Update the schema for semantic ranking and scoring profile

Azure AI Search has multiple features and capabilities that improve relevance. In this step, we add two of them: semantic ranking and scoring profiles. 

Semantic ranking and scoring profile configurations exist in the index schema. You can update an existing index to use both without incurring a [rebuild requirement](/azure/search/search-howto-reindex). An update request should include all of the existing schema definitions that you want to keep, plus the new or changed elements. It's a best practice to issue a GET INDEX request to retrieve the current index before adding new elements.

To learn more about this step, see [Maximimze relevance](https://learn.microsoft.com/azure/search/tutorial-rag-build-maximize-relevance) in the RAG tutorial series.

In [11]:
# Update the classes to include the new fields
from azure.identity import DefaultAzureCredential
from azure.identity import get_bearer_token_provider
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchField,
    SearchFieldDataType,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    AzureOpenAIVectorizer,
    AzureOpenAIVectorizerParameters,
    SearchIndex,
    SemanticConfiguration,
    SemanticPrioritizedFields,
    SemanticField,
    SemanticSearch,
    ScoringProfile,
    TagScoringFunction,
    TagScoringParameters
)

credential = DefaultAzureCredential()

# Existing index name, client and fields
index_name = "py-rag-tutorial-idx"
index_client = SearchIndexClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)  
fields = [
    SearchField(name="parent_id", type=SearchFieldDataType.String),  
    SearchField(name="title", type=SearchFieldDataType.String),
    SearchField(name="locations", type=SearchFieldDataType.Collection(SearchFieldDataType.String), filterable=True),
    SearchField(name="chunk_id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True, analyzer_name="keyword"),  
    SearchField(name="chunk", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),  
    SearchField(name="text_vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=1024, vector_search_profile_name="myHnswProfile")
    ]  
  
# Existing vector search configuration  
vector_search = VectorSearch(  
    algorithms=[  
        HnswAlgorithmConfiguration(name="myHnsw"),
    ],  
    profiles=[  
        VectorSearchProfile(  
            name="myHnswProfile",  
            algorithm_configuration_name="myHnsw",  
            vectorizer_name="myOpenAI",  
        )
    ],  
    vectorizers=[  
        AzureOpenAIVectorizer(  
            vectorizer_name="myOpenAI",  
            kind="azureOpenAI",  
            parameters=AzureOpenAIVectorizerParameters(  
                resource_url=AZURE_OPENAI_ACCOUNT,  
                deployment_name="text-embedding-3-large",
                model_name="text-embedding-3-large"
            ),
        ),  
    ], 
)

# New semantic configuration
semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="title"),
        keywords_fields=[SemanticField(field_name="locations")],
        content_fields=[SemanticField(field_name="chunk")]
    )
)

# Create the semantic settings with the configuration
semantic_search = SemanticSearch(configurations=[semantic_config])

# New scoring profile
scoring_profiles = [  
    ScoringProfile(  
        name="my-scoring-profile",
        functions=[
            TagScoringFunction(  
                field_name="locations",  
                boost=5.0,  
                parameters=TagScoringParameters(  
                    tags_parameter="tags",  
                ),  
            ) 
        ]
    )
]


# Update the search index with the semantic configuration and scoring profile
index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search, scoring_profiles=scoring_profiles)  
result = index_client.create_or_update_index(index)  
print(f"{result.name} updated")  

py-rag-tutorial-idx updated


## Rerun the query using semantic configuration and scoring profile

This example updates the query request. It's using the same query as before, but with the addition of semantic ranking and a scoring profile that boosts any matching search documents that mention water-related terms. 

Compared to the "before" query that gave us a reasonable response to the question about cloud formations and water, this query should provide a better answer based on the extra relevance tuning capabilities.

An explanation for this script can be found in [Maximimze relevance](https://learn.microsoft.com/azure/search/tutorial-rag-build-maximize-relevance) in the RAG tutorial series.

In [13]:
# Import libraries
from azure.search.documents import SearchClient
from openai import AzureOpenAI

token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
openai_client = AzureOpenAI(
     api_version="2024-06-01",
     azure_endpoint=AZURE_OPENAI_ACCOUNT,
     azure_ad_token_provider=token_provider
 )

deployment_name = "gpt-4o"

search_client = SearchClient(
     endpoint=AZURE_SEARCH_SERVICE,
     index_name=index_name,
     credential=credential
 )

# Prompt is unchanged in this update
GROUNDED_PROMPT="""
You are an AI assistant that helps users learn from the information found in the source material.
Answer the query using only the sources provided below.
Use bullets if the answer has multiple points.
If the answer is longer than 3 sentences, provide a summary.
Answer ONLY with the facts listed in the list of sources below. Cite your source when you answer the question
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
Query: {query}
Sources:\n{sources}
"""

# Queries are unchanged in this update
query="Are there any cloud formations specific to oceans and large bodies of water?"
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")

# Add query_type semantic and semantic_configuration_name
# Add scoring_profile and scoring_parameters
search_results = search_client.search(
    query_type="semantic",
    semantic_configuration_name="my-semantic-config",
    scoring_profile="my-scoring-profile",
    scoring_parameters=["tags-ocean, 'sea surface', seas, surface"],
    search_text=query,
    vector_queries= [vector_query],
    select="title, chunk, locations",
    top=5,
)
sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])

response = openai_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
        }
    ],
    model=deployment_name
)

print(response.choices[0].message.content)

Yes, there are specific cloud formations associated with oceans and large bodies of water:

- **Low Stratus Clouds**: Observed framing a hole over iceberg A-56 in the South Atlantic Ocean. These clouds can be influenced by thermal instabilities created by large obstacles like icebergs (page-39.pdf).
- **Undular Bore/Solitary Wave**: Created by the interaction between cool, dry air from Africa and warm, moist air over the Atlantic Ocean off the coast of Mauritania. This results in a wave structure in the atmosphere that influences cloud formation (page-23.pdf).
- **Ship Tracks**: Narrow clouds formed by water vapor condensing around pollution particles from ship exhaust, observed over the Pacific Ocean. These clouds can stretch for many hundreds of kilometers (page-31.pdf).
- **Volcanic Eruption Plumes**: Ash and volcanic particles from eruptions in the South Sandwich Islands act as seeds for cloud formation. These plumes were observed in the South Atlantic Ocean (page-13.pdf).

Summary

## Create a second index with reduced vector size

Azure AI Search has multiple approaches for reducing vector size, which lowers the cost of vector workloads. In this step, create a new index that uses the following capabilities:

- Smaller vector indexes by compressing the vectors used during query execution. Scalar quantization provides this capability.
- Smaller vector indexes by opting out of vector storage for search results. If you only need vectors for queries and not in response payload, you can drop the vector copy used for search results.
- Smaller vector fields through narrow data types. You can specify Collection(Edm.Half) on the text_vector field to store incoming float32 dimensions as float16.

All of these capabilities are specified in a search index. After you load the index, compare the difference between the original index and the new one.


In [None]:
from azure.identity import DefaultAzureCredential
from azure.identity import get_bearer_token_provider
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchField,
    SearchFieldDataType,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    AzureOpenAIVectorizer,
    AzureOpenAIVectorizerParameters,
    ScalarQuantizationCompression,
    ScalarQuantizationParameters,
    SearchIndex,
    SemanticConfiguration,
    SemanticPrioritizedFields,
    SemanticField,
    SemanticSearch,
    ScoringProfile,
    TagScoringFunction,
    TagScoringParameters
)

credential = DefaultAzureCredential()

index_name = "py-rag-tutorial-small-vectors-idx"
index_client = SearchIndexClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)  
fields = [
    SearchField(name="parent_id", type=SearchFieldDataType.String),  
    SearchField(name="title", type=SearchFieldDataType.String),
    SearchField(name="locations", type=SearchFieldDataType.Collection(SearchFieldDataType.String), filterable=True),
    SearchField(name="chunk_id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True, analyzer_name="keyword"),  
    SearchField(name="chunk", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),  
    SearchField(name="text_vector", type="Collection(Edm.Half)", vector_search_dimensions=1024, vector_search_profile_name="myHnswProfile", stored= False)
    ]  

# Configure the vector search configuration  
vector_search = VectorSearch(  
    algorithms=[  
        HnswAlgorithmConfiguration(name="myHnsw"),
    ],  
    profiles=[  
        VectorSearchProfile(  
            name="myHnswProfile",  
            algorithm_configuration_name="myHnsw",
            compression_name="myScalarQuantization",
            vectorizer_name="myOpenAI",  
        )
    ],  
    vectorizers=[  
        AzureOpenAIVectorizer(  
            vectorizer_name="myOpenAI",  
            kind="azureOpenAI",  
            parameters=AzureOpenAIVectorizerParameters(  
                resource_url=AZURE_OPENAI_ACCOUNT,  
                deployment_name="text-embedding-3-large",
                model_name="text-embedding-3-large"
            ),
        ),  
    ],
    compressions=[
        ScalarQuantizationCompression(
            compression_name="myScalarQuantization",
            rerank_with_original_vectors=True,
            default_oversampling=10,
            parameters=ScalarQuantizationParameters(quantized_data_type="int8"),
        )
    ]
)

semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="title"),
        keywords_fields=[SemanticField(field_name="locations")],
        content_fields=[SemanticField(field_name="chunk")]
    )
)

semantic_search = SemanticSearch(configurations=[semantic_config])

scoring_profiles = [  
    ScoringProfile(  
        name="my-scoring-profile",
        functions=[
            TagScoringFunction(  
                field_name="locations",  
                boost=5.0,  
                parameters=TagScoringParameters(  
                    tags_parameter="tags",  
                ),  
            ) 
        ]
    )
]

index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search, scoring_profiles=scoring_profiles)  
result = index_client.create_or_update_index(index)  
print(f"{result.name} created")


## Run the indexer to create the new index

Your ability to create and populate the new index is predicated on reusing data structures created earlier in this tutorial. If you have run every cell in this notebook, you have an existing data source and skillset, but here we create a new indexer so that there's no history or caching to get in the way. The indexer uses the notebook's current values for skillset_name, target_index_name, and data_source_name. Because the skillset and data source haven't changed, the existing references are still valid. Because you just created a new index in the previous cell, the current target_index_name is now "py-rag-tutorial-small-vectors-idx".

In [None]:
from azure.search.documents.indexes.models import (
    SearchIndexer
)

# Create an indexer  
indexer_name = "py-rag-tutorial-small-vectors-idxr" 

indexer_parameters = None

indexer = SearchIndexer(  
    name=indexer_name,  
    description="Indexer to index documents and generate embeddings",  
    skillset_name=skillset_name,  
    target_index_name=index_name,  
    data_source_name=data_source.name,
    parameters=indexer_parameters
)  

# Create and run the indexer  
indexer_client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)  
indexer_result = indexer_client.create_or_update_indexer(indexer)  

print(f' {indexer_name} is created and running. Give the indexer a few minutes before running a query.')


As a final step, switch to the Azure portal to compare the vector storage requirements for the two indexes. The index created in the last step uses half-precision floating-point numbers (float16) for the text vectors. This reduces the storage requirements for the vectors by half compared to the previous index that used single-precision floating-point numbers (float32). Scalar compression and the omission of one set of the vecctors account for the remaining storage savings. For more information about reducing vector size, see [Choose an approach for optimizing vector storage and processing](https://learn.microsoft.com/azure/search/vector-search-how-to-configure-compression-storage).

Consider revisiting the queries from the previous lessons so that you can compare query speed and utility. You should expect some variation in LLM output whenever you repeat a query, but in general the storage-saving techniques you implemented should not cause significant degradation of search result quality.