# Quickstart: Agentic retrieval in Azure AI Search using Python

This notebook demonstrates the basics of agentic retrieval in Azure AI Search. You create and load a search index, set up a knowledge source and knowledge base, and run queries that use an LLM for query planning and answer synthesis. You also run an optional evaluation to assess the groundedness and relevance of the pipeline.

For prerequisites and setup instructions, see [Quickstart: Agentic retrieval using Python](https://learn.microsoft.com/azure/search/search-get-started-agentic-retrieval?pivots=programming-language-python).

## Load connections

Before you run this cell, create a virtual environment with `Quickstart-Agentic-Retrieval/requirements.txt` as the dependencies.

In [None]:
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
import os

# Take environment variables from .env
load_dotenv(override=True)

# This notebook uses the following variables from your .env file
search_endpoint = os.environ["SEARCH_ENDPOINT"]
credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(credential, "https://search.azure.com/.default")
aoai_endpoint = os.environ["AOAI_ENDPOINT"]
aoai_embedding_model = os.environ.get("AOAI_EMBEDDING_MODEL", "text-embedding-3-large")
aoai_embedding_deployment = os.environ.get("AOAI_EMBEDDING_DEPLOYMENT", "text-embedding-3-large")
aoai_gpt_model = os.environ.get("AOAI_GPT_MODEL", "gpt-5-mini")
aoai_gpt_deployment = os.environ.get("AOAI_GPT_DEPLOYMENT", "gpt-5-mini")
index_name = os.environ.get("INDEX_NAME", "earth-at-night")
knowledge_source_name = os.environ.get("KNOWLEDGE_SOURCE_NAME", "earth-knowledge-source")
knowledge_base_name = os.environ.get("KNOWLEDGE_BASE_NAME", "earth-knowledge-base")

## Create a search index

This step creates an index that contains plain text and vector content. You can use an existing index, but it must meet the criteria for [agentic retrieval workloads](https://learn.microsoft.com/azure/search/search-agentic-retrieval-how-to-index). The primary schema requirement is a semantic configuration with a `default_configuration_name`.

In [None]:
from azure.search.documents.indexes.models import SearchIndex, SearchField, VectorSearch, VectorSearchProfile, HnswAlgorithmConfiguration, AzureOpenAIVectorizer, AzureOpenAIVectorizerParameters, SemanticSearch, SemanticConfiguration, SemanticPrioritizedFields, SemanticField
from azure.search.documents.indexes import SearchIndexClient
from azure.identity import get_bearer_token_provider

azure_openai_token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
index = SearchIndex(
    name=index_name,
    fields=[
        SearchField(name="id", type="Edm.String", key=True, filterable=True, sortable=True, facetable=True),
        SearchField(name="page_chunk", type="Edm.String", filterable=False, sortable=False, facetable=False),
        SearchField(name="page_embedding_text_3_large", type="Collection(Edm.Single)", stored=False, vector_search_dimensions=3072, vector_search_profile_name="hnsw_text_3_large"),
        SearchField(name="page_number", type="Edm.Int32", filterable=True, sortable=True, facetable=True)
    ],
    vector_search=VectorSearch(
        profiles=[VectorSearchProfile(name="hnsw_text_3_large", algorithm_configuration_name="alg", vectorizer_name="azure_openai_text_3_large")],
        algorithms=[HnswAlgorithmConfiguration(name="alg")],
        vectorizers=[
            AzureOpenAIVectorizer(
                vectorizer_name="azure_openai_text_3_large",
                parameters=AzureOpenAIVectorizerParameters(
                    resource_url=aoai_endpoint,
                    deployment_name=aoai_embedding_deployment,
                    model_name=aoai_embedding_model
                )
            )
        ]
    ),
    semantic_search=SemanticSearch(
        default_configuration_name="semantic_config",
        configurations=[
            SemanticConfiguration(
                name="semantic_config",
                prioritized_fields=SemanticPrioritizedFields(
                    content_fields=[
                        SemanticField(field_name="page_chunk")
                    ]
                )
            )
        ]
    )
)

index_client = SearchIndexClient(endpoint=search_endpoint, credential=credential)
index_client.create_or_update_index(index)
print(f"Index '{index_name}' created or updated successfully.")

## Upload sample documents

This notebook uses data from NASA's Earth at Night e-book. The data is retrieved from the [azure-search-sample-data](https://github.com/Azure-Samples/azure-search-sample-data) repository on GitHub and passed to the search client for indexing.

In [None]:
import requests
from azure.search.documents import SearchIndexingBufferedSender

url = "https://raw.githubusercontent.com/Azure-Samples/azure-search-sample-data/refs/heads/main/nasa-e-book/earth-at-night-json/documents.json"
documents = requests.get(url).json()

with SearchIndexingBufferedSender(endpoint=search_endpoint, index_name=index_name, credential=credential) as client:
    client.upload_documents(documents=documents)

print(f"Documents uploaded to index '{index_name}' successfully.")

## Create a knowledge source

This step creates a knowledge source that targets the index you previously created. In the next step, you create a knowledge base that uses the knowledge source to orchestrate agentic retrieval.

In [None]:
from azure.search.documents.indexes.models import SearchIndexKnowledgeSource, SearchIndexKnowledgeSourceParameters, SearchIndexFieldReference
from azure.search.documents.indexes import SearchIndexClient

ks = SearchIndexKnowledgeSource(
    name=knowledge_source_name,
    description="Knowledge source for Earth at night data",
    search_index_parameters=SearchIndexKnowledgeSourceParameters(
        search_index_name=index_name,
        source_data_fields=[SearchIndexFieldReference(name="id"), SearchIndexFieldReference(name="page_number")]
    ),
)

index_client = SearchIndexClient(endpoint=search_endpoint, credential=credential)
index_client.create_or_update_knowledge_source(knowledge_source=ks)
print(f"Knowledge source '{knowledge_source_name}' created or updated successfully.")

## Create a knowledge base

This step creates a knowledge base, which acts as a wrapper for your knowledge source and LLM deployment.

`EXTRACTIVE_DATA` is the default modality and returns content from your knowledge sources without generative alteration. However, this quickstart uses the `ANSWER_SYNTHESIS` modality for LLM-generated answers that cite the retrieved content.

In [None]:
from azure.search.documents.indexes.models import KnowledgeBase, KnowledgeBaseAzureOpenAIModel, KnowledgeSourceReference, AzureOpenAIVectorizerParameters, KnowledgeRetrievalOutputMode, KnowledgeRetrievalLowReasoningEffort
from azure.search.documents.indexes import SearchIndexClient

aoai_params = AzureOpenAIVectorizerParameters(
    resource_url=aoai_endpoint,
    deployment_name=aoai_gpt_deployment,
    model_name=aoai_gpt_model,
)

knowledge_base = KnowledgeBase(
    name=knowledge_base_name,
    models=[KnowledgeBaseAzureOpenAIModel(azure_open_ai_parameters=aoai_params)],
    knowledge_sources=[
        KnowledgeSourceReference(
            name=knowledge_source_name
        )
    ],
    output_mode=KnowledgeRetrievalOutputMode.ANSWER_SYNTHESIS,
    answer_instructions="Provide a 2 sentence concise and informative answer based on the retrieved documents."
)

index_client = SearchIndexClient(endpoint=search_endpoint, credential=credential)
index_client.create_or_update_knowledge_base(knowledge_base)
print(f"Knowledge base '{knowledge_base_name}' created or updated successfully.")

## Set up messages

Messages are the input for the retrieval route and contain the conversation history. Each message includes a `role` that indicates its origin, such as `system` or `user`, and `content` in natural language. The LLM you use determines which roles are valid.

In [None]:
instructions = """
A Q&A agent that can answer questions about the Earth at night.
If you don't have the answer, respond with "I don't know".
"""

messages = [
    {
        "role": "system",
        "content": instructions
    }
]

## Use agentic retrieval to fetch results

This step runs the agentic retrieval pipeline to produce a grounded, citation-backed answer. Given the conversation history and retrieval parameters, your knowledge base:

1. Analyzes the entire conversation to infer the user's information need.

1. Decomposes the compound query into focused subqueries.

1. Runs the subqueries concurrently against your knowledge source.

1. Uses semantic ranker to rerank and filter the results.

1. Synthesizes the top results into a natural-language answer.

In [None]:
from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient
from azure.search.documents.knowledgebases.models import KnowledgeBaseRetrievalRequest, KnowledgeBaseMessage, KnowledgeBaseMessageTextContent, SearchIndexKnowledgeSourceParams

agent_client = KnowledgeBaseRetrievalClient(endpoint=search_endpoint, knowledge_base_name=knowledge_base_name, credential=credential)
query_1 = """
    Why do suburban belts display larger December brightening than urban cores even though absolute light levels are higher downtown?
    Why is the Phoenix nighttime street grid is so sharply visible from space, whereas large stretches of the interstate between midwestern cities remain comparatively dim?
    """

messages.append({
    "role": "user",
    "content": query_1
})

req = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role=m["role"],
            content=[KnowledgeBaseMessageTextContent(text=m["content"])]
        ) for m in messages if m["role"] != "system"
    ],
    knowledge_source_params=[
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name=knowledge_source_name,
            include_references=True,
            include_reference_source_data=True,
            always_query_source=True
        )
    ],
    include_activity=True,
    retrieval_reasoning_effort=KnowledgeRetrievalLowReasoningEffort
)

result = agent_client.retrieve(retrieval_request=req)
print(f"Retrieved content from '{knowledge_base_name}' successfully.")

### Review the retrieval response, activity, and results

Because your knowledge base is configured for answer synthesis, the retrieval response contains the following values:

+ `response_contents`: An LLM-generated answer to the query that cites the retrieved documents.

+ `activity_contents`: Detailed planning and execution information, including subqueries, reranking decisions, and intermediate steps.

+ `references_contents`: Source documents and chunks that contributed to the answer.

**Tip:** Retrieval parameters, such as reranker thresholds and knowledge source parameters, influence how aggressively your agent reranks and which sources it queries. Inspect the activity and references to validate grounding and build traceable citations.

In [None]:
response_contents = []
activity_contents = []
references_contents = []

In [None]:
import json

# Build simple string values for response_content, activity_content, and references_content

# Responses -> Concatenate text/value fields from all response contents
response_parts = []
for resp in result.response:
    for content in resp.content:
        response_parts.append(content.text)
response_content = "\n\n".join(response_parts) if response_parts else "No response found on 'result'"

response_contents.append(response_content)

# Print the three string values
print("response_content:\n", response_content, "\n")

In [None]:
messages.append({
    "role": "assistant",
    "content": response_content
})

In [None]:
# Activity -> JSON string of activity as list of dicts
if result.activity:
    activity_content = json.dumps([a.as_dict() for a in result.activity], indent=2)
else:
    activity_content = "No activity found on 'result'"
    
activity_contents.append(activity_content)
print("activity_content:\n", activity_content, "\n")

In [None]:
# References -> JSON string of references as list of dicts
if result.references:
    references_content = json.dumps([r.as_dict() for r in result.references], indent=2)
else:
    references_content = "No references found on 'result'"
    
references_contents.append(references_content)
print("references_content:\n", references_content)

## Continue the conversation

This step continues the conversation with your knowledge base, building upon the previous messages and queries to retrieve relevant information from your knowledge source.

In [None]:
query_2 = "How do I find lava at night?"
messages.append({
    "role": "user",
    "content": query_2
})

req = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role=m["role"],
            content=[KnowledgeBaseMessageTextContent(text=m["content"])]
        ) for m in messages if m["role"] != "system"
    ],
    knowledge_source_params=[
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name=knowledge_source_name,
            include_references=True,
            include_reference_source_data=True,
            always_query_source=True
        )
    ],
    include_activity=True,
    retrieval_reasoning_effort=KnowledgeRetrievalLowReasoningEffort
)

result = agent_client.retrieve(retrieval_request=req)
print(f"Retrieved content from '{knowledge_base_name}' successfully.")

### Review the new retrieval response, activity, and results

In [None]:
import json

# Build simple string values for response_content, activity_content, and references_content

# Responses -> Concatenate text/value fields from all response contents
response_parts = []
for resp in result.response:
    for content in resp.content:
        response_parts.append(content.text)
response_content = "\n\n".join(response_parts) if response_parts else "No response found on 'result'"

response_contents.append(response_content)

# Print the three string values
print("response_content:\n", response_content, "\n")

In [None]:
# Activity -> JSON string of activity as list of dicts
if result.activity:
    activity_content = json.dumps([a.as_dict() for a in result.activity], indent=2)
else:
    activity_content = "No activity found on 'result'"
    
activity_contents.append(activity_content)
print("activity_content:\n", activity_content, "\n")

In [None]:
# References -> JSON string of references as list of dicts
if result.references:
    references_content = json.dumps([r.as_dict() for r in result.references], indent=2)
else:
    references_content = "No references found on 'result'"
    
references_contents.append(references_content)
print("references_content:\n", references_content)

## Run an evaluation with Microsoft Foundry

To evaluate the groundedness and relevance of the pipeline, run an evaluation with Microsoft Foundry. For more detailed guidance, see [Evaluate your generative AI application locally with the Azure AI Evaluation SDK (preview)](https://learn.microsoft.com/azure/ai-foundry/how-to/develop/evaluate-sdk).

### Prerequisites

+ The same [Microsoft Foundry project](https://learn.microsoft.com/azure/ai-foundry/how-to/create-projects) you used for agentic retrieval. Set `FOUNDRY_ENDPOINT` to your project endpoint in the `.env` file. You can find this endpoint in the [Microsoft Foundry portal](https://ai.azure.com/).

+ The `azure-ai-evaluation` package, which is already installed as part of the `requirements.txt` file.

In [None]:
# Load connections
from dotenv import load_dotenv
import os

load_dotenv(override=True)

foundry_endpoint = os.environ["FOUNDRY_ENDPOINT"]
aoai_api_version = os.environ["AOAI_API_VERSION"]

# Run the evaluation
from azure.ai.evaluation import AzureOpenAIModelConfiguration, GroundednessEvaluator, RelevanceEvaluator, evaluate
import json

evaluation_data = []
print("Preparing evaluation data...")
for q, r, g in zip([query_1, query_2], references_contents, response_contents):
    evaluation_data.append({
        "query": q,
        "response": g,
        "context": r,
    })

filename = "evaluation_data.jsonl"

with open(filename, "w") as f:
    for item in evaluation_data:
        f.write(json.dumps(item) + "\n")

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=aoai_endpoint,
    api_version=aoai_api_version,
    azure_deployment=aoai_gpt_model
)

# RAG triad metrics
groundedness = GroundednessEvaluator(model_config=model_config)
relevance = RelevanceEvaluator(model_config=model_config)

print("Starting evaluation...")
result = evaluate(
    data=filename,
    evaluators={
        "groundedness": groundedness,
        "relevance": relevance,
    },
    azure_ai_project=foundry_endpoint,
)

print("Evaluation complete.")
studio_url = result.get("studio_url")
print("For more information, go to the Foundry portal.") if studio_url else None

## Clean up objects and resources

If you no longer need Azure AI Search or Microsoft Foundry, delete the resources from your Azure subscription. You can also start over by deleting individual objects.

### Delete the knowledge base

In [None]:
from azure.search.documents.indexes import SearchIndexClient

index_client = SearchIndexClient(endpoint=search_endpoint, credential=credential)
index_client.delete_knowledge_base(knowledge_base_name)
print(f"Knowledge base '{knowledge_base_name}' deleted successfully.")

### Delete the knowledge source

In [None]:
from azure.search.documents.indexes import SearchIndexClient

index_client = SearchIndexClient(endpoint=search_endpoint, credential=credential)
index_client.delete_knowledge_source(knowledge_source=knowledge_source_name)
print(f"Knowledge source '{knowledge_source_name}' deleted successfully.")

### Delete the search index

In [None]:
from azure.search.documents.indexes import SearchIndexClient

index_client = SearchIndexClient(endpoint=search_endpoint, credential=credential)
index_client.delete_index(index_name)
print(f"Index '{index_name}' deleted successfully.")