# Basic RAG (Retrieval-Augmented Generation) example with AI Foundry Project

This notebook demonstrates a basic RAG implementation using:
- Consuming assets of AI Foundry projects with instatiation of `azure-ai-projects`
- Text embedding and chat completion with `azure-ai-inference`
- Vector search with `azure-ai-search`

Our theme is space, so let's create a sample set of space facts,
embed them, store them in a search index, then do a query that retrieves
relevant knowledge, and pass them to an LLM to produce a final answer.

## Imports and setup

First, import the necessary libraries and do some setup

In [1]:
import os
import logging
import pathlib
from dotenv import load_dotenv

# Azure Identity
from azure.identity import DefaultAzureCredential
from azure.core.credentials import AzureKeyCredential

# Azure AI Foundry Project SDK
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import ConnectionType

# Azure AI Inference
from azure.ai.inference.models import UserMessage, SystemMessage

# Azure AI Search SDK
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    HnswParameters,
    HnswAlgorithmConfiguration,
    SearchableField,
    SearchField,
    SearchFieldDataType,
    SearchIndex,
    SemanticConfiguration,
    SemanticField,
    SemanticPrioritizedFields,
    SemanticSearch,
    SimpleField,
    VectorSearch,
    VectorSearchProfile,
)

In [2]:
# Suppress Azure SDK HTTP logging as it pollutes and makes it difficult to see use case code output
logging.getLogger("azure").setLevel(logging.ERROR)
logging.getLogger("azure.core.pipeline.policies.http_logging_policy").setLevel(logging.ERROR)

## Get our AI Foundry project connection string

In [3]:
# Load environment variables from .env file
# Look for .env in the current directory and parent directory
current_dir = pathlib.Path().absolute()
root_dir = current_dir.parent
load_dotenv(dotenv_path=root_dir / ".env")

# Get the project connection string from environment variables
project_connection_string = os.getenv("AZURE_AI_FOUNDRY_PROJECT_CONNECTION_STRING")
if not project_connection_string:
    raise ValueError("Please set AZURE_AI_FOUNDRY_PROJECT_CONNECTION_STRING in your .env file")

## Connect to the AI Foundry project
Now get credentials to authenticate and get handle to the AI Foundry project

In [4]:
try:
    credential = DefaultAzureCredential()
    print("✓ Successfully initialized DefaultAzureCredential")
except Exception as e:
    print(f"× Error initializing credentials: {str(e)}")

✓ Successfully initialized DefaultAzureCredential


In [5]:
try:
    project_client = AIProjectClient.from_connection_string(
        conn_str=os.getenv("AZURE_AI_FOUNDRY_PROJECT_CONNECTION_STRING"),
        credential=credential
    )
    print("✓ Successfully initialized AIProjectClient")
except Exception as e:
    print(f"× Error initializing project client: {str(e)}")

✓ Successfully initialized AIProjectClient


## Get handle to embedding model client and test

In [6]:
embedding_client = project_client.inference.get_embeddings_client()
print("✅ Created embeddings client.")

# Test the embedding client
embedding = embedding_client.embed(model="text-embedding-3-large", input="this is a test")
vector = embedding.data[0].embedding

# Save the dimensions size for later reference when configuring the index
embedding_dimensions = len(vector)

print(f"Vector dimension: {embedding_dimensions}")
print(f"First 10 elements: {vector[:10]}")

✅ Created embeddings client.
Vector dimension: 3072
First 10 elements: [-0.015253292, 0.0033125114, -0.007039981, 0.04884106, -0.014547386, -0.017113449, -0.017666727, 0.07356685, -0.02886583, 0.032433517]


## Get handle to the default Foundry project AI Search connection
This will be named AzureAISearch in your AI Foundry project connected resources.

In [7]:
search_connection = project_client.connections.get_default(
    connection_type=ConnectionType.AZURE_AI_SEARCH,
    include_credentials=True,
)

print(f"Connected to AI Search service: {search_connection.name}")
print(f"Endpoint: {search_connection.endpoint_url}")

Connected to AI Search service: AzureAISearch
Endpoint: https://srch-jpb-ai-core.search.windows.net


## Create AI Search clients for managing indexing and searching

In [16]:
# Define the index name
index_name = "space-facts"

# Create a client to manage search indexes
index_client = SearchIndexClient(
    endpoint=search_connection.endpoint_url,
    credential=AzureKeyCredential(key=search_connection.key)
)

# Create a client to run search queries
search_client = SearchClient(
    index_name=index_name,
    endpoint=search_connection.endpoint_url,
    credential=AzureKeyCredential(key=search_connection.key)
)

## Delete index if enabled
Helps with iterative testing

In [17]:
delete_index_if_exists = True

if delete_index_if_exists:
    try:
        index_client.delete_index(index_name)
        print(f"Index '{index_name}' deleted successfully.")
    except Exception as e:
        print(f"Index '{index_name}' does not exist or could not be deleted: {str(e)}")

Index 'space-facts' deleted successfully.


## Check if index exists and create if needed

Check if the *customer* index already exists, and create if not. Index based on the following schema and configuration

In [18]:
# Check if the index exists
index_exists = index_name in [index.name for index in index_client.list_indexes()]

if index_exists:
    print(f"Index '{index_name}' already exists.")
else:
    print(f"Index '{index_name}' does not exist. Creating...")
    
    # index schema definition
    fields = [
        SimpleField(name="id", type=SearchFieldDataType.String, key=True, filterable=True, sortable=True),
        SearchableField(name="content", type=SearchFieldDataType.String),
        SimpleField(name="category", type=SearchFieldDataType.String, filterable=True, sortable=True, facetable=True),
        SearchField(
            name="content_vector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            hidden=False,
            searchable=True,
            filterable=False,
            sortable=False,
            facetable=False,
            vector_search_dimensions=embedding_dimensions,
            vector_search_profile_name="embedding_config",
        )
    ]
    
    # Define vectorizers (empty for now as not using integrated vectorization)
    vectorizers = []
    use_int_vectorization = False
    
    # Create the index with vector search and semantic search
    index = SearchIndex(
        name=index_name,
        fields=fields,
        semantic_search=SemanticSearch(
            configurations=[
                SemanticConfiguration(
                    name="default",
                    prioritized_fields=SemanticPrioritizedFields(
                        title_field=None, 
                        content_fields=[SemanticField(field_name="content")]
                    ),
                )
            ]
        ),
        vector_search=VectorSearch(
            algorithms=[
                HnswAlgorithmConfiguration(
                    name="hnsw_config",
                    parameters=HnswParameters(metric="cosine"),
                )
            ],
            profiles=[
                VectorSearchProfile(
                    name="embedding_config",
                    algorithm_configuration_name="hnsw_config",
                    vectorizer_name=(
                        f"{index_name}-vectorizer" if use_int_vectorization else None
                    ),
                ),
            ],
            vectorizers=vectorizers,
        ),
    )
    
    result = index_client.create_index(index)
    print(f"Index '{index_name}' created successfully.")

Index 'space-facts' does not exist. Creating...
Index 'space-facts' created successfully.


## Create sample space facts data

Generate a small dataset of space facts that we'll use for our RAG system.

In [19]:
space_facts = [
    {
        "id": "doc1",
        "content": "The Great Red Spot on Jupiter is a storm that has been raging for at least 400 years and is large enough to fit three Earths inside it.",
        "category": "Solar System Facts"
    },
    {
        "id": "doc2",
        "content": "Saturn's rings are made primarily of ice particles, with a small amount of rocky debris and dust. Despite their massive appearance, they're only about 10 meters thick in most places.",
        "category": "Solar System Facts"
    },
    {
        "id": "doc3",
        "content": "An event horizon is the boundary around a black hole beyond which nothing, not even light, can escape its gravitational pull.",
        "category": "Astrophysics Concepts"
    },
    {
        "id": "doc4",
        "content": "The universe is expanding at an accelerating rate due to a mysterious force scientists call dark energy, which makes up roughly 68% of the universe.",
        "category": "Cosmology"
    },
    {
        "id": "doc5",
        "content": "Neutron stars are so dense that a teaspoon of their material would weigh about a billion tons on Earth, equivalent to the weight of a mountain.",
        "category": "Stellar Objects"
    },
    {
        "id": "doc6",
        "content": "Time dilation, predicted by Einstein's theory of relativity, means that time actually passes slower for objects in stronger gravitational fields or moving at high speeds.",
        "category": "Physics Concepts"
    },
    {
        "id": "doc7",
        "content": "Venus rotates on its axis in the opposite direction to most planets, meaning the sun rises in the west and sets in the east. Its day is also longer than its year.",
        "category": "Solar System Facts"
    },
    {
        "id": "doc8",
        "content": "Quantum entanglement allows particles to be connected in such a way that the quantum state of each particle cannot be described independently of the others, regardless of distance.",
        "category": "Quantum Physics"
    },
    {
        "id": "doc9",
        "content": "The Andromeda Galaxy is on a collision course with our Milky Way. The galaxies will begin to merge in about 4.5 billion years, forming a new galaxy astronomers have nicknamed 'Milkomeda'.",
        "category": "Galactic Facts"
    },
    {
        "id": "doc10",
        "content": "A white dwarf star is what remains after a sun-like star has exhausted its nuclear fuel. These stellar remnants are incredibly dense, packing a sun's worth of material into an Earth-sized sphere.",
        "category": "Stellar Evolution"
    },
    {
        "id": "doc11",
        "content": "The Oort Cloud is a theoretical cloud of predominantly icy objects that surrounds the Sun at distances ranging from 2,000 to 100,000 astronomical units. It's believed to be the source of long-period comets.",
        "category": "Solar System Facts"
    },
    {
        "id": "doc12",
        "content": "Pulsars are rapidly rotating neutron stars that emit beams of electromagnetic radiation from their poles. As they spin, these beams sweep across Earth like cosmic lighthouses, appearing to pulse.",
        "category": "Stellar Objects"
    },
    {
        "id": "doc13",
        "content": "The cosmic microwave background radiation is the afterglow of the Big Bang, filling all space. It is the oldest light in the universe, dating to about 380,000 years after the Big Bang.",
        "category": "Cosmology"
    },
    {
        "id": "doc14",
        "content": "Enceladus, a moon of Saturn, has geysers that shoot plumes of water ice into space. This suggests the presence of a subsurface ocean that could potentially harbor life.",
        "category": "Solar System Facts"
    },
    {
        "id": "doc15",
        "content": "Gravitational lensing occurs when a massive object bends the light from objects behind it, acting like a lens. This effect helps astronomers see distant galaxies that would otherwise be too faint to detect.",
        "category": "Astrophysics Concepts"
    },
    {
        "id": "doc16",
        "content": "The quantum tunneling phenomenon allows particles to pass through barriers that should be impenetrable according to classical physics. This process is essential for nuclear fusion in stars.",
        "category": "Quantum Physics"
    },
    {
        "id": "doc17",
        "content": "Europa, one of Jupiter's moons, has more liquid water than all of Earth's oceans combined, hidden beneath its icy surface. This makes it one of the most promising places to look for extraterrestrial life.",
        "category": "Solar System Facts"
    },
    {
        "id": "doc18",
        "content": "A supernova can briefly outshine an entire galaxy and radiate more energy than our sun will emit over its entire lifespan. These stellar explosions are responsible for creating many of the elements in the universe.",
        "category": "Stellar Evolution"
    },
    {
        "id": "doc19",
        "content": "The multiverse theory suggests that our universe is just one of many universes, each with potentially different laws of physics. These parallel universes may be impossible to detect directly.",
        "category": "Theoretical Physics"
    },
    {
        "id": "doc20",
        "content": "The Chandrasekhar limit (about 1.4 solar masses) determines the fate of a white dwarf star. Exceeding this mass triggers a supernova explosion, as electron degeneracy pressure can no longer support the star.",
        "category": "Stellar Physics"
    },
    {
        "id": "doc21",
        "content": "Mars' Olympus Mons is the tallest mountain in the solar system, standing at approximately 22 km (13.6 miles) high. That's nearly three times the height of Mount Everest.",
        "category": "Solar System Facts"
    },
    {
        "id": "doc22",
        "content": "Hawking radiation is a theoretical process by which black holes slowly lose mass and eventually evaporate. This contradicts the classical view that nothing, including energy, can escape a black hole.",
        "category": "Astrophysics Concepts"
    },
    {
        "id": "doc23",
        "content": "The observable universe contains an estimated 2 trillion galaxies, each containing billions to trillions of stars. This is only the portion of the universe we can theoretically observe from Earth.",
        "category": "Cosmology"
    },
    {
        "id": "doc24",
        "content": "The James Webb Space Telescope can observe objects formed just 100-250 million years after the Big Bang, revealing the earliest galaxies and potentially observing the formation of the first stars.",
        "category": "Astronomy Technology"
    },
    {
        "id": "doc25",
        "content": "Baryonic matter—the ordinary matter that makes up stars, planets, and people—accounts for only about 5% of the universe's total mass-energy content. The rest is dark matter and dark energy.",
        "category": "Cosmology"
    },
    {
        "id": "doc26",
        "content": "Quasars are among the brightest objects in the universe, powered by supermassive black holes actively feeding on surrounding matter. They can outshine entire galaxies despite being relatively compact.",
        "category": "Galactic Objects"
    },
    {
        "id": "doc27",
        "content": "The concept of spacetime, introduced by Hermann Minkowski and incorporated into Einstein's relativity, unifies three-dimensional space and time into a single four-dimensional continuum.",
        "category": "Physics Concepts"
    },
    {
        "id": "doc28",
        "content": "A spaghettification is the theoretical stretching and compressing of objects into long thin shapes when caught in extremely strong gravitational fields, such as those found near black holes.",
        "category": "Astrophysics Concepts"
    },
    {
        "id": "doc29",
        "content": "Gamma-ray bursts are the most energetic explosions observed in the universe, releasing as much energy in seconds as the Sun will in its entire 10-billion-year lifetime.",
        "category": "Cosmic Phenomena"
    },
    {
        "id": "doc30",
        "content": "Planetesimals are solid objects thought to exist in protoplanetary disks and in debris disks. These rocky or icy bodies, ranging from several meters to hundreds of kilometers in size, are building blocks of planets.",
        "category": "Planetary Formation"
    }
]

## Generate embeddings for space facts and upload to search index

In [20]:
# Generate embeddings for each customer's profile text
for fact in space_facts:
    content = fact['content']
    embedding_response = embedding_client.embed(model="text-embedding-3-large", input=content)
    fact['content_vector'] = embedding_response.data[0].embedding
    
print("Added vector embeddings to customer profiles.")
print(f"Vector dimension: {len(space_facts[0]['content_vector'])}")

Added vector embeddings to customer profiles.
Vector dimension: 3072


## Upload Documents to the Index

In [21]:
# Upload documents to the index
result = search_client.upload_documents(documents=space_facts)
print(f"Uploaded {len(result)} documents to the index.")

# Check the results
for i, success in enumerate(result):
    print(f"Document {i+1} {'succeeded' if success else 'failed'}")

Uploaded 30 documents to the index.
Document 1 succeeded
Document 2 succeeded
Document 3 succeeded
Document 4 succeeded
Document 5 succeeded
Document 6 succeeded
Document 7 succeeded
Document 8 succeeded
Document 9 succeeded
Document 10 succeeded
Document 11 succeeded
Document 12 succeeded
Document 13 succeeded
Document 14 succeeded
Document 15 succeeded
Document 16 succeeded
Document 17 succeeded
Document 18 succeeded
Document 19 succeeded
Document 20 succeeded
Document 21 succeeded
Document 22 succeeded
Document 23 succeeded
Document 24 succeeded
Document 25 succeeded
Document 26 succeeded
Document 27 succeeded
Document 28 succeeded
Document 29 succeeded
Document 30 succeeded


## Implement RAG Chat Function

Create a function that implements the RAG pattern:
1. Embed the user query
2. Search the vector index with that embedding to get top docs
3. Generate an answer using the retrieved docs

In [22]:
def rag_chat(query, embeddings_client, search_client, project_client, chat_model, top_k=3):
    """
    Implement the RAG pattern:
    1. Embed user query
    2. Search vector index with that embedding to get top docs
    3. Generate answer using retrieved docs
    """

    user_query_vector = embeddings_client.embed(model="text-embedding-3-large", input=[query]).data[0].embedding

    results = search_client.search(
        search_text="",
        vector_queries=[{
            "vector": user_query_vector,
            "fields": "content_vector",
            "k": 5,
            "kind": "vector"  
        }],
    )

    top_docs_content = []
    for res in results:
        top_docs_content.append(f"Source: {res['category']} => {res['content']}")

    system_message = (
        "You are a space facts assistant.\n"
        "Answer user questions using ONLY the text from these docs.\n"
        "Docs:\n"
        + "\n".join(top_docs_content)
        + "\nIf unsure, say 'I'm not sure'.\n"
    )

    with project_client.inference.get_chat_completions_client() as chat_client:
        response = chat_client.complete(
            model=chat_model,
            messages=[
                SystemMessage(content=system_message),
                UserMessage(content=query)
            ]
        )
    return response.choices[0].message.content

## Test

Let's try our RAG system with an example query.

In [23]:
# Example query
user_query = "What's the multiverse theory?"
chat_model = "gpt-4o-mini"

# Get the RAG answer
answer = rag_chat(
    query=user_query,
    embeddings_client=embedding_client,
    search_client=search_client,
    project_client=project_client,
    chat_model=chat_model
)

print("🗣️ User Query:", user_query)
print("🤖 RAG Answer:", answer)

🗣️ User Query: What's the multiverse theory?
🤖 RAG Answer: The multiverse theory suggests that our universe is just one of many universes, each with potentially different laws of physics. These parallel universes may be impossible to detect directly.


In [24]:
# Example query
user_query = "Explain Gamma-ray bursts"
chat_model = "gpt-4o-mini"

# Get the RAG answer
answer = rag_chat(
    query=user_query,
    embeddings_client=embedding_client,
    search_client=search_client,
    project_client=project_client,
    chat_model=chat_model
)

print("🗣️ User Query:", user_query)
print("🤖 RAG Answer:", answer)

🗣️ User Query: Explain Gamma-ray bursts
🤖 RAG Answer: Gamma-ray bursts are the most energetic explosions observed in the universe, releasing as much energy in seconds as the Sun will in its entire 10-billion-year lifetime.


In [25]:
# Example query
user_query = "How do you make bread and butter pudding?"
chat_model = "gpt-4o-mini"

# Get the RAG answer
answer = rag_chat(
    query=user_query,
    embeddings_client=embedding_client,
    search_client=search_client,
    project_client=project_client,
    chat_model=chat_model
)

print("🗣️ User Query:", user_query)
print("🤖 RAG Answer:", answer)

🗣️ User Query: How do you make bread and butter pudding?
🤖 RAG Answer: I'm not sure.
