<a href="https://colab.research.google.com/github/Pezzan/AI_workshop/blob/main/02-RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Workshop 2: Embeddings + Azure AI Search + RAG (keyless auth with `DefaultAzureCredential`)

## What you will build
You will build a minimal Retrieval-Augmented Generation (RAG) pipeline:

1. **Embed** a user question into a vector using an embedding model deployed in Azure AI Foundry.
2. **Retrieve** relevant passages from **Azure AI Search** using vector search (and optionally hybrid search).
3. **Generate** an answer with an LLM deployment using the retrieved passages as context.

---

## Why keyless auth?
Instead of putting API keys in notebooks, we authenticate with **Microsoft Entra ID** using `DefaultAzureCredential`:

- `DefaultAzureCredential` automatically tries multiple credential sources (environment, Azure CLI login, managed identity, etc.). :contentReference[oaicite:0]{index=0}
- For Azure OpenAI-style endpoints, the OpenAI Python SDK supports Entra auth via `azure_ad_token_provider` created with `get_bearer_token_provider`. :contentReference[oaicite:1]{index=1}
- For Azure AI Search, Microsoft docs show how to build keyless apps using `DefaultAzureCredential` once RBAC is enabled. :contentReference[oaicite:2]{index=2}

---

## RBAC prerequisites (important)
Your identity (user/service principal/managed identity) needs roles:

### Azure OpenAI / Foundry deployment
Assign **Cognitive Services OpenAI User** on the Azure OpenAI resource (or appropriate scope). :contentReference[oaicite:3]{index=3}

### Azure AI Search
Enable role-based access on the search service and assign roles such as **Search Index Data Reader** (for querying). :contentReference[oaicite:4]{index=4}


In [None]:
%pip -q install --upgrade openai azure-search-documents azure-core azure-identity azure-identity-broker

## Setup: configuration variables (no API keys)

You need **two** services configured:

### A) Azure AI Foundry / Azure OpenAI-style endpoint (for embeddings + chat)
You need:
- `AZURE_OPENAI_ENDPOINT`  
  Example: `https://YOUR_RESOURCE.openai.azure.com/`
- `AZURE_OPENAI_API_VERSION`
- Two deployment names:
  - `AZURE_OPENAI_CHAT_DEPLOYMENT` (your LLM deployment name)
  - `AZURE_OPENAI_EMBED_DEPLOYMENT` (your embedding deployment name)

Reminder: in Azure OpenAI-style APIs, the parameter is called `model`, but you pass your **deployment name**.

### B) Azure AI Search (for retrieval)
You need:
- `AZURE_SEARCH_ENDPOINT`  
  Example: `https://YOUR-SEARCH-SERVICE.search.windows.net`
- `AZURE_SEARCH_INDEX_NAME`

---

## Authentication options in Colab with `DefaultAzureCredential`

### Option 1 (recommended for workshops): Azure CLI device login
If you can run `az login` in the notebook, `DefaultAzureCredential` can pick up your Azure CLI session. :contentReference[oaicite:5]{index=5}

### Option 2: Service principal via environment variables
Set these environment variables (we can prompt for them in a secure way):
- `AZURE_TENANT_ID`
- `AZURE_CLIENT_ID`
- `AZURE_CLIENT_SECRET`

Then `DefaultAzureCredential` will use `EnvironmentCredential`.

> In production on Azure (Functions/VM/AKS), `DefaultAzureCredential` often uses Managed Identity automatically.


In [None]:
import os
from getpass import getpass
from azure.identity import DefaultAzureCredential

if not os.getenv("AZURE_CLIENT_SECRET"):
    os.environ["AZURE_CLIENT_SECRET"] = getpass("AZURE_CLIENT_SECRET: ")

os.environ["AZURE_TENANT_ID"] = "0f9e35db-544f-4f60-bdcc-5ea416e6dc70"
os.environ["AZURE_CLIENT_ID"] = "c473f6c5-2b24-44e5-b44f-b8897e21cbe5"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://unepazcdoopenaiprod01.openai.azure.com/"
os.environ["AZURE_OPENAI_DEPLOYMENT"] = "gpt-4o-mini"
os.environ["AZURE_OPENAI_API_VERSION"] = "2024-12-01-preview"
os.environ["AZURE_SEARCH_ENDPOINT"] = "https://unepazcdoaisearchprod01.search.windows.net"
os.environ["AZURE_SEARCH_INDEX_NAME"] = "environmentgpt-v2"
os.environ["AZURE_OPENAI_EMBED_DEPLOYMENT"] = "text-embedding-3-large"


credential = DefaultAzureCredential()


## Create SDK clients (keyless)

We will use:
- `AzureOpenAI` (OpenAI Python SDK) for:
  - embeddings
  - chat completions
  authenticated via Entra ID tokens using `get_bearer_token_provider`. :contentReference[oaicite:7]{index=7}

- `SearchClient` (Azure AI Search Python SDK) for retrieval,
  authenticated via `DefaultAzureCredential` after enabling RBAC on the search service. :contentReference[oaicite:8]{index=8}


In [None]:
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from openai import AzureOpenAI

from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential  # not used (kept for reference)

aoai = AzureOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_ad_token=credential.get_token("https://cognitiveservices.azure.com/.default").token,
)

search_client = SearchClient(
    endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],
    index_name=os.environ["AZURE_SEARCH_INDEX_NAME"],
    credential=credential
)

CHAT_DEPLOYMENT  = os.environ["AZURE_OPENAI_DEPLOYMENT"]
EMBED_DEPLOYMENT = os.environ["AZURE_OPENAI_EMBED_DEPLOYMENT"]


## Step 1: Turn text into a vector (embedding)

We embed the **query** (user question) so we can compare it to vectors stored in the search index.

### What comes back?
An embedding request returns a list of floats (a vector).
- The vector length (dimensions) depends on the embedding model.
- We store document chunk vectors at indexing time.
- At query time, we compute the query vector and ask Azure AI Search for nearest neighbors.

### Why keyless auth matters here
Because we used `DefaultAzureCredential` + `get_bearer_token_provider`, the OpenAI client will acquire and refresh Entra ID tokens automatically. :contentReference[oaicite:10]{index=10}


In [None]:
from typing import List

def embed_text(text: str) -> List[float]:
    """Create an embedding vector for a single text input."""
    if not text or not text.strip():
        raise ValueError("Text must be a non-empty string.")

    resp = aoai.embeddings.create(
        model=EMBED_DEPLOYMENT,  # Azure deployment name
        input=text
    )
    return resp.data[0].embedding

# Quick sanity test:
vec = embed_text("Hello embeddings!")
print("Vector length:", len(vec))
print("First 5 values:", vec[:5])


## Step 2: Retrieve relevant chunks from Azure AI Search (vector search)

We’ll do **vector search**:
1) compute the query embedding
2) ask Azure AI Search for the K nearest chunks

### Azure AI Search vector querying
Azure AI Search supports vector indexes and vector queries.
The query itself must be a vector, which is why we embed the question first. :contentReference[oaicite:11]{index=11}

### Keyless auth for Search
Microsoft docs describe using `DefaultAzureCredential` for keyless (RBAC-based) connections to Azure AI Search. :contentReference[oaicite:12]{index=12}

### You must match your index schema
This notebook assumes your index has:
- `contentVector` as the vector field
- `content` as the chunk text field
- optional metadata fields like `id`, `title`, `source`

If your fields differ, update `VECTOR_FIELD`, `TEXT_FIELD`, and `select=[...]`.


In [None]:
from azure.search.documents.models import VectorizedQuery

VECTOR_FIELD = "contentVector"   # <-- change to match your index
TEXT_FIELD   = "content"         # <-- change to match your index

def vector_search(query: str, k: int = 5):
    """Vector-only search: returns top-k docs by vector similarity."""
    qv = embed_text(query)

    vq = VectorizedQuery(
        vector=qv,
        k_nearest_neighbors=k,
        fields=VECTOR_FIELD
    )

    results = search_client.search(
        search_text=query,
        search_fields=['chunk'],
        top=k
    )

    return list(results)

hits = vector_search("Which is CO2 impact on the planet?", k=5)
for i, h in enumerate(hits, 1):
    print(f"\n--- Hit #{i} ---")
    print("id    :", h.get("id"))
    print("title :", h.get("title"))
    print("filename:", h.get("filename"))
    print("text  :", (h.get("chunk") or "")[:300], "...")


## Step 3: Generate an answer grounded in retrieved chunks

We will:
- keep a clean helper function for chat calls (`ask_llm`)
- build a `rag_answer` function that:
  1) retrieves top chunks from Azure AI Search
  2) formats them into a context block
  3) asks the LLM to answer using ONLY that context

### Why do we force “ONLY the provided context”?
Because we want the assistant to behave like a tool:
- grounded in enterprise documents
- transparent about sources (citations)
- willing to say “I don’t know” if retrieval didn’t find enough information


In [None]:
def ask_llm(messages, temperature: float = 0.2, max_tokens: int = 400) -> str:
    resp = aoai.chat.completions.create(
        model=CHAT_DEPLOYMENT,  # Azure deployment name
        messages=messages,
        temperature=temperature,
        max_tokens=max_tokens,
    )
    return resp.choices[0].message.content


def format_context(hits, max_chars_per_hit: int = 900) -> str:
    blocks = []
    for h in hits:
        doc_id = h.get("id", "unknown-id")
        title = h.get("title", "")
        source = h.get("filename", "")
        text = (h.get("chunk") or "")[:max_chars_per_hit]
        blocks.append(f"[doc_id: {doc_id} | title: {title} | source: {source}]\n{text}")
    return "\n\n---\n\n".join(blocks)


def rag_answer(question: str, k: int = 5, use_hybrid: bool = True) -> str:
    hits = vector_search(question, k=k)
    context = format_context(hits)

    system = (
        "You are a helpful assistant. Use ONLY the provided context to answer. "
        "If the context is insufficient, say you don't have enough information. "
        "Cite sources by doc_id in square brackets, e.g. [doc_id: 123]."
    )

    user = f"""QUESTION:
{question}

CONTEXT:
{context}

INSTRUCTIONS:
- Answer based only on CONTEXT.
- Add citations in the form [doc_id: ...].
"""

    return ask_llm(
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": user},
        ],
        temperature=0.2,
        max_tokens=450,
    )

print(rag_answer("Which is CO2 impact on the planet?", k=5))
