# Agentic RAG (Retrieval-Augmented Generation)

This notebook demonstrates how to build an **Agentic RAG system** that combines retrieval-augmented generation with autonomous agent capabilities. Unlike traditional RAG systems that follow a simple retrieve-then-generate pattern, agentic RAG systems can dynamically decide when and how to retrieve information, reformulate queries, and iteratively refine their responses.

## Table of Contents
1. [Introduction to Agentic RAG](#introduction)
2. [Environment Setup](#setup)
3. [Traditional RAG vs. Agentic RAG](#comparison)
4. [Building an Agentic RAG System](#building)
5. [Advanced Agentic RAG Patterns](#advanced)
6. [Summary and Best Practices](#summary)

## 1. Introduction to Agentic RAG <a id="introduction"></a>

### What is Agentic RAG?

**Agentic RAG** extends traditional Retrieval-Augmented Generation by giving the system autonomous decision-making capabilities:

- **Dynamic Query Planning**: The agent decides when and what to retrieve
- **Self-Reflection**: The agent evaluates whether retrieved information is sufficient
- **Query Refinement**: The agent reformulates queries to get better results
- **Multi-Step Reasoning**: The agent breaks down complex questions into sub-queries
- **Adaptive Retrieval**: The agent adjusts retrieval strategies based on context

### Key Differences from Traditional RAG

| Traditional RAG | Agentic RAG |
|----------------|-------------|
| Fixed retrieve-then-generate flow | Dynamic decision-making about retrieval |
| Single retrieval step | Multiple adaptive retrieval steps |
| No query reformulation | Intelligent query refinement |
| Limited error handling | Self-correction and re-retrieval |
| Static context usage | Context-aware strategy selection |

### When to Use Agentic RAG

Agentic RAG is ideal for:
- Complex multi-step questions requiring information synthesis
- Scenarios where initial retrieval may be insufficient
- Cases requiring query disambiguation or refinement
- Applications needing high-quality, verified responses
- Systems that benefit from explanatory reasoning traces

## 2. Environment Setup <a id="setup"></a>

First, let's set up our connection to Azure AI Foundry and create a simple in-memory knowledge base for demonstration.

In [None]:
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv
import os
import json
from typing import List, Dict, Any

# Load environment variables
load_dotenv()

# Initialize the AI Project client
project_client = AIProjectClient(
    endpoint=os.environ["PROJECT_ENDPOINT"],
    credential=DefaultAzureCredential()
)

# Get the OpenAI client
chat = project_client.get_openai_client()
model = os.environ["MODEL"]

print(f"Connected to Azure AI Foundry")
print(f"Using model: {model}")

In [None]:
# Create a simple in-memory knowledge base for demonstration
# In production, this would be Azure AI Search, vector database, etc.

knowledge_base = [
    {
        "id": "doc1",
        "title": "Azure AI Foundry Overview",
        "content": "Azure AI Foundry is a comprehensive platform for building, deploying, and managing AI applications. It provides tools for prompt engineering, model deployment, evaluation, and monitoring.",
        "category": "platform"
    },
    {
        "id": "doc2",
        "title": "RAG Systems Explained",
        "content": "Retrieval-Augmented Generation (RAG) combines information retrieval with text generation. It retrieves relevant documents from a knowledge base and uses them to generate accurate, grounded responses.",
        "category": "rag"
    },
    {
        "id": "doc3",
        "title": "Agent Frameworks",
        "content": "Microsoft Agent Framework enables building autonomous agents that can use tools, make decisions, and execute complex workflows. Agents can plan, reflect, and adapt their behavior based on context.",
        "category": "agents"
    },
    {
        "id": "doc4",
        "title": "Vector Embeddings",
        "content": "Vector embeddings are numerical representations of text that capture semantic meaning. They enable similarity search by finding documents with similar meaning rather than just matching keywords.",
        "category": "embeddings"
    },
    {
        "id": "doc5",
        "title": "Prompt Engineering Best Practices",
        "content": "Effective prompt engineering involves being specific, providing context, using examples (few-shot learning), and structuring prompts clearly. Chain-of-thought prompting helps with complex reasoning tasks.",
        "category": "prompts"
    },
    {
        "id": "doc6",
        "title": "Azure AI Search Integration",
        "content": "Azure AI Search provides powerful indexing and retrieval capabilities. It supports vector search, hybrid search, and semantic ranking for improved relevance in RAG applications.",
        "category": "search"
    },
    {
        "id": "doc7",
        "title": "Model Evaluation Techniques",
        "content": "Evaluating AI models involves metrics like relevance, groundedness, coherence, and fluency. Azure AI Foundry provides built-in evaluation tools to measure model performance systematically.",
        "category": "evaluation"
    },
    {
        "id": "doc8",
        "title": "Agent Tool Usage",
        "content": "Agents can use various tools including search APIs, calculators, code interpreters, and custom functions. Tool selection and usage is a key capability that distinguishes agents from simple chatbots.",
        "category": "agents"
    }
]

print(f"Loaded {len(knowledge_base)} documents into knowledge base")
print("\nSample documents:")
for doc in knowledge_base[:3]:
    print(f"- {doc['title']} ({doc['category']})")

In [None]:
# Helper function for simple keyword-based retrieval (simplified for demo)
# In production, use semantic search with embeddings

def simple_retrieve(query: str, top_k: int = 3) -> List[Dict[str, Any]]:
    """Simple keyword-based retrieval for demonstration"""
    query_lower = query.lower()
    scored_docs = []
    
    for doc in knowledge_base:
        # Simple scoring based on keyword matches
        score = 0
        content_lower = (doc['title'] + ' ' + doc['content']).lower()
        
        for word in query_lower.split():
            if len(word) > 3:  # Only count meaningful words
                score += content_lower.count(word)
        
        scored_docs.append((score, doc))
    
    # Sort by score and return top_k
    scored_docs.sort(reverse=True, key=lambda x: x[0])
    return [doc for score, doc in scored_docs[:top_k] if score > 0]

# Test retrieval
test_query = "How does RAG work?"
retrieved = simple_retrieve(test_query)
print(f"Retrieved {len(retrieved)} documents for query: '{test_query}'")
for doc in retrieved:
    print(f"- {doc['title']}")

## 3. Traditional RAG vs. Agentic RAG <a id="comparison"></a>

Let's compare how traditional RAG and agentic RAG handle the same question.

In [None]:
# Traditional RAG: Simple retrieve-then-generate

def traditional_rag(query: str) -> str:
    """Traditional RAG: retrieve once, then generate"""
    # Step 1: Retrieve relevant documents
    retrieved_docs = simple_retrieve(query, top_k=2)
    
    # Step 2: Build context from retrieved documents
    context = "\n\n".join([f"{doc['title']}:\n{doc['content']}" for doc in retrieved_docs])
    
    # Step 3: Generate response using context
    messages = [
        {"role": "system", "content": "You are a helpful assistant. Use the provided context to answer questions accurately."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}\n\nAnswer based on the context provided:"}
    ]
    
    response = chat.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.3,
        max_tokens=300
    )
    
    return response.choices[0].message.content

# Test traditional RAG
question = "How can I build an intelligent agent that uses search capabilities?"
print("Traditional RAG Response:")
print("=" * 80)
response = traditional_rag(question)
print(response)
print("\n" + "=" * 80)

In [None]:
# Agentic RAG: Intelligent multi-step retrieval with reasoning

def agentic_rag(query: str, max_iterations: int = 3) -> Dict[str, Any]:
    """
    Agentic RAG with self-reflection and iterative refinement.
    Returns both the answer and the reasoning trace.
    """
    reasoning_trace = []
    all_retrieved_docs = []
    
    # Step 1: Agent analyzes the query and plans retrieval strategy
    planning_prompt = f"""Analyze this question and break it down into key information needs:
Question: {query}

Provide:
1. Main topic/concept
2. Specific information required
3. Suggested search queries (2-3 focused queries)

Format your response as JSON with keys: main_topic, info_needed, search_queries"""
    
    planning_response = chat.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": planning_prompt}],
        temperature=0.3,
        max_tokens=300
    )
    
    plan_text = planning_response.choices[0].message.content
    reasoning_trace.append({"step": "planning", "content": plan_text})
    
    # Extract search queries (simplified - in production use JSON parsing)
    # For demo, we'll use the original query and a refined version
    search_queries = [query, query.replace("?", "")]
    
    # Step 2: Iterative retrieval with self-reflection
    for iteration in range(max_iterations):
        # Retrieve documents
        for sq in search_queries[:1]:  # Use first query for iteration
            docs = simple_retrieve(sq, top_k=2)
            all_retrieved_docs.extend(docs)
        
        # Remove duplicates
        seen_ids = set()
        unique_docs = []
        for doc in all_retrieved_docs:
            if doc['id'] not in seen_ids:
                unique_docs.append(doc)
                seen_ids.add(doc['id'])
        all_retrieved_docs = unique_docs
        
        # Build context
        context = "\n\n".join([f"{doc['title']}:\n{doc['content']}" for doc in all_retrieved_docs])
        
        # Step 3: Self-reflection - is the information sufficient?
        reflection_prompt = f"""Context:
{context}

Question: {query}

Evaluate if the context provides sufficient information to answer the question comprehensively.
Response with ONLY 'SUFFICIENT' or 'INSUFFICIENT: <reason>'"""
        
        reflection_response = chat.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": reflection_prompt}],
            temperature=0.2,
            max_tokens=100
        )
        
        reflection = reflection_response.choices[0].message.content
        reasoning_trace.append({"step": f"reflection_{iteration+1}", "content": reflection})
        
        if "SUFFICIENT" in reflection.upper():
            break
    
    # Step 4: Generate final answer with reasoning
    final_prompt = f"""Context:
{context}

Question: {query}

Provide a comprehensive answer based on the context. Be specific and cite relevant information."""
    
    final_response = chat.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": final_prompt}],
        temperature=0.3,
        max_tokens=400
    )
    
    answer = final_response.choices[0].message.content
    
    return {
        "answer": answer,
        "reasoning_trace": reasoning_trace,
        "retrieved_docs": [doc['title'] for doc in all_retrieved_docs],
        "num_iterations": len([t for t in reasoning_trace if 'reflection' in t['step']])
    }

# Test agentic RAG
print("Agentic RAG Response:")
print("=" * 80)
result = agentic_rag(question)
print(f"Answer:\n{result['answer']}")
print(f"\nRetrieved Documents: {', '.join(result['retrieved_docs'])}")
print(f"Iterations: {result['num_iterations']}")
print("\n" + "=" * 80)

**Key Observations:**

- **Traditional RAG**: Single retrieval, direct answer generation
- **Agentic RAG**: Query planning, iterative retrieval, self-reflection, comprehensive answer

The agentic approach provides:
1. Better query understanding and decomposition
2. Self-verification of information sufficiency
3. Ability to retrieve additional information if needed
4. Transparent reasoning trace for debugging and trust

## 4. Building an Agentic RAG System <a id="building"></a>

Let's build a more sophisticated agentic RAG system with additional capabilities.

### 4.1 Query Decomposition

For complex questions, the agent breaks them into sub-queries.

In [None]:
def decompose_query(complex_query: str) -> List[str]:
    """Break down complex queries into simpler sub-queries"""
    decomposition_prompt = f"""Break down this complex question into 2-3 simpler sub-questions that, when answered, would provide a complete answer to the original question.

Complex Question: {complex_query}

Provide sub-questions as a numbered list."""
    
    response = chat.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": decomposition_prompt}],
        temperature=0.3,
        max_tokens=200
    )
    
    decomposition = response.choices[0].message.content
    
    # Parse sub-questions (simplified)
    sub_queries = []
    for line in decomposition.split('\n'):
        line = line.strip()
        if line and (line[0].isdigit() or line.startswith('-')):
            # Remove numbering and clean up
            clean_line = line.lstrip('0123456789.-) ').strip()
            if clean_line:
                sub_queries.append(clean_line)
    
    return sub_queries if sub_queries else [complex_query]

# Test query decomposition
complex_q = "What are the key differences between traditional RAG and agentic RAG, and how do vector embeddings improve retrieval quality?"
print(f"Original Question:\n{complex_q}\n")
print("=" * 80)
print("Decomposed Sub-Questions:")
sub_qs = decompose_query(complex_q)
for i, sq in enumerate(sub_qs, 1):
    print(f"{i}. {sq}")
print("=" * 80)

### 4.2 Query Reformulation

The agent can reformulate queries to improve retrieval quality.

In [None]:
def reformulate_query(original_query: str, context: str = "") -> List[str]:
    """Generate alternative formulations of a query for better retrieval"""
    reformulation_prompt = f"""Generate 2-3 alternative phrasings of this question that might retrieve different relevant information.

Original Question: {original_query}
{f'Previous Context: {context}' if context else ''}

Provide alternatives as a numbered list. Make them specific and diverse."""
    
    response = chat.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": reformulation_prompt}],
        temperature=0.5,
        max_tokens=200
    )
    
    reformulations_text = response.choices[0].message.content
    
    # Parse reformulations
    reformulations = [original_query]  # Include original
    for line in reformulations_text.split('\n'):
        line = line.strip()
        if line and (line[0].isdigit() or line.startswith('-')):
            clean_line = line.lstrip('0123456789.-) ').strip()
            if clean_line:
                reformulations.append(clean_line)
    
    return reformulations[:4]  # Original + 3 alternatives

# Test query reformulation
test_q = "How do agents use tools?"
print(f"Original Query: {test_q}\n")
print("=" * 80)
print("Reformulated Queries:")
reformulated = reformulate_query(test_q)
for i, rq in enumerate(reformulated, 1):
    print(f"{i}. {rq}")
print("=" * 80)

### 4.3 Complete Agentic RAG with All Features

Now let's put it all together into a comprehensive agentic RAG system.

In [None]:
def advanced_agentic_rag(query: str, verbose: bool = True) -> Dict[str, Any]:
    """
    Advanced Agentic RAG with:
    - Query decomposition for complex questions
    - Query reformulation for better retrieval
    - Iterative retrieval with self-reflection
    - Reasoning trace for transparency
    """
    trace = []
    all_docs = []
    
    # Step 1: Decompose complex query
    if verbose:
        print("Step 1: Query Decomposition")
        print("-" * 80)
    
    sub_queries = decompose_query(query)
    trace.append({"step": "decomposition", "sub_queries": sub_queries})
    
    if verbose:
        for i, sq in enumerate(sub_queries, 1):
            print(f"  {i}. {sq}")
        print()
    
    # Step 2: For each sub-query, reformulate and retrieve
    if verbose:
        print("Step 2: Query Reformulation & Retrieval")
        print("-" * 80)
    
    for i, sub_q in enumerate(sub_queries[:2]):  # Limit to 2 sub-queries for demo
        # Reformulate
        reformulated = reformulate_query(sub_q)
        if verbose:
            print(f"  Sub-query {i+1}: {sub_q}")
            print(f"  Reformulations: {len(reformulated)}")
        
        # Retrieve for each reformulation
        for ref_q in reformulated[:2]:  # Use first 2 reformulations
            docs = simple_retrieve(ref_q, top_k=2)
            all_docs.extend(docs)
        
        if verbose:
            print(f"  Retrieved {len(docs)} documents")
            print()
    
    # Remove duplicates
    seen_ids = set()
    unique_docs = []
    for doc in all_docs:
        if doc['id'] not in seen_ids:
            unique_docs.append(doc)
            seen_ids.add(doc['id'])
    
    trace.append({"step": "retrieval", "num_docs": len(unique_docs)})
    
    # Step 3: Self-reflection on information quality
    if verbose:
        print("Step 3: Self-Reflection")
        print("-" * 80)
    
    context = "\n\n".join([f"{doc['title']}:\n{doc['content']}" for doc in unique_docs])
    
    reflection_prompt = f"""Evaluate if the following context provides sufficient information to answer this question comprehensively.

Question: {query}

Context:
{context}

Provide:
1. Assessment (SUFFICIENT or INSUFFICIENT)
2. Brief reasoning
3. If insufficient, what information is missing?"""
    
    reflection_response = chat.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": reflection_prompt}],
        temperature=0.2,
        max_tokens=200
    )
    
    reflection = reflection_response.choices[0].message.content
    trace.append({"step": "reflection", "assessment": reflection})
    
    if verbose:
        print(f"  {reflection}")
        print()
    
    # Step 4: Generate comprehensive answer
    if verbose:
        print("Step 4: Answer Generation")
        print("-" * 80)
    
    answer_prompt = f"""Based on the following context, provide a comprehensive answer to the question.

Question: {query}

Context:
{context}

Provide a detailed, well-structured answer. Cite specific information from the context."""
    
    answer_response = chat.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": answer_prompt}],
        temperature=0.3,
        max_tokens=500
    )
    
    answer = answer_response.choices[0].message.content
    
    if verbose:
        print(f"  {answer}")
        print()
    
    return {
        "answer": answer,
        "reasoning_trace": trace,
        "retrieved_docs": [doc['title'] for doc in unique_docs],
        "num_docs_retrieved": len(unique_docs)
    }

# Test the complete agentic RAG system
complex_question = "How can I build an AI agent that uses search and evaluation tools effectively?"
print("=" * 80)
print(f"Question: {complex_question}")
print("=" * 80)
print()

result = advanced_agentic_rag(complex_question, verbose=True)

print("=" * 80)
print(f"\nSummary:")
print(f"- Retrieved {result['num_docs_retrieved']} unique documents")
print(f"- Documents: {', '.join(result['retrieved_docs'])}")
print("=" * 80)

## 5. Advanced Agentic RAG Patterns <a id="advanced"></a>

### 5.1 Corrective RAG (Self-Correction)

The agent can detect when retrieved information is not relevant and take corrective action.

In [None]:
def corrective_rag(query: str) -> Dict[str, Any]:
    """
    Corrective RAG: Agent evaluates retrieval quality and re-retrieves if needed
    """
    print(f"Query: {query}\n")
    max_attempts = 2
    
    for attempt in range(max_attempts):
        print(f"Attempt {attempt + 1}:")
        print("-" * 80)
        
        # Retrieve documents
        docs = simple_retrieve(query, top_k=3)
        print(f"Retrieved: {[doc['title'] for doc in docs]}")
        
        # Evaluate relevance
        context = "\n".join([doc['title'] for doc in docs])
        relevance_check = f"""Are these document titles relevant to the query: '{query}'?

Documents:
{context}

Answer with 'RELEVANT' or 'NOT_RELEVANT: <reason>'"""
        
        relevance_response = chat.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": relevance_check}],
            temperature=0.2,
            max_tokens=100
        )
        
        relevance = relevance_response.choices[0].message.content
        print(f"Relevance Check: {relevance}")
        
        if "RELEVANT" in relevance.upper() and "NOT_RELEVANT" not in relevance.upper():
            print("✓ Documents are relevant, proceeding with answer generation\n")
            break
        else:
            print("✗ Documents not relevant, reformulating query...\n")
            # Reformulate query for next attempt
            reformulated = reformulate_query(query)
            if len(reformulated) > 1:
                query = reformulated[1]  # Try first alternative
                print(f"New query: {query}\n")
    
    # Generate final answer
    context_full = "\n\n".join([f"{doc['title']}:\n{doc['content']}" for doc in docs])
    answer_response = chat.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": f"Context:\n{context_full}\n\nQuestion: {query}\n\nAnswer:"}],
        temperature=0.3,
        max_tokens=300
    )
    
    return {
        "answer": answer_response.choices[0].message.content,
        "attempts": attempt + 1,
        "final_docs": [doc['title'] for doc in docs]
    }

# Test corrective RAG
print("=" * 80)
print("Corrective RAG Example")
print("=" * 80)
print()

result = corrective_rag("What are the best practices for prompt engineering?")
print("=" * 80)
print("Final Answer:")
print(result['answer'])
print(f"\nAttempts: {result['attempts']}")
print(f"Final Documents: {', '.join(result['final_docs'])}")
print("=" * 80)

### 5.2 Adaptive Retrieval

The agent adapts its retrieval strategy based on the query type and context.

In [None]:
def adaptive_retrieval(query: str) -> Dict[str, Any]:
    """
    Adaptive RAG: Agent chooses retrieval strategy based on query analysis
    """
    # Analyze query to determine strategy
    analysis_prompt = f"""Analyze this query and classify it:

Query: {query}

Classification:
- FACTUAL: Simple factual question (retrieve 1-2 docs)
- COMPARATIVE: Comparing multiple concepts (retrieve 3-4 docs)
- COMPLEX: Multi-faceted question (retrieve 4+ docs, use decomposition)

Respond with ONLY the classification (FACTUAL, COMPARATIVE, or COMPLEX)"""
    
    analysis_response = chat.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": analysis_prompt}],
        temperature=0.2,
        max_tokens=50
    )
    
    query_type = analysis_response.choices[0].message.content.strip().upper()
    print(f"Query Type: {query_type}")
    print("-" * 80)
    
    # Adapt retrieval based on type
    if "FACTUAL" in query_type:
        print("Strategy: Simple retrieval with 2 documents")
        docs = simple_retrieve(query, top_k=2)
    elif "COMPARATIVE" in query_type:
        print("Strategy: Enhanced retrieval with query reformulation")
        reformulated = reformulate_query(query)
        all_docs = []
        for rq in reformulated[:2]:
            all_docs.extend(simple_retrieve(rq, top_k=2))
        # Remove duplicates
        seen = set()
        docs = []
        for doc in all_docs:
            if doc['id'] not in seen:
                docs.append(doc)
                seen.add(doc['id'])
    else:  # COMPLEX
        print("Strategy: Query decomposition + multi-step retrieval")
        sub_queries = decompose_query(query)
        all_docs = []
        for sq in sub_queries[:2]:
            all_docs.extend(simple_retrieve(sq, top_k=2))
        # Remove duplicates
        seen = set()
        docs = []
        for doc in all_docs:
            if doc['id'] not in seen:
                docs.append(doc)
                seen.add(doc['id'])
    
    print(f"Retrieved {len(docs)} documents: {[doc['title'] for doc in docs]}")
    print()
    
    # Generate answer
    context = "\n\n".join([f"{doc['title']}:\n{doc['content']}" for doc in docs])
    answer_response = chat.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}\n\nAnswer:"}],
        temperature=0.3,
        max_tokens=400
    )
    
    return {
        "answer": answer_response.choices[0].message.content,
        "query_type": query_type,
        "docs_retrieved": len(docs),
        "doc_titles": [doc['title'] for doc in docs]
    }

# Test with different query types
queries = [
    "What is RAG?",
    "What are the differences between Azure AI Search and vector embeddings?",
    "How can I build a comprehensive AI agent system with evaluation and search capabilities?"
]

for q in queries:
    print("=" * 80)
    print(f"Query: {q}")
    print("=" * 80)
    result = adaptive_retrieval(q)
    print(f"Answer: {result['answer'][:200]}...")
    print("=" * 80)
    print()

## 6. Summary and Best Practices <a id="summary"></a>

### Key Takeaways

**Agentic RAG Capabilities:**
1. **Query Planning**: Break down complex questions into sub-queries
2. **Query Reformulation**: Generate alternative phrasings for better retrieval
3. **Self-Reflection**: Evaluate if retrieved information is sufficient
4. **Corrective Retrieval**: Re-retrieve if initial results are poor
5. **Adaptive Strategy**: Adjust approach based on query type
6. **Transparent Reasoning**: Provide trace of decision-making process

### Implementation Best Practices

1. **Start Simple**: Begin with basic RAG, add agentic features incrementally
2. **Use Proper Retrieval**: In production, use vector search (Azure AI Search, embeddings)
3. **Set Iteration Limits**: Prevent infinite loops in self-reflection cycles
4. **Monitor Costs**: Agentic RAG uses more API calls; optimize accordingly
5. **Log Everything**: Keep detailed traces for debugging and improvement
6. **Evaluate Performance**: Measure quality improvements vs. cost increases
7. **Handle Failures**: Implement fallbacks when retrieval or generation fails
8. **Optimize Prompts**: Fine-tune prompts for each agentic step

### When to Use Agentic RAG

**Good Use Cases:**
- Complex multi-step questions
- High-stakes applications requiring accuracy
- Scenarios with ambiguous queries
- Applications benefiting from explainability

**Maybe Not Needed:**
- Simple factual lookups
- Cost-sensitive applications
- Real-time, low-latency requirements
- Well-defined, straightforward queries

### Production Considerations

1. **Use Vector Search**: Implement proper semantic search with embeddings
2. **Caching**: Cache retrieved documents and intermediate results
3. **Parallel Processing**: Run sub-queries in parallel when possible
4. **Streaming**: Stream responses for better user experience
5. **Monitoring**: Track success rates, iteration counts, costs
6. **A/B Testing**: Compare agentic vs. traditional RAG performance

### Next Steps

- Implement with real Azure AI Search and vector embeddings
- Add tool usage (calculator, code execution, etc.)
- Integrate with Microsoft Agent Framework
- Build evaluation pipelines to measure improvements
- Explore multi-agent RAG systems

### Additional Resources

- [Agentic Retrieval in Azure AI Search](https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/introducing-agentic-retrieval-in-azure-ai-search/4414677)
- [Azure AI Search Documentation](https://learn.microsoft.com/azure/search/)
- [Microsoft Agent Framework](https://learn.microsoft.com/azure/ai-studio/)
- [RAG Patterns and Best Practices](https://learn.microsoft.com/azure/ai-studio/concepts/retrieval-augmented-generation)

## Practice Exercises

Try implementing these enhancements:

1. **Hybrid Search**: Combine keyword and semantic search results
2. **Re-ranking**: Add a re-ranking step after initial retrieval
3. **Citation Generation**: Make the agent cite specific documents in its answer
4. **Confidence Scoring**: Have the agent rate its confidence in the answer
5. **Multi-turn RAG**: Extend to handle follow-up questions with conversation history

In [None]:
# Your practice code here
# Try building one of the enhancements above!