# 🎯 RAG Demo: Retrieval-Augmented Generation

This notebook demonstrates a complete RAG (Retrieval-Augmented Generation) pipeline that combines:
- **Internal Knowledge**: ChromaDB with product catalog
- **External Knowledge**: Web search via SerpAPI Bing
- **AI Generation**: Azure OpenAI GPT-4
- **Quality Evaluation**: Automated response accuracy assessment

## 🏗️ Component Overview

The RAG system intelligently combines internal company data with external web information to provide comprehensive, accurate responses.

```mermaid
graph LR
    A[User Query] --> B[ChromaDB Search]
    B --> C[Generate Web Queries]
    A --> C[Generate Web Queries]
    C --> D[SerpAPI Bing Search]
    B --> E[Combine Context]
    D --> E
    E --> F[RAG: 
    Query + Context 
    -> LLM Response]
    F --> G[LLM Evaluation]
```

In [1]:
import sys
import os
import asyncio
from pathlib import Path

# Add src directory to path
project_root = os.path.join(str(Path.cwd().parent), 'src')
sys.path.insert(0, project_root)

# Import RAG components
from utils.mcp_config import Config
from tools.chroma_search import ChromaDBSearcher
from tools.web_search import WebSearcher
from tools.rag_generator import RAGResponseGenerator
from tools.rag_evaluator import RAGEvaluator

# Initialize configuration
config = Config(environment="local")
print(f"✅ Configuration loaded: {config.environment}")

2025-08-02 18:58:31,463 - root - INFO - ✅ Loaded environment file for 'local' environment


✅ Configuration loaded: local


## 🧪 Demo Queries

We'll test the RAG system with two different scenarios:
1. **External Knowledge Needed**: "What is the best footwear for hiking?" - Requires web search
2. **Internal Knowledge Sufficient**: "Tell me about our RainGuard Hiking Jacket" - Uses product catalog

In [2]:
# Initialize RAG components
chroma_searcher = ChromaDBSearcher(config)
web_searcher = WebSearcher(config)
rag_generator = RAGResponseGenerator(config)
rag_evaluator = RAGEvaluator(config)

# Define test queries
test_queries = [
    "What is the best footwear for hiking?",  # External knowledge needed
    "Tell me about our RainGuard Hiking Jacket product."  # Internal knowledge sufficient
]

print("🎯 Demo Queries:")
for i, query in enumerate(test_queries, 1):
    print(f"  {i}. {query}")
    
print(f"\n🔧 RAG Components initialized successfully!")

🎯 Demo Queries:
  1. What is the best footwear for hiking?
  2. Tell me about our RainGuard Hiking Jacket product.

🔧 RAG Components initialized successfully!


# Scenario Discussion

- **Scenario 1** involves the user asking a generic question that `goes beyond the scope of the internal data` - a product catalog. In this scenario, external Web context may provide value; therefore relevant Web queries are generated to be executed and included in the context to answer the users question.

- **Scenario 2** involves the user asking a question about a `specific product in the internal catalog`, the RainGuard Hiking Jacket. In this scenario, the system has been instructed to only answer questions from the internal product catalog about any internal products. No Web queries are generated. The RAG response will be limited to internal data sources.

## 🔍 Internal Search - Chroma DB - Local File in MCP Server

The below shows how each scenario performs when searching the local Chroma database. The Chroma database is local to the MCP server.

In [3]:
# Demo 1: ChromaDB Internal Search
collection_info = await chroma_searcher.get_collection_info()
print(f"🔍 Searching INTERNAL DATA (ChromaDB). {collection_info['document_count']} documents ingested from https://github.com/Azure-Samples/contoso-chat/blob/main/data/product_info/products.csv)")

internal_context0 = await chroma_searcher.search_chroma(
    query=test_queries[0],
    n_results=2
)
internal_context1 = await chroma_searcher.search_chroma(
    query=test_queries[1],
    n_results=2
)

print("="*100)
print(f"{'='*20} INTERNAL + EXTERNAL {test_queries[0]} {'='*20}")
print("="*100)
for i, result in enumerate(internal_context0.queries[:3], 1):
    print(f"\n  {i}. Content: {result.content[:100]}...")
    print(f"     Citation: {result.citation}")
    print(f"     Similarity: {result.metadata.get('similarity_score', 'N/A')}")

print("="*100)
print(f"{'='*20} INTERNAL ONLY {test_queries[1]} {'='*20}")
print("="*100)

for i, result in enumerate(internal_context1.queries[:3], 1):
    print(f"\n  {i}. Content: {result.content[:100]}...")
    print(f"     Citation: {result.citation}")
    print(f"     Similarity: {result.metadata.get('similarity_score', 'N/A')}")



🔍 Searching INTERNAL DATA (ChromaDB). 20 documents ingested from https://github.com/Azure-Samples/contoso-chat/blob/main/data/product_info/products.csv)

  1. Content: ID: 11, Name: TrailWalker Hiking Shoes, Price: $110.0, Category: Hiking Footwear, Brand: TrekReady, ...
     Citation: [Source: ChromaDB | Collection: product_collection | ID: 11 | Similarity: 0.180]
     Similarity: 0.18037372827529907

  2. Content: ID: 4, Name: TrekReady Hiking Boots, Price: $140.0, Category: Hiking Footwear, Brand: TrekReady, Des...
     Citation: [Source: ChromaDB | Collection: product_collection | ID: 4 | Similarity: 0.117]
     Similarity: 0.11699265241622925

  1. Content: ID: 17, Name: RainGuard Hiking Jacket, Price: $110.0, Category: Hiking Clothing, Brand: MountainStyl...
     Citation: [Source: ChromaDB | Collection: product_collection | ID: 17 | Similarity: 0.456]
     Similarity: 0.4556225538253784

  2. Content: ID: 3, Name: Summit Breeze Jacket, Price: $120.0, Category: Hiking Clothing, B

## 🌐 External Search - Bing via Serp API

Execute the below block to see how each scenario is treated for external searching. 

- Scenario 1 has web search queries generated

- Scenario 2 that is specific to an internal product does not have queries generated (this is by instruction to an LLM call that generates web queries based on the user's question and internal context provided)

In [4]:
# Demo 2: Intelligent Web Search
print("🌐 Searching EXTERNAL DATA (Bing) with Intelligent Web Query Generation...")

generated_queries0 = await web_searcher.get_web_search_queries(
    user_query=test_queries[0],
    internal_context=internal_context0.to_list()
)
generated_queries1 = await web_searcher.get_web_search_queries(
    user_query=test_queries[1],
    internal_context=internal_context1.to_list()
)

print("="*100)
print(f"{'='*20} INTERNAL + EXTERNAL {test_queries[0]} {'='*20}")
print("="*100)
print(f"Generated {len(generated_queries0.queries)} web queries:")
for query in generated_queries0.queries:
    print(f"  Priority {query.priority_rank}: {query.search_query}")
    print(f"  Purpose: {query.purpose}\n")
print("="*20)
# Execute web search if queries were generated
print(f"Executing generated queries asynchronously: {len(generated_queries0.queries)} queries")
if generated_queries0.queries:
    print("🔍 Executing Web Search...")
    web_results = await web_searcher.search_serpapi_bing_with_generated_queries(
        generated_queries=generated_queries0,
        n_results=2
    )
    print(f"Returned web results (external context): {len(web_results)} results:")
    for i, result in enumerate(web_results.queries, 1):
        print(f"\n  {i}. Content: {result.content[:100]}...")
        print(f"     Citation: {result.citation}")

print("="*100)
print(f"{'='*20} INTERNAL ONLY {test_queries[1]} {'='*20}")
print("="*100)
print(f"Generated {len(generated_queries1.queries)} web queries:")
for query in generated_queries1.queries:
    print(f"  Priority {query.priority_rank}: {query.search_query}")
    print(f"  Purpose: {query.purpose}\n")
print("⏭️ !!!! No search queries should be generated - internal context is sufficient !!!")
print("See the prompt in the web_search module for instructions that enforce this.")

🌐 Searching EXTERNAL DATA (Bing) with Intelligent Web Query Generation...
Generated 2 web queries:
  Priority 1: best hiking footwear for different terrains and weather conditions
  Purpose: To compare various hiking footwear options and their suitability for diverse hiking environments.

  Priority 2: reviews of top hiking shoes and boots in 2025
  Purpose: To gather up-to-date reviews and recommendations for hiking footwear.

Executing generated queries asynchronously: 2 queries
🔍 Executing Web Search...
Returned web results (external context): 4 results:

  1. Content: Top performers include the Theora Pro HF Ergonomic for wide feet, Merrell Moab 2 Vent for comfort, K...
     Citation: [Source: Bing Search | Link: https://backpackingguys.com/best-shoes-for-hike/ | Position: 1]

  2. Content: Ultimately, the best hiking footwear is the one that fits your body, trail conditions, and experienc...
     Citation: [Source: Bing Search | Link: https://hikingequipped.com/choosing-the-right-

## 🤖 Explore the RAG Response

- Execute the below cell, and then, explore each RAG response as you like. 
- Consider how applications might display the response and also the citations

**🚨 Important:** The below will not use the above found context. Instead, it will perform the entire RAG response function end to end and may find new context.

In [8]:
print("🤖 Generating an LLM response to the test user queries ...\n")

print("="*100)
print(f"{'='*20} INTERNAL + EXTERNAL {test_queries[0]} {'='*20}")
print("="*100)
rag_response0 = await rag_generator.generate_chat_response(
            user_query=test_queries[0],
            n_chroma_results=3,
            n_web_results=3
        )
print("🤖 Generated RAG response, use one of the below cells to evaluate the `rag_response0` varible")
print(rag_response0["response"])

print("="*100)
print(f"{'='*20} INTERNAL ONLY {test_queries[1]} {'='*20}")
print("="*100)
rag_response1 = await rag_generator.generate_chat_response(
            user_query=test_queries[1],
            n_chroma_results=3,
            n_web_results=3
        )
print("🤖 Generated RAG response, use one of the below cells to evaluate the `rag_response1` varible")
print(rag_response1["response"])

🤖 Generating an LLM response to the test user queries ...

🤖 Generated RAG response, use one of the below cells to evaluate the `rag_response0` varible
The best footwear for hiking depends on the type of terrain, weather conditions, and the level of support and comfort you need. Here's a breakdown of the options and their suitability:

---

### **1. Hiking Shoes**
Hiking shoes, like the **TrailWalker Hiking Shoes** by TrekReady, are a versatile choice for most hikers. They are lightweight, breathable, and provide excellent traction and support for moderate trails. Key features include:
- **Waterproofing**: Ideal for wet conditions.
- **Comfort**: Cushioned insoles and breathable mesh materials.
- **Durability**: Reinforced toe caps and mudguards for rugged terrain.
- **Best For**: Day hikes, well-maintained trails, and hikers who prefer lighter footwear.

**Internal Knowledge Example**: The TrailWalker Hiking Shoes are designed for all-weather durability and comfort, making them a grea

Explore rag_response0 - INTERNAL + EXTERNAL What is the best footwear for hiking?

In [9]:
rag_response0

{'user_query': 'What is the best footwear for hiking?',
 'response': "The best footwear for hiking depends on the type of terrain, weather conditions, and the level of support and comfort you need. Here's a breakdown of the options and their suitability:\n\n---\n\n### **1. Hiking Shoes**\nHiking shoes, like the **TrailWalker Hiking Shoes** by TrekReady, are a versatile choice for most hikers. They are lightweight, breathable, and provide excellent traction and support for moderate trails. Key features include:\n- **Waterproofing**: Ideal for wet conditions.\n- **Comfort**: Cushioned insoles and breathable mesh materials.\n- **Durability**: Reinforced toe caps and mudguards for rugged terrain.\n- **Best For**: Day hikes, well-maintained trails, and hikers who prefer lighter footwear.\n\n**Internal Knowledge Example**: The TrailWalker Hiking Shoes are designed for all-weather durability and comfort, making them a great all-around option for hiking ([Source: Internal Knowledge Base, ID: 1

Explore rag_response1 - INTERNAL ONLY Tell me about our RainGuard Hiking Jacket product.

In [11]:
rag_response1

{'user_query': 'Tell me about our RainGuard Hiking Jacket product.',
 'response': "The **RainGuard Hiking Jacket** is a premium outdoor gear product designed by **MountainStyle**, specifically tailored for hiking, camping, trekking, and other outdoor adventures. Here’s a detailed overview of its features and benefits:\n\n### Key Features:\n1. **Weatherproof Comfort**:\n   - Made with **waterproof and breathable fabric**, the RainGuard Hiking Jacket ensures you stay dry and comfortable in wet conditions.\n   - It includes an **adjustable hood** to provide a customizable fit against wind and rain.\n\n2. **Durability**:\n   - The jacket boasts **rugged construction**, making it highly durable and capable of withstanding tough outdoor conditions.\n\n3. **Convenience**:\n   - Equipped with **multiple pockets**, it offers safe and convenient storage for your essentials.\n   - **Adjustable cuffs and hem** allow you to tailor the fit to your needs while on the move.\n\n4. **Ventilation**:\n   

## 👩‍🔬 Add Evaluations Logic

- To right-size user expectations about accuracy, consider returning an accuracy score with results
- Consider if every response should evaluated for accuracy; or, should users opt-in via the MCP dynamic tools?

**🚨 Important:** The evaluation logic embedded in this solution is for demonstration purposes only. It is recommended to use a content moderation service provider, such as Azure AI Content Safety, that specializes in evaluations

In [14]:
print("👩‍🔬 Evaluating the LLM responses to the test user queries ...\n")

print("="*100)
print(f"{'='*20} INTERNAL + EXTERNAL {test_queries[0]} {'='*20}")
print("="*100)
evaluation0 = await rag_evaluator.evaluate_rag_response(rag_response0)
print(f"\n📈 Evaluation Results:")
print(f"   Accuracy Score: {evaluation0.accuracy_score:.2f}")
print(f"   Hallucination Risk: {'High' if evaluation0.is_hallucination else 'Low'}")
print(f"   Confidence: {evaluation0.confidence_level}")
print(f"   Supported Claims: {len(evaluation0.supported_claims)}")
print(f"   Unsupported Claims: {len(evaluation0.unsupported_claims)}")

print("="*100)
print(f"{'='*20} INTERNAL ONLY {test_queries[1]} {'='*20}")
print("="*100)
evaluation1 = await rag_evaluator.evaluate_rag_response(rag_response1)
print(f"\n📈 Evaluation Results:")
print(f"   Accuracy Score: {evaluation1.accuracy_score:.2f}")
print(f"   Hallucination Risk: {'High' if evaluation1.is_hallucination else 'Low'}")
print(f"   Confidence: {evaluation1.confidence_level}")
print(f"   Supported Claims: {len(evaluation1.supported_claims)}")
print(f"   Unsupported Claims: {len(evaluation1.unsupported_claims)}")

👩‍🔬 Evaluating the LLM responses to the test user queries ...


📈 Evaluation Results:
   Accuracy Score: 0.85
   Hallucination Risk: High
   Confidence: High
   Supported Claims: 5
   Unsupported Claims: 3

📈 Evaluation Results:
   Accuracy Score: 0.95
   Hallucination Risk: Low
   Confidence: High
   Supported Claims: 11
   Unsupported Claims: 1


## 🎮 Interactive Demo

Try your own queries with the RAG system!

In [25]:
my_question = ""

if my_question == "":
    print("❓ Please enter your test question above")

else: 
      print(f"🎮 Testing the end to end flow with my_question")
      print("="*100)
      print(f"{'='*20} My question: {my_question} {'='*20}")
      print("="*100)

      my_test_response = await rag_evaluator.generate_chat_response_and_evaluate (
      user_query=my_question,
      n_chroma_results=10,
      n_web_results=10
      )

      # And, now, return the results
      print(f"""The LLM''s response to my question: 
            {my_test_response['rag_response']}""")
      print(f"""The hallucination risk of the LLM's response: 
            {my_test_response['evaluation']['confidence_level']}""")

❓ Please enter your test question above


Or, look at the entire contents of the output

In [None]:
my_test_response