# 🎯 RAG Demo: Retrieval-Augmented Generation

This notebook demonstrates a complete RAG (Retrieval-Augmented Generation) pipeline that combines:
- **Internal Knowledge**: ChromaDB with product catalog
- **External Knowledge**: Web search via SerpAPI Bing
- **AI Generation**: Azure OpenAI GPT-4
- **Quality Evaluation**: Automated response accuracy assessment

## 🏗️ Component Overview

The RAG system intelligently combines internal company data with external web information to provide comprehensive, accurate responses.

```mermaid
graph LR
    A[User Query] --> B[ChromaDB Search]
    B --> C[Generate Web Queries]
    A --> C[Generate Web Queries]
    C --> D[SerpAPI Bing Search]
    B --> E[Combine Context]
    D --> E
    E --> F[RAG: 
    Query + Context 
    -> LLM Response]
    F --> G[LLM Evaluation]
```

Set path (to work when running from ./sk_mcp_demo or ./sk_mcp_demo/mcp_rag)

In [3]:
import sys
import os
import asyncio
import time
from pathlib import Path
# Detect and set up proper path to src directory
current_dir = Path.cwd()
# Check if we're in the root folder
if 'mcp_rag' in os.listdir(current_dir):
    # Running from root folder (sk_mcp_demo)
    project_root = os.path.join(str(current_dir), 'mcp_rag', 'src')
else:
    # Running from mcp_rag folder or subfolder
    mcp_rag_dir = current_dir
    while mcp_rag_dir.name != 'mcp_rag' and mcp_rag_dir != mcp_rag_dir.parent:
        mcp_rag_dir = mcp_rag_dir.parent
    project_root = os.path.join(str(mcp_rag_dir), 'src')

# Add src directory to path if it's not already there
if project_root not in sys.path:
    sys.path.insert(0, project_root)
    print(f"Added {project_root} to Python path")

In [None]:
# Import RAG components
from utils import McpConfig
from tools.chroma_search import ChromaDBSearcher
from tools.web_search import WebSearcher
from tools.rag_generator import RAGResponseGenerator
from tools.rag_evaluator import RAGEvaluator

# Initialize configuration
config = McpConfig(environment="local")
print(f"✅ Configuration loaded: {config.environment}")
print(config)

2025-08-08 08:22:27,388 - root - INFO - ✅ Loaded environment file for 'local': c:\Users\aprilhazel\Source\sk_mcp_demo\mcp_rag\.env.local


✅ Configuration loaded: local


## 🧪 Demo Queries

We'll test the RAG system with two different scenarios:
1. **External Knowledge Needed**: "What is the best footwear for hiking?" - Requires web search
2. **Internal Knowledge Sufficient**: "Tell me about our RainGuard Hiking Jacket" - Uses product catalog

In [5]:
# Initialize RAG components
chroma_searcher = ChromaDBSearcher(config)
web_searcher = WebSearcher(config)
rag_generator = RAGResponseGenerator(config)
rag_evaluator = RAGEvaluator(config)

# Define test queries
test_queries = [
    "What is the best footwear for hiking?",  # External knowledge needed
    "Tell me about our RainGuard Hiking Jacket product."  # Internal knowledge sufficient
]

print("🎯 Demo Queries:")
for i, query in enumerate(test_queries, 1):
    print(f"  {i}. {query}")
    
print(f"\n🔧 RAG Components initialized successfully!")

🎯 Demo Queries:
  1. What is the best footwear for hiking?
  2. Tell me about our RainGuard Hiking Jacket product.

🔧 RAG Components initialized successfully!


# Scenario Discussion

- **Scenario 1** involves the user asking a generic question that `goes beyond the scope of the internal data` - a product catalog. In this scenario, external Web context may provide value; therefore relevant Web queries are generated to be executed and included in the context to answer the users question.

- **Scenario 2** involves the user asking a question about a `specific product in the internal catalog`, the RainGuard Hiking Jacket. In this scenario, the system has been instructed to only answer questions from the internal product catalog about any internal products. No Web queries are generated. The RAG response will be limited to internal data sources.

## 🔍 Internal Search - Chroma DB - Local File in MCP Server

The below shows how each scenario performs when searching the local Chroma database. The Chroma database is local to the MCP server.

In [6]:
# Demo 1: ChromaDB Internal Search
collection_info = await chroma_searcher.get_collection_info()
print(f"🔍 Searching INTERNAL DATA (ChromaDB). {collection_info['document_count']} documents ingested from https://github.com/Azure-Samples/contoso-chat/blob/main/data/product_info/products.csv)")

internal_context0 = await chroma_searcher.search_chroma(
    query=test_queries[0],
    n_results=2
)
internal_context1 = await chroma_searcher.search_chroma(
    query=test_queries[1],
    n_results=2
)

print("="*100)
print(f"{'='*20} INTERNAL + EXTERNAL {test_queries[0]} {'='*20}")
print("="*100)

for i, result in enumerate(internal_context0[:3], 1):
    print(f"\n  {i}. Content: {result['content'][:100]}...")
    print(f"     Citation: {result['citation']}")
    print(f"     Similarity: {result['metadata'].get('similarity_score', 'N/A')}")

print("="*100)
print(f"{'='*20} INTERNAL ONLY {test_queries[1]} {'='*20}")
print("="*100)

for i, result in enumerate(internal_context1[:3], 1):
    print(f"\n  {i}. Content: {result['content'][:100]}...")
    print(f"     Citation: {result['citation']}")
    print(f"     Similarity: {result['metadata'].get('similarity_score', 'N/A')}")

🔍 Searching INTERNAL DATA (ChromaDB). 20 documents ingested from https://github.com/Azure-Samples/contoso-chat/blob/main/data/product_info/products.csv)

  1. Content: ID: 11, Name: TrailWalker Hiking Shoes, Price: $110.0, Category: Hiking Footwear, Brand: TrekReady, ...
     Citation: [Source: ChromaDB | Collection: product_collection | ID: 11 | Similarity: 0.180]
     Similarity: 0.18037378787994385

  2. Content: ID: 4, Name: TrekReady Hiking Boots, Price: $140.0, Category: Hiking Footwear, Brand: TrekReady, Des...
     Citation: [Source: ChromaDB | Collection: product_collection | ID: 4 | Similarity: 0.117]
     Similarity: 0.11699259281158447

  1. Content: ID: 17, Name: RainGuard Hiking Jacket, Price: $110.0, Category: Hiking Clothing, Brand: MountainStyl...
     Citation: [Source: ChromaDB | Collection: product_collection | ID: 17 | Similarity: 0.456]
     Similarity: 0.4556226134300232

  2. Content: ID: 3, Name: Summit Breeze Jacket, Price: $120.0, Category: Hiking Clothing, B

In [7]:
internal_context0

[{'query': 'What is the best footwear for hiking?',
  'content': "ID: 11, Name: TrailWalker Hiking Shoes, Price: $110.0, Category: Hiking Footwear, Brand: TrekReady, Description: Meet the TrekReady TrailWalker Hiking Shoes, the ideal companion for all your outdoor adventures. Constructed with synthetic leather and breathable mesh, these shoes are tough as nails yet surprisingly airy. Their cushioned insoles offer fabulous comfort for long hikes, while the supportive midsoles and traction outsoles with multidirectional lugs ensure stability and excellent grip. A quick-lace system, padded collar and tongue, and reflective accents make these shoes a dream to wear. From combating rough terrain with the reinforced toe cap and heel, to keeping off trail debris with the protective mudguard, the TrailWalker Hiking Shoes have you covered. These waterproof warriors are made to endure all weather conditions. But they're not just about being rugged, they're light as a feather too, minimizing fatig

## 🌐 External Search - Bing via Serp API

Execute the below block to see how each scenario is treated for external searching. 

- Scenario 1 has web search queries generated

- Scenario 2 that is specific to an internal product does not have queries generated (this is by instruction to an LLM call that generates web queries based on the user's question and internal context provided)

In [8]:
# Demo 2: Intelligent Web Search
print("🌐 Searching EXTERNAL DATA (Bing) with Intelligent Web Query Generation...")

generated_queries0 = await web_searcher._get_web_search_queries(
    user_query=test_queries[0],
    internal_context=internal_context0  # Remove .to_list() - already a list
)
generated_queries1 = await web_searcher._get_web_search_queries(
    user_query=test_queries[1],
    internal_context=internal_context1  # Remove .to_list() - already a list
)

print("="*100)
print(f"{'='*20} INTERNAL + EXTERNAL {test_queries[0]} {'='*20}")
print("="*100)
print(f"Generated {len(generated_queries0)} web queries:")  # generated_queries0 is a list
for query in generated_queries0:  # Iterate directly over the list
    print(f"  Priority {query['priority_rank']}: {query['search_query']}")
    print(f"  Purpose: {query['purpose']}\n")
print("="*20)
# Execute web search if queries were generated
print(f"Executing generated queries asynchronously: {len(generated_queries0)} queries")
if generated_queries0:  # Check if list is not empty
    print("🔍 Executing Web Search...")
    web_results = await web_searcher.search_bing_with_chat_and_context(
        user_query=test_queries[0],
        internal_context=internal_context0 
    )
    print(f"Returned web results (external context): {len(web_results)} results:")
    for i, result in enumerate(web_results, 1):  # web_results is a list, not SearchResults object
        print(f"\n  {i}. Content: {result['content'][:100]}...")
        print(f"     Citation: {result['citation']}")

print("="*100)
print(f"{'='*20} INTERNAL ONLY {test_queries[1]} {'='*20}")
print("="*100)
print(f"Generated {len(generated_queries1)} web queries:")  # generated_queries1 is a list
for query in generated_queries1:  # Iterate directly over the list
    print(f"  Priority {query['priority_rank']}: {query['search_query']}")
    print(f"  Purpose: {query['purpose']}\n")
print("⏭️ !!!! No search queries should be generated - internal context is sufficient !!!")
print("See the prompt in the web_search module for instructions that enforce this.")

🌐 Searching EXTERNAL DATA (Bing) with Intelligent Web Query Generation...
Generated 3 web queries:
  Priority 1: best hiking footwear options 2025
  Purpose: To identify the latest and most recommended hiking footwear available in 2025.

  Priority 2: comparison of hiking boots and hiking shoes for trails
  Purpose: To explore the differences and suitability of hiking boots versus hiking shoes for various terrains.

  Priority 3: top-rated waterproof hiking footwear
  Purpose: To find highly-rated waterproof hiking footwear options for challenging weather conditions.

Executing generated queries asynchronously: 3 queries
🔍 Executing Web Search...
Returned web results (external context): 15 results:

  1. Content: In this field‑tested roundup, you’ll discover top picks that excel in trail running scenarios, deliv...
     Citation: [Source: Bing Search | Link: https://outdoortrekker.com/footwear/hiking-footwear/best-hiking-shoes/ | Position: 1]

  2. Content: In this guide, we explain ho

## 🤖 Explore the RAG Response

- Execute the below cell, and then, explore each RAG response as you like. 
- Consider how applications might display the response and also the citations

**🚨 Important:** The below will not use the above found context. Instead, it will perform the entire RAG response function end to end and may find new context.

In [9]:
print("🤖 Generating an LLM response to the test user queries ...\n")

print("="*100)
print(f"{'='*20} INTERNAL + EXTERNAL {test_queries[0]} {'='*20}")
print("="*100)
rag_response0 = await rag_generator.generate_chat_response(
            user_query=test_queries[0],
            n_chroma_results=3,
            n_web_results=3
        )
print("🤖 Generated RAG response, use one of the below cells to evaluate the `rag_response0` varible")
print(rag_response0["response"])

print("="*100)
print(f"{'='*20} INTERNAL ONLY {test_queries[1]} {'='*20}")
print("="*100)
rag_response1 = await rag_generator.generate_chat_response(
            user_query=test_queries[1],
            n_chroma_results=3,
            n_web_results=3
        )
print("🤖 Generated RAG response, use one of the below cells to evaluate the `rag_response1` varible")
print(rag_response1["response"])

🤖 Generating an LLM response to the test user queries ...

🤖 Generated RAG response, use one of the below cells to evaluate the `rag_response0` varible
The best footwear for hiking depends on several factors, including the type of terrain, weather conditions, and your personal comfort preferences. Here’s a breakdown of the options based on internal knowledge and external sources:

### **1. Hiking Shoes**
Hiking shoes, like the **TrailWalker Hiking Shoes** ($110.00), are ideal for moderate trails and day hikes. They offer:
- **Lightweight design**: Reduces fatigue during long hikes.
- **Breathability**: Keeps feet cool and dry.
- **Traction and stability**: Multidirectional lugs on outsoles ensure grip on uneven terrain.
- **Waterproofing**: Suitable for wet conditions.
- **Customization**: Removable insoles and multiple size options for a perfect fit.
These shoes are a great choice for hikers who want a balance of comfort, durability, and versatility. [Source: Internal Knowledge Base, 

Explore rag_response0 - INTERNAL + EXTERNAL What is the best footwear for hiking?

In [10]:
rag_response0

{'user_query': 'What is the best footwear for hiking?',
 'response': 'The best footwear for hiking depends on several factors, including the type of terrain, weather conditions, and your personal comfort preferences. Here’s a breakdown of the options based on internal knowledge and external sources:\n\n### **1. Hiking Shoes**\nHiking shoes, like the **TrailWalker Hiking Shoes** ($110.00), are ideal for moderate trails and day hikes. They offer:\n- **Lightweight design**: Reduces fatigue during long hikes.\n- **Breathability**: Keeps feet cool and dry.\n- **Traction and stability**: Multidirectional lugs on outsoles ensure grip on uneven terrain.\n- **Waterproofing**: Suitable for wet conditions.\n- **Customization**: Removable insoles and multiple size options for a perfect fit.\nThese shoes are a great choice for hikers who want a balance of comfort, durability, and versatility. [Source: Internal Knowledge Base, ID: 11]\n\n### **2. Hiking Boots**\nFor more rugged terrain or multi-day 

Explore rag_response1 - INTERNAL ONLY Tell me about our RainGuard Hiking Jacket product.

In [11]:
rag_response1

{'user_query': 'Tell me about our RainGuard Hiking Jacket product.',
 'response': "The **RainGuard Hiking Jacket** is a premium outdoor gear designed by **MountainStyle** to provide comfort and protection during various outdoor activities such as hiking, camping, and trekking. Priced at **$110**, this jacket offers a combination of functionality, durability, and convenience.\n\n### Key Features:\n1. **Weatherproof Design**:\n   - Made with **waterproof and breathable fabric**, the RainGuard Hiking Jacket ensures you stay dry and comfortable even in rainy conditions.\n   - The adjustable hood provides a customizable fit to shield you from wind and rain.\n\n2. **Durability**:\n   - Its rugged construction guarantees long-lasting use, making it ideal for challenging environments.\n\n3. **Convenience**:\n   - Equipped with **multiple pockets**, it offers safe and convenient storage for your essentials.\n   - Adjustable cuffs and hem allow you to tailor the fit to your preference.\n\n4. **V

## 👩‍🔬 Add Evaluations Logic

- To right-size user expectations about accuracy, consider returning an accuracy score with results
- Consider if every response should evaluated for accuracy; or, should users opt-in via the MCP dynamic tools?

**🚨 Important:** The evaluation logic embedded in this solution is for demonstration purposes only. It is recommended to use a content moderation service provider, such as Azure AI Content Safety, that specializes in evaluations

In [12]:
print("👩‍🔬 Generating RAG responses with integrated evaluation ...\n")

print("="*100)
print(f"{'='*20} INTERNAL + EXTERNAL {test_queries[0]} {'='*20}")
print("="*100)

try: 
    start = time.time()
    evaluation0 = await rag_evaluator.evaluate_rag_response(
            rag_response = rag_response0
        )
    elapsed = time.time() - start
    print(f"\n📈 Evaluation Results:")
    print(f"   Accuracy Score: {evaluation0['accuracy_score']:.2f}")
    print(f"   Confidence: {evaluation0['confidence_level']}")
    print(f"   Supported Claims: {len(evaluation0['supported_claims'])}")
    print(f"   Unsupported Claims: {len(evaluation0['unsupported_claims'])}")
    
except Exception as e:
    print(f"👩‍🔬❌ Error in RAG response with evaluation test: {e}")

print("="*100)
print(f"{'='*20} INTERNAL ONLY {test_queries[1]} {'='*20}")
print("="*100)

try: 
    start = time.time()
    evaluation1 = await rag_evaluator.evaluate_rag_response(
        rag_response = rag_response1
    )
    elapsed = time.time() - start
    print(f"\n📈 Evaluation Results:")
    print(f"   Accuracy Score: {evaluation1['accuracy_score']:.2f}")
    print(f"   Confidence: {evaluation1['confidence_level']}")
    print(f"   Supported Claims: {len(evaluation1['supported_claims'])}")
    print(f"   Unsupported Claims: {len(evaluation1['unsupported_claims'])}")
    
except Exception as e:
    print(f"👩‍🔬❌ Error in RAG response with evaluation test: {e}")

👩‍🔬 Generating RAG responses with integrated evaluation ...


📈 Evaluation Results:
   Accuracy Score: 0.90
   Confidence: High
   Supported Claims: 4
   Unsupported Claims: 1

📈 Evaluation Results:
   Accuracy Score: 1.00
   Confidence: High
   Supported Claims: 12
   Unsupported Claims: 0


In [13]:
evaluation0

{'accuracy_score': 0.9,
 'evaluation_reasoning': 'The AI-generated answer is mostly supported by the provided context. It accurately describes the features and suitability of the TrailWalker Hiking Shoes, TrekReady Hiking Boots, and TrekStar Hiking Sandals based on the internal knowledge base. The external recommendations are general and align with the context provided, though they are not directly cited in detail.',
 'supported_claims': ['TrailWalker Hiking Shoes are lightweight, breathable, have traction and stability, are waterproof, and customizable.',
  'TrekReady Hiking Boots are durable, provide ankle support, have traction, and are comfortable.',
  'TrekStar Hiking Sandals are breathable, have adjustable straps, toe protection, and are lightweight.',
  'External sources emphasize choosing footwear based on trail conditions and personal fit.'],
 'unsupported_claims': ['External sources highlight durability and water resistance for creek crossings and loose terrain (not directly 

## 🎮 Interactive Demo

Try your own queries with the RAG system!

In [14]:
my_question = ""

my_rag_response = None
my_eval_outcome = None
if my_question == "":
    print("❓ Please enter your test question above")

else: 
    print(f"🎮 Testing the end to end flow with my_question")
    print("="*100)
    print(f"{'='*20} My question: {my_question} {'='*20}")
    print("="*100)

    my_rag_response = await rag_generator.generate_chat_response(
            user_query=my_question,
            n_chroma_results=3,
            n_web_results=3
        )
    my_eval_outcome = await rag_evaluator.evaluate_rag_response(
        rag_response = my_rag_response
    )

    # And, now, return the results
    print(f"""The LLM''s response to my question: 
{my_rag_response['response']}""")

    print(f"""!!!!! The accuracy score of the LLM's response: {my_eval_outcome['accuracy_score']:.2f} and the confidence level is {my_eval_outcome['confidence_level']}. """)

❓ Please enter your test question above


Or, look at the entire contents of the output

In [15]:
my_rag_response

In [16]:
my_eval_outcome