# Employee Handbook RAG Implementation with Multiple AI Platforms

This notebook demonstrates how to implement Retrieval-Augmented Generation (RAG) for an employee handbook using different AI platforms:
- OpenAI GPT models
- Azure AI Foundry (Azure OpenAI)
- Google AI Studio (Gemini)
- Ollama (Local models)

We'll build a knowledge base from the employee handbook and show how each platform can be used to answer HR-related questions.

In [1]:
# Install required packages!
!pip install openai chromadb sentence-transformers google-generativeai ollama python-dotenv tiktoken PyPDF2



## 1. Import Required Libraries and Setup

In [2]:
import os
import json
from typing import List, Dict, Any
from dotenv import load_dotenv
import chromadb
from sentence_transformers import SentenceTransformer
import tiktoken
import PyPDF2

# AI Platform imports
import openai
from openai import AzureOpenAI
import google.generativeai as genai
import ollama

# Load environment variables
load_dotenv()

# Initialize embedding model for vector storage
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

print("✅ Libraries imported successfully")

  from .autonotebook import tqdm as notebook_tqdm


✅ Libraries imported successfully


## 2. Configuration and Environment Setup

Create a `.env` file in your project directory with the following variables:
```
OPENAI_API_KEY=your_openai_api_key
AZURE_OPENAI_API_KEY=your_azure_openai_api_key
AZURE_OPENAI_ENDPOINT=your_azure_openai_api_endpoint
AZURE_OPENAI_API_VERSION=your_azure_openai_api_version
GOOGLE_API_KEY=your_google_api_key
```

In [3]:
# Configuration for different AI platforms
class RAGConfig:
    def __init__(self):
        # OpenAI Configuration
        self.openai_api_key = os.getenv("OPENAI_API_KEY")
        
        # Azure OpenAI Configuration
        self.azure_api_key = os.getenv("AZURE_OPENAI_API_KEY")
        self.azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
        self.azure_api_version = os.getenv("AZURE_OPENAI_API_VERSION", "2025-01-01-preview")
        
        # Google AI Configuration
        self.google_api_key = os.getenv("GOOGLE_API_KEY")
        
        # Ollama Configuration (assumes local installation)
        self.ollama_host = "http://localhost:11434"
        
        # Vector DB Configuration
        self.collection_name = "employee_handbook_documents"
        
        # Employee Handbook PDF path
        self.handbook_pdf_path = "employee_handbook (2).pdf"

config = RAGConfig()
print("✅ Configuration loaded")

✅ Configuration loaded


In [4]:
# Initialize clients
def initialize_clients():
    clients = {}
    
    # OpenAI Client
    if config.openai_api_key:
        openai.api_key = config.openai_api_key
        openai.api_type = "openai"  # Explicitly set the API type to OpenAI
        clients['openai'] = openai
        print("✅ OpenAI client initialized")
    
    # Azure OpenAI Client
    if config.azure_api_key and config.azure_endpoint:
        clients['azure'] = AzureOpenAI(
            api_key=config.azure_api_key,
            api_version=config.azure_api_version,
            azure_endpoint=config.azure_endpoint
        )
        print("✅ Azure OpenAI client initialized")
    
    # Google AI Client
    if config.google_api_key:
        genai.configure(api_key=config.google_api_key)
        clients['google'] = genai
        print("✅ Google AI client initialized")
    
    # Ollama Client (check if running)
    try:
        ollama_response = ollama.list()
        clients['ollama'] = ollama
        print("✅ Ollama client initialized")
    except Exception as e:
        print(f"⚠️ Ollama not available: {e}")
    
    return clients

clients = initialize_clients()

✅ OpenAI client initialized
✅ Azure OpenAI client initialized
✅ Google AI client initialized
✅ Ollama client initialized


## 3. Employee Handbook Processing and Vector Store Setup

In [5]:
def extract_text_from_pdf(pdf_path: str) -> str:
    """Extract text content from PDF file."""
    try:
        with open(pdf_path, 'rb') as file:
            pdf_reader = PyPDF2.PdfReader(file)
            text = ""
            
            for page_num in range(len(pdf_reader.pages)):
                page = pdf_reader.pages[page_num]
                text += page.extract_text() + "\n"
            
            return text
    except Exception as e:
        print(f"Error reading PDF: {e}")
        return ""

def create_handbook_sections(text: str) -> List[Dict]:
    """Split handbook text into logical sections."""
    # Split by common section markers or use paragraph breaks
    sections = []
    
    # Simple approach: split by double newlines and filter out short sections
    raw_sections = text.split('\n\n')
    
    section_id = 1
    for section in raw_sections:
        section = section.strip()
        
        # Skip very short sections (likely headers or artifacts)
        if len(section) < 100:
            continue
            
        # Try to identify section titles (usually shorter lines at the start)
        lines = section.split('\n')
        if lines:
            title = lines[0].strip()
            # If first line is very long, use a generic title
            if len(title) > 100:
                title = f"Employee Handbook Section {section_id}"
            
            sections.append({
                "id": f"handbook_section_{section_id}",
                "title": title,
                "content": section
            })
            section_id += 1
    
    return sections

# Load and process employee handbook
handbook_text = extract_text_from_pdf(config.handbook_pdf_path)

if handbook_text:
    handbook_sections = create_handbook_sections(handbook_text)
    print(f"✅ Extracted {len(handbook_sections)} sections from employee handbook")
    
    # Display first few section titles
    print("\nHandbook sections found:")
    for i, section in enumerate(handbook_sections[:5]):
        print(f"{i+1}. {section['title']}")
    if len(handbook_sections) > 5:
        print(f"... and {len(handbook_sections) - 5} more sections")
else:
    # Fallback: Create sample employee handbook data if PDF reading fails
    print("⚠️ Could not read PDF file. Using sample employee handbook data...")
    handbook_sections = [
        {
            "id": "handbook_section_1",
            "title": "Company Overview and Mission",
            "content": """Our company is committed to fostering an inclusive, innovative workplace where every employee can thrive. Founded in 2010, we have grown from a small startup to a leading technology company with over 500 employees worldwide. Our mission is to create cutting-edge solutions that improve people's lives while maintaining the highest standards of ethics and integrity. We value collaboration, innovation, continuous learning, and work-life balance."""
        },
        {
            "id": "handbook_section_2",
            "title": "Employment Policies and Procedures",
            "content": """All employees are expected to maintain professional conduct and adhere to company policies. Working hours are typically 9 AM to 5 PM, Monday through Friday, with flexibility for remote work arrangements. Employees are entitled to 20 days of paid vacation annually, plus standard holidays. Performance reviews are conducted annually with opportunities for career advancement and professional development."""
        },
        {
            "id": "handbook_section_3",
            "title": "Benefits and Compensation",
            "content": """We offer comprehensive benefits including health insurance, dental and vision coverage, retirement savings plan with company matching, life insurance, and professional development allowances. Employees receive competitive salaries reviewed annually, performance-based bonuses, stock options for eligible positions, and reimbursement for job-related training and certifications."""
        },
        {
            "id": "handbook_section_4",
            "title": "Code of Conduct and Ethics",
            "content": """All employees must maintain the highest standards of professional and ethical conduct. This includes respecting colleagues regardless of background, maintaining confidentiality of proprietary information, avoiding conflicts of interest, and reporting any violations of company policy. Discrimination, harassment, or retaliation of any kind will not be tolerated and will result in disciplinary action up to and including termination."""
        },
        {
            "id": "handbook_section_5",
            "title": "Remote Work and Technology Policies",
            "content": """Employees may work remotely up to 3 days per week with manager approval. Remote workers must maintain secure internet connections, use company-provided VPN for accessing internal systems, and participate in all required meetings via video conference. Company equipment must be used responsibly and returned upon termination. Personal use of company technology should be minimal and appropriate."""
        },
        {
            "id": "handbook_section_6",
            "title": "Leave Policies and Time Off",
            "content": """In addition to vacation time, employees are entitled to sick leave, personal days, bereavement leave, and parental leave. Sick leave accrues at 1 day per month worked. Parental leave includes 12 weeks paid leave for primary caregivers and 6 weeks for secondary caregivers. All leave requests must be submitted through the HR system with appropriate advance notice when possible."""
        }
    ]
    print(f"✅ Using {len(handbook_sections)} sample employee handbook sections")

# Store the handbook sections for later use
sample_documents = handbook_sections

✅ Extracted 1 sections from employee handbook

Handbook sections found:
1. This document contains information generated using a language model (Azure OpenAI). The


In [6]:
def chunk_text(text: str, max_tokens: int = 500) -> List[str]:
    """Split text into smaller chunks for better retrieval."""
    encoding = tiktoken.get_encoding("cl100k_base")
    tokens = encoding.encode(text)
    
    chunks = []
    for i in range(0, len(tokens), max_tokens):
        chunk_tokens = tokens[i:i + max_tokens]
        chunk_text = encoding.decode(chunk_tokens)
        chunks.append(chunk_text)
    
    return chunks

# Process handbook sections into chunks
processed_chunks = []
for doc in sample_documents:
    chunks = chunk_text(doc["content"])
    for i, chunk in enumerate(chunks):
        processed_chunks.append({
            "id": f"{doc['id']}_chunk_{i}",
            "doc_id": doc["id"],
            "title": doc["title"],
            "content": chunk,
            "metadata": {"source": doc["title"], "type": "employee_handbook"}
        })

print(f"✅ Processed {len(processed_chunks)} chunks from {len(sample_documents)} handbook sections")

✅ Processed 7 chunks from 1 handbook sections


In [7]:
# Initialize ChromaDB vector store
chroma_client = chromadb.Client()

# Create or get collection
try:
    collection = chroma_client.get_collection(name=config.collection_name)
    chroma_client.delete_collection(name=config.collection_name)
except:
    pass

collection = chroma_client.create_collection(
    name=config.collection_name,
    metadata={"description": "RAG document collection"}
)

print("✅ ChromaDB collection created")

✅ ChromaDB collection created


In [8]:
# Generate embeddings and add to collection
def add_documents_to_vectorstore(chunks: List[Dict]):
    """Add document chunks to the vector store."""
    documents = [chunk["content"] for chunk in chunks]
    metadatas = [{"doc_id": chunk["doc_id"], "title": chunk["title"]} for chunk in chunks]
    ids = [chunk["id"] for chunk in chunks]
    
    # Generate embeddings
    embeddings = embedding_model.encode(documents).tolist()
    
    # Add to collection
    collection.add(
        documents=documents,
        embeddings=embeddings,
        metadatas=metadatas,
        ids=ids
    )
    
    print(f"✅ Added {len(chunks)} documents to vector store")

add_documents_to_vectorstore(processed_chunks)

✅ Added 7 documents to vector store


## 4. RAG Utility Functions

In [9]:
def retrieve_relevant_docs(query: str, n_results: int = 3) -> List[Dict]:
    """Retrieve relevant documents for a given query."""
    query_embedding = embedding_model.encode([query]).tolist()
    
    results = collection.query(
        query_embeddings=query_embedding,
        n_results=n_results
    )
    
    relevant_docs = []
    for i in range(len(results['documents'][0])):
        relevant_docs.append({
            "content": results['documents'][0][i],
            "metadata": results['metadatas'][0][i],
            "distance": results['distances'][0][i]
        })
    
    return relevant_docs

def create_rag_prompt(query: str, context_docs: List[Dict]) -> str:
    """Create a prompt with context for RAG."""
    context = "\n\n".join([f"Source: {doc['metadata']['title']}\nContent: {doc['content']}" 
                          for doc in context_docs])
    
    prompt = f"""You are a helpful HR assistant that answers questions about company policies and procedures based on the employee handbook. Use the following context to answer the user's question. If the answer cannot be found in the context, say so clearly.

Context from Employee Handbook:
{context}

Question: {query}

Answer:"""
    
    return prompt

print("✅ RAG utility functions defined")

✅ RAG utility functions defined


## 5. OpenAI RAG Implementation

In [10]:
def openai_rag_query(query: str) -> Dict[str, Any]:
    """Perform RAG query using OpenAI."""
    # Retrieve relevant documents
    relevant_docs = retrieve_relevant_docs(query)
    
    # Create prompt with context
    prompt = create_rag_prompt(query, relevant_docs)
    
    # Generate response
    response = clients['openai'].chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful HR assistant that answers questions about company policies based on the employee handbook."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.1,
        max_tokens=500
    )
    
    return {
        "platform": "OpenAI",
        "model": "gpt-3.5-turbo",
        "answer": response.choices[0].message.content,
        "sources": [doc['metadata']['title'] for doc in relevant_docs],
        "retrieved_docs": relevant_docs
    }

# Test OpenAI RAG
test_query = "What are the company's mission?"
openai_result = openai_rag_query(test_query)
print("OpenAI RAG Result:")
print(f"Answer: {openai_result.get('answer')}")
print(f"Sources: {openai_result.get('sources')}")

OpenAI RAG Result:
Answer: The company's mission is to provide the highest quality aircraft components to customers while maintaining a commitment to safety and excellence. They strive to continually improve their products and services, aiming to remain a leader in the aerospace industry for years to come.
Sources: ['This document contains information generated using a language model (Azure OpenAI). The', 'This document contains information generated using a language model (Azure OpenAI). The', 'This document contains information generated using a language model (Azure OpenAI). The']


## 6. Azure OpenAI RAG Implementation

In [11]:
def azure_rag_query(query: str) -> Dict[str, Any]:
    client = AzureOpenAI(
        azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
        api_version=os.getenv("AZURE_OPENAI_API_VERSION")
    )
    # Retrieve relevant documents
    relevant_docs = retrieve_relevant_docs(query)
            
    # Create prompt with context
    prompt = create_rag_prompt(query, relevant_docs)

    response = client.chat.completions.create(
                model="gpt-4o",  # Use your Azure deployment name
                messages=[
                    {"role": "system", "content": "You are a helpful HR assistant that answers questions about company policies based on the employee handbook."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.1,
                max_tokens=500
            )
    return {
            "platform": "Azure OpenAI",
            "model": "gpt-4o",
            "answer": response.choices[0].message.content,
            "sources": [doc['metadata']['title'] for doc in relevant_docs],
            "retrieved_docs": relevant_docs
        }

query = "What are the performance reviews metrics?"
AzureOpenAIResult = azure_rag_query(query)
print("AzureOpenAI RAG Result:")
print(f"Answer: {AzureOpenAIResult.get('answer')}")
print(f"Sources: {AzureOpenAIResult.get('sources')}")

AzureOpenAI RAG Result:
Answer: The context provided does not specify the exact metrics used for performance reviews at Contoso Electronics. It mentions that performance reviews include a discussion of the employee's performance over the past year, feedback on areas for improvement, and goals and objectives for the upcoming year. Employees also receive a written summary that includes a rating of their performance, feedback, and goals. However, the specific metrics or criteria used to evaluate performance are not detailed in the employee handbook.
Sources: ['This document contains information generated using a language model (Azure OpenAI). The', 'This document contains information generated using a language model (Azure OpenAI). The', 'This document contains information generated using a language model (Azure OpenAI). The']


## 7. Google AI Studio (Gemini) RAG Implementation

In [12]:
def google_rag_query(query: str) -> Dict[str, Any]:
    """Perform RAG query using Google AI Studio."""
    from google import genai
    
    # Initialize client
    client = genai.Client(api_key=config.google_api_key)
    
    # Retrieve relevant documents
    relevant_docs = retrieve_relevant_docs(query)
    
    # Create prompt with context
    prompt = create_rag_prompt(query, relevant_docs)
    
    # Generate response
    response = client.models.generate_content(
        model="gemini-2.5-flash", 
        contents=prompt
    )
    
    return {
        "platform": "Google AI Studio",
        "model": "gemini-2.5-flash",
        "answer": response.text,
        "sources": [doc['metadata']['title'] for doc in relevant_docs],
        "retrieved_docs": relevant_docs
    }

# Test Google AI RAG
test_query = "What are the whistle blower details i need to know?"
google_result = google_rag_query(test_query)
print("Google AI RAG Result:")
print(f"Answer: {google_result.get('answer')}")
print(f"Sources: {google_result.get('sources')}")

Google AI RAG Result:
Answer: The Whistleblower Policy at Contoso Electronics is established to encourage employees to report any unethical or illegal activities they may witness.

Here are the key details you need to know:

*   **Applicability:** This policy applies to all Contoso Electronics employees, contractors, and other third parties.
*   **Definition of a Whistleblower:** An individual who reports activities that are illegal, unethical, or otherwise not in accordance with company policy.
*   **Reporting Procedures:** If you witness any activity you believe to be illegal, unethical, or not in accordance with company policy, you should report it immediately by:
    1.  Contacting the Human Resources Department.
    2.  Emailing the Compliance Officer at compliance@contoso.com.
    3.  Calling the Compliance Hotline at 1-800-555-1212.
*   **Information to Provide When Reporting:** Please include as much detail as possible, such as:
    1.  The time and date of the incident.
    2.

## 8. Ollama RAG Implementation

In [13]:
def ollama_rag_query(query: str, model_name: str = "gemma3n:latest") -> Dict[str, Any]:
    """Perform RAG query using Ollama."""
    # Retrieve relevant documents
    relevant_docs = retrieve_relevant_docs(query)
    
    # Create prompt with context
    prompt = create_rag_prompt(query, relevant_docs)
    
    # Generate response using chat API format
    response = ollama.chat(
        model=model_name,
        messages=[
            {"role": "user", "content": prompt}
        ],
        options={
            "temperature": 0.1
        }
    )
    
    return {
        "platform": "Ollama",
        "model": model_name,
        "answer": response['message']['content'],
        "sources": [doc['metadata']['title'] for doc in relevant_docs],
        "retrieved_docs": relevant_docs
    }

# Test Ollama RAG
test_query = "What is the company's privacy note?"
print(f"Testing Ollama RAG with model: gemma3n:latest")
ollama_result = ollama_rag_query(test_query)
print("Ollama RAG Result:")
print(f"Answer: {ollama_result.get('answer')}")
print(f"Sources: {ollama_result.get('sources')}")

Testing Ollama RAG with model: gemma3n:latest
Ollama RAG Result:
Answer: The company's privacy note outlines your rights regarding your personal information, how to access or correct it, and how the company handles changes to its privacy policies. 

Here's a summary:

*   **Right to Access and Correct:** You have the right to access, review, and request a copy of your personal information. You can also request that inaccurate information be corrected.
*   **Contact Information:** To access or make changes to your information, contact the Privacy Officer at privacy@contoso.com.
*   **Policy Changes:** The company may update its privacy policy and will notify you of changes by posting a revised policy on its website.
*   **Questions or Concerns:** If you have questions or concerns about the privacy policies or practices, contact the Privacy Officer at privacy@contoso.com.




Sources: ['This document contains information generated using a language model (Azure OpenAI). The', 'This docume

## 9. Troubleshooting Guide


In [14]:
def compare_rag_platforms(query: str) -> Dict[str, Any]:
    """Compare RAG implementation across all available platforms."""
    print(f"🔍 Query: {query}\n")
    print("⚠️ Before running this comparison, make sure you've fixed all model/deployment issues mentioned above")
    
    results = {}
    
    # Test each platform
    platforms = [
        ("OpenAI", openai_rag_query),
        ("Azure", lambda q: azure_rag_query(q)),
        ("Google", lambda q: google_rag_query(q)),
        ("Ollama", lambda q: ollama_rag_query(q, model_name="gemma3n:latest"))
    ]
    
    for platform_name, rag_function in platforms:
        print(f"Testing {platform_name}...")
        try:
            result = rag_function(query)
            results[platform_name] = result
            
            if "error" in result:
                print(f"❌ {platform_name}: {result['error']}")
            else:
                print(f"✅ {platform_name}: Response generated")
                print(f"   Sources: {', '.join(result.get('sources', []))}")
                print(f"   Answer length: {len(result.get('answer', ''))} characters")
        except Exception as e:
            print(f"❌ {platform_name}: Error - {str(e)}")
            print("Check the platform-specific instructions above to resolve the issue")
            results[platform_name] = {"error": str(e)}
        print()
    
    return results

# Run comparison test
test_query = "What is the company's roles ava"
comparison_results = compare_rag_platforms(test_query)

🔍 Query: What is the company's roles ava

⚠️ Before running this comparison, make sure you've fixed all model/deployment issues mentioned above
Testing OpenAI...
✅ OpenAI: Response generated
   Sources: This document contains information generated using a language model (Azure OpenAI). The, This document contains information generated using a language model (Azure OpenAI). The, This document contains information generated using a language model (Azure OpenAI). The
   Answer length: 1193 characters

Testing Azure...
✅ Azure: Response generated
   Sources: This document contains information generated using a language model (Azure OpenAI). The, This document contains information generated using a language model (Azure OpenAI). The, This document contains information generated using a language model (Azure OpenAI). The
   Answer length: 1190 characters

Testing Google...
✅ Google: Response generated
   Sources: This document contains information generated using a language model (Azure Open

In [15]:
# Display detailed comparison results
print("DETAILED COMPARISON RESULTS")
print("=" * 60)

for platform, result in comparison_results.items():
    if "error" not in result:
        print(f"\n{platform} Response:")
        print("-" * 40)
        print(result.get('answer', 'No answer generated'))
        print(f"\nSources used: {', '.join(result.get('sources', []))}")
    else:
        print(f"\n{platform}: {result['error']}")

print("\n" + "="*60)

DETAILED COMPARISON RESULTS

OpenAI Response:
----------------------------------------
Based on the provided context from the Employee Handbook, here is a list of the company roles available at Contoso Electronics:

1. Chief Executive Officer
2. Chief Operating Officer
3. Chief Financial Officer
4. Chief Technology Officer
5. Vice President of Sales
6. Vice President of Marketing
7. Vice President of Operations
8. Vice President of Human Resources
9. Vice President of Research and Development
10. Vice President of Product Management
11. Director of Sales
12. Director of Marketing
13. Director of Operations
14. Director of Human Resources
15. Director of Research and Development
16. Director of Product Management
17. Senior Manager of Sales
18. Senior Manager of Marketing
19. Senior Manager of Operations
20. Senior Manager of Human Resources
21. Senior Manager of Research and Development
22. Senior Manager of Product Management
23. Manager of Sales
24. Manager of Marketing
25. Manager o

## 10. Advanced RAG with Document Re-ranking

In [16]:
def advanced_rag_with_reranking(query: str, platform: str = "openai") -> Dict[str, Any]:
    """Implement RAG with document re-ranking for better results."""
    
    # Get more documents initially
    relevant_docs = retrieve_relevant_docs(query, n_results=6)
    
    # Simple re-ranking based on query similarity
    query_words = set(query.lower().split())
    
    for doc in relevant_docs:
        doc_words = set(doc['content'].lower().split())
        word_overlap = len(query_words.intersection(doc_words))
        doc['relevance_score'] = word_overlap / len(query_words) if query_words else 0
    
    # Sort by relevance and take top 3
    relevant_docs.sort(key=lambda x: x['relevance_score'], reverse=True)
    top_docs = relevant_docs[:3]
    
    # Create enhanced prompt
    context = "\n\n".join([
        f"Source: {doc['metadata']['title']} (Relevance: {doc['relevance_score']:.2f})\nContent: {doc['content']}" 
        for doc in top_docs
    ])
    
    prompt = f"""You are an expert AI assistant. Use the following ranked context to provide a comprehensive answer. Pay special attention to the most relevant sources.

Ranked Context:
{context}

Question: {query}

Provide a detailed answer with specific references to the sources:"""
    
    # Use OpenAI for this demo if available
    if 'openai' in clients:
        try:
            response = clients['openai'].chat.completions.create(
                model="gpt-4o",
                messages=[
                    {"role": "system", "content": "You are an expert HR assistant that provides detailed answers about company policies with source references from the employee handbook."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.1,
                max_tokens=700
            )
            
            return {
                "platform": f"{platform.upper()} (Advanced RAG)",
                "answer": response.choices[0].message.content,
                "sources": [doc['metadata']['title'] for doc in top_docs],
                "relevance_scores": [doc['relevance_score'] for doc in top_docs],
                "method": "Re-ranked retrieval"
            }
        except Exception as e:
            return {"error": f"Advanced RAG failed: {str(e)}"}
    
    return {"error": "Platform not available for advanced RAG"}

# Test advanced RAG
advanced_query = "How does the company handle parental leave and what benefits are available to new parents?"
advanced_result = advanced_rag_with_reranking(advanced_query)

print("Advanced RAG Result:")
print("=" * 50)
if "error" not in advanced_result:
    print(f"Platform: {advanced_result['platform']}")
    print(f"Method: {advanced_result['method']}")
    print(f"\nAnswer:\n{advanced_result['answer']}")
    print(f"\nSources with relevance scores:")
    for source, score in zip(advanced_result['sources'], advanced_result['relevance_scores']):
        print(f"  - {source}: {score:.2f}")
else:
    print(f"Error: {advanced_result['error']}")

Advanced RAG Result:
Platform: OPENAI (Advanced RAG)
Method: Re-ranked retrieval

Answer:
Unfortunately, the provided context does not include specific information about parental leave policies or benefits available to new parents at Contoso Electronics. To obtain accurate and detailed information regarding parental leave, I recommend consulting the company's employee handbook or contacting the Human Resources Department directly. They should be able to provide comprehensive details about the parental leave policy, including eligibility, duration, and any associated benefits. If you have access to the employee handbook, it would be beneficial to review the sections related to leave policies for more precise information.

Sources with relevance scores:
  - This document contains information generated using a language model (Azure OpenAI). The: 0.40
  - This document contains information generated using a language model (Azure OpenAI). The: 0.33
  - This document contains information gen

## 11. Performance Benchmarking

In [17]:
import time
from typing import Tuple

def measure_rag_performance(query: str, platform_function, platform_name: str) -> Tuple[Dict, float]:
    """Measure response time and token usage for RAG queries."""
    start_time = time.time()
    
    result = platform_function(query)
    
    end_time = time.time()
    response_time = end_time - start_time
    
    # Estimate token usage (simplified)
    if "error" not in result:
        input_tokens = len(query.split()) * 1.3  # Rough estimate
        output_tokens = len(result.get('answer', '').split()) * 1.3
        total_tokens = input_tokens + output_tokens
        
        result['performance'] = {
            'response_time': response_time,
            'estimated_input_tokens': int(input_tokens),
            'estimated_output_tokens': int(output_tokens),
            'estimated_total_tokens': int(total_tokens)
        }
    
    return result, response_time

def performance_benchmark():
    """Run performance benchmark across platforms."""
    test_query = "What are the company's policies on professional development and training?"
    
    print("Performance Benchmark Results")
    print("=" * 50)
    
    platforms_to_test = []
    if 'openai' in clients:
        platforms_to_test.append(("OpenAI", openai_rag_query))
    if 'azure' in clients:
        platforms_to_test.append(("Azure", azure_rag_query))
    if 'google' in clients:
        platforms_to_test.append(("Google", google_rag_query))
    if 'ollama' in clients:
        platforms_to_test.append(("Ollama", ollama_rag_query))
    
    benchmark_results = []
    
    for platform_name, platform_function in platforms_to_test:
        print(f"\nTesting {platform_name}...")
        
        result, response_time = measure_rag_performance(test_query, platform_function, platform_name)
        
        if "error" not in result:
            perf = result['performance']
            print(f"  Response time: {response_time:.2f}s")
            print(f"  Estimated tokens: {perf['estimated_total_tokens']}")
            print(f"  Answer quality: {len(result['answer'])} chars")
            
            benchmark_results.append({
                'platform': platform_name,
                'response_time': response_time,
                'tokens': perf['estimated_total_tokens'],
                'answer_length': len(result['answer']),
                'sources_used': len(result.get('sources', []))
            })
        else:
            print(f"  Error: {result['error']}")
    
    # Summary
    if benchmark_results:
        print(f"\n{'Platform':<12} {'Time (s)':<10} {'Tokens':<8} {'Length':<8} {'Sources':<8}")
        print("-" * 50)
        for result in benchmark_results:
            print(f"{result['platform']:<12} {result['response_time']:<10.2f} {result['tokens']:<8} {result['answer_length']:<8} {result['sources_used']:<8}")

# Run benchmark
performance_benchmark()

Performance Benchmark Results

Testing OpenAI...
  Response time: 2.19s
  Estimated tokens: 198
  Answer quality: 1016 chars

Testing Azure...
  Response time: 1.76s
  Estimated tokens: 76
  Answer quality: 378 chars

Testing Google...
  Response time: 2.97s
  Estimated tokens: 57
  Answer quality: 250 chars

Testing Ollama...
  Response time: 14.93s
  Estimated tokens: 119
  Answer quality: 617 chars

Platform     Time (s)   Tokens   Length   Sources 
--------------------------------------------------
OpenAI       2.19       198      1016     3       
Azure        1.76       76       378      3       
Google       2.97       57       250      3       
Ollama       14.93      119      617      3       


## 12. Best Practices and Platform Comparison

### Platform Comparison Summary:

**OpenAI GPT Models:**
- ✅ High-quality responses
- ✅ Good API documentation
- ✅ Reliable performance
- ❌ Requires API key and costs money
- ❌ Data sent to external service

**Azure OpenAI:**
- ✅ Enterprise-grade security
- ✅ Integration with Azure ecosystem
- ✅ Same model quality as OpenAI
- ❌ More complex setup
- ❌ Requires Azure subscription

**Google AI Studio (Gemini):**
- ✅ Competitive performance
- ✅ Good integration with Google services
- ✅ Multimodal capabilities
- ❌ Newer platform, less mature
- ❌ Limited model options

**Ollama (Local):**
- ✅ Complete privacy and control
- ✅ No ongoing API costs
- ✅ Works offline
- ❌ Requires powerful hardware
- ❌ Model quality may be lower
- ❌ Slower response times

### RAG Implementation Best Practices:

1. **Document Chunking:** Keep chunks between 200-500 tokens
2. **Embedding Models:** Use domain-specific models when available
3. **Vector Store:** Choose based on scale (ChromaDB for small, Pinecone/Weaviate for production)
4. **Retrieval:** Experiment with different similarity metrics
5. **Prompt Engineering:** Include clear instructions about using context
6. **Evaluation:** Implement metrics for answer quality and relevance

In [18]:
# Summary and next steps
print("🎉 RAG Implementation Complete!")
print("\nNext Steps:")
print("1. Configure your API keys in a .env file")
print("2. Install Ollama locally if you want to test local models")
print("3. Experiment with different embedding models")
print("4. Try your own documents and queries")
print("5. Implement evaluation metrics for your use case")

print(f"\nAvailable platforms in this session: {list(clients.keys())}")

# Display sample queries for testing
sample_queries = [
    "What are the company's vacation and time off policies?",
    "How does the company handle performance reviews?",
    "What is the policy on remote work and flexible hours?",
    "Explain the company's benefits package and compensation structure"
]

print("\nSample queries to try:")
for i, query in enumerate(sample_queries, 1):
    print(f"{i}. {query}")
    
print("\nUse the compare_rag_platforms() function to test any of these queries across all available platforms!")

🎉 RAG Implementation Complete!

Next Steps:
1. Configure your API keys in a .env file
2. Install Ollama locally if you want to test local models
3. Experiment with different embedding models
4. Try your own documents and queries
5. Implement evaluation metrics for your use case

Available platforms in this session: ['openai', 'azure', 'google', 'ollama']

Sample queries to try:
1. What are the company's vacation and time off policies?
2. How does the company handle performance reviews?
3. What is the policy on remote work and flexible hours?
4. Explain the company's benefits package and compensation structure

Use the compare_rag_platforms() function to test any of these queries across all available platforms!
