# Part C: Mini-RAG for Knowledge Base Answering

## Objective
- Build embedding-based retrieval system
- Answer queries using KB articles
- Provide confidence scores
- Test with 2 specific queries

## 1. Setup and Imports

In [1]:
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer
import faiss
from groq import Groq
import os
from dotenv import load_dotenv
import json
from typing import List, Dict, Tuple

load_dotenv()
client = Groq(api_key=os.getenv('GROQ_API_KEY'))

print("Libraries loaded successfully!")

Libraries loaded successfully!


## 2. Create Knowledge Base Articles

Since KB articles aren't provided, we'll create sample articles based on common Hiver topics

In [2]:
# Sample Knowledge Base Articles
kb_articles = [
    {
        "id": 1,
        "title": "How to Configure Automations in Hiver",
        "content": """Automations in Hiver help streamline your email workflows. To configure automations:
        1. Go to Settings > Automations
        2. Click 'Create New Rule'
        3. Set your trigger conditions (e.g., subject contains, from specific sender)
        4. Define actions (auto-assign, add tags, set status)
        5. Test the automation with sample emails
        6. Save and activate
        
        Common automation uses: Auto-assigning emails based on keywords, auto-tagging by department,
        setting SLA timers, sending auto-responses."""
    },
    {
        "id": 2,
        "title": "CSAT Survey Setup and Troubleshooting",
        "content": """CSAT (Customer Satisfaction) surveys in Hiver measure customer happiness. 
        
        Setup:
        1. Navigate to Analytics > CSAT Settings
        2. Enable CSAT surveys
        3. Customize survey questions and timing
        4. Set which email closures trigger surveys
        
        Common issues why CSAT doesn't appear:
        - CSAT is not enabled in settings
        - Email status is not 'Closed'
        - Customer email is in blocklist
        - Survey delay timer not elapsed
        - Shared mailbox permissions issue
        
        To fix: Check Analytics > CSAT Settings and verify all configurations."""
    },
    {
        "id": 3,
        "title": "Email Tagging Best Practices",
        "content": """Tags help organize and categorize emails in Hiver.
        
        Creating tags:
        - Keep tags simple and clear
        - Use consistent naming conventions
        - Limit to 10-15 main tags per team
        - Use colors for visual distinction
        
        Auto-tagging:
        - Set up automation rules for common patterns
        - Use LLM-based tag suggestions
        - Review and refine tag suggestions regularly
        
        Common tagging issues:
        - Tags not saving: Check permissions
        - Tags disappearing: Verify automation rules aren't removing them
        - Wrong auto-tags: Review and adjust automation conditions"""
    },
    {
        "id": 4,
        "title": "Setting Up SLA (Service Level Agreements)",
        "content": """SLAs in Hiver help you meet response time commitments.
        
        Configuration:
        1. Go to Settings > SLA
        2. Define SLA tiers (e.g., VIP, Standard, Low Priority)
        3. Set response time targets
        4. Configure business hours
        5. Set escalation rules
        
        SLA Types:
        - First Response Time: Time to first reply
        - Resolution Time: Time to close ticket
        
        Assigning SLAs:
        - Manual assignment per email
        - Automation based on sender/subject/tags
        - Customer tier-based rules
        
        Troubleshooting:
        - SLA not applying: Check automation rules and conditions
        - Wrong SLA assigned: Review customer tier settings
        - SLA timer incorrect: Verify business hours configuration"""
    },
    {
        "id": 5,
        "title": "Shared Mailbox Access and Permissions",
        "content": """Shared mailboxes allow team collaboration on emails.
        
        Granting Access:
        1. Admin goes to Settings > Users
        2. Select user to grant access
        3. Choose mailboxes to share
        4. Set permission level (Admin, Member, Viewer)
        
        Permission Levels:
        - Admin: Full control including settings
        - Member: Send, reply, manage emails
        - Viewer: Read-only access
        
        Common Access Issues:
        - 'Permission denied' error: User doesn't have required access level
        - Can't see mailbox: Not added to shared mailbox list
        - Can't send from mailbox: Need Member or Admin role
        
        Solutions:
        - Contact admin to adjust permissions
        - Verify email address has been added
        - Check if mailbox was recently archived"""
    },
    {
        "id": 6,
        "title": "Analytics and Reporting in Hiver",
        "content": """Hiver Analytics provide insights into team performance.
        
        Available Metrics:
        - Response times (first response, average)
        - Resolution times
        - Email volume trends
        - CSAT scores
        - Agent performance
        - Tag distribution
        
        Accessing Reports:
        1. Navigate to Analytics tab
        2. Select date range
        3. Filter by team, agent, or mailbox
        4. Export as CSV/PDF
        
        Common Issues:
        - Data not showing: Check date range and filters
        - Incorrect metrics: Verify timezone settings
        - Missing CSAT data: Ensure CSAT is enabled and surveys sent
        - Export failing: Try smaller date range
        
        Best Practices:
        - Review weekly performance trends
        - Set team KPI targets
        - Use data for coaching opportunities"""
    },
    {
        "id": 7,
        "title": "Email Assignment and Workflow Management",
        "content": """Proper email assignment ensures accountability and efficiency.
        
        Assignment Methods:
        1. Manual: Click assign button, select team member
        2. Round-robin: Automatic equal distribution
        3. Load-based: Assign to least busy agent
        4. Rule-based: Automation by keywords/sender
        
        Workflow Statuses:
        - New: Unassigned, needs attention
        - Pending: Assigned, in progress
        - Closed: Resolved
        
        Common Issues:
        - Auto-assignment not working: Check automation rules
        - Emails going to wrong person: Review assignment criteria
        - Can't reassign: Verify permissions
        
        Tips:
        - Set up out-of-office reassignment
        - Use tags with assignment rules
        - Regular balance checks on team workload"""
    },
    {
        "id": 8,
        "title": "Troubleshooting Common Hiver Issues",
        "content": """Quick solutions to frequent Hiver problems.
        
        Email Sync Issues:
        - Problem: Emails not appearing
        - Solution: Check IMAP settings, re-authenticate Gmail, verify filters
        
        Performance Issues:
        - Problem: Slow loading times
        - Solution: Clear cache, check internet connection, reduce concurrent sessions
        
        Notification Problems:
        - Problem: Not receiving notifications
        - Solution: Check notification settings, browser permissions, email filters
        
        Login Issues:
        - Problem: Can't log in
        - Solution: Reset password, clear cookies, try incognito mode, verify account active
        
        Integration Issues:
        - Problem: Third-party apps not syncing
        - Solution: Re-authorize app, check API limits, verify permissions
        
        When to Contact Support:
        - Data loss or corruption
        - Billing discrepancies
        - System-wide outages
        - Security concerns"""
    }
]

print(f"Created {len(kb_articles)} knowledge base articles")
for article in kb_articles:
    print(f"  - {article['title']}")

Created 8 knowledge base articles
  - How to Configure Automations in Hiver
  - CSAT Survey Setup and Troubleshooting
  - Email Tagging Best Practices
  - Setting Up SLA (Service Level Agreements)
  - Shared Mailbox Access and Permissions
  - Analytics and Reporting in Hiver
  - Email Assignment and Workflow Management
  - Troubleshooting Common Hiver Issues


## 3. Initialize Embedding Model

In [3]:
# Load sentence transformer model
# Using all-MiniLM-L6-v2: Fast, lightweight, good quality
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

print(f"Embedding model loaded: all-MiniLM-L6-v2")
print(f"Embedding dimension: {embedding_model.get_sentence_embedding_dimension()}")

Embedding model loaded: all-MiniLM-L6-v2
Embedding dimension: 384


## 4. Create Embeddings for KB Articles

In [4]:
def create_article_embeddings(articles, model):
    """
    Create embeddings for KB articles.
    Combine title + content for better context.
    """
    texts = []
    for article in articles:
        # Combine title and content for embedding
        combined_text = f"{article['title']}\n\n{article['content']}"
        texts.append(combined_text)
    
    # Generate embeddings
    embeddings = model.encode(texts, show_progress_bar=True)
    
    return embeddings

# Create embeddings
print("Creating embeddings for KB articles...")
article_embeddings = create_article_embeddings(kb_articles, embedding_model)

print(f"\nCreated {len(article_embeddings)} embeddings")
print(f"Embedding shape: {article_embeddings.shape}")

Creating embeddings for KB articles...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


Created 8 embeddings
Embedding shape: (8, 384)


## 5. Build FAISS Index for Fast Retrieval

In [5]:
# Create FAISS index
dimension = article_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)  # L2 distance (Euclidean)

# Add embeddings to index
index.add(article_embeddings.astype('float32'))

print(f"FAISS index created with {index.ntotal} vectors")

FAISS index created with 8 vectors


## 6. Retrieval Function

In [6]:
def retrieve_articles(query: str, model, index, articles, top_k=3):
    """
    Retrieve most relevant KB articles for a query.
    
    Returns:
        List of (article, similarity_score) tuples
    """
    # Create query embedding
    query_embedding = model.encode([query])
    
    # Search in FAISS index
    distances, indices = index.search(query_embedding.astype('float32'), top_k)
    
    # Convert L2 distance to similarity score (inverse)
    # Lower distance = higher similarity
    similarities = 1 / (1 + distances[0])
    
    # Retrieve articles
    results = []
    for idx, similarity in zip(indices[0], similarities):
        results.append({
            'article': articles[idx],
            'similarity': float(similarity),
            'distance': float(distances[0][list(indices[0]).index(idx)])
        })
    
    return results

# Test retrieval
test_query = "How do I set up automations?"
test_results = retrieve_articles(test_query, embedding_model, index, kb_articles, top_k=3)

print(f"Query: {test_query}\n")
for i, result in enumerate(test_results, 1):
    print(f"{i}. {result['article']['title']}")
    print(f"   Similarity: {result['similarity']:.4f}")
    print()

Query: How do I set up automations?

1. How to Configure Automations in Hiver
   Similarity: 0.4738

2. Email Assignment and Workflow Management
   Similarity: 0.4357

3. Email Tagging Best Practices
   Similarity: 0.3966



## 7. Answer Generation with RAG

In [7]:
def generate_answer(query: str, retrieved_articles: List[Dict], client) -> Dict:
    """
    Generate answer using retrieved articles and LLM.
    """
    # Build context from retrieved articles
    context = ""
    for i, result in enumerate(retrieved_articles, 1):
        article = result['article']
        context += f"\n\n--- Article {i}: {article['title']} ---\n"
        context += article['content']
    
    # Create RAG prompt
    prompt = f"""You are a helpful Hiver customer support assistant.

Use the following knowledge base articles to answer the user's question.

Knowledge Base Articles:
{context}

User Question: {query}

Instructions:
1. Answer based ONLY on the provided articles
2. If the answer isn't in the articles, say "I don't have enough information"
3. Be concise but complete
4. Reference which article(s) you used
5. Provide step-by-step instructions if applicable

Respond in JSON format:
{{
    "answer": "your detailed answer here",
    "confidence": 0.0-1.0,
    "sources": ["article titles used"],
    "reasoning": "why this confidence score"
}}
"""
    
    try:
        response = client.chat.completions.create(
            model="llama-3.3-70b-versatile",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2,
            max_tokens=500
        )
        
        result = json.loads(response.choices[0].message.content)
        
        # Add retrieval scores to result
        result['retrieved_articles'] = [
            {
                'title': r['article']['title'],
                'similarity': r['similarity']
            }
            for r in retrieved_articles
        ]
        
        return result
    
    except Exception as e:
        return {
            "answer": f"Error generating answer: {str(e)}",
            "confidence": 0.0,
            "sources": [],
            "reasoning": "Error occurred"
        }

# Test answer generation
test_answer = generate_answer(test_query, test_results, client)
print("Generated Answer:")
print(json.dumps(test_answer, indent=2))

Generated Answer:
{
  "answer": "To set up automations in Hiver, follow these steps: Go to Settings > Automations, click 'Create New Rule', set your trigger conditions, define actions, test the automation with sample emails, and save and activate. Common automation uses include auto-assigning emails based on keywords, auto-tagging by department, setting SLA timers, and sending auto-responses.",
  "confidence": 1.0,
  "sources": [
    "Article 1: How to Configure Automations in Hiver"
  ],
  "reasoning": "The answer is directly based on the step-by-step instructions provided in Article 1, which explicitly explains how to configure automations in Hiver. The confidence score is 1.0 because the information is directly available in the provided knowledge base article.",
  "retrieved_articles": [
    {
      "title": "How to Configure Automations in Hiver",
      "similarity": 0.4737522006034851
    },
    {
      "title": "Email Assignment and Workflow Management",
      "similarity": 0.435

## 8. Test Query 1: "How do I configure automations in Hiver?"

In [8]:
query1 = "How do I configure automations in Hiver?"

print(f"Query 1: {query1}")
print("="*80)

# Retrieve relevant articles
retrieved1 = retrieve_articles(query1, embedding_model, index, kb_articles, top_k=3)

print("\nRetrieved Articles:")
for i, result in enumerate(retrieved1, 1):
    print(f"{i}. {result['article']['title']}")
    print(f"   Similarity: {result['similarity']:.4f}")

# Generate answer
answer1 = generate_answer(query1, retrieved1, client)

print("\n" + "="*80)
print("Generated Answer:")
print(answer1['answer'])
print(f"\nConfidence: {answer1['confidence']}")
print(f"Sources: {', '.join(answer1['sources'])}")
print(f"Reasoning: {answer1['reasoning']}")

Query 1: How do I configure automations in Hiver?

Retrieved Articles:
1. How to Configure Automations in Hiver
   Similarity: 0.7186
2. Setting Up SLA (Service Level Agreements)
   Similarity: 0.5066
3. Email Tagging Best Practices
   Similarity: 0.4825

Generated Answer:
To configure automations in Hiver, follow these steps: Go to Settings > Automations, click 'Create New Rule', set your trigger conditions, define actions, test the automation with sample emails, and save and activate. Common automation uses include auto-assigning emails based on keywords, auto-tagging by department, setting SLA timers, and sending auto-responses.

Confidence: 1.0
Sources: Article 1: How to Configure Automations in Hiver
Reasoning: The answer is directly based on the step-by-step instructions provided in Article 1, which explicitly explains how to configure automations in Hiver.


## 9. Test Query 2: "Why is CSAT not appearing?"

In [9]:
query2 = "Why is CSAT not appearing?"

print(f"Query 2: {query2}")
print("="*80)

# Retrieve relevant articles
retrieved2 = retrieve_articles(query2, embedding_model, index, kb_articles, top_k=3)

print("\nRetrieved Articles:")
for i, result in enumerate(retrieved2, 1):
    print(f"{i}. {result['article']['title']}")
    print(f"   Similarity: {result['similarity']:.4f}")

# Generate answer
answer2 = generate_answer(query2, retrieved2, client)

print("\n" + "="*80)
print("Generated Answer:")
print(answer2['answer'])
print(f"\nConfidence: {answer2['confidence']}")
print(f"Sources: {', '.join(answer2['sources'])}")
print(f"Reasoning: {answer2['reasoning']}")

Query 2: Why is CSAT not appearing?

Retrieved Articles:
1. CSAT Survey Setup and Troubleshooting
   Similarity: 0.5255
2. Analytics and Reporting in Hiver
   Similarity: 0.3869
3. Setting Up SLA (Service Level Agreements)
   Similarity: 0.3804

Generated Answer:
To troubleshoot why CSAT is not appearing, check if CSAT is enabled in settings (Analytics > CSAT Settings), verify that the email status is 'Closed', ensure the customer's email is not in the blocklist, check if the survey delay timer has elapsed, and confirm there are no shared mailbox permissions issues. Refer to the CSAT settings to verify all configurations.

Confidence: 0.9
Sources: CSAT Survey Setup and Troubleshooting
Reasoning: The confidence score is high because the provided article 'CSAT Survey Setup and Troubleshooting' directly addresses common issues with CSAT not appearing and provides a clear step-by-step guide to troubleshoot the problem.


## 10. Failure Case Analysis

In [10]:
# TODO: Test edge cases and find failures
# Examples:
# - Ambiguous queries
# - Out-of-domain questions
# - Multi-intent queries
# - Queries requiring multiple articles

failure_queries = [
    "How do I delete my account?",  # Not in KB
    "Integration with Slack",  # Minimal info in KB
    "What is Hiver?",  # Too broad
]

print("Testing potential failure cases...\n")
for query in failure_queries:
    print(f"Query: {query}")
    retrieved = retrieve_articles(query, embedding_model, index, kb_articles, top_k=3)
    answer = generate_answer(query, retrieved, client)
    
    print(f"Top match: {retrieved[0]['article']['title']} (similarity: {retrieved[0]['similarity']:.4f})")
    print(f"Confidence: {answer['confidence']}")
    print(f"Answer: {answer['answer'][:100]}...")
    print("-" * 80)
    print()

Testing potential failure cases...

Query: How do I delete my account?
Top match: Troubleshooting Common Hiver Issues (similarity: 0.3761)
Confidence: 0.0
Answer: I don't have enough information...
--------------------------------------------------------------------------------

Query: Integration with Slack
Top match: Setting Up SLA (Service Level Agreements) (similarity: 0.3927)
Confidence: 0.0
Answer: I don't have enough information...
--------------------------------------------------------------------------------

Query: What is Hiver?
Top match: How to Configure Automations in Hiver (similarity: 0.4822)
Confidence: 0.0
Answer: I don't have enough information...
--------------------------------------------------------------------------------



## 11. Save Results

In [11]:
# Save results for documentation
results = {
    "query1": {
        "query": query1,
        "retrieved_articles": answer1['retrieved_articles'],
        "answer": answer1['answer'],
        "confidence": answer1['confidence'],
        "sources": answer1['sources']
    },
    "query2": {
        "query": query2,
        "retrieved_articles": answer2['retrieved_articles'],
        "answer": answer2['answer'],
        "confidence": answer2['confidence'],
        "sources": answer2['sources']
    }
}

with open('rag_results.json', 'w') as f:
    json.dump(results, f, indent=2)

print("Results saved to rag_results.json")
print("\nNext steps:")
print("1. Update README.md with results")
print("2. Document 5 retrieval improvements")
print("3. Analyze and document failure case")
print("4. Add debugging steps to README")

Results saved to rag_results.json

Next steps:
1. Update README.md with results
2. Document 5 retrieval improvements
3. Analyze and document failure case
4. Add debugging steps to README
