# Enterprise-RAG: Production-Grade Hybrid RAG System

## Overview
This notebook demonstrates the Enterprise-RAG system with:
- Hybrid retrieval (dense vector + sparse BM25)
- Cross-encoder reranking
- Multi-format document ingestion
- RAGAS evaluation metrics

---

## 1. Installation & Setup

In [None]:
# Install required packages
!pip install -q llama-index chromadb sentence-transformers rank-bm25 python-docx pypdf

In [None]:
import os
import sys
from pathlib import Path

# Add project to path
sys.path.insert(0, '../../projects/rag/Enterprise-RAG')

# Set up environment
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'  # Replace with actual key

print("‚úÖ Setup complete!")

## 2. Initialize the RAG System

In [None]:
from src.ingestion import create_processor_from_settings
from src.retrieval import create_vector_store, create_embedding_service, create_hybrid_retriever
from src.retrieval.sparse_retriever import create_bm25_retriever
from src.retrieval.reranker import CrossEncoderReranker
from src.generation import create_rag_chain

print("üìö Initializing components...")

# 1. Create embedding service
embedding_service = create_embedding_service()
print("‚úÖ Embedding service initialized")

# 2. Create vector store
vector_store = create_vector_store()
print("‚úÖ Vector store created")

# 3. Create BM25 sparse retriever
from src.retrieval.sparse_retriever import create_bm25_retriever
bm25_retriever = create_bm25_retriever()
print("‚úÖ BM25 retriever created")

# 4. Create hybrid retriever
hybrid_retriever = create_hybrid_retriever(
    vector_store=vector_store,
    embedding_service=embedding_service,
    bm25_retriever=bm25_retriever
)
print("‚úÖ Hybrid retriever created")

# 5. Create reranker
reranker = CrossEncoderReranker()
print("‚úÖ Cross-encoder reranker initialized")

# 6. Create document processor
document_processor = create_processor_from_settings()
print("‚úÖ Document processor initialized")

# 7. Create RAG chain
rag_chain = create_rag_chain(
    retriever=hybrid_retriever,
    reranker=reranker
)
print("‚úÖ RAG chain created")

print("\nüéâ All components initialized successfully!")

## 3. Document Ingestion

In [None]:
# Create sample documents
sample_docs = [
    {
        "text": """Enterprise RAG systems combine dense and sparse retrieval for optimal results.
        Dense retrieval uses vector embeddings for semantic understanding, while sparse retrieval
        like BM25 provides exact keyword matching. The hybrid approach combines both methods.""",
        "metadata": {"source": "rag_guide.pdf", "page": 1}
    },
    {
        "text": """Cross-encoder reranking significantly improves retrieval accuracy by re-scoring
        the top-k results from the initial retrieval phase. This two-stage approach balances
        efficiency with accuracy.""",
        "metadata": {"source": "reranking_doc.pdf", "page": 3}
    },
    {
        "text": """RAGAS (Retrieval Augmented Generation Assessment) provides comprehensive metrics
        including faithfulness, answer relevancy, context precision, and context recall to evaluate
        RAG systems.""",
        "metadata": {"source": "evaluation_metrics.pdf", "page": 5}
    }
]

print(f"üìÑ Preparing to ingest {len(sample_docs)} documents...")

In [None]:
# Process and add documents
from llama_index import Document

for i, doc_data in enumerate(sample_docs, 1):
    # Create Document object
    doc = Document(
        text=doc_data["text"],
        metadata=doc_data["metadata"]
    )
    
    # Add to retriever
    hybrid_retriever.add_documents([doc])
    
    print(f"‚úÖ Ingested document {i}: {doc_data['metadata']['source']}")

print(f"\nüéâ Successfully ingested {len(sample_docs)} documents!")

## 4. Query the RAG System

In [None]:
# Query 1: Hybrid retrieval explanation
query1 = "What is hybrid retrieval in RAG systems?"

print(f"‚ùì Query: {query1}\n")
print("üîç Searching...\n")

response1 = rag_chain.query(
    query1,
    top_k=3,
    use_reranking=True
)

print(f"üìù Answer:\n{response1.answer}\n")
print(f"üìö Sources:")
for i, source in enumerate(response1.sources, 1):
    print(f"  {i}. {source['metadata']['source']} (relevance: {source.get('score', 'N/A')})")

In [None]:
# Query 2: Reranking benefits
query2 = "How does cross-encoder reranking improve RAG systems?"

print(f"‚ùì Query: {query2}\n")
print("üîç Searching...\n")

response2 = rag_chain.query(
    query2,
    top_k=2,
    use_reranking=True
)

print(f"üìù Answer:\n{response2.answer}\n")
print(f"‚è±Ô∏è Processing time: {response2.metadata.get('processing_time', 'N/A')}s")

In [None]:
# Query 3: Evaluation metrics
query3 = "What metrics does RAGAS provide for evaluation?"

print(f"‚ùì Query: {query3}\n")
print("üîç Searching...\n")

response3 = rag_chain.query(
    query3,
    top_k=3,
    use_reranking=False  # Test without reranking
)

print(f"üìù Answer:\n{response3.answer}\n")
print(f"üîÑ Reranking used: {response3.metadata.get('use_reranking', False)}")

## 5. Compare Retrieval Methods

In [None]:
test_query = "How does RAGAS evaluate RAG systems?"

print(f"üîç Test Query: {test_query}\n")
print("="*50)

# Dense only
print("\n1Ô∏è‚É£ Dense Retrieval Only:")
dense_results = vector_store.query(test_query, top_k=3)
for i, result in enumerate(dense_results, 1):
    print(f"  {i}. {result.metadata['source']} - score: {result.score:.3f}")

# Sparse only
print("\n2Ô∏è‚É£ Sparse (BM25) Retrieval Only:")
sparse_results = bm25_retriever.retrieve(test_query, top_k=3)
for i, result in enumerate(sparse_results, 1):
    print(f"  {i}. {result.metadata['source']} - score: {result.score:.3f}")

# Hybrid
print("\n3Ô∏è‚É£ Hybrid Retrieval (Dense + Sparse):")
hybrid_results = hybrid_retriever.retrieve(test_query, top_k=3)
for i, result in enumerate(hybrid_results, 1):
    print(f"  {i}. {result.metadata['source']} - score: {result.score:.3f}")

# Hybrid with reranking
print("\n4Ô∏è‚É£ Hybrid + Reranking:")
reranked_results = reranker.rerank(hybrid_results, query=test_query)
for i, result in enumerate(reranked_results[:3], 1):
    print(f"  {i}. {result.metadata['source']} - rerank score: {result.score:.3f}")

## 6. Evaluation with RAGAS

In [None]:
from src.evaluation import RAGEvaluator, create_evaluator

# Create evaluator
evaluator = create_evaluator()
print("üìä RAGAS Evaluator initialized\n")

# Define test cases
test_cases = [
    {
        "question": "What is hybrid retrieval?",
        "expected_answer": "Hybrid retrieval combines dense vector embeddings and sparse keyword matching for optimal search results."
    },
    {
        "question": "How does reranking help?",
        "expected_answer": "Reranking improves accuracy by re-scoring top results using a cross-encoder model."
    }
]

print(f"üß™ Running {len(test_cases)} evaluation test cases...\n")

# Run evaluation
results = []
for i, test in enumerate(test_cases, 1):
    response = rag_chain.query(test["question"], top_k=3)
    
    metrics = evaluator.evaluate_single(
        question=test["question"],
        answer=response.answer,
        expected=test["expected_answer"],
        contexts=[doc.text for doc in response.sources]
    )
    
    results.append({
        "question": test["question"],
        **metrics
    })
    
    print(f"Test {i}: {test['question'][:40]}...")
    print(f"  Faithfulness: {metrics['faithfulness']:.2f}")
    print(f"  Answer Relevancy: {metrics['answer_relevancy']:.2f}")
    print(f"  Context Precision: {metrics['context_precision']:.2f}")
    print()

## 7. Performance Comparison

In [None]:
import time

test_queries = [
    "What is hybrid retrieval?",
    "How does cross-encoder reranking work?",
    "What are RAGAS metrics?"
]

configs = [
    ("Dense Only", {"use_reranking": False, "use_sparse": False}),
    ("Sparse Only", {"use_reranking": False, "use_dense": False}),
    ("Hybrid (No Rerank)", {"use_reranking": False}),
    ("Hybrid + Rerank", {"use_reranking": True})
]

print("‚ö° Performance Comparison\n")
print("-" * 70)

for config_name, config in configs:
    times = []
    
    for query in test_queries:
        start = time.time()
        _ = rag_chain.query(query, **config)
        times.append(time.time() - start)
    
    avg_time = sum(times) / len(times)
    print(f"{config_name:25} | Avg: {avg_time:.3f}s | Min: {min(times):.3f}s | Max: {max(times):.3f}s")

## 8. Interactive Chat Interface

In [None]:
def interactive_chat():
    """Interactive chat with the RAG system."""
    print("üí¨ Enterprise-RAG Chat Interface")
    print("Type 'quit' to exit\n")
    
    while True:
        query = input("\nYou: ")
        
        if query.lower() in ['quit', 'exit', 'q']:
            print("Goodbye! üëã")
            break
        
        if not query.strip():
            continue
        
        response = rag_chain.query(query, top_k=3, use_reranking=True)
        
        print(f"\nAssistant: {response.answer}")
        
        if response.sources:
            print("\nSources:")
            for source in response.sources:
                print(f"  - {source.metadata['source']}")

# Uncomment to run interactive chat
# interactive_chat()

## Summary

### ‚úÖ What We Demonstrated:

1. **System Setup** - Initialized all RAG components
2. **Document Ingestion** - Processed multi-format documents
3. **Hybrid Retrieval** - Combined dense + sparse search
4. **Cross-Encoder Reranking** - Improved result accuracy
5. **RAGAS Evaluation** - Measured faithfulness, relevancy, precision
6. **Performance Analysis** - Compared different retrieval strategies

### üéØ Key Features:

- ‚úÖ Hybrid retrieval (dense + sparse)
- ‚úÖ Cross-encoder reranking
- ‚úÖ Multi-format document support
- ‚úÖ RAGAS evaluation metrics
- ‚úÖ Streaming responses
- ‚úÖ Production-ready API

### üìö Next Steps:

- Try with your own documents
- Experiment with different embedding models
- Tune reranking thresholds
- Deploy with FastAPI: `uvicorn src.api.main:app --reload`
- Explore the Streamlit UI: `streamlit run src/ui/app.py`

---

**üìñ Documentation:** [Enterprise-RAG README](../../projects/rag/Enterprise-RAG/README.md)