# Enterprise RAG System - Retrieval Demo

This notebook demonstrates the retrieval pipeline:
1. Query processing (normalization, expansion, entity extraction)
2. Hybrid retrieval (Vector + BM25 + Knowledge Graph)
3. Cross-encoder reranking (MANDATORY)
4. Grounded generation with citations

In [None]:
# Add project root to path
import sys
sys.path.insert(0, '..')

from config.settings import settings
settings.initialize()

## 1. Query Processing

Before retrieval, queries are:
1. Normalized (lowercase, clean)
2. Intent detected (factual, comparison, trend, etc.)
3. Expanded with synonyms and domain terms
4. Entities extracted for KG traversal

In [None]:
from src.retrieval.query_processor import QueryProcessor

processor = QueryProcessor()

# Example queries
queries = [
    "What are the main risk factors?",
    "How did revenue change compared to last year?",
    "Show me the breakdown of sales by segment",
    "What is the company's total debt?"
]

for query in queries:
    processed = processor.process(query)
    print(f"\nQuery: {query}")
    print(f"  Intent: {processed.intent.value}")
    print(f"  Entities: {processed.entities}")
    print(f"  Domain terms: {processed.domain_terms}")
    print(f"  Wants table: {processed.wants_table}")
    print(f"  Expanded queries: {len(processed.expanded_queries)}")

## 2. Hybrid Retrieval

Three parallel retrieval paths:
1. **Vector Search** (semantic similarity) - weight 0.5
2. **BM25 Search** (keyword matching) - weight 0.3
3. **KG Traversal** (entity relationships) - weight 0.2

Results merged using Reciprocal Rank Fusion (RRF)

In [None]:
from src.retrieval.hybrid_retriever import HybridRetriever

retriever = HybridRetriever()

print("Retriever configuration:")
print(f"  Vector weight: {retriever.vector_weight}")
print(f"  BM25 weight: {retriever.bm25_weight}")
print(f"  KG weight: {retriever.kg_weight}")

In [None]:
# Example retrieval (requires ingested documents)
# query = "What are the risk factors?"
# results = retriever.retrieve(query, top_k=10)
# 
# print(f"Retrieved {len(results)} candidates")
# for i, r in enumerate(results[:5]):
#     print(f"\n[{i+1}] Score: {r.combined_score:.4f}")
#     print(f"    Sources: {r.sources}")
#     print(f"    Section: {r.section_title}")
#     print(f"    Text: {r.text[:100]}...")

## 3. Cross-Encoder Reranking (MANDATORY)

**This is a REQUIRED component**. The system is invalid without reranking.

Reranking uses a cross-encoder model that:
1. Jointly encodes query and document
2. Produces more accurate relevance scores
3. Applies confidence threshold (0.3 default)
4. Filters out low-confidence results

In [None]:
from src.retrieval.reranker import CrossEncoderReranker

reranker = CrossEncoderReranker()

print("Reranker configuration:")
print(f"  Model: {reranker.model_name}")
print(f"  Device: {reranker.device}")
print(f"  Min score threshold: {reranker.min_score}")

In [None]:
# Example reranking (requires retrieval results)
# ranked = reranker.rerank(query, results, top_k=5)
# 
# print(f"{len(ranked)} chunks passed threshold")
# for r in ranked:
#     print(f"\nScore: {r.rerank_score:.4f} (passes: {r.passes_threshold})")
#     print(f"Section: {r.section_title}")
#     print(f"Text: {r.text[:100]}...")

## 4. Full Query Pipeline

The complete workflow:
1. Process query
2. Hybrid retrieval
3. Reranking
4. Confidence check
5. Context assembly
6. Grounded generation

In [None]:
from src.pipeline.query import QueryPipeline

pipeline = QueryPipeline()

# Example query (requires ingested documents)
# response = pipeline.query(
#     query="What are the main risk factors?",
#     verbose=True
# )
# 
# print(f"\n=== RESPONSE ===")
# print(f"Answer: {response.answer}")
# print(f"Has answer: {response.has_answer}")
# print(f"Confidence: {response.confidence:.2%}")
# print(f"Citations: {len(response.citations)}")
# print(f"Tables: {len(response.tables)}")
# print(f"Processing time: {response.processing_time_ms:.0f}ms")

## 5. Failure Cases

The system handles failures gracefully:

1. **No matching chunks**: Returns "information not present"
2. **Low confidence**: Explicitly states uncertainty
3. **Missing tables**: Only includes if referenced

In [None]:
# Test with out-of-scope query
# response = pipeline.query(
#     query="What is the weather today?",
#     verbose=True
# )
# 
# print(f"Answer: {response.answer}")
# print(f"Has answer: {response.has_answer}")  # Should be False

## Key Design Decisions

### Why Hybrid Retrieval?
- Vector: Semantic similarity (concept matching)
- BM25: Exact keywords (rare terms, numbers)
- KG: Entity relationships (connected concepts)

### Why Reranking is MANDATORY?
- Bi-encoders are approximate
- Cross-encoders are more accurate
- Confidence thresholds prevent hallucination

### Why Confidence Thresholds?
- Zero hallucination tolerance
- Better to say "not found" than guess
- Configurable based on use case