# ⚖️ Week 07-08 · Notebook 10 · RAG Fusion & Reranking

Blend multiple retrieval signals and apply rerankers to deliver precise, auditable answers.

## 🎯 Learning Objectives
- Implement reciprocal rank fusion across vector stores.
- Integrate cross-encoder rerankers (e.g., Cohere, HuggingFace).
- Evaluate precision@k and response quality improvements.
- Capture reranker explainability metadata.

## 🧩 Scenario
Combining SOP and incident stores yields better context. The QA team wants data on how rerankers improve top-3 accuracy for safety-critical questions.

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from langchain_community.vectorstores import Chroma, FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document

# --- 1. Setup: Create multiple, distinct knowledge sources ---
emb = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

# SOPs (Standard Operating Procedures) - The "how-to" guides
sop_docs = [
    Document(page_content="SOP-122: For bearing lubrication, use Grade 2 lithium grease only."),
    Document(page_content="SOP-900: Lockout/Tagout procedure must be followed before any maintenance."),
    Document(page_content="SOP-123: If a bearing is overheating, shut down the machine immediately and inspect for damage.")
]
sop_store = FAISS.from_documents(sop_docs, emb)

# Incident Logs - Historical data of what went wrong
incident_docs = [
    Document(page_content="Incident 2023-55: Bearing failure on Press-07 led to 8 hours of downtime. Root cause: improper lubrication."),
    Document(page_content="Incident 2024-12: Overheating bearing on CNC-02 was caught during a routine check. Re-lubricated per SOP-122."),
    Document(page_content="Incident 2024-21: Coolant shortage caused multiple machines to overheat.")
]
incident_store = FAISS.from_documents(incident_docs, emb)


# --- 2. RAG Fusion using a simple combined retriever ---
# A simple fusion approach is to just search over both stores.
# LangChain's `merger_retriever` is another more advanced option.
class CombinedRetriever:
    def __init__(self, retrievers):
        self.retrievers = retrievers
    def get_relevant_documents(self, query):
        docs = []
        for retriever in self.retrievers:
            docs.extend(retriever.get_relevant_documents(query))
        return docs

# Create a retriever that searches both SOPs and Incidents
fusion_retriever = CombinedRetriever(retrievers=[sop_store.as_retriever(k=3), incident_store.as_retriever(k=3)])

query = 'Bearing is overheating after maintenance shift'
fused_docs = fusion_retriever.get_relevant_documents(query)

print("--- Fused Documents (Before Reranking) ---")
for doc in fused_docs:
    print(f"- {doc.page_content}")


### 🔁 Cross-Encoder Reranking
Score fused results using a transformer reranker for improved precision.

In [None]:

# --- 3. Cross-Encoder Reranking ---
# A cross-encoder model takes the query and a document and outputs a relevance score.
# It is much more accurate than vector similarity but slower, so it's used to rerank the top N results.
model = HuggingFaceCrossEncoder(model_name="cross-encoder/ms-marco-MiniLM-L-6-v2")
compressor = CrossEncoderReranker(model=model, top_n=3) # Keep the top 3 most relevant docs

# The compression retriever wraps the base retriever and the reranker
reranking_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, 
    base_retriever=fusion_retriever
)

# --- 4. Get Reranked Documents ---
reranked_docs = reranking_retriever.get_relevant_documents(query)

print("\n--- Reranked & Compressed Documents (Top 3) ---")
for doc in reranked_docs:
    print(f"- {doc.page_content} (Relevance: {doc.metadata['relevance_score']:.2f})")

## 📊 Evaluation
1. Use labeled QA pairs (Week 09 dataset) to compute precision@3, recall@5, and reranker uplift.
2. Log metrics to MLflow: `baseline_precision`, `reranked_precision`, `uplift_pct`.
3. Record reranker version, model checksum, and calibration notes for audits.

## 🧪 Lab Assignment
1. Evaluate multiple weight combinations in RRF and identify optimal settings per plant.
2. Benchmark rerankers (Cohere vs. HuggingFace) for latency and accuracy.
3. Produce a dashboard (Looker or Superset) tracking retrieval KPIs.
4. Share findings with governance board.

## ✅ Checklist
- [ ] Fusion retriever implemented
- [ ] Reranker evaluated
- [ ] Metrics logged
- [ ] Governance artefacts stored

## 📚 References
- Reciprocal Rank Fusion Research
- Cross-Encoder Reranking Guides
- Week 09 Evaluation Harness