# 🔍 Week 07-08 · Notebook 07: Advanced RAG - Retrievers

**Objective:** Move beyond basic similarity search and build sophisticated, multi-stage retrieval pipelines that deliver more accurate and context-aware results.

In the previous notebook, we built a basic retrieval system: we indexed documents in a vector store and used similarity search to find relevant chunks. While this is a great start, it often falls short in real-world scenarios. User questions can be ambiguous, and simple vector similarity doesn't always capture the true relevance of a document.

**Advanced RAG** introduces more sophisticated techniques to improve the "Retrieval" step. A better retriever leads to a better final answer from the LLM. In this notebook, we will focus on the **Retriever** component itself and explore several powerful patterns:

1.  **Multi-Query Retriever:** Tackles ambiguity by generating multiple variations of a user's question from different perspectives.
2.  **Parent Document Retriever:** Solves the "lost-in-the-middle" problem by retrieving small, precise chunks but providing the larger parent document to the LLM for better context.
3.  **Re-ranking:** Adds a secondary sorting step to the retrieved results, applying custom business logic or a more powerful model to improve the final ranking.

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:

1.  **Implement a Multi-Query Retriever:** Use an LLM to automatically generate and run multiple search queries to improve recall.
2.  **Use the Parent Document Retriever:** Structure your data to retrieve small chunks but return their parent documents for improved LLM context.
3.  **Apply a Re-ranking Step:** Implement a custom re-ranking function to sort retrieved documents based on specific business heuristics (e.g., prioritizing safety alerts).
4.  **Understand the Trade-offs:** Compare and contrast these advanced techniques with basic similarity search.

## 🧩 Scenario: Improving the Maintenance Chatbot's Retrieval Accuracy

The maintenance chatbot is live, but technicians are reporting some issues. A vague query like *"spindle vibrating"* sometimes misses critical safety alerts or fails to find the root cause from past incidents. A simple similarity search isn't enough.

Your task is to upgrade the retrieval system to be more robust and intelligent. You will implement and test three advanced patterns:

1.  When a technician asks, *"Why is the spindle vibrating?"*, you'll use a **Multi-Query Retriever** to also search for related concepts like *"causes of spindle vibration"* and *"troubleshooting high spindle vibration"*.
2.  You'll use a **Parent Document Retriever** to ensure that when a small, relevant sentence from a long SOP is found, the LLM receives the full section for complete context.
3.  Finally, you'll add a **re-ranking** step that explicitly boosts the score of any retrieved document containing "Safety-Alert" or "Incident", ensuring that critical information always appears first.

In [None]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# --- 1. Setup: Create a sample vector store ---
# This represents our knowledge base of SOPs, maintenance logs, etc.
sample_texts = [
    "SOP-101: For high spindle vibration, first check for loose tool holders. Torque to 50 Nm.",
    "Incident-552: High vibration on Press-04 was caused by bearing failure. Downtime: 8 hours. Cost: $50,000.",
    "SOP-102: Spindle bearing replacement must be done in a clean environment. Use only approved lubricants.",
    "Safety-Alert-7: Immediate shutdown is required if spindle vibration exceeds 10 mm/s. This is a critical safety issue.",
    "Troubleshooting-Guide: Common causes of vibration include tool imbalance, bearing wear, and incorrect speed settings."
]
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
vectorstore = Chroma.from_texts(sample_texts, embeddings)

# --- 2. Multi-Query Retriever ---
# This retriever generates multiple variations of the user's question to improve recall.
llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)
retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(), 
    llm=llm
)

# --- 3. Build the RAG Chain using LCEL ---
# This is the modern way to build RAG chains in LangChain.
rag_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful manufacturing assistant. Answer the user's question based on the following context. Prioritize safety alerts."),
    ("human", "Context:\n{context}\n\nQuestion: {question}")
])

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

# --- 4. Run the chain ---
question = 'Why is spindle vibration high after maintenance? Provide troubleshooting steps.'
response = rag_chain.invoke(question)

print("--- RAG Response ---")
print(response)

# --- Bonus: See the generated queries ---
import logging
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

retriever.get_relevant_documents(question)

### 🧭 Hierarchical Retrieval
1. Retrieve relevant SOP sections via embeddings.
2. Drill down to paragraphs referencing maintenance step IDs.
3. Apply heuristic scoring: safety incidents > downtime costs > general tips.
4. Log retrieval path for explainability.

In [None]:
def rank_documents(docs):
    """
    Applies a simple heuristic to rank documents based on keywords.
    In a real system, this would be more sophisticated, using metadata, etc.
    """
    ranked_docs = []
    for doc in docs:
        content = doc.page_content.lower()
        score = 0
        if "safety-alert" in content or "critical safety" in content:
            score = 100
        elif "incident" in content or "downtime" in content:
            score = 50
        elif "sop" in content:
            score = 20
        else:
            score = 10
        
        ranked_docs.append({"doc": doc, "score": score})
        
    # Sort documents by score in descending order
    return sorted(ranked_docs, key=lambda x: x['score'], reverse=True)

# --- Example Usage ---
# First, retrieve the documents
retrieved_docs = retriever.get_relevant_documents(question)

# Then, rank them
ranked_results = rank_documents(retrieved_docs)

print("\n--- Heuristically Ranked Documents ---")
for result in ranked_results:
    print(f"Score: {result['score']}, Content: {result['doc'].page_content[:80]}...")

## 🧪 Lab Assignment
1. Implement hierarchical retriever using LangChain `ParentDocumentRetriever` for SOP tree.
2. Add weight adjustments based on downtime impact from cost database.
3. Export retrieval traces to JSON for auditors.
4. Present RAG improvements vs. baseline to stakeholders.

## ✅ Checklist
- [ ] Multi-query retriever configured
- [ ] Hierarchical retrieval implemented
- [ ] Heuristic ranking documented
- [ ] Lab deliverables shared

## 📚 References
- LangChain Multi-Query Retriever
- Hierarchical Retrieval Patterns
- Week 09 Evaluation Harness