# 1. Multi-Query Generation

**What:** Generate 3-5 variations of the original question

**Why:** Different wordings catch different relevant documents

**When:** Ambiguous queries, need better recall

**Stage:** Pre-Retrieval

**LLM Calls:** 2 (generate variations + final answer)

---

### Flow

```
Original Query → LLM → [Query 1, Query 2, Query 3, ...]
                              ↓
                     Retrieve for each query
                              ↓
                     Combine unique documents
                              ↓
                        Generate Answer
```

### Example

**Original:** "Where did Otabek study?"

**Variations:**
- "What university did Otabek attend?"
- "Where did Otabek receive his education?"
- "What is Otabek's educational institution?"

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

from config import model, setup_vectorstore, get_retriever, format_docs

## Step 1: Generate Multiple Query Variations

Generate different versions of the question to improve retrieval coverage.

**Args:**
- `question`: Original user question
- `num_variations`: Number of variations to generate (default: 3)

**Returns:**
- List of queries including the original

In [None]:
def generate_multi_queries(question, num_variations=3):
    """Generate multiple query variations"""
    
    template = """You are an AI assistant that generates alternative versions of questions
to improve document retrieval in a RAG system.

Generate {num_variations} different versions of the given question.
Each version should ask the same thing but with different words/perspectives.

Original question: {question}

Provide numbered alternatives:"""
    
    prompt = ChatPromptTemplate.from_template(template)
    chain = prompt | model | StrOutputParser()
    
    response = chain.invoke({
        "question": question,
        "num_variations": num_variations
    })
    
    # Parse response
    queries = [question]  # Always include original
    
    for line in response.split("\n"):
        line = line.strip()
        if line and len(line) > 5:
            # Remove numbering
            if line[0].isdigit():
                line = line.split(". ", 1)[-1].split(") ", 1)[-1]
            queries.append(line)
    
    return queries[:num_variations + 1]


# Test
question = "What is task decomposition for LLM agents?"
queries = generate_multi_queries(question, 5)

print("Generated Queries:")
for i, q in enumerate(queries):
    prefix = "(original)" if i == 0 else f"{i}."
    print(f"  {prefix} {q}")

## Step 2: Multi-Query Retrieval

Retrieve documents for each query variation and combine unique results.

**Process:**
1. Generate query variations
2. Retrieve for each query
3. Deduplicate by content fingerprint

**Args:**
- `question`: Original user question
- `retriever`: LangChain retriever
- `num_variations`: Number of variations

**Returns:**
- List of unique documents

In [None]:
def multi_query_retrieve(question, retriever, num_variations=3):
    """Retrieve using multiple query variations"""
    
    # Generate variations
    queries = generate_multi_queries(question, num_variations)
    
    print("Generated Queries:")
    for i, q in enumerate(queries):
        prefix = "(original)" if i == 0 else f"{i}."
        print(f"  {prefix} {q}")
    
    # Retrieve for each query
    all_docs = []
    seen_content = set()
    
    for query in queries:
        docs = retriever.invoke(query)
        for doc in docs:
            # Deduplicate by first 100 chars
            fingerprint = doc.page_content[:100]
            if fingerprint not in seen_content:
                seen_content.add(fingerprint)
                all_docs.append(doc)
    
    print(f"\nRetrieved {len(all_docs)} unique documents")
    for i, doc in enumerate(all_docs[:5]):
        source = doc.metadata.get('filename', 'unknown')
        print(f"  {i+1}. [{source}] {doc.page_content[:60]}...")
    
    return all_docs

## Step 3: Complete Multi-Query RAG Pipeline

Full pipeline: Generate variations → Retrieve → Generate answer

**Args:**
- `question`: User question
- `retriever`: LangChain retriever
- `num_variations`: Number of variations

**Returns:**
- Final answer string

In [None]:
def multi_query_rag(question, retriever, num_variations=3):
    """Complete Multi-Query RAG pipeline"""
    
    # Retrieve
    docs = multi_query_retrieve(question, retriever, num_variations)
    context = format_docs(docs[:5])
    
    # Generate answer
    answer_template = """Answer the question based ONLY on the provided context.
If the context doesn't contain the answer, say "Information not found in documents."

Context:
{context}

Question: {question}

Answer:"""
    
    prompt = ChatPromptTemplate.from_template(answer_template)
    chain = prompt | model | StrOutputParser()
    
    answer = chain.invoke({"context": context, "question": question})
    
    print(f"\n{'='*70}")
    print("ANSWER:")
    print("="*70)
    print(answer)
    
    return answer

## Test

In [None]:
# Setup
vectorstore = setup_vectorstore()
retriever = get_retriever(vectorstore, k=3)

# Test questions
test_questions = [
    "Where did Otabek study?",
    "How to use DeMask?",
    "What is DMS?",
    "What is the difference between Graph DTA and Graph DF?"
]

for question in test_questions:
    print(f"\n{'='*70}")
    print(f"Question: {question}")
    print("="*70)
    answer = multi_query_rag(question, retriever)
    print("\n" + "-"*70)