# 3. HyDE (Hypothetical Document Embeddings)

**What:** Generate hypothetical answer, embed it, retrieve real docs

**Why:** Hypothetical answers are closer in embedding space to real answers

**When:** Technical/semantic queries where keyword search fails

**Key Insight:**
- Question → Real Answer: ~0.65 similarity
- Hypothetical → Real Answer: ~0.89 similarity

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from config import model, embeddings, setup_vectorstore, format_docs

## Generate Hypothetical Answer

In [None]:
def generate_hypothetical(question):
    template = """Write a short paragraph answering this question.
Even if unsure, write what a good answer might look like.

Question: {question}

Answer:"""
    
    prompt = ChatPromptTemplate.from_template(template)
    chain = prompt | model | StrOutputParser()
    return chain.invoke({"question": question})

## HyDE Retrieval

In [None]:
def hyde_retrieve(question, vectorstore, k=5):
    print(f"Question: {question}\n")
    
    hypothetical = generate_hypothetical(question)
    print(f"Hypothetical: {hypothetical[:150]}...\n")
    
    docs = vectorstore.similarity_search(hypothetical, k=k)
    
    print(f"Retrieved {len(docs)} documents")
    return docs, hypothetical

## Complete HyDE RAG

In [None]:
def hyde_rag(question, vectorstore, k=5):
    docs, hypothetical = hyde_retrieve(question, vectorstore, k)
    context = format_docs(docs)
    
    template = """Answer based on context.

Context: {context}

Question: {question}

Answer:"""
    
    prompt = ChatPromptTemplate.from_template(template)
    chain = prompt | model | StrOutputParser()
    answer = chain.invoke({"context": context, "question": question})
    
    print(f"\nAnswer: {answer}")
    return answer

## Test

In [None]:
vectorstore = setup_vectorstore()

test_questions = [
    "Where did Otabek study?",
    "What is DMS?"
]

for q in test_questions:
    print("="*60)
    hyde_rag(q, vectorstore)
    print()