# HyDE

HyDE (Hypthetical Document Embeddings) is a retrieval technique where, instead of embedding the user's query directly, you first generate a hypthetical answer (document) to the query using an LLM - and then embed that hyptothetical document to search your vectore store.

### HyDE bridged the gap between user intent and relevant content, especially when:
1. Queries are short
2. Language meismatch between query and documents.
3. You want to retrieve based on the answer content, not question words.

In [2]:
from langchain_community.document_loaders import WikipediaLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain_ollama import ChatOllama

In [5]:
chunk_size = 300
chunk_overlap = 100

loader = WikipediaLoader(query="Steve Jobs", load_max_docs=5)
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)
docs = text_splitter.split_documents(documents=documents)
docs

[Document(metadata={'title': 'Steve Jobs', 'summary': 'Steven Paul Jobs (February 24, 1955 – October 5, 2011) was an American businessman, inventor, and investor best known for co-founding the technology company Apple Inc. Jobs was also the founder of NeXT and chairman and majority shareholder of Pixar. He was a pioneer of the personal computer revolution of the 1970s and 1980s, along with his early business partner and fellow Apple co-founder Steve Wozniak.\nJobs was born in San Francisco in 1955 and adopted shortly afterwards. He attended Reed College in 1972 before withdrawing that same year. In 1974, he traveled through India, seeking enlightenment before later studying Zen Buddhism. He and Wozniak co-founded Apple in 1976 to further develop and sell Wozniak\'s Apple I personal computer. Together, the duo gained fame and wealth a year later with production and sale of the Apple II, one of the first highly successful mass-produced microcomputers. \nJobs saw the commercial potential 

In [47]:
from langchain_community.vectorstores import FAISS
from langchain_chroma import Chroma

embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')

vectorestore = Chroma.from_documents(
    documents=docs,
    embedding=embeddings,
    persist_directory="output/steve_jobs_hyde.db"
)

In [27]:
llm = ChatOllama(
    model="gemma2:9b-instruct-q4_K_M",
    num_ctx=32768,
    reasoning=False
)

In [48]:
base_retriever = vectorestore.as_retriever(search_kwargs={'k':5})

In [32]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate

## Generating a Prompt for generating Hyde
def get_hyde_doc(query):
    template = """
        Imagine you are an expert writing a detailed explanation on the topic: '{query}'
        Your response should be comprehensive and include all key points that could be found in the top search results.
    """
    
    system_message_prompt = SystemMessagePromptTemplate.from_template(template = template)
    chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt])
    messages = chat_prompt.format_prompt(query=query).to_messages()
    print(messages)
    response = llm.invoke(messages)
    hypo_doc = response.content
    return hypo_doc

In [44]:
query = 'When was Steve Jobs fired from Apple?'
print(get_hyde_doc(query))

[SystemMessage(content="\n        Imagine you are an expert writing a detailed explanation on the topic: 'When was Steve Jobs fired from Apple?'\n        Your response should be comprehensive and include all key points that could be found in the top search results.\n    ", additional_kwargs={}, response_metadata={})]
##  The Ouster of Steve Jobs: A Defining Moment for Apple

Steve Jobs' departure from Apple, a company he co-founded, wasn't a simple resignation. It was a dramatic event that shook the tech world and ultimately paved the way for both his return and Apple's resurgence. 

**The Date:** **September 12, 1985**, marks the day Steve Jobs was officially ousted from Apple Computer Inc., the company he had led since its inception in 1976.

**The Context:** The seeds of Jobs' departure were sown long before this date. By the early 1980s, Apple faced several internal challenges:

* **Power Struggles:**  Jobs, known for his demanding personality and ambitious vision, clashed frequent

In [49]:
matched_doc = base_retriever.invoke(get_hyde_doc(query))
print(matched_doc)

[SystemMessage(content="\n        Imagine you are an expert writing a detailed explanation on the topic: 'When was Steve Jobs fired from Apple?'\n        Your response should be comprehensive and include all key points that could be found in the top search results.\n    ", additional_kwargs={}, response_metadata={})]


## Langchain-HypotheticalDocumentEmbedder

In [50]:
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_ollama import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableMap

## Step 1: Load and Split
loader = TextLoader("langchain_crewai_dataset.txt")
text = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(text)

In [52]:
# Setup the Hyde Embedder
hyde_embedding_function = HypotheticalDocumentEmbedder.from_llm(
    llm=llm,
    base_embeddings=embeddings,
    prompt_key="web_search"
)

In [58]:
vectorestore = Chroma.from_documents(
    documents=chunks,
    embedding=hyde_embedding_function,
    persist_directory="output/langchain.db"
)

In [67]:
rag_prompt = PromptTemplate.from_template("""
Use the context below to answer the question.

Context:
{context}

Question: {input}
""")

rag_chain = create_stuff_documents_chain(llm=llm, prompt=rag_prompt)

In [69]:
# Final RAG Pipeline
def hyde_rag_pipeline(query):
    matched_docs = vectorestore.similarity_search(query, k=4)
    response = rag_chain.invoke({
        "input": query,
        "context": matched_docs
    })
    
    return response

In [70]:
from IPython.display import Markdown, display

query = "What memory modules does LangChain provide?"
answer = hyde_rag_pipeline(query)
print(answer)

According to the context, LangChain provides these memory modules:

* **ConversationBufferMemory**
* **ConversationSummaryMemory** 


Let me know if you have any other questions! 😊



### Custom prompt

In [None]:
from langchain_core.prompts import PromptTemplate

custom = PromptTemplate.from_template(
    "Generate a concise hypothetical answer for this topic: {query}"
)

hyde_embedding_function = HypotheticalDocumentEmbedder.from_llm(
    llm=llm,
    base_embeddings=embeddings,
    custom_prompt=custom
)