# Query Translation: HyDE (Hypothetical Document Embeddings)

![hyDE](../images/images-hyde.png)

**HyDE** (Hypothetical Document Embeddings) is a technique that enhances retrieval by first generating a hypothetical document based on the user query, and then using it to find real documents in the knowledge base. It bridges the gap when direct retrieval from the query might fail or return irrelevant results.

### The Problem HyDE Solves
Sometimes, user queries:
- Are vague or lack enough information for effective retrieval.
- Don’t have direct matches in the knowledge base, leading to poor or no results.

**HyDE** solves this by creating a hypothetical document—a kind of imagined answer or context—that guides the retrieval system toward the most relevant documents.

### How HyDE Works
1. **Generate a Hypothetical Document**:
   - Use an LLM to generate a plausible response or context based on the user query.
   - This document doesn’t need to be accurate—it just needs to be similar to what the user might be looking for.
   - Example:
     - Query: "How does token expiry work in OAuth 2.0?"
     - Hypothetical Document:
       - "Token expiry in OAuth 2.0 ensures that access tokens are valid only for a short duration, reducing the risk of unauthorized access."

2. **Create Embeddings**:
   - Convert the hypothetical document into a vector representation (embedding) using a dense embedding model.
   - This vector captures the semantic meaning of the hypothetical document.

3. **Retrieve Real Documents**:
   - Use the embedding of the hypothetical document to search the knowledge base for semantically similar real documents.
   - Example:
     - Retrieved documents might include OAuth 2.0 specifications, API security guidelines, or examples of token expiry implementation.

4. **Generate the Final Response**:
   - Combine the real documents retrieved with the user query and pass them to the LLM for generating a final, accurate response.

### Why HyDE Works
- **Fills Knowledge Gaps**: By hypothesizing an answer, HyDE helps retrieve relevant information even when the query itself lacks enough detail.
- **Improves Retrieval Quality**: The hypothetical document aligns better with the information in the knowledge base, guiding the retriever effectively.
- **Handles Vague Queries**: Works well for open-ended or poorly defined questions.

### Simple Example
#### Query:
*"What is the role of token expiry in OAuth 2.0 security?"*

#### Without HyDE:
- The system might struggle to find documents if the exact query terms (e.g., "token expiry") aren’t present in the indexed documents.

#### With HyDE:
1. **Generate Hypothetical Document**:
   - "Token expiry in OAuth 2.0 ensures short-lived access, preventing misuse and enabling safer API interactions."
2. **Create Embedding**:
   - Generate a semantic embedding for the hypothetical document.
3. **Retrieve Real Documents**:
   - Use the embedding to find documents about OAuth 2.0 and token-based security.
   - Retrieved documents might include detailed OAuth workflows or articles on API security.
4. **Generate the Final Response**:
   - Combine the retrieved documents and user query to produce a response:
     - "Token expiry in OAuth 2.0 limits the duration an access token is valid, reducing the risk of token misuse. Refresh tokens are used to renew access without compromising security."

### Key Benefits of HyDE
- **Resilient to Query Limitations**: Generates relevant results even if the query lacks specificity or direct matches.
- **Semantic Accuracy**: Embedding the hypothetical document ensures retrieval is based on meaning, not just keywords.
- **Flexible Application**: Can handle complex, open-ended, or poorly phrased queries.

In short, **HyDE** is like imagining a possible answer to your question first, and then using that imagined answer to find the best real-world information to back it up!

![hyde](../images/hyde.png)

Arxiv paper:

- [Precise Zero-Shot Dense Retrieval without Relevance Labels](https://arxiv.org/pdf/2212.10496)

## Setup

In [1]:
%run "../Z - Common/setup.ipynb"

Stored 'enable_langsmith' (bool)


USER_AGENT environment variable not set, consider setting it to identify your requests.


In [2]:
docs = load_sample_data()
split_docs = split_sample_data(docs)
retriever = seed_sample_data(split_docs)

Let's start by building the prompts and chain to retrieve the documents:


In [3]:
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""

prompt_hyde = ChatPromptTemplate.from_template(template)

chain_generate_docs_for_retrieval = (
    prompt_hyde | llm | StrOutputParser() 
)

In [4]:
question = "What are the main components of an LLM-powered autonomous agent system?"
chain_generate_docs_for_retrieval.invoke({"question":question})

"Here's a scientific paper passage addressing the main components of an LLM-powered autonomous agent system:\n\nThe Architecture of LLM-Powered Autonomous Agent Systems: A Component Analysis\n\nThe fundamental architecture of Large Language Model (LLM)-powered autonomous agent systems typically comprises several essential components that work in concert to enable intelligent, goal-directed behavior. These core components can be categorized into five primary subsystems:\n\n1. Language Model Core\nThe foundation of the system is the Large Language Model itself, which serves as the cognitive engine. This component handles natural language understanding, reasoning, and generation capabilities. Modern implementations typically utilize transformer-based architectures with parameters ranging from billions to trillions, enabling complex reasoning and decision-making processes.\n\n2. Memory Management System\nThe memory architecture consists of multiple layers:\n- Short-term working memory for 

Next build the retriever chain:

In [5]:
chain_retriever = chain_generate_docs_for_retrieval | retriever 

retrieved_docs = chain_retriever.invoke({"question":question})
retrieved_docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refi

And finally build the RAG chain:

In [6]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

chain_rag = (
    prompt
    | llm
    | StrOutputParser()
)



In [7]:
result=chain_rag.invoke({"context":retrieved_docs,"question":question})
result

'Based on the context, the main components of an LLM-powered autonomous agent system include the LLM itself functioning as the brain, a planning component (which handles subgoal decomposition and task breakdown), and a memory system. The planning component also includes reflection and refinement capabilities, allowing the agent to learn from past actions and improve future performance.'

Save the results so we can compare later.

In [8]:
write_results("hyde.txt", result)