HyDE = Hypothetical Document Embeddings.

It is a technique where the LLM first generates a fake (hypothetical) answer/document based on the user's query, and that synthetic text is embedded instead of embedding the raw query.

How it works?

User query →
LLM generates a hypothetical answer →
Embed the hypothetical answer →
Search vector DB using that embedding →
Get better results

When to Use?
-> user queries are often short, vague, or not similar to real documents in your vector DB

-> When retrieval recall is low

-> Multi-Hop Reasoning Agent

-> Domain is dense, technical

-> When Response Accuracy is Needed


In [40]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain.document_loaders import DirectoryLoader,TextLoader
from langchain_groq import ChatGroq
import gradio as gr

In [41]:
root_path =r"C:\Users\Mohamed Arshad\Downloads\My_RAG_Lab\llm_engineering\RAG\knowledge-base"

In [42]:
loader =DirectoryLoader(path=root_path,
                        glob="**/*.md",
                        loader_cls=TextLoader,
                        loader_kwargs={"encoding":"utf-8"})

try:
    docs =loader.load()
    print(f"Documents loaded with {len(docs)} from {root_path}")

except Exception as e:
    print(f"Error Occured ;{e}")

Documents loaded with 76 from C:\Users\Mohamed Arshad\Downloads\My_RAG_Lab\llm_engineering\RAG\knowledge-base


In [43]:
# Chunking
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
chunks =text_splitter.split_documents(documents=docs)

In [44]:
embedding_model = HuggingFaceEmbeddings(model="all-MiniLM-L6-V2")
vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model,
    collection_name="my_rag_collection"
)

# Retreiver
retreiver =vector_store.as_retriever(
    search_type="mmr", search_kwargs={"k": 20, "lambda_mult":0.5 }
)

In [45]:
# Import Model 
import os
from dotenv import load_dotenv

load_dotenv(override=True)
groq_api_key = os.getenv('GROQ_API_KEY')

llm = ChatGroq(model="llama-3.1-8b-instant")

In [46]:
# llm.invoke('what is the time difference between u.k and melbourne and what is the time right now ')

In [75]:
MIN_QUERY_WORDS =4
RETRIEVAL_SCORE_THRESHOLD=0.55


In [76]:
def is_short_query(query:str):
    #print(len(query.split()))
    return len(query.split())<MIN_QUERY_WORDS #Return boolean whether query is shorter or not 

In [78]:
docs = retreiver.invoke('who is the ceo of this company?')
print(docs)
print(type(docs[0]))
print(docs[0].metadata)


[Document(id='fbe68176-3ab5-4725-8e5d-06b3ccdf4a08', metadata={'source': 'C:\\Users\\Mohamed Arshad\\Downloads\\My_RAG_Lab\\llm_engineering\\RAG\\knowledge-base\\employees\\James Wilson.md'}, page_content='# HR Record\n\n# James Wilson\n\n## Summary\n- **Date of Birth:** April 5, 1978\n- **Job Title:** Chief Technology Officer (CTO)\n- **Location:** San Francisco, California\n- **Current Salary:** $285,000\n\n## Insurellm Career Progression\n- **January 2017 - Present:** Chief Technology Officer\n  - Reports directly to CEO, member of executive leadership team\n  - Oversees all technology strategy and engineering operations\n  - Manages 85-person engineering, product, and data organization\n  - Led company through major platform modernization and cloud migration\n  - Drove adoption of AI/ML capabilities across product suite\n\n- **March 2012 - December 2016:** VP of Engineering at TechScale Inc.\n  - Built and scaled engineering organization from 15 to 60 engineers\n  - Led product dev

In [77]:
is_low_score('what is the name of this company?')

no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found
no score found


True

In [69]:


# ------------------------------------
# 2. Low Retrieval Score Check
# ------------------------------------
def is_low_score(query: str):
    docs = retreiver.invoke(query)

    if not docs:
        return True

    scores = []
    for d in docs:
        if "score" in d.metadata:
            scores.append(d.metadata["score"])
        else:
            print('no score found')
            scores.append(1.0)  # default if no score

    top_score = min(scores)


    return top_score > RETRIEVAL_SCORE_THRESHOLD

# ------------------------------------
# HyDE Generate & Retrieve
# ------------------------------------
def hyde_retrieve(query):
    synthetic_doc = llm.invoke(
        f"Generate a factual paragraph that could answer the question:\n\n{query}"
    )

    synthetic_text = (
        synthetic_doc.content
        if hasattr(synthetic_doc, "content")
        else str(synthetic_doc)
    )

    synthetic_embeddings = embedding_model.embed_query(synthetic_text)

    docs = vector_store.max_marginal_relevance_search_by_vector(
        synthetic_embeddings, k=5
    )
    return docs

# ------------------------------------
# MAIN CHAT LOGIC
# ------------------------------------
def chat(query, history):

    hyde_used = False  # Track if HyDE is used

    if is_short_query(query):
        docs = hyde_retrieve(query)
        hyde_used = True
    else:
        normal_docs = retreiver.invoke(query)
        if is_low_score(query):
            docs = hyde_retrieve(query)
            hyde_used = True
        else:
            docs = normal_docs

    # Number of documents retrieved
    num_docs = len(docs)

    # Prepare context
    context = "\n\n".join(d.page_content for d in docs)

    # Ask LLM
    final_answer = llm.invoke(
        f"Use the context below to answer the question. If not found, say 'Not found'.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {query}"
    )

    # Return simple info + answer
    info_text = f"Documents retrieved: {num_docs}\nHyDE used: {hyde_used}\n"
    return info_text + "\nAnswer:\n" + final_answer.content


In [70]:
gr.ChatInterface(chat).launch()


  self.chatbot = Chatbot(


* Running on local URL:  http://127.0.0.1:7872
* To create a public link, set `share=True` in `launch()`.


