HyDE = Hypothetical Document Embeddings.

It is a technique where the LLM first generates a fake (hypothetical) answer/document based on the user's query, and that synthetic text is embedded instead of embedding the raw query.

How it works?

User query →
LLM generates a hypothetical answer →
Embed the hypothetical answer →
Search vector DB using that embedding →
Get better results

When to Use?
-> user queries are often short, vague, or not similar to real documents in your vector DB

-> When retrieval recall is low

-> Multi-Hop Reasoning Agent

-> Domain is dense, technical

-> When Response Accuracy is Needed


In [19]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain.document_loaders import DirectoryLoader,TextLoader
from langchain_groq import ChatGroq
import gradio as gr

In [2]:
root_path =r"C:\Users\Mohamed Arshad\Downloads\My_RAG_Lab\llm_engineering\RAG\knowledge-base"

In [4]:
loader =DirectoryLoader(path=root_path,
                        glob="**/*.md",
                        loader_cls=TextLoader,
                        loader_kwargs={"encoding":"utf-8"})

try:
    docs =loader.load()
    print(f"Documents loaded with {len(docs)} from {root_path}")

except Exception as e:
    print(f"Error Occured ;{e}")

Documents loaded with 76 from C:\Users\Mohamed Arshad\Downloads\My_RAG_Lab\llm_engineering\RAG\knowledge-base


In [20]:
# Chunking
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
chunks =text_splitter.split_documents(documents=docs)

In [18]:
embedding_model = HuggingFaceEmbeddings(model="all-MiniLM-L6-V2")
vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model,
    collection_name="my_rag_collection"
)

# Retreiver
retreiver =vector_store.as_retriever(
    search_type="mmr", search_kwargs={"k": 6, "lambda_mult":0.5 }
)

In [16]:
# Import Model 
import os
from dotenv import load_dotenv

load_dotenv(override=True)
groq_api_key = os.getenv('GROQ_API_KEY')

llm = ChatGroq(model="llama-3.1-8b-instant")

In [None]:
# llm.invoke('what is the time difference between u.k and melbourne and what is the time right now ')

AIMessage(content="The time difference between the UK and Melbourne depends on the time of year due to daylight saving time (DST) in both countries. \n\nIn the UK:\n- Standard time: GMT (Greenwich Mean Time)\n- DST: BST (British Summer Time) = GMT+1\n\nIn Melbourne:\n- Standard time: AEDT (Australian Eastern Daylight Time) - UTC+11\n- DST: AEST (Australian Eastern Standard Time) - UTC+10\n\nAssuming it's not DST time in Melbourne, the UK is 9 hours behind Melbourne. However, if Melbourne is on DST and the UK is not, the time difference would be 10 hours. \n\nFor current time, I can suggest a website or give you general knowledge:\n\nGeneral Time in Melbourne (AEDT):\n\n- Melbourne is 11:00 pm - 2:00 am UTC (depending on DST)\n- If you want to find the exact time, I recommend checking a current time website like worldtimebuddy.com.\n\nAs I'm a large language model, my access to real-time data is limited, but you can use online resources to get the most up-to-date time.\n\nCurrent Time i

In [None]:
def chat(message,history):
    

gr.ChatInterface(chat).launch()
