__Methods for Query Translation__

1. Multi Query (Query Translation)
2. RAG-Fusion (Query Translation)
3. Decomposition
4. Step Back Prompting
5. HyDE (Hypothetical Document Embeddings)


https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb

In [None]:
import bs4 
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.load import dumps, loads
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEndpoint, HuggingFaceEmbeddings, ChatHuggingFace

__Reading document and Indexing it__

In [4]:
# Reading the document from the web

loader = WebBaseLoader(
    web_path= ('https://lilianweng.github.io/posts/2024-07-07-hallucination/'),
    bs_kwargs = dict(parse_only = bs4.SoupStrainer(class_ = ('post-content','post-title','post-header'))), 
)
blog_doc = loader.load()


# Splitting the document into chunks
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size = 1500,
    chunk_overlap = 150)

splits = text_splitter.split_documents(blog_doc)


# Indexing the document into vectore database
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=HuggingFaceEmbeddings())


retriever = vectorstore.as_retriever()
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x16690cb30>, search_kwargs={})

### Multi Query

In [5]:
# Multi Query - Prompt 

template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector database. 
By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}"""

prompt_perpesctive = ChatPromptTemplate.from_template(template)
prompt_perpesctive

ChatPromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, template='You are an AI language model assistant. Your task is to generate five \ndifferent versions of the given user question to retrieve relevant documents from a vector database. \nBy generating multiple perspectives on the user question, your goal is to help\nthe user overcome some of the limitations of the distance-based similarity search. \nProvide these alternative questions separated by newlines. Original question: {question}'), additional_kwargs={})])

In [6]:
chat = ChatHuggingFace(
    
    llm = HuggingFaceEndpoint(
        repo_id="deepseek-ai/DeepSeek-V3.2",
        task="conversational",  
        temperature=0.7,
        max_new_tokens=256,
        streaming=False)
)

genreate_quries = (
    prompt_perpesctive | chat | StrOutputParser() | (lambda x: x.split('\n'))
)

genreate_quries

ChatPromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, template='You are an AI language model assistant. Your task is to generate five \ndifferent versions of the given user question to retrieve relevant documents from a vector database. \nBy generating multiple perspectives on the user question, your goal is to help\nthe user overcome some of the limitations of the distance-based similarity search. \nProvide these alternative questions separated by newlines. Original question: {question}'), additional_kwargs={})])
| ChatHuggingFace(llm=HuggingFaceEndpoint(repo_id='deepseek-ai/DeepSeek-V3.2', max_new_tokens=256, temperature=0.7, stop_sequences=[], server_kwargs={}, model_kwargs={}, model='deepseek-ai/DeepSeek-V3.2', client=<InferenceClient(model='deepseek-ai/DeepSeek-V3.2', timeout=120)>, async_client=<InferenceClient(model='deep

In [10]:
# union of retrieved docs 

def get_unique_union(document: list[list]):

   flattened_docs = [dumps(doc) for sublist in document for doc in sublist]
   unique_docs = list(set(flattened_docs))

   return [loads(doc) for doc in unique_docs]

In [20]:
question = "why does llm hallucinate ?"
retrieval_chain = genreate_quries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question":question})

docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2024-07-07-hallucination/'}, page_content='Non-context LLM: Prompt LLM directly with <atomic-fact> True or False? without additional context.\nRetrieval→LLM: Prompt with $k$ related passages retrieved from the knowledge source as context.\nNonparametric probability (NP)): Compute the average likelihood of tokens in the atomic fact by a masked LM and use that to make a prediction.\nRetrieval→LLM + NP: Ensemble of two methods.\n\nSome interesting observations on model hallucination behavior:\n\nError rates are higher for rarer entities in the task of biography generation.\nError rates are higher for facts mentioned later in the generation.\nUsing retrieval to ground the model generation significantly helps reduce hallucination.\n\nWei et al. (2024) proposed an evaluation method for checking long-form factuality in LLMs, named SAFE (Search-Augmented Factuality Evaluator; code). The main difference compared to FActScore is t

In [24]:
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (

    {'context': retrieval_chain,
     'question': itemgetter("question")}
     | prompt
     | chat 
     | StrOutputParser()
)

answer0 = final_rag_chain.invoke({"question":question})

In [26]:
print(answer0)

Based on the context provided, LLMs hallucinate primarily due to issues with their **pre-training data** and challenges in **learning new knowledge during fine-tuning**.

The key causes outlined are:

1.  **Pre-training Data Issues**: The massive datasets used for pre-training (often crawled from the public internet) inherently contain **out-of-date, missing, or incorrect information**. Since the model learns by maximizing the likelihood of this data, it can incorrectly memorize and reproduce these inaccuracies, leading to fabricated or unfaithful outputs.

2.  **Fine-Tuning on New Knowledge**: Introducing new information via supervised fine-tuning can be problematic. Research (Gekhman et al., 2024) found that:
    *   Models learn new knowledge that contradicts their pre-existing knowledge much **slower** than they learn consistent information.
    *   Once the model eventually learns these new, contradictory examples, it **increases its tendency to hallucinate**.
    *   The best mod