# RAG from scratch: Query Transformations

Query Transformations mean that our approach focuses on re-writing and/or modifying questions for retrieval.

In [1]:
!pip install -q langchain_community tiktoken langchainhub chromadb langchain


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import os
from access import Access

os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = Access.LANGCHAIN_API_KEY

## Method 1: Multi Query

In multi query, we are going to have 1 question, based on that, we will feed it to an ai model to make multiple queries to gather data from the vectorstore and retrieve the correct documents, by feeding these documents into the LLM, we can get the most accurate answer.

### Retriever

In [3]:
import bs4
from langchain_classic import hub
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_ollama import ChatOllama, OllamaEmbeddings

# === Indexing ===
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2024-11-28-reward-hacking/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

docs = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# Embed
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=OllamaEmbeddings(model="mxbai-embed-large"))

retriever = vectorstore.as_retriever()

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
USER_AGENT environment variable not set, consider setting it to identify your requests.


### Prompt

In [4]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser

# Prompt
template = """
You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}
"""

prompt_perspectives = ChatPromptTemplate.from_template(template)

generate_queries = (
    prompt_perspectives
    | ChatOllama(model="llama3.1", temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

In [None]:
# def get_unique_union(documents: list[list]):
#     """ Unique union of retrieved docs """
#     seen = set()
#     uniq_docs = []
    
#     for sublist in documents:
#         for doc in sublist:
#             key = (doc.page_content, tuple(sorted(doc.metadata.items())))
#             if key not in seen:
#                 seen.add(key)
#                 uniq_docs.append(doc)
                
#     return uniq_docs
#     # # Flatten list of lists, and convert each Document to string
#     # flattened_docs = [Serializable(doc) for sublist in documents for doc in sublist]
    
#     # # Get unique documents
#     # unique_docs = list(set(flattened_docs))
    
#     # # Return
#     # return [loads(doc) for doc in unique_docs]

# question = "What is the definition of reward hacking?"
# retrieval_chain = generate_queries | retriever.map() | get_unique_union
# docs = retrieval_chain.invoke({"question":question})
# len(docs)


16

In [19]:
from langchain_core.load.dump import dumps
from langchain_core.load import load

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [load(doc) for doc in unique_docs]

# Retrieve
question = "What is the definition of reward hacking?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question":question})
len(docs)

13

In [17]:
docs

['{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "document", "Document"], "kwargs": {"metadata": {"source": "https://lilianweng.github.io/posts/2024-11-28-reward-hacking/"}, "page_content": "$$\\nF(s, a, s\') = \\\\gamma \\\\Phi(s\') - \\\\Phi(s)\\n$$\\n\\nThis would guarantee that the sum of discounted $F$, $F(s_1, a_1, s_2) + \\\\gamma F(s_2, a_2, s_3) + \\\\dots$, ends up being 0. If $F$ is such a potential-based shaping function, it is both sufficient and necessary to ensure $M$ and $M\\u2019$ share the same optimal policies.\\nWhen $F(s, a, s\\u2019) = \\\\gamma \\\\Phi(s\\u2019) - \\\\Phi(s)$, and if we further assume that $\\\\Phi(s_0) = 0$, where $s_0$ is absorbing state, and $\\\\gamma=1$, and then for all $s \\\\in S, a \\\\in A$:\\n\\n$$\\n\\\\begin{aligned}\\nQ^*_{M\'} (s,a) &= Q^*_M(s, a) - \\\\Phi(s) \\\\\\\\\\nV^*_{M\'} (s,a) &= V^*_M(s, a) - \\\\Phi(s)\\n\\\\end{aligned}\\n$$", "type": "Document"}}',
 '{"lc": 1, "type": "constructor", "id": ["langchain",

In [7]:
from operator import itemgetter
from langchain_ollama import ChatOllama
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

llm = ChatOllama(model="llama3.1", temperature=0)

final_rag_chain = (
    {"context": retrieval_chain,
     "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question": question})

'Reward hacking refers to the possibility of an agent gaming the reward function to achieve high rewards through undesired behavior, as proposed by Amodei et al. in their seminal paper "Concrete Problems in AI Safety" (2016). It involves exploiting the task specification or finding "holes" in the design of the reward function to achieve higher proxy rewards but lower true rewards.'

## Method 2: RAG-Fusion

RAG-Fusion has a similar approach to multi query. Multi Query can be simply explained as taking the original query, pass it to LLM to rephrase the question producing even more question so that we can have a wider access to the retrieved documents by mitigating mistake caused by missed words from the original query.

RAG-Fusion also works the same way. passing the question to LLM to generate more questions. We then use these independent questions to retrieve the documents. The Novelty of RAG-Fusion lies in the way it retrieves documents. Instead of retrieving list of documents, we expect the model to return list of ranked documents. This way we can rank the documents and find out which documents consistently appear in different search attempts indicating that it's the most related documents.

### Prompt

In [20]:
from langchain_core.prompts import ChatPromptTemplate

# RAG-Fusion: Related
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)

In [21]:
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser

generate_queries = (
    prompt_rag_fusion
    | ChatOllama(model="llama3.1", temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

In [22]:
from langchain_core.load import load
from langchain_core.load.dump import dumps

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previous_score = fused_scores[doc_str]
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (load(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retrieval_chain_rag_fusion.invoke({"question": question})
len(docs)

13

In [23]:
docs

[('{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "document", "Document"], "kwargs": {"metadata": {"source": "https://lilianweng.github.io/posts/2024-11-28-reward-hacking/"}, "page_content": "In-Context Reward Hacking#", "type": "Document"}}',
  0.11585580821434865),
 ('{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "document", "Document"], "kwargs": {"metadata": {"source": "https://lilianweng.github.io/posts/2024-11-28-reward-hacking/"}, "page_content": "Reward Tampering (Everitt et al. 2019) is a form of reward hacking behavior where the agent interferes with the reward function itself, causing the observed reward to no longer accurately represent the intended goal. In reward tampering, the model modifies its reward mechanism either by directly manipulating the implementation of the reward function or by indirectly altering the environmental information used as input for the reward function.\\n(Note: Some work defines reward tampering as a distinct cat

In [24]:
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain_rag_fusion, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

'Reward hacking refers to the possibility of an agent gaming the reward function to achieve high reward through undesired behavior. It was first proposed by Amodei et al. (2016) in their seminal paper "Concrete Problems in AI Safety". Reward hacking can be seen as a broader concept that includes specification gaming, where the agent satisfies the literal specification of an objective but not achieving the desired results due to a gap between the literal description of the task goal and the intended goal.'

## Method 3: Decomposition

Decomposition is a technique to break a complex questions into smaller sub-questions where the LLM can answer each part and combine the results into the final answer. Decomposition is helpful when the questions scope are too big to be done over one retrieval call. So we solve this by breaking it into smaller quesitons to give a better and more complete context before merging the information to produce the best answer.

In [35]:
from langchain_core.prompts import ChatPromptTemplate

# Decomposition: Ask the LLM to create sub-questions
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Rules:\n
- Each sub-questions must be self-contained.\n
- Do NOT add blank lines.\n
- Do NOT number or bullet the questions.\n
- Only return the 3 sub-questions. one per line.\n
Output (3 queries):"""

prompt_decomposition = ChatPromptTemplate.from_template(template)

In [36]:
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser

# LLM
llm = ChatOllama(model="llama3.1", temperature=0)

generate_queries_decomposition = (
    prompt_decomposition
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

# Run
question = "What are the main components of an LLM-powered autonomous agent system?"
questions = generate_queries_decomposition.invoke({"question":question})

In [37]:
questions

['What are the key components of a Large Language Model (LLM) in an autonomous agent system?',
 'What are the primary functions and responsibilities of a reasoning module in an LLM-powered autonomous agent system?',
 'How do natural language processing (NLP) and machine learning algorithms contribute to the decision-making capabilities of an LLM-powered autonomous agent system?']

In [38]:
# Prompt
template = """Here is the question you need to answer:

\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question:

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

In [44]:
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

def debug_print_prompt(prompt_dict):
    print("\n========== PROMPT ==========")
    print(prompt_dict)
    print("============================\n")
    return prompt_dict

def format_qa_pair(question, answer):
    """Format Q and A pair"""
    
    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()
    
# llm
llm = ChatOllama(model="llama3.1", temperature=0)

q_a_pairs = ""
for q in questions:
    rag_chain = (
        {"context": itemgetter("question") | retriever,
         "question": itemgetter("question"),
         "q_a_pairs": itemgetter("q_a_pairs")}
        | decomposition_prompt
        | debug_print_prompt
        | llm
        | StrOutputParser()
    )
    
    answer = rag_chain.invoke({"question": q, "q_a_pairs": q_a_pairs})
    q_a_pair = format_qa_pair(q, answer)
    q_a_pairs = q_a_pairs + "\n---\n" + q_a_pair


messages=[HumanMessage(content="Here is the question you need to answer:\n\n\n --- \n What are the key components of a Large Language Model (LLM) in an autonomous agent system? \n --- \n\n\nHere is any available background question + answer pairs:\n\n\n --- \n  \n --- \n\n\nHere is additional context relevant to the question:\n\n\n --- \n [Document(metadata={'source': 'https://lilianweng.github.io/posts/2024-11-28-reward-hacking/'}, page_content='Reward hacking examples in LLM tasks#\\n\\nA language model for generating summarization is able to explore flaws in the ROUGE metric such that it obtains high score but the generated summaries are barely readable. (Link)\\nA coding model learns to change unit test in order to pass coding questions. (Link)\\nA coding model may learn to directly modify the code used for calculating the reward. (Link)\\n\\nReward hacking examples in real life#'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2024-11-28-reward-hacking/'}, page

In [45]:
answer

'Based on the provided context and background information, I will outline how natural language processing (NLP) and machine learning algorithms contribute to the decision-making capabilities of an LLM-powered autonomous agent system.\n\n**Contribution of NLP**\n\nNatural Language Processing (NLP) plays a crucial role in enabling an LLM-powered autonomous agent system to understand and interact with its environment through natural language. The key components of an LLM, including the Language Understanding Module, Knowledge Retrieval System, and Generative Model, rely heavily on NLP algorithms to process and generate human-like language.\n\n1. **Language Understanding**: NLP algorithms enable the LLM to comprehend user input, including text or voice commands, and extract relevant information.\n2. **Knowledge Retrieval**: NLP is used to access a vast knowledge base, which is then used to generate responses to user queries or requests.\n3. **Generative Model**: NLP algorithms are employed

This approach is effective because it adds additional context in a form of !/A pairs. It does supports deep reasoning and it wors well for technical, multi-layered, or causally connected problems. but The weakness is that the prompt will grow larger and larger after every iteration leading to usage of more tokens. Using more tokens will also increase the cost and make the speed slower. As in longer dependecies, it has the risk of error propagation where a wrong early answer can affect later one.

For this reason, we can alternate the approach to have a more stable compute cost that is fast and scalable and is not prone to error accumulation. Although the trade-poff is that there will be no cross-pollination of knowledge between sub-qiestions. So in a case where the sub-question depend on each other, this approach will be suboptimal.

In [54]:
from langchain_classic import hub


# Prompt
prompt_rag = hub.pull("rlm/rag-prompt")

def retrieve_and_rag(question, prompt_rag, sub_question_generator_chain):
    """RAG on each sub-question"""
    
    # Use our decomposition
    sub_questions = sub_question_generator_chain.invoke({"question":question})
    
    # Initialize a list to hold RAG chain results
    rag_results = []
    
    for sub_question in sub_questions:
        
        # Retrieve documents for each sub-question
        retrieved_docs = retriever.invoke(sub_question)
        
        # Use retrieved documents and sub-question in RAG chain
        answer = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs,
                                                                "question": sub_question})
        rag_results.append(answer)
        
    return rag_results, sub_questions

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag, generate_queries_decomposition)

In [55]:
def format_qa_pairs(questions, answers):
    """Format Q and A pairs"""
    
    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    return formatted_string.strip()

context = format_qa_pairs(questions, answers)

# Prompt
template = """Here is a set of Q+A pairs:

{context}

Use these to synthesize an answer to the question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":context,"question":question})

'Based on the provided Q&A pairs, here is a synthesized answer:\n\nAn LLM-powered autonomous agent system consists of several key components. At its core, it includes a Large Language Model (LLM) that has been optimized based on feedback and can interact with APIs through user-specific goals and tasks. The LLM also has access to error feedback, which enables it to recover from errors but may lead to increased constraint violations.\n\nIn addition to the LLM, the system incorporates a reasoning module that plays a crucial role in decision-making. This module analyzes feedback, identifies errors, and adjusts the policy of the LLM accordingly to achieve desired goals while minimizing undesired behavior. The reasoning module also helps the agent to recover from errors with minimal constraint violations.\n\nFurthermore, the system relies on natural language processing (NLP) and machine learning algorithms to enable it to analyze user feedback, optimize its policy based on that feedback, and

## Method 4: Step Back

Step Back is a method used to modify the question when it is too specific to can make the model retrieve irrelevant context. It rephrases the question into a more general version so that the retrieval result will become better.

Step-by-step:
1. Use asks a question.
2. LLM generates a more general question
3. RAG retrieves context using this general question
4. LLM uses the retrieved context to answer the original question accurately.

In [58]:
# Few Shot Examples
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

# Example for the LLM
examples = [
    {
        "input": "Could the members of the Police perform lawful arrests?",
        "output": "What can the members of the Police do?"
    },
    {
        "input": "Jan Sindel's was born in what country?",
        "output": "What is Jan Sindel's personal history?"
        
    }
]

# We now transform these to example messages
# We use this to tell LangChain how to format each example
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)

# Wrap all examples into few-shot template
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

# Build the final prompt that will be sent to the LLM
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:""",
        ),
        # Few Shot Examples
        few_shot_prompt,
        
        # New Question
        ("user", "{question}"),
    ]
)

the result of the prompt will look pretty much like this:
```
System: (instruction)
Human: Example 1 input
AI: Example 1 output
Human: Example 2 input
AI: Example 2 output
Human: {the real question from end user}
```

In [59]:
generate_queries_step_back = prompt | ChatOllama(model="llama3.1", temperature=0) | StrOutputParser()
question = "What is Reward Hacking for Reinforcement Learning?"
generate_queries_step_back.invoke({"question": question})

'What are some techniques used to improve reinforcement learning algorithms?'

In [61]:
from langchain_core.runnables import RunnableLambda

# Response prompt
response_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.

# {normal_context}
# {step_back_context}

# Original Question: {question}
# Answer:"""
response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

chain = (
    {
        # Retrieve context using the normal question
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        
        # Retrieve context using the step-back question
        "step_back_context": generate_queries_step_back | retriever,
        
        # Pass on the question
        "question": lambda x: x["question"]
    }
    | response_prompt
    | ChatOllama(model="llama3.1", temperature=0)
    | StrOutputParser()
)

chain.invoke({"question": question})


'Reward hacking in reinforcement learning (RL) refers to a phenomenon where an RL agent exploits flaws or ambiguities in the reward function to achieve high rewards, without genuinely learning or completing the intended task. This occurs when the model and algorithm become increasingly sophisticated, allowing the agent to find "holes" in the design of the reward function and exploit the task specification.\n\nReward hacking can manifest in two primary ways:\n\n1. **Environment or goal misspecified**: The model learns undesired behavior to achieve high rewards by hacking the environment or optimizing a reward function not aligned with the true reward objective, such as when the reward is misspecified or lacks key requirements.\n2. **Reward tampering**: The model learns to interfere with the reward mechanism itself.\n\nThis issue exists because RL environments are often imperfect, and it is fundamentally challenging to accurately specify a reward function. As a result, reward hacking has

## Method 5: HyDE

HyDE stands for Hypothetical Document Embeddings. In HyDE, llm will create a fake answer in the beginning to improve retrieval.

Step will go by:
1. User asks a question
2. LLM Writes a hypothetical answer (just like random guess)
3. The hypothetical text is embedded and used as the search query to the vector database.
4. The retrieved documents is likely contain more relevant and detailed information.
5. LLM answers the original question using retrieved docs.

SyntaxError: invalid syntax (3410470723.py, line 1)