<a href="https://colab.research.google.com/github/balamurugan-shanmuganathan/RAG_Tutorial-Public-/blob/main/Query_Transformation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Query Transformation

Query transformation in the context of Large Language Models (LLMs) refers to the process of altering or modifying a user's query to improve the quality of responses, enhance information retrieval, or better align with specific goals. It is commonly used in search, question answering, and retrieval-augmented generation (RAG) tasks.

Few Types:

1. Multiple Query Generation
2. Decomposition
3. Rank Reciprocal Fusion (RRF)

Reference:

Decomposition + RRF : [RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation](https://arxiv.org/pdf/2406.12566)

Mulit Query + RRF [RAG-FUSION: A NEW TAKE ON RETRIEVAL-AUGMENTED GENERATION](https://arxiv.org/pdf/2402.03367)

In [1]:
!pip install -q langchain_google_genai
!pip install -q langchain_community
!pip install -q langchain
!pip install -q langchain_huggingface
!pip install -q chromadb
!pip install -q tiktoken
!pip install -q bs4
!pip install -q python-dotenv
!pip install -q langchainhub
!pip install -q langsmith

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.4/40.4 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m160.8/160.8 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m760.0/760.0 kB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m407.7/407.7 kB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.7/296.7 kB[0m [31m18.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.0/78.0 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
## Langsmith
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = LANGCHAIN_API_KEY

### Setup Environment

In [3]:
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda, RunnableParallel
from langchain_google_genai.chat_models import ChatGoogleGenerativeAI
import bs4
from  tqdm import tqdm



In [4]:
## Model
import os
from dotenv import load_dotenv
load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
google_llm = ChatGoogleGenerativeAI(model='gemini-1.0-pro')

## Huggingface Embeddings
hf_embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

  hf_embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [5]:
## Load Documents
loader = WebBaseLoader(
    web_paths=[
        "https://lilianweng.github.io/posts/2023-06-23-agent/"
    ],
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title","post-header")
        )
    )
)
document = loader.load()

In [6]:
## Splits
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size = 300,
    chunk_overlap = 50
)

split_doc = text_splitter.split_documents(document)
print(len(split_doc))
split_doc[0]

52


Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refin

In [7]:
## Embeddings
vectorstore = Chroma.from_documents(split_doc, hf_embeddings)
vectorstore

<langchain_community.vectorstores.chroma.Chroma at 0x79d1e41d8820>

In [8]:
## Retriever
retriever = vectorstore.as_retriever(k=2)
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x79d1e41d8820>, search_kwargs={})

In [179]:
retriever.get_relevant_documents("what is Agent System Overview and it types. And describe Task Decomposition and it process")

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Planning & Reacting: translate the reflections and the environment information into actions\n\nPlanning is essentially in order to optimize believability at the moment vs in time.\nPro

## Multiple Query Generation

**Goal:** Generate multiple related queries that cover different aspects of the original question to improve retrieval from diverse sources.


**Example:**

Original Query: "Latest advancements in AI research"

Many Queries:

* "What are the current trends in AI research?"

* "What are the latest breakthroughs in artificial intelligence?"

* "Which areas of AI are progressing the fastest?"

In [19]:
template = """You are an AI Language model assistant. Your task is to generate five different versions of the given user question
to retrieve relevant documents for each questions from a vector database. By generating multiple perspectives on the user question,
your goal is to help the user overcome some of the limitaions of the distance-based similarity search.
Provide these alternative questions separated by newlines. Original question: {question} """

prompt = ChatPromptTemplate.from_template(template)

generate_queries = (
    {"question" : RunnablePassthrough()}
    | prompt | google_llm | StrOutputParser() | (lambda x: x.split("\n"))
)
generate_queries.invoke( "what is Agent System Overview and it types. And describe Task Decomposition and it process")

['1. Overview and types of agent systems and their role in task decomposition.',
 '2. Explain the concept of agent systems, their types, and how they facilitate task decomposition.',
 '3. Describe the process of task decomposition in agent systems, including its benefits and limitations.',
 '4. How do agent systems contribute to the effective decomposition and execution of complex tasks?',
 '5. Provide a comprehensive analysis of agent systems, their types, and the process of task decomposition within these systems.']

In [20]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
  """ Unique union of retrieved docs """
  ## Flatten list of list, and convert each document to string
  flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
  print(len(flattened_docs))

  ## Get unique documents
  unique_docs = list(set(flattened_docs))

  ## Return
  return [loads(doc) for doc in unique_docs]

In [23]:
## get_unique_union
retriever_chain = generate_queries | retriever.map() | RunnableLambda(get_unique_union)
unique_results = retriever_chain.invoke("what is Agent System Overview and it types. And describe Task Decomposition and it process")
print(len(unique_results))

20
6


In [22]:
## RAG
from langchain.load import dumps, loads

template = """Answer the question in 500 words summary based only on the following context
{context}

Question: {question}

Output should be json format
"""
final_prompt = ChatPromptTemplate.from_template(template)

multiple_query_chain = (
    {"context": retriever_chain, "question": RunnablePassthrough()}
    | final_prompt
    | google_llm
    | JsonOutputParser()
)

multiple_query_chain.invoke("what is Agent System Overview and it types. And describe Task Decomposition and it process")

20


{'Agent System Overview': {'Definition': 'A LLM-powered autonomous agent system is a system that uses a large language model (LLM) as its core controller. LLMs are powerful AI models that can understand and generate human language, and they have been shown to be effective at a wide range of tasks, including question answering, translation, and summarization.',
  'Components': [{'Planning': {'Subgoal and decomposition': 'The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.',
     'Reflection and refinement': 'The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.'}},
   {'Memory': {'Storage': 'The agent stores memories of past events and experiences.',
     'Retrieval': 'The agent can retrieve memories from storage to inform its current behavior.'}},
   {'Communication': 'The agent can communicate with other a

## Decomposition

**Goal:**

Break down a complex query into simpler sub-queries that are easier to answer individually.


**Example:**

Original Query: "How do LLM agents work with task decomposition?"


Decomposed Queries:
* "What is task decomposition in AI?"
* "How do large language models use task decomposition?"
* "What are the benefits of task decomposition for LLM agents?"

### Difference Between Multi-query and decompositon

**Multi Query:**
* Multi Query generates multiple variations of the original query, each covering a different aspect or related angle of the same topic. The goal is to retrieve a broader set of relevant documents or information.

* It focuses on expanding the scope by producing alternative phrasing or related questions, often overlapping in content.

* Ittypically retrieves answers from diverse perspectives for a broader understanding.

* It may generate redundant or overlapping results due to similar queries.

* It is beneficial when the goal is to gather as much information as possible.


**Decomposition:**
* Decomposition, on the other hand, breaks a complex query into smaller, more manageable sub-queries, each addressing a specific part of the original question.

* Decomposition simplifies and divides a complex question, isolating distinct sub-problems to be solved independently.

* Decomposition narrows the focus by tackling individual parts of a complex query step by step.

* Decomposition is more structured and goal-oriented, leading to distinct answers for each sub-question.

* Decomposition is useful when a query is complex, requiring detailed exploration of each component.

In [24]:
template = """You are a helpful assistant that generates sub-questions related
to an input question.
The goal is to break down the input into a set of sub-questions that can be
answered in isolation. Generate exactly 4 sub-questions related to: {question}
Output only the sub-questions, each on a new line, with no additional text or empty lines:
"""

prompt = ChatPromptTemplate.from_template(template)

sub_question_generator = (
    {"question": RunnablePassthrough()}
    | prompt
    | google_llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

sub_questions = sub_question_generator.invoke("what is Agent System Overview and it types. And describe Task Decomposition and it process")
sub_questions

['- What is an agent system overview?',
 '- What are the different types of agent systems?',
 '- What is task decomposition?',
 '- What is the process of task decomposition?']

In [25]:
generate_queries.invoke( "what is Agent System Overview and it types. And describe Task Decomposition and it process")

['- What is an overview of agent systems?',
 '- Describe the different types of agent systems.',
 '- Explain the concept of task decomposition.',
 '- Describe the process of task decomposition.',
 '- How are agent systems used in task decomposition?']

In [26]:
def get_qa_pairs(sub_questions):
  formatted_string = ""
  i=1

  for ques in tqdm(sub_questions):

    prompt = """Answer the question with 50 words based on below context :
    {context}

    Question: {question}
    """

    retriever_doc = retriever.get_relevant_documents(ques)

    answer = (
        ChatPromptTemplate.from_template(prompt)
        | google_llm
        | StrOutputParser()
    ).invoke({"context" : retriever_doc, "question" : ques})

    formatted_string += f"Question {i} : {ques}\nAnswer {i} : {answer}\n\n"
    i+=1

  return formatted_string

print(get_qa_pairs(sub_questions))


  retriever_doc = retriever.get_relevant_documents(ques)
100%|██████████| 4/4 [00:05<00:00,  1.35s/it]

Question 1 : - What is an agent system overview?
Answer 1 : An agent system overview is a high-level description of the components and functionality of an agent system. It typically includes a description of the agent's goals, its environment, and the mechanisms it uses to interact with its environment and achieve its goals.

Question 2 : - What are the different types of agent systems?
Answer 2 : In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:

- Planning
- Subgoal and decomposition
- Reflection and refinement
- Memory

Question 3 : - What is task decomposition?
Answer 3 : Task decomposition is a technique used in planning to break down complex tasks into smaller, more manageable steps. This can be done by using a large language model (LLM) with simple prompting, by using task-specific instructions, or with human inputs.

Question 4 : - What is the process of task decomposition?
Answer 4 : Task decomposition is the 




In [27]:
template = """Here is a set of Q+A pairs:
{context}

Use these to Q+A to answer the input question with 50 words summary: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

decomp_rag = (

    sub_question_generator | RunnableParallel({"context" : RunnableLambda(get_qa_pairs), "question": RunnablePassthrough()})
    | prompt
    | google_llm
    | StrOutputParser()
)

print(decomp_rag.invoke("what is Agent System Overview and it types. And describe Task Decomposition and it process"))

100%|██████████| 4/4 [00:07<00:00,  1.92s/it]


**Summary:**

Agent systems consist of planning, memory, and reflection mechanisms. They can be reactive, goal-based, utility-based, or learning. Task decomposition breaks down complex tasks into manageable steps, aiding planning and understanding. Steps involve identifying the task, breaking it down, determining dependencies, assigning steps, executing them, and logging results. Decomposition enhances agent efficiency and performance.


## Rank Reciprocal Fusion

Rank Reciprocal Fusion (RRF) is a simple and effective method for combining the ranked results from multiple retrieval models or sources into a single, unified ranking.


Here we rank the retrieved results from multi-query.

In [17]:
template = """You are a helpful assistant that generates mulit query. You have to create exactly four search queries based on a single input query.
Only return the 5 different search queries without any additional text or numbering.
Generate exactly 5 search queries related to: {question} """

prompt = ChatPromptTemplate.from_template(template)

generate_queries = (
    {"question" : RunnablePassthrough()}
    | prompt | google_llm | StrOutputParser() | (lambda x: x.split("\n"))
)
generate_queries.invoke( "what is Agent System Overview and it types. And describe Task Decomposition and it process")


['1. Agent System Overview and Types',
 '2. Task Decomposition Definition',
 '3. Task Decomposition Process',
 '4. Role of Task Decomposition in Agent Systems',
 '5. Applications of Task Decomposition in Agent Systems']

In [36]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(result: list[list], k =60):
  fused_scores = {}
  for docs in result:
    for rank, doc in enumerate(docs):
      doc_str = dumps(doc)
      print(rank, doc_str[10:])
      if doc_str not in fused_scores:
        fused_scores[doc_str] = 0
      fused_scores[doc_str] += 1 / (rank + k)

  reranked_resluts = [
      (loads(doc), score)
      for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
  ]

  return reranked_resluts

In [37]:
## get_unique_union

retrieval_chain_rag_fusion = generate_queries | retriever.map() | RunnableLambda(reciprocal_rank_fusion)
fused_scores = retrieval_chain_rag_fusion.invoke("what is Agent System Overview and it types. And describe Task Decomposition and it process")
len(fused_scores)

0 "type": "constructor", "id": ["langchain", "schema", "document", "Document"], "kwargs": {"metadata": {"source": "https://lilianweng.github.io/posts/2023-06-23-agent/"}, "page_content": "LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent\u2019s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The age

7

In [39]:
import pandas as pd
pd.DataFrame(fused_scores)
# pd.DataFrame.from_dict(fused_scores, orient='index')

Unnamed: 0,0,1
0,page_content='LLM Powered Autonomous Agents\n ...,0.081976
1,"page_content='Each element is an observation, ...",0.080662
2,page_content='Fig. 1. Overview of a LLM-powere...,0.066667
3,page_content='Fig. 2. Examples of reasoning t...,0.032002
4,page_content='Finite context length: The restr...,0.032002
5,page_content='Fig. 13. The generative agent ar...,0.016129
6,page_content='}\n]\nChallenges#\nAfter going t...,0.015873


In [18]:
from operator import itemgetter
template = """Answer the following question based on this context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag = (
    {"context" : retrieval_chain_rag_fusion, "question": itemgetter("question")}
    | prompt
    | google_llm
    | StrOutputParser()

)

question = "what is Agent System Overview and it types. And describe Task Decomposition and it process"

print(final_rag.invoke({"question": question}))

0 "type": "constructor", "id": ["langchain", "schema", "document", "Document"], "kwargs": {"metadata": {"source": "https://lilianweng.github.io/posts/2023-06-23-agent/"}, "page_content": "LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent\u2019s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The age