<a href="https://colab.research.google.com/github/Msaleemakhtar/GenerativeAI/blob/main/Query_translation(RAG)_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **🌟 Query Translation 🌟**

# 1. Multi Query Translation

In [28]:
! pip install langchain_community tiktoken  langchainhub langchain langchain-google-genai



In [29]:
!pip install --quiet chromadb

In [30]:
# import os
# os.environ['LANGCHAIN_TRACING_V2'] = 'true'
# os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
# os.environ['LANGCHAIN_API_KEY'] = "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"
# os.environ["LANGCHAIN_PROJECT"]="rag-prompt"



LANGCHAIN_TRACING_V2 = 'true'
LANGCHAIN_ENDPOINT = 'https://api.smith.langchain.com'
LANGCHAIN_API_KEY= "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"
LANGCHAIN_PROJECT="rag-prompt"


In [31]:
import os
import getpass
#GOOGLE_API_KEY=""



os.environ['GOOGLE_API_KEY'] = getpass.getpass('Gemini API Key:')

Gemini API Key:··········


In [32]:
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI



#### INDEXING ####

# Load Documents
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=300, chunk_overlap=50)
splits = text_splitter.split_documents(docs)

# Embed
gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

vectorstore = Chroma.from_documents(
                     documents=splits,                 # Data
                     embedding=gemini_embeddings,    # Embedding model
                     #persist_directory="./chroma_db" # Directory to save data
                     )


retriever = vectorstore.as_retriever()


In [33]:
from langchain.prompts import ChatPromptTemplate

# Multi Query: Different Perspectives
template = """You are an AI language model assistant. Your task is to generate five
different versions of the given user question to retrieve relevant documents from a vector
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search.
Provide these alternative questions separated by newlines. Original question: {question}"""


prompt_perspectives = ChatPromptTemplate.from_template(template)

generate_queries = (
    prompt_perspectives
    | ChatGoogleGenerativeAI(model="gemini-1.5-flash",temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

In [34]:
docs = generate_queries.invoke({"question": "What is Task Decomposition?"})
docs

['Here are five different versions of the user question "What is Task Decomposition?" to retrieve relevant documents from a vector database:',
 '',
 '1. **Focusing on the process:** How do you break down a complex task into smaller, manageable subtasks?',
 '2. **Highlighting the purpose:** What are the benefits of dividing a large task into smaller components?',
 '3. **Emphasizing the application:**  How is task decomposition used in project management or software development?',
 '4. **Using synonyms:** What is the process of dividing a large job into smaller, more manageable units?',
 '5. **Adding context:**  Explain the concept of task decomposition in the context of [specific field or domain, e.g., artificial intelligence, psychology]. ',
 '']

In [35]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]


# Retrieve
question = "What is task decomposition for LLM agents?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question":question})
len(docs)


8

In [36]:
from operator import itemgetter
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash",temperature=0)

final_rag_chain = (
    {"context": retrieval_chain,
     "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)
question = "What is task decomposition for LLM agents?"
final_rag_chain.invoke({"question":question})

'Task decomposition is the process of breaking down large, complex tasks into smaller, more manageable subgoals. This allows LLM agents to handle complex tasks more efficiently. \n\nThe context mentions several methods for task decomposition:\n\n* **Chain of Thought (CoT):**  The model is prompted to "think step by step" to decompose tasks into smaller steps.\n* **Tree of Thoughts (ToT):**  This extends CoT by exploring multiple reasoning possibilities at each step, creating a tree structure of potential solutions.\n* **LLM prompting:**  Simple prompts like "Steps for XYZ... 1." or "What are the subgoals for achieving XYZ?" can guide the LLM to decompose tasks.\n* **Task-specific instructions:**  Providing instructions tailored to the task, like "Write a story outline," can help the LLM understand how to break down the task.\n* **Human inputs:**  Humans can provide guidance on how to decompose tasks, especially for complex or unfamiliar tasks. \n'

# **2. Rag_Fusion**

In [37]:
from langchain.prompts import ChatPromptTemplate

# Rag fusion
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""


rag_fusion_queries= ChatPromptTemplate.from_template(template)

generate_queries = (
    rag_fusion_queries
    | ChatGoogleGenerativeAI(model="gemini-1.5-flash",temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

In [38]:
queries=generate_queries.invoke({"question": "What is task decomposition for LLM agents?"})
queries

['Here are 4 search queries related to "What is task decomposition for LLM agents?":',
 '',
 '1. **Task decomposition techniques for large language model agents**',
 '2. **How to decompose complex tasks for LLM-powered agents**',
 '3. **Benefits of task decomposition in LLM agent development**',
 '4. **Examples of task decomposition in LLM agent applications** ',
 '']

In [39]:
print(queries[0])
print(queries[1])
print(queries[2])
print(queries[3])
print(queries[4])
print(queries[5])
print(queries[6])




Here are 4 search queries related to "What is task decomposition for LLM agents?":

1. **Task decomposition techniques for large language model agents**
2. **How to decompose complex tasks for LLM-powered agents**
3. **Benefits of task decomposition in LLM agent development**
4. **Examples of task decomposition in LLM agent applications** 



In [40]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents
        and an optional parameter k used in the RRF formula """

    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previous_score = fused_scores[doc_str]
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
# question= "What is task decomposition for LLM agents?"
# docs = retrieval_chain_rag_fusion.invoke({"question": question})
# len(docs)

In [41]:
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain_rag_fusion,
     "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)
question= "What is task decomposition for LLM agents?"
final_rag_chain.invoke({"question":question})

'Task decomposition is the process of breaking down large, complex tasks into smaller, more manageable subgoals. This allows LLM agents to handle complex tasks more efficiently. \n\nThe document mentions several methods for task decomposition:\n\n* **Chain of Thought (CoT):**  The model is prompted to "think step by step" to break down tasks into smaller steps.\n* **Tree of Thoughts:** This extends CoT by exploring multiple reasoning possibilities at each step, creating a tree structure of potential solutions.\n* **LLM prompting:** Simple prompts like "Steps for XYZ..." or "What are the subgoals for achieving XYZ?" can guide the LLM to decompose tasks.\n* **Task-specific instructions:**  Providing instructions tailored to the task, like "Write a story outline," can help the LLM understand how to break it down.\n* **Human input:** Humans can also provide guidance on task decomposition. \n'

# **Decomposition**

**Part_1 recursively**

In [42]:

from langchain.prompts import ChatPromptTemplate

# Decomposition
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
only give the questions in output \n
Output (3 queries):"""
prompt_decomposition = ChatPromptTemplate.from_template(template)

In [43]:
# LLM
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash",temperature=0)
# Chain
generate_queries_decomposition = ( prompt_decomposition | llm | StrOutputParser() | (lambda x: x.split("\n")))

# Run
question = "What are the main components of an LLM-powered autonomous agent system?"
questions = generate_queries_decomposition.invoke({"question":question})

In [44]:
questions

['1. What are the key components of an LLM-based autonomous agent?',
 '2. How do LLMs interact with other components in an autonomous agent system?',
 '3. What are the different types of LLM-powered autonomous agent architectures? ',
 '']

In [45]:
# Prompt
template = """Here is the question you need to answer:

\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question:

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

In [46]:
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

def format_qa_pair(question, answer):
    """Format Q and A pair"""

    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

# llm
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash",temperature=0)

q_a_pairs = ""
for q in questions:

    rag_chain = (
    {"context": itemgetter("question") | retriever,
     "question": itemgetter("question"),
     "q_a_pairs": itemgetter("q_a_pairs")}
    | decomposition_prompt
    | llm
    | StrOutputParser())

    answer = rag_chain.invoke({"question":q,"q_a_pairs":q_a_pairs})

    q_a_pair = format_qa_pair(q,answer)

    q_a_pairs = q_a_pairs + "\n---\n"+  q_a_pair




In [47]:
answer

'The provided context describes a specific type of LLM-based autonomous agent architecture, which can be categorized as a **goal-oriented architecture with a strong emphasis on self-reflection and learning**. \n\nHere\'s a breakdown of the key characteristics based on the context:\n\n**1. Goal-Oriented:** The agent is explicitly designed to achieve user-provided goals. This is evident from the "GOALS" section, which outlines the user-defined objectives the agent needs to accomplish.\n\n**2. Self-Reflection and Learning:** The agent is equipped with mechanisms for self-evaluation and improvement. The "Performance Evaluation" section highlights this aspect, emphasizing the need for continuous review, self-criticism, and reflection on past decisions to refine its approach.\n\n**3. Resource Management:** The agent is aware of its limitations, particularly its short-term memory capacity. It is instructed to save important information to files and use subprocesses for long-running tasks. Thi

**Part_2 Answer individually**

In [48]:
# Answer each sub-question individually

from langchain import hub
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_google_genai import ChatGoogleGenerativeAI


# RAG prompt
prompt_rag = hub.pull("rlm/rag-prompt")

def retrieve_and_rag(question,prompt_rag,sub_question_generator_chain):
    """RAG on each sub-question"""

    # Use our decomposition /
    sub_questions = sub_question_generator_chain.invoke({"question":question})

    # Initialize a list to hold RAG chain results
    rag_results = []

    for sub_question in sub_questions:

        # Retrieve documents for each sub-question
        retrieved_docs = retriever.invoke(sub_question)

        # Use retrieved documents and sub-question in RAG chain
        answer = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs,
                                                                "question": sub_question})
        rag_results.append(answer)

    return rag_results,sub_questions

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag, generate_queries_decomposition)



In [50]:
print(retrieve_and_rag(question, prompt_rag, generate_queries_decomposition))

(['Key components of an LLM-based autonomous agent include planning, memory, and an LLM as the brain. Planning involves breaking down tasks into smaller subgoals and reflecting on past actions to improve future performance. Memory allows the agent to store and retrieve information for decision-making. \n', "LLMs in autonomous agent systems function as the agent's brain, interacting with other components like planning, memory, and execution. These components work together to break down complex tasks into smaller subgoals, learn from past actions, and execute the necessary steps to achieve the desired outcome. \n", 'The provided context does not mention different types of LLM-powered autonomous agent architectures. It focuses on the general concept of using LLMs as the core controller for autonomous agents and highlights key components like planning and memory. \n', "The provided context describes the constraints and resources available to an agent. It mentions a limited short-term memor

In [51]:
def format_qa_pairs(questions, answers):
    """Format Q and A pairs"""

    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    return formatted_string.strip()

context = format_qa_pairs(questions, answers)

# Prompt
template = """Here is a set of Q+A pairs:

{context}

Use these to synthesize an answer to the question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":context,"question":question})

'An LLM-powered autonomous agent system is comprised of several key components working together to enable intelligent and independent action. These components include:\n\n* **LLM as the Brain:** The core of the system is a large language model (LLM) that acts as the agent\'s "brain." It processes information, makes decisions, and generates responses.\n* **Planning:** The agent needs to be able to break down complex tasks into smaller, manageable subgoals. This planning component helps the agent strategize and prioritize actions.\n* **Memory:**  The agent requires a memory system to store and retrieve information for decision-making. This could include past experiences, learned knowledge, and relevant data. \n* **Execution:**  The agent needs to be able to execute the actions it plans. This could involve interacting with the environment, manipulating objects, or communicating with other agents.\n\nThese components work together to enable the agent to learn from its experiences, adapt to

# **Step back**

In [52]:
# Few Shot Examples
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
examples = [
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "what can the members of The Police do?",
    },
    {
        "input": "Jan Sindel’s was born in what country?",
        "output": "what is Jan Sindel’s personal history?",
    },
]
# We now transform these to example messages
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:""",
        ),
        # Few shot examples
        few_shot_prompt,
        # New question
        ("user", "{question}"),
    ]
)

In [53]:
generate_queries_step_back = prompt | llm | StrOutputParser()
question = "What is the capital of France?"
generate_queries_step_back.invoke({"question":question})

'What is the most important city in France? \n'

In [54]:
# Response prompt
response_prompt_template = """You are an expert of world knowledge.
I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant.
Otherwise, ignore them if they are not relevant.

# {normal_context}
# {step_back_context}

# Original Question: {question}
# Answer:"""


response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

# llm
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash",temperature=0)

chain = (
    {
        # Retrieve context using the normal question
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # Retrieve context using the step-back question
        "step_back_context": generate_queries_step_back | retriever,
        # Pass on the question
        "question": lambda x: x["question"],
    }
    | response_prompt
    | llm
    | StrOutputParser()
)

chain.invoke({"question": question})

'The capital of France is **Paris**. \n'

# **HyDE**

In [55]:
from langchain.prompts import ChatPromptTemplate

# HyDE document genration
template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""
prompt_hyde = ChatPromptTemplate.from_template(template)


# llm
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash",temperature=0)

generate_docs_for_retrieval = (
    prompt_hyde | llm | StrOutputParser()
)

# Run
question = "What is task decomposition for LLM agents?"
generate_docs_for_retrieval.invoke({"question":question})

'## Task Decomposition for Large Language Model Agents\n\nTask decomposition is a crucial aspect of enabling Large Language Models (LLMs) to effectively act as agents in complex environments. It involves breaking down a high-level, often ambiguous task into a sequence of smaller, more manageable subtasks that the LLM can execute individually. This process is essential for several reasons:\n\n**1. Bridging the Gap between Language and Action:** LLMs excel at understanding and generating natural language, but they lack the ability to directly interact with the real world. Task decomposition allows us to translate complex, human-understandable instructions into a series of actions that the LLM can perform through APIs or other interfaces.\n\n**2. Handling Complexity and Uncertainty:** Real-world tasks are often multifaceted and involve uncertainty. Decomposing tasks into smaller units allows the LLM to focus on individual steps, reducing the cognitive burden and enabling more robust plann

In [56]:
# Retrieve
retrieval_chain = generate_docs_for_retrieval | retriever
retireved_docs = retrieval_chain.invoke({"question":question})
retireved_docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search

In [57]:
# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":retireved_docs,"question":question})


'Task decomposition is the process of breaking down large, complex tasks into smaller, more manageable subgoals. This allows LLM agents to efficiently handle complex tasks by tackling them in smaller, more manageable steps. \n'