### Decomposition

This process of splitting an input into multiple distinct sub-queries is what we refer to as query decomposition. It is also sometimes referred to as sub-query generation. This simplifies the prompts and increases the context for the retrieval process.

In [1]:
from dotenv import load_dotenv, dotenv_values
import google.generativeai as genai
from IPython.display import Markdown, display
import os 


load_dotenv()
os.getenv("GOOGLE_API_KEY") 
my_api_key = os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=my_api_key)

In [2]:
#### INDEXING ####

# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [3]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Chroma

## Call Embedding Model
embedding = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")


vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=embedding)

retriever = vectorstore.as_retriever()

In [4]:
from langchain.prompts import ChatPromptTemplate

# Decomposition
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Output (3 queries):"""
prompt_decomposition = ChatPromptTemplate.from_template(template)

In [5]:
from langchain_core.output_parsers import StrOutputParser
from langchain_google_genai import ChatGoogleGenerativeAI

# LLM
llm = ChatGoogleGenerativeAI(model= "gemini-1.5-flash", temperature = 0)

# Chain
generate_queries_decomposition = ( prompt_decomposition | llm | StrOutputParser() | (lambda x: x.split("\n")))

# Run
question = "What is task decomposition for LLM agents?"
questions = generate_queries_decomposition.invoke({"question":question})

In [6]:
questions

['Here are three search queries related to "What is task decomposition for LLM agents?":',
 '',
 '1. **"Task decomposition in large language model agents"** - This query focuses on the general concept of task decomposition within the context of LLM agents.',
 '2. **"How do LLMs decompose complex tasks into subtasks"** - This query delves into the specific mechanisms LLMs use to break down complex tasks.',
 '3. **"Benefits of task decomposition for LLM agent performance"** - This query explores the advantages of using task decomposition for improving the effectiveness of LLM agents. ',
 '']

##### Answer recursively

Here we are passing the questions one by one along with the previous Q-and-A response and context fetched for the current question. This, in turn, retains the old perspective and synchronizes the solution with the new perspective, making the solution more nuanced. This approach has proven to be effective against really complex queries.


In [7]:
# Prompt
template = """Here is the question you need to answer:

\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question: 

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

In [8]:
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

def format_qa_pair(question, answer):
    """Format Q and A pair"""
    
    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

q_a_pairs = ""
for q in questions:
    
    rag_chain = (
    {"context": itemgetter("question") | retriever, 
     "question": itemgetter("question"),
     "q_a_pairs": itemgetter("q_a_pairs")} 
    | decomposition_prompt
    | llm
    | StrOutputParser())

    answer = rag_chain.invoke({"question":q,"q_a_pairs":q_a_pairs})
    q_a_pair = format_qa_pair(q,answer)
    q_a_pairs = q_a_pairs + "\n---\n"+  q_a_pair

In [9]:
answer

'## What is task decomposition for LLM agents?\n\nTask decomposition is a crucial technique for building effective and safe LLM agents. It involves breaking down complex tasks into smaller, manageable steps that the LLM can understand and execute. This process is essential for several reasons:\n\n**1. Handling Complexity:** LLMs are powerful language processors, but they struggle with complex tasks requiring multiple steps and reasoning. Task decomposition breaks down these challenges into smaller, manageable chunks, allowing LLMs to tackle them more effectively.\n\n**2. Improved Efficiency:** By dividing a task into smaller steps, LLMs can focus their processing power on each individual step, leading to more efficient execution. This is particularly important for tasks that require a lot of computational resources.\n\n**3. Enhanced Safety and Control:** Task decomposition allows for better control over the LLM\'s actions. By breaking down a task into smaller steps, it becomes easier t

##### Answer individually 
Also known as Parallel Answering Approach, in this approach, we are decomposing the user prompt into nuanced slices as before. The difference is that we are attempting to solve them in parallel. Here, we answer each question individually and then combine them together for a much more nuanced context, which is then used for answering the user query. Depending on the quality of the sub-queries, this approach is an efficient solution for most use cases.

In [11]:
# Answer each sub-question individually 

from langchain import hub
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser


# RAG prompt
prompt_rag = hub.pull("rlm/rag-prompt")

def retrieve_and_rag(question,prompt_rag,sub_question_generator_chain):
    """RAG on each sub-question"""
    
    # Use our decomposition / 
    sub_questions = sub_question_generator_chain.invoke({"question":question})
    
    # Initialize a list to hold RAG chain results
    rag_results = []
    
    for sub_question in sub_questions:
        
        # Retrieve documents for each sub-question
        retrieved_docs = retriever.invoke(sub_question)
        
        # Use retrieved documents and sub-question in RAG chain
        answer = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs, 
                                                                "question": sub_question})
        rag_results.append(answer)
    
    return rag_results,sub_questions

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag, generate_queries_decomposition)

In [12]:
def format_qa_pairs(questions, answers):
    """Format Q and A pairs"""
    
    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    return formatted_string.strip()

context = format_qa_pairs(questions, answers)

# Prompt
template = """Here is a set of Q+A pairs:

{context}

Use these to synthesize an answer to the question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":context,"question":question})


"## What is task decomposition for LLM agents?\n\nTask decomposition for LLM agents is the process of breaking down large, complex tasks into smaller, more manageable subgoals. This allows the agent to efficiently handle complex tasks and improve the quality of its final results. \n\nHere's how it works:\n\n* **LLMs can be prompted to decompose tasks using:**\n    * **Simple instructions:**  The prompt can directly instruct the LLM to break down the task into steps.\n    * **Task-specific instructions:**  The prompt can provide specific instructions tailored to the task, guiding the LLM on how to decompose it.\n    * **Human input:**  Humans can provide explicit instructions on how to decompose the task, giving the LLM a clear roadmap.\n\n* **Techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) are used to decompose tasks into smaller steps:**\n    * **CoT:**  Prompts the model to break down tasks into smaller steps, allowing it to reason and solve problems more effectivel