In [1]:
# Query transformation is the process of modifying or enhancing a user's original query before it's used to retrieve documents from a knowledge base.

In [2]:
from dotenv import load_dotenv
load_dotenv() # load environment variables

True

In [3]:
# Do all the setup here

import bs4
from langchain_core.prompts import PromptTemplate
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_ollama import OllamaLLM

# Fetch blog content
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

# Split content into chunks
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=300, chunk_overlap=50) # uses tiktoken before splitting
splits = text_splitter.split_documents(blog_docs)

# Index (embed and store) the chunks into a vector db
vectorstore = Chroma.from_documents(documents=splits, embedding=OllamaEmbeddings(model="nomic-embed-text"))
retriever = vectorstore.as_retriever()

# Start llama3.2 model
llm = OllamaLLM(model="llama3.2:1b")

In [4]:
# Define some reusable prompts & queries

# User query
query = "What is task decomposition for LLM agents?"
query_cot = "What are the main components of an LLM-powered autonomous agent system?"

# RAG template
rag_template = PromptTemplate.from_template('''
    You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
    Context: {context}
    Question: {question}
    Answer:
''')

### Problem Statement

User queries are a challenge: _If a user provides an ambiguous query, they'll get ambiguous matches._

LLMs just follow what was in the context and hallucinate answers as a result.

#### Example

User query:
- "Jaguar speed"

Problem:
- "Jaguar" could refer to the animal or the car.
- "Speed" could mean top speed, acceleration, or agility.

Potential bad outcome:
- Retrieval returns mixed results (e.g., car specs and animal facts).
- LLM hallucinates by combining facts:
    - “The Jaguar can reach 150 mph in the wild.”

### Strategy #1: Multi-Query

![Image](rsc/jupyter/multi-query.png)

**Idea**: Instead of relying on just one query, let's generate _multiple semantically different queries_ that represent various plausible interpretations or reformulations of the original user question.

In [5]:
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import PromptTemplate

# Multi Query: Different Perspectives
template_multi = """You are an AI language model assistant. Your task is to generate five
different versions of the given user question to retrieve relevant documents from a vector
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search.
Provide these alternative questions as a numbered list. Don't include text before or after
this list. Original question: {question}"""
prompt_multi_query = PromptTemplate.from_template(template_multi)

generate_queries_multi = (
    prompt_multi_query
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

multi_queries = generate_queries_multi.invoke(query)
print(len(multi_queries))
print(multi_queries)

5
['1. What are tasks decomposed into in large language models (LLMs) like myself?', '2. How do I identify the individual components of a task in LLMs?', "3. What's the process behind splitting complex tasks into smaller sub-tasks within LLMs?", '4. In what ways can I understand and decompose tasks given their representation in vector space?', '5. Can you explain how to decompose a given task from its input representations or embeddings?']


In [6]:
retrieval_chain_multi = generate_queries_multi | retriever.map()

multi_retrievals = retrieval_chain_multi.invoke(query)
print(len(multi_retrievals))  # 5 questions
print(len(multi_retrievals[0]))  # 4 documents per question

5
4


In [7]:
from operator import itemgetter

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [doc.page_content for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return unique_docs

rag_chain_multi = (
    {
        "context": retrieval_chain_multi | get_unique_union,
        "question": itemgetter("question")
    }
    | rag_template
    | llm
    | StrOutputParser()
)

rag_chain_multi.invoke({"question": query})

'Task decomposition for LLM (Large Language Model) agents involves breaking down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks. This process typically involves decomposing the problem into multiple thought steps and generating multiple thoughts per step, creating a tree structure to explore possible solutions. The goal is to reduce the complexity of the task by breaking it down into smaller, more manageable parts, and then providing multiple options or paths to reach the desired outcome.'

### Strategy #2: RAG-Fusion

![Image](rsc/jupyter/rag-fusion.png)

**Idea**: Extension of Multi-Query - we not only generate multiple semantically different queries, but also combine (_fuse_) the results by _ranking and removing duplicates_ before passing them to the language model.

In [8]:
# RAG-Fusion: Related
template_fusion = """You are a helpful assistant that generates multiple search queries based on a single input query. Provide these queries as a numbered list. Don't include text before or after this list. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""
prompt_fusion = PromptTemplate.from_template(template_fusion)

generate_queries_fusion = (
    prompt_fusion
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

fusion_queries = generate_queries_fusion.invoke(query)
print(len(fusion_queries))
print(fusion_queries)

4
['1. what is task decomposition in language model learning models', '2. task decomposition definition llm algorithmic approach', '3. how does task decomposition work for llm models', '4. what tasks can be decomposed by llm models']


In [9]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents
        and an optional parameter k used in the RRF formula """

    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

retrieval_chain_fusion = generate_queries_fusion | retriever.map() | reciprocal_rank_fusion

docs = retrieval_chain_fusion.invoke({"question": query})
len(docs)

  (loads(doc), score)


8

In [10]:
rag_chain_fusion = (
    {
        "context": retrieval_chain_fusion,
        "question": itemgetter("question")
    }
    | rag_template
    | llm
    | StrOutputParser()
)

rag_chain_fusion.invoke({"question": query})

"Task decomposition for Large Language Model (LLM) agents involves breaking down complex tasks into smaller, manageable subgoals to enhance efficiency and improve model performance. This process typically requires a chain of thought or tree of thoughts to explore multiple possible solutions and generate multiple thoughts per step. The goal is to transform big tasks into multiple manageable tasks that can be solved more effectively using the LLM's capabilities."

### Strategy #3: Decomposition

![Image](rsc/jupyter/decomposition.png)

**Idea**: Break down a complex query into simpler, more specific sub-queries that each target a different facet of the information need.

_Note: This is especially helpful in multi-hop questions or those requiring reasoning over multiple pieces of information._

In [11]:
template_decomposition_generate = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Provide these sub-questions as a numbered list. Don't include text before or after this list. \n

Generate multiple search queries related to: {question} \n
Output (3 queries):"""
prompt_decomposition_generate = PromptTemplate.from_template(template_decomposition_generate)

generate_queries_decomposition = (
    prompt_decomposition_generate
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

decomposition_queries = generate_queries_decomposition.invoke(query_cot)
print(len(decomposition_queries))
print(decomposition_queries)

3
['1. What are the primary components required for a large language model (LLM) to function effectively as a self-driving car?', '2. How do different architectures contribute to the overall performance and flexibility of an LLM-based autonomous driving system?', '3. What role does pre-trained knowledge retrieval in LLMs play when it comes to integrating with external sensor data in a self-driving vehicle?']


In [12]:
template_decomposition_cot = """Here is the question you need to answer:

\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {qa_pairs} \n --- \n

Here is additional context relevant to the question:

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

prompt_decomposition_cot = PromptTemplate.from_template(template_decomposition_cot)

rag_chain_decomposition = (
    {
        "context": itemgetter("question") | retriever.map(),
        "question": itemgetter("question"),
        "qa_pairs": itemgetter("qa_pairs")
    }
    | prompt_decomposition_cot
    | llm
    | StrOutputParser()
)

def format_qa_pair(question, answer):
    """Format Q and A pair"""

    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

qa_pairs = ""
for question in decomposition_queries:
    answer = rag_chain_decomposition.invoke({"question": question, "qa_pairs": qa_pairs})  # sub-problem answers
    qa_pair = format_qa_pair(question, answer)
    qa_pairs = qa_pairs + "\n---\n" + qa_pair

answer = rag_chain_decomposition.invoke({"question": query_cot, "qa_pairs": qa_pairs})  # final answer to original question

print(answer)

Based on the provided context, I'll break down the main components of an LLM (Large Language Model)-powered autonomous agent system:

1. **Language Model (LLM):** This is a crucial component that allows the agent to understand and process human language. In this case, it's mentioned as being fine-tuned for specific tasks like verbal math problems.

2. **Memory:** As discussed in the context of Long-Term Memory (LTM), LLMs are equipped with two types of memory:
   - Explicit / declarative memory: Stores facts and events that can be consciously recalled.
   - Implicit / procedural memory: Enables automatic skills and routines, such as riding a bike or typing on a keyboard.

3. **Maximum Inner Product Search (MIPS):** This is an algorithm used for fast maximum inner-product search, which is crucial for tasks like visual recognition in images. MIPS typically employs approximation nearest neighbors algorithms to find nearby items quickly, while sacrificing some accuracy.

4. **External APIs

### Strategy #4: Step-Back

![Image](rsc/jupyter/step-back.png)

**Idea**: Instead of directly answering a complex question with a single retrieval call, "step back" and ask: What do I need to know to answer this? Then use those simpler () subquestions to drive retrieval and reasoning.

In [13]:
prompt_step_back = PromptTemplate.from_template("""You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer.

Here are a few examples:
    1. input: Could the members of the police perform lawful arrests?
       output: What can the members of the police do?
    2. input: Jan Sindel’s was born in what country?
       output: What is Jan Sindel’s personal history?

Rules:
- Don't include the examples above in your output
- Only include the output.
    - Don't include text before or after.
    - Don't put quotes around the output.

Here is your input:
"{question}"

output:
""")

generate_queries_step_back = prompt_step_back | llm | StrOutputParser()

generate_queries_step_back.invoke({"question": query})

'What are task decomposition for Large Language Model (LLM) agents?'

In [14]:
# Response prompt from paper
rag_template_step_back = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.

# {normal_context}
# {step_back_context}

# Original Question: {question}
# Answer: """

rag_prompt_step_back = PromptTemplate.from_template(rag_template_step_back)

chain = (
    {
        # Retrieve context using the normal question
        "normal_context": itemgetter("question") | retriever,
        # Retrieve context using the step-back question
        "step_back_context": generate_queries_step_back | retriever,
        # Pass on the question
        "question": itemgetter("question"),
    }
    | rag_prompt_step_back
    | llm
    | StrOutputParser()
)

chain.invoke({"question": query})

'Task decomposition is a critical component of building autonomous agents using large language models (LLMs). Task decomposition refers to the process of breaking down complex tasks into smaller, manageable subtasks that can be executed by the agent.\n\nIn the context provided, task decomposition involves several key steps:\n\n1. **Subgoal and decomposition**: The agent breaks down large tasks into smaller, more specific subgoals.\n2. **Reflection and refinement**: The agent reflects on past actions, learns from mistakes, and refines them for future steps to improve the quality of final results.\n3. **Task decomposition with human inputs**: In some cases, task decomposition can be done using human-provided instructions or prompts.\n\nFinite context length limitations are a challenge in task decomposition, as they restrict the inclusion of historical information, detailed instructions, API call context, and responses. To mitigate this issue, mechanisms like self-reflection can learn fro

### Strategy #5: Hyde

![Image](rsc/jupyter/hyde.png)

**Idea**: Instead of retrieving documents based on the query, retrieve based on a hypothetical answer (based on the query).

In [15]:
# HyDE document generation
template_hyde = """Please write a scientific paper passage to answer the question
Question: {question}
Passage: """
prompt_hyde = PromptTemplate.from_template(template_hyde)

generate_hyde_docs = prompt_hyde | llm | StrOutputParser()

generate_hyde_docs.invoke({"question": query})

'Task Decomposition in Large Language Models (LLMs): A Framework for Multitasking and Autonomy\n\nTask decomposition, a fundamental concept in cognitive psychology and artificial intelligence, refers to the process of breaking down complex tasks into smaller, manageable subtasks. In the context of Large Language Models (LLMs), task decomposition has emerged as a critical component for achieving multitask learning and autonomy. By decomposing a given task or domain into constituent subtasks, LLMs can exploit their parallel processing capabilities, reducing computational overhead and improving overall performance.\n\nAt its core, task decomposition involves identifying the relevant subtasks that comprise the original task, typically represented through a hierarchical representation of the task space (i.e., a set of high-level concepts and fine-grained details). This identification is followed by a process of refinement, where each subtask is assigned to a specific LLM agent or component,

In [16]:
# Retrieve using HyDE document
retrieval_chain_hyde = generate_hyde_docs | retriever

retrieved_docs = retrieval_chain_hyde.invoke({"question": query})
print(retrieved_docs)

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refi

In [17]:
rag_chain_hyde = (
    {
        "context": retrieval_chain_hyde,
        "question": itemgetter("question")
    }
    | rag_template
    | llm
    | StrOutputParser()
)

rag_chain_hyde.invoke({"question": query})

'Task decomposition for LLM (Large Language Model) agents involves breaking down large tasks into smaller, manageable subgoals to improve efficiency and quality of results. This process typically involves several steps such as planning, reflection, and refinement, using techniques like Chain of Thought (CoT), Tree of Thoughts (Yao et al.), and model-specific instructions.'