In [1]:
import os
import bs4
from langchain_community.document_loaders import WebBaseLoader
from dotenv import load_dotenv
load_dotenv()

USER_AGENT environment variable not set, consider setting it to identify your requests.


True

In [2]:
import warnings
warnings.filterwarnings(action="ignore") # for langsmith uuid7 warning

### Let's dive into lilian wang's great post on LLM powered agents

1. get the blog post

In [3]:
loader = WebBaseLoader(
    web_path="https://lilianweng.github.io/posts/2023-06-23-agent/",
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header") # relevant html tags
        )
    )
)

docs = loader.load()

In [4]:
docs[0].page_content

'\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\nMemory\n\nS

2. Chunk the documents.
The complete doc cannot be put into the LLM context window

In [5]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)

splits = text_splitter.split_documents(docs)

3. Convert the text to embeddings and store them in a vector db.

In [6]:
from langchain_community.vectorstores import Chroma
from langchain_huggingface import HuggingFaceEndpointEmbeddings

hf_embedings = HuggingFaceEndpointEmbeddings(
    model="Qwen/Qwen3-Embedding-8B",
    task="feature-extraction",
    huggingfacehub_api_token=os.getenv("HUGGINGFACE_API_KEY")
)

vector_store = Chroma.from_documents(
    documents=splits,
    embedding=hf_embedings,
)

4. Retrieval.

The vector store is our library and the retriever is our smart librarian.
It takes the user query, embeds it, finds the most similar chunks/docs from the vector store.

In [7]:
retriever = vector_store.as_retriever()

5. Let's test it.

In [8]:
query = "What is task decomposition?"

docs = retriever.invoke(input=query)

print(docs[0].page_content)

Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.
Self-Reflection#


In [9]:
len(docs)

4

6. Add context to the LLM to get an accurate answer.

In [62]:
from langsmith import Client

# pull rag prompt
client = Client()
prompt = client.pull_prompt("rlm/rag-prompt")
print(prompt.messages)

[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})]


In [11]:
# initialize llm to use
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

model = HuggingFaceEndpoint(
    model="openai/gpt-oss-20b",
    max_new_tokens=1024,
    huggingfacehub_api_token=os.getenv("HUGGINGFACE_API_KEY")
)

llm = ChatHuggingFace(
    llm=model
)

In [60]:
# chaining everything together

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [13]:
response = rag_chain.invoke(query)
print(response)

Task decomposition is the process of breaking a complex task into smaller, manageable sub‑tasks or steps. It can be achieved by prompting LLMs to list steps, using task‑specific instructions, or incorporating human input, and sometimes by outsourcing planning to external tools like PDDL planners. This approach enables clearer planning, execution, and interpretation of how the overall goal is achieved.


## Multi-query

In [17]:
from langchain_core.prompts import ChatPromptTemplate

# Prompt for generating multiple queries
template = """You are an AI language model assistant. Your task is to generate five \
different versions of the given user question to retrieve relevant documents from a vector \
database. By generating multiple perspectives on the user question, your goal is to help \
the user overcome some of the limitations of the distance-based similarity search. \
Provide these alternative questions separated by newlines.

Original question: {question}"""

prompt_perspectives = ChatPromptTemplate.from_template(template)
prompt_perspectives

ChatPromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, template='You are an AI language model assistant. Your task is to generate five different versions of the given user question to retrieve relevant documents from a vector database. By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of the distance-based similarity search. Provide these alternative questions separated by newlines.\n\nOriginal question: {question}'), additional_kwargs={})])

In [None]:
# create the chain
generate_queries = (
    prompt_perspectives
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

In [19]:
question = "What is task decomposition for LLM agents?"
generated_queries_list = generate_queries.invoke({"question": question})

# Print the generated queries
for i, q in enumerate(generated_queries_list):
    print(f"{i+1}. {q}")

1. How do LLM agents carry out task decomposition?  
2. What methods do large language model agents use to break a task into subtasks?  
3. Can you explain the concept of task decomposition as applied to LLM agents?  
4. In what ways do LLM agents decompose complex tasks, and why is this important?  
5. What techniques are employed by LLM agents for effective task decomposition?


<span style="color:red; font-style:italic">A simple way to combine them is to take the unique set of all retrieved documents.</span>

In [21]:
from langchain_core.load import dumps, loads

def get_unique_union(documents: list[list]):
    """
    simple document to get the union of retrieved docs
    """
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    unique_docs = list(set(flattened_docs))
    return [loads(doc) for doc in unique_docs]

# build the chain
retrieval_chain = generate_queries | retriever.map() | get_unique_union

# invoke chain
docs = retrieval_chain.invoke({"question": question})
print(len(docs))

5


In [23]:
docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\n\nThe system comprises of 4 stages:\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\nInstruction:'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach util

In [22]:
rag_multi_query_chain = (
    {"context": retrieval_chain, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_multi_query_chain.invoke({"question": question})

'Task decomposition for LLM agents is the process by which a large language model splits a complex user request into a set of smaller, manageable sub‑goals or steps. This can be done through simple prompts (“Steps for XYZ”), task‑specific instructions, or by outsourcing planning to an external planner (e.g., LLM+P using PDDL). The resulting sub‑tasks are then executed sequentially or in parallel, allowing the agent to handle long‑horizon or multi‑step tasks efficiently.'

## RAG-Fusion

Multi-Query is a great start, but simply taking a union of documents treats them all equally. What if one document was ranked highly by three of our queries, while another was a low-ranked result from only one?

The first is clearly more important. RAG-Fusion improves on Multi-Query by not just fetching documents, but also …

re-ranking them using a technique called <span style="color:red; font-style:italic">Reciprocal Rank Fusion (RRF).</span>


RRF intelligently combines results from multiple searches. It boosts the score of documents that appear consistently high across different result lists, pushing the most relevant content to the top.

In [24]:
def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal Rank Fusion that intelligently combines multiple ranked lists """
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        for rank, doc in enumerate(docs):
            doc_str = dumps(doc)
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # The core of RRF: documents ranked higher (lower rank value) get a larger score
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort documents by their new fused scores in descending order
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]
    return reranked_results

In [25]:

template = """You are a helpful assistant that generates multiple search queries based on a single input query.

Generate multiple search queries related to: {question}
Output (4 queries):
"""

prompt_rag_fusion = ChatPromptTemplate.from_template(template=template)

generate_queries = (
    prompt_rag_fusion
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retrieval_chain_rag_fusion.invoke({"question": question})

print(f"total re-ranked documents retrieved: {len(docs)}")

total re-ranked documents retrieved: 6


In [26]:
rag_fusion_chain = (
    {"context": retrieval_chain_rag_fusion, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_fusion_chain.invoke({"question": question})

'Task decomposition for LLM agents is the process of breaking a complex user goal into a series of smaller, manageable sub‑tasks or subgoals that the agent can plan and execute step by step.  \nIt can be done by prompting the LLM to list steps, giving task‑specific instructions, or asking for human input, and more advanced methods may involve translating the problem into PDDL for an external planner.  \nThis decomposition enables the agent to plan ahead, reason about dependencies, and perform each sub‑task with greater precision.'

## Decomposition

### Decompose complex queries into simpler queries and answer question:

- one by one
- recursively

In [38]:
# Decomposition prompt
template = """You are a helpful assistant that generates multiple sub-questions related to an input question.
The goal is to break down the input into a set of sub-problems / sub-questions that can be answered in isolation. 

Generate three questions that simplify the original query given below and help in answering it: 
{question} 

Please keep your questions concise. 

Output (3 questions):"""

prompt_decomposition = ChatPromptTemplate.from_template(template)

decomposition_chain = (
    prompt_decomposition
    | llm 
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

# Generate and print the sub-questions
question = "What are the main components of an LLM-powered autonomous agent system?"
sub_questions = decomposition_chain.invoke({"question": question})

for q in sub_questions:
    print(q)

1. What functions does the large language model (LLM) perform within an autonomous agent?  
2. Besides the LLM, which core modules (e.g., perception, planning, action, memory) are required for the agent to operate autonomously?  
3. How do these modules communicate and coordinate to produce the agent’s end‑to‑end behavior?


#### Answer each sub-question and use the resulting Q&A pairs as context to synthesize a final, comprehensive answer.

In [39]:
prompt_rag = client.pull_prompt("rlm/rag-prompt")

# A list to hold the answers to our sub-questions
rag_results = []
for sub_question in sub_questions:
    # Retrieve documents for each sub-question
    retrieved_docs = retriever.invoke(sub_question)
    
    # Use our standard RAG chain to answer the sub-question
    answer = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs, "question": sub_question})
    rag_results.append(answer)

In [40]:
print(len(rag_results))
print("---")
for q in rag_results:
    print(q)

3
---
The LLM serves as the agent’s brain, orchestrating the overall decision‑making process.  
It handles planning tasks such as breaking a goal into sub‑goals and refining actions through reflection and self‑criticism.  
Additionally, it generates natural‑language instructions to external tools and manages memory interactions to guide future behavior.
Besides the LLM, an autonomous agent needs modules that handle the outside world and internal bookkeeping.  
These include a perception stack to sense or ingest sensor data, a planning engine to break tasks into sub‑goals and refine them, an action layer that calls APIs or actuators to affect the environment, and a memory system (short‑term in‑context and long‑term vector store) to retain and retrieve past observations.  
Together these components let the LLM “brain” make decisions, learn from experience, and act in a dynamic setting.
The agent’s pipeline begins with a memory stream that records observations, which a retrieval model sur

In [41]:
def format_qa_pairs(questions, answers):
    """Format Q and A pairs"""
    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    return formatted_string.strip()

# Format the Q&A pairs into a single context string
context = format_qa_pairs(sub_questions, rag_results)

# Final synthesis prompt
template = """Here is a set of Q+A pairs:

{context}

Use these to synthesize an answer to the original question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_response = final_rag_chain.invoke({"context": context, "question": question})

from pprint import pprint
pprint(final_response)

('**Main components of an LLM‑powered autonomous agent**\n'
 '\n'
 '| Component | Core role | Key interactions |\n'
 '|-----------|-----------|-------------------|\n'
 '| **LLM “brain”** | Central decision‑maker. Orchestrates planning, generates '
 'natural‑language instructions, and manages internal reasoning. | Receives '
 'prompts that include memory excerpts, perception data, and instructions; '
 'outputs actions, sub‑goals, reflections, and memory updates. |\n'
 '| **Perception stack** | Ingests raw sensory data (vision, audio, text, '
 'sensor streams) and converts it into structured observations. | Passes '
 'processed observations to the LLM and stores them in memory. |\n'
 '| **Planning engine** | Breaks high‑level goals into actionable sub‑goals, '
 'refines plans, and adjusts them on the fly. | Feeds the LLM with plans or '
 'sub‑goal lists; receives updated plans from the LLM after reflection. |\n'
 '| **Action layer** | Executes commands (API calls, actuator controls, UI '

## Step-Back Prompting

Sometimes, a user’s query is too specific, while our documents contain the more general, underlying information needed to answer it.

For example, a user might ask, “Could the members of The Police perform lawful arrests?”

A direct search for this might fail. The Step-Back technique uses an LLM to take a “step back” and form a more general question, like 

“What are the powers and duties of the band The Police?” 

We then retrieve context for both the specific and general questions, providing a richer context for the final answer.

In [43]:
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

# Few-shot examples to teach the model how to generate step-back (more generic) questions
examples = [
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "What can the members of The Police do?",
    },
    {
        "input": "Jan Sindel's was born in what country?",
        "output": "What is Jan Sindel's personal history?",
    },
]

# Define how each example is formatted in the prompt
example_prompt = ChatPromptTemplate.from_messages([
    ("human", "{input}"),  # User input
    ("ai", "{output}")     # Model's response
])

# Wrap the few-shot examples into a reusable prompt template
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

# Full prompt includes system instruction, few-shot examples, and the user question
prompt = ChatPromptTemplate.from_messages([
    ("system", 
     "You are an expert at world knowledge. Your task is to step back and paraphrase a question "
     "to a more generic step-back question, which is easier to answer. Here are a few examples:"),
    few_shot_prompt,
    ("user", "{question}"),
])

pprint(prompt.messages)

[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:'), additional_kwargs={}),
 FewShotChatMessagePromptTemplate(examples=[{'input': 'Could the members of The Police perform lawful arrests?', 'output': 'What can the members of The Police do?'}, {'input': "Jan Sindel's was born in what country?", 'output': "What is Jan Sindel's personal history?"}], input_variables=[], input_types={}, partial_variables={}, example_prompt=ChatPromptTemplate(input_variables=['input', 'output'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={}), AIMessagePromptTemplate(prompt=PromptTemplate(input_variables=['outp

In [48]:
prompt.format_messages(question=question)

[SystemMessage(content='You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Could the members of The Police perform lawful arrests?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='What can the members of The Police do?', additional_kwargs={}, response_metadata={}),
 HumanMessage(content="Jan Sindel's was born in what country?", additional_kwargs={}, response_metadata={}),
 AIMessage(content="What is Jan Sindel's personal history?", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='What are the main components of an LLM-powered autonomous agent system?', additional_kwargs={}, response_metadata={})]

In [50]:
step_back_query_chain = (
    prompt
    | llm
    | StrOutputParser()
)

question = "What is task decomposition for LLM agents?"
step_back_question = step_back_query_chain.invoke({"question": question})
step_back_question

'How can a task be broken down into smaller, manageable parts?'

In [52]:
from langchain_core.runnables import RunnableLambda

# Prompt for the final response
response_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.

# Normal Context
{normal_context}

# Step-Back Context
{step_back_context}

# Original Question: {question}
# Answer:"""

response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

# The full chain
chain = (
    {
        # Retrieve context using the normal question
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # Retrieve context using the step-back question
        "step_back_context": step_back_query_chain | retriever,
        # Pass on the original question
        "question": lambda x: x["question"],
    }
    | response_prompt
    | llm
    | StrOutputParser()
)

step_back_response = chain.invoke({"question": question})
step_back_response


'**Task decomposition for an LLM‑powered agent** is the procedure by which the model’s “brain” turns a **large, multi‑step problem** into a set of smaller, manageable subtasks (or “sub‑goals”) that can be planned, executed, monitored\u202fand refined one after another.  In other words, it is the *planning* step that turns the user’s request into an actionable sequence of actions for the agent.\n\n---\n\n### Why it matters\n\n1. **Complex tasks do not fit in a single LLM prompt** – most large‑language‑models struggle with very long‑horizon reasoning or with keeping many intermediate facts in mind.  \n2. **Enables modular execution** – once sub‑tasks are explicit, each can be passed to a specialized tool or to a lower‑level routine.  \n3. **Facilitates reflection & learning** – if the agent can see each individual step, it can evaluate successes/failures and correct mistakes in later steps.\n\n---\n\n### Core ingredient of an LLM agent architecture  \n\nLilian\u202fWeng’s overview of LLM

### test chain for understanding

In [54]:
### test chain

test_chain = (
    {
        # Retrieve context using the normal question
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # Retrieve context using the step-back question
        "step_back_context": step_back_query_chain | retriever,
        # Pass on the original question
        "question": lambda x: x["question"],
    }
    | response_prompt
)

response = test_chain.invoke({"question": question})

ChatPromptValue(messages=[HumanMessage(content='You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.\n\n# Normal Context\n[Document(metadata={\'source\': \'https://lilianweng.github.io/posts/2023-06-23-agent/\'}, page_content=\'Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”,

In [56]:
pprint(response.messages)

[HumanMessage(content='You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.\n\n# Normal Context\n[Document(metadata={\'source\': \'https://lilianweng.github.io/posts/2023-06-23-agent/\'}, page_content=\'Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a clas

#### HyDe (Hypothetical Document Embeddings)

 The core problem of retrieval is that a user’s query might use different words than the document (the “vocabulary mismatch” problem).

First, have an LLM generate a hypothetical answer to the question. This fake document, while not factually correct, will be semantically rich and use the kind of language we expect to find in a real answer.

We then embed this hypothetical document and use its embedding to perform the retrieval. The result is that we find real documents that are semantically very similar to an ideal answer.

In [58]:
# HyDe prompt

template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""

prompt_hyde = ChatPromptTemplate.from_template(template=template)

generate_docs_for_retrieval = (
    prompt_hyde
    | llm
    | StrOutputParser()
)

hypothetical_document = generate_docs_for_retrieval.invoke(
    {"question": question}
)

print(hypothetical_document)

**Task Decomposition for Large Language Model Agents**

Large language model (LLM) agents are increasingly being deployed to solve complex, multi‑step problems that require planning, reasoning, and interaction with external tools. A core capability that underlies their effectiveness is **task decomposition** – the systematic breakdown of a high‑level objective into a set of manageable sub‑tasks or atomic actions. In this section we formalize the notion, review algorithmic approaches, and discuss its practical implications for autonomous reasoning systems.

---

### 1. Formal Definition

Let \( \mathcal{T} \) denote a high‑level task specified by a natural‑language instruction \( \mathcal{I} \). Task decomposition is a function
\[
\Delta : \mathcal{T} \rightarrow \{ \tau_1, \tau_2, \dots, \tau_k \}
\]
where each \( \tau_i \) is a sub‑task characterized by:
- **Inputs** \( \mathbf{x}_i \) (environment state, intermediate results, or context),
- **Outputs** \( \mathbf{y}_i \) (desired sub

Now lets use its embedidng to find the real documents for answering the question

In [64]:
retrieval_chain = generate_docs_for_retrieval | retriever
retrieved_docs = retrieval_chain.invoke(
    {"question": question}
)

# use basic rag chain
prompt = client.pull_prompt("rlm/rag-prompt")
rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)
rag_chain.invoke(
        {"question": question, "context": format_docs(retrieved_docs)}
)

'Task decomposition for LLM agents is the process of breaking a large goal into smaller, manageable sub‑goals so the model can plan and execute step by step. It can be done by prompting the LLM directly (“What are the steps for XYZ?”), by giving task‑specific templates (“Write a story outline”), or by incorporating human guidance. Advanced approaches like LLM+P outsource the planning step to an external classical planner using PDDL, translating the problem into PDDL, running the planner, and converting the plan back into natural language.'

By using a hypothetical document as a lure, HyDE helped us zero in on the most relevant chunks in our knowledge base, demonstrating another powerful tool in our RAG toolkit.