#### LLM Compiler with LangGraph

In this notebook, we will be implementing the LLM compiler algorithm with langgraph. LLM Compiler is an agent architecture designed to speed up the eectuion of agentic tasks by eagerly-execting tasks within a DAG. It also saves costs on redundant token usage by reducing the number of calls to the LLM.

The system is composed fo 3 main components:
- Planner: stream a dag of tasks
- Task fetching unit: schedules and executes the task as soon as they are executed
- Joiner: Responds to the usrer or triggers a second plan (I have no idea what this is)

In [1]:
import os
from dotenv import load_dotenv


load_dotenv()

True

#### Defining our tools
The first stage is to define the tools, that would be used by our application. In this project, we will be making use of math tools.


In [2]:
import math
import re 
from typing import List, Optional

import numexpr 
from langchain.chains.openai_functions import create_structured_output_runnable
from langchain_openai.chat_models import ChatOpenAI
from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.runnables import RunnableConfig
from langchain_core.tools import StructuredTool
from textwrap import dedent

In [3]:
_MATH_DESCRIPTION = dedent("""
 "math(problem: str, context: Optional[list[str]]) -> float:\n"
" - Solves the provided math problem.\n"
' - `problem` can be either a simple math problem (e.g. "1 + 3") or a word problem (e.g. "how many apples are there if there are 3 apples and 2 apples").\n'
" - You cannot calculate multiple expressions in one call. For instance, `math('1 + 3, 2 + 4')` does not work. "
"If you need to calculate multiple expressions, you need to call them separately like `math('1 + 3')` and then `math('2 + 4')`\n"
" - Minimize the number of `math` actions as much as possible. For instance, instead of calling "
'2. math("what is the 10% of $1") and then call 3. math("$1 + $2"), '
'you MUST call 2. math("what is the 110% of $1") instead, which will reduce the number of math actions.\n'
# Context specific rules below
" - You can optionally provide a list of strings as `context` to help the agent solve the problem. "
"If there are multiple contexts you need to answer the question, you can provide them as a list of strings.\n"
" - `math` action will not see the output of the previous actions unless you provide it as `context`. "
"You MUST provide the output of the previous actions as `context` if you need to do math on it.\n"
" - You MUST NEVER provide `search` type action's outputs as a variable in the `problem` argument. "
"This is because `search` returns a text blob that contains the information about the entity, not a number or value. "
"Therefore, when you need to provide an output of `search` action, you MUST provide it as a `context` argument to `math` action. "
'For example, 1. search("Barack Obama") and then 2. math("age of $1") is NEVER allowed. '
'Use 2. math("age of Barack Obama", context=["$1"]) instead.\n'
" - When you ask a question about `context`, specify the units. "
'For instance, "what is xx in height?" or "what is xx in millions?" instead of "what is xx?"\n'
""")


_SYSTEM_PROMPT = dedent("""
Translate a math problem into a expression that can be executed using Python's numexpr library. Use the output of running this code to answer the question.

Question: ${{Question with math problem.}}
```text
${{single line mathematical expression that solves the problem}}
```
...numexpr.evaluate(text)...
```output
${{Output of running the code}}
```
Answer: ${{Answer}}

Begin.

Question: What is 37593 * 67?
ExecuteCode({{code: "37593 * 67"}})
...numexpr.evaluate("37593 * 67")...
```output
2518731
```
Answer: 2518731

Question: 37593^(1/5)
ExecuteCode({{code: "37593**(1/5)"}})
...numexpr.evaluate("37593**(1/5)")...
```output
8.222831614237718
```
Answer: 8.222831614237718
""")


_ADDITIONAL_CONTEXT_PROMPT = """The following additional context is provided from other functions.\
    Use it to substitute into any ${{#}} variables or other words in the problem.\
    \n\n${context}\n\nNote that context varibles are not defined in code yet.\
You must extract the relevant numbers and directly put them in code."""


In [9]:
class ExecutedCode(BaseModel):
    """The input to the numexpr.evaluate() function"""
    
    reasoning: str = Field(
        description="The reasoning behind the code expression, including how context is included, if applicable"
    )
    code: str = Field(
        description="The simple code expression to execute by numexpr.evaluate()"
    )
    
def _evaluate_expression(expression: str) -> str:
    try:
        local_dict = {"pi": math.pi, "e": math.e}
        output = str(
            numexpr.evaluate(
                expression.strip(),
                global_dict={},
                local_dict=local_dict
            )
        )
    except Exception as e:
        raise ValueError("Failed to evaluate '{expression}'. Raised error; {repr(e)}. \n Please try again with a valid numerical expression")
    return re.sub(r"^\[|\]$", "", output)

def get_math_tool(llm: ChatOpenAI):
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", _SYSTEM_PROMPT),
            ("user", "{problem}"),
            MessagesPlaceholder(variable_name="context", optional=True)
        ]
    )
    extractor = create_structured_output_runnable(ExecutedCode, llm, prompt)
    
    def calculate_expression(problem: str, context: Optional[List[str]] = None, config: Optional[RunnableConfig] = None):
        chain_input = {"problem": problem}
        if context:
            context_str = "\n".join(context)
            if context_str.strip():
                context_str = _ADDITIONAL_CONTEXT_PROMPT.format(
                    context=context_str.strip()
                )
                chain_input["context"] = [SystemMessage(content=context_str)]
        code_model = extractor.invoke(chain_input, config)
        try:
            return _evaluate_expression(code_model.code)
        except Exception as e:
            return repr(e)
        
    return StructuredTool.from_function(
        name="math",
        func=calculate_expression,
        description=_MATH_DESCRIPTION
    )

#### Defining our Chat model
In this section, we will be defining the chat model that would be used to develop our LLM Compiler applicaton. We will be making use of Mixtral 8x7b instruct model

In [5]:
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai.chat_models import ChatOpenAI


calculate = get_math_tool(ChatOpenAI(model="mistralai/Mixtral-8x7B-Instruct-v0.1"))
search = TavilySearchResults(
    max_results=1,
    description="tavily_search_results_json(query='the search query') - a search engine."
)

tools = [calculate, search]

In [8]:
calculate.invoke({
    "problem": "What's the temperature of sf + 5?",
    "context": ["The temperature of sf is 32 degrees"]
})

OutputParserException: Could not parse function call: 'tool_calls'

#### Planner
Largely adapted from the original source code, the planner accepts the input question and generates a task list to execute. 
It is provided with a previous plan, it is instructed to re-plan, which is useful if, upon completion of the first batch of tasks, the agent must take more actions. 
The planner uses an output parser that takes in a list of taks and breaks then down, extracting each task and its dependencies

In [10]:
import ast
import re
from typing import (
    Any, 
    Dict,
    Iterator, 
    List, 
    Optional, 
    Sequence, 
    Tuple,
    Union
)
from langchain_core.exceptions import OutputParserException
from langchain_core.messages import BaseMessage
from langchain_core.output_parsers.transform import BaseTransformOutputParser
from langchain_core.runnables import RunnableConfig
from langchain_core.tools import BaseTool
from typing_extensions import TypedDict

In [11]:
THOUGHT_PATTERN = r"Thought: ([^\n]*)"
ACTION_PATTERN = r"\n*(\d+)\. (\w+)\((.*)\)(\s*#\w+\n)?"
# $1 or ${1} -> 1
ID_PATTERN = r"\$\{?(\d+)\}?"
END_OF_PLAN = "<END_OF_PLAN>"



In [13]:
# define helper functions
def _ast_parse(arg: str) -> Any:
    try:
        return ast.literal_eval(arg)
    except:
        return arg
    
    
def _parse_llm_compiler_action_args(args: str, tool:Union[str, BaseTool]) ->list[Any]:
    """
        Parse arguments form a string
    """
    if args == "":
        return ()
    if isinstance(tool, str):
        return ()
    
    extracted_args = {}
    tool_key = None
    prev_idx = None
    
    for key in tool.args.keys():
        if f"{key}=" in args:
            idx = args.index(f"{key}=")
            if prev_idx is not None:
                extracted_args[tool_key] = _ast_parse(
                    args[prev_idx:idx].strip().rstrip(",")
                )
            args = args.split(f"{key}=", 1)[1]
            tool_key = key
            prev_idx = 0
            
    if prev_idx is not None:
        extracted_args[tool_key] = _ast_parse((
            args[prev_idx:].strip().rstrip(",").rstrip(")")
        ))
    return extracted_args


def default_dependency_rule(idx, args: str):
    matches = re.findall(ID_PATTERN, args)
    numbers = [int(match) for match in matches]
    return idx in numbers

def _get_dependencies_from_graph(
    idx: int, tool_name: str, args: Dict[str, Any]
) -> dict[str, list[str]]:
    """Get dependencies from a grap"""
    if tool_name == "join":
        return list(range(1, idx))
    
    return [i for i in range(1, idx) if default_dependency_rule(i, str(args))]

class Task(TypedDict):
    idx: int
    tool: BaseTool
    args: list
    dependencies: Dict[str, list]
    thought: Optional[str]
    
    
def instantiate_task(
    tools: Sequence[BaseTool],
    idx: int, 
    tool_name: str,
    args: Union[str, Any],
    thought: Optional[str] = None 
) -> Task:
    if tool_name == "join":
        tool = "join"
    else:
        try:
            tool = tools[[tool.name for tool in tools].index(tool_name)]
        except Exception as e:
            raise OutputParserException(f"Tool {tool_name} not found") from e
    tool_args = _parse_llm_compiler_action_args(args, tool)
    dependencies = _get_dependencies_from_graph(idx, tool_name, tool_args)
    
    return Task(
        idx=idx,
        tool=tool,
        args=tool_args,
        dependencies=dependencies,
        thought=thought
    )
    
    
class LLMCompilerPlanParser(BaseTransformOutputParser[dict], extra="allow"):
    """Planning output parser"""
    tools: List[BaseTool]
    
    def _transform(self, input: Iterator[Union[str, BaseMessage]]) -> Iterator[Task]:
        texts = []
        thought = None
        for chunk in input:
            text = chunk if isinstance(chunk, str) else str(chunk.content)
            for task, thought in self.ingest_token(text, texts, thought):
                yield task
        if texts:
            task, _ = self._parse_task("".join(texts), thought)
            if task:
                yield task
                
    def parse(self, text: str) -> List[Task]:
        return list(self._transform([text]))
    
    def stream(
        self, input: str| BaseMessage, config: RunnableConfig | None = None, **kwargs: Any |None) -> Iterator[Task]:
        yield from self.transform([input], config, **kwargs)
        
    def ingest_token(self, token:str, buffer: List[str], thought: Optional[str]=None):
        buffer.append(token)
        if "\n" in token:
            buffer_ =  "".join(buffer).split("\n")
            suffix =buffer_[-1]
            for line in buffer_[:-1]:
                task, thought = self._parse_task(line,thought)
                if task:
                    yield task, thought
                    
            buffer.clear()
            buffer.append(suffix)
            
    
    def _parse_task(self, line: str, thought: Optional[str] = None):
        task = None
        if match := re.match(THOUGHT_PATTERN, line):
            thought = match.group(1)
        elif match := re.match(ACTION_PATTERN, line):
            idx, tool_name, args, _ = match.groups()
            idx = int(idx)
            
            task = instantiate_task(
                tools=self.tools,
                idx=idx,
                tool_name=tool_name,
                args=args,
                thought=thought
            )
            thought = None
        return task, thought

In [15]:
from typing import Sequence
from langchain_core.language_models import BaseChatModel
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableBranch
from langchain_core.tools import BaseTool

from langchain_core.messages import (
    BaseMessage, 
    FunctionMessage,
    HumanMessage,
    SystemMessage
)

from langchain import hub
from langchain_openai import ChatOpenAI

In [16]:
prompt = hub.pull("wfh/llm-compiler")
print(prompt.pretty_print())


Given a user query, create a plan to solve it with the utmost parallelizability. Each plan should comprise an action from the following [33;1m[1;3m{num_tools}[0m types:
[33;1m[1;3m{tool_descriptions}[0m
[33;1m[1;3m{num_tools}[0m. join(): Collects and combines results from prior actions.

 - An LLM agent is called upon invoking join() to either finalize the user query or wait until the plans are executed.
 - join should always be the last action in the plan, and will be called in two scenarios:
   (a) if the answer can be determined by gathering the outputs from tasks to generate the final response.
   (b) if the answer cannot be determined in the planning phase before you execute the plans. Guidelines:
 - Each action described above contains input/output types and description.
    - You must strictly adhere to the input and output types for each action.
    - The action descriptions contain the guidelines. You MUST strictly follow those guidelines when you use the actions.
 -

In [18]:
def create_planner(llm: BaseChatModel, tools: Sequence[BaseTool], base_prompt: ChatPromptTemplate):
    tool_descriptions = "\n".join(
        f"{i}.{tool.description}\n" for i, tool in enumerate(tools)
    )
    planner_prompt = base_prompt.partial(
        replan="",
        num_tools=len(tools),
        tool_descriptions=tool_descriptions
    )
    
    replanner_prompt =  base_prompt.partial(
        replan=dedent(' - You are given "Previous Plan" which is the plan that the previous agent created along with the execution results '
        "(given as Observation) of each plan and a general thought (given as Thought) about the executed results."
        'You MUST use these information to create the next plan under "Current Plan".\n'
        ' - When starting the Current Plan, you should start with "Thought" that outlines the strategy for the next plan.\n'
        " - In the Current Plan, you should NEVER repeat the actions that are already executed in the Previous Plan.\n"
        " - You must continue the task index from the end of the previous one. Do not repeat task indices."),
        num_tools=len(tools),
        tool_descriptions=tool_descriptions
    )
    
    def should_replan(state: list):
        return isinstance(state[-1], SystemMessage)
    
    def wrap_messages(state: list):
        return {"messages": state}
    
    def wrap_and_get_last_index(state: list):
        next_task = 0
        for message in state[::-1]:
            if isinstance(message, FunctionMessage):
                next_task = message.additional_kwargs["idx"] + 1
                break
        state[-1].content = state[-1].content + f" - Begin couting at : {next_task}"
        return {"messages": state}
    
    return (
        RunnableBranch((should_replan, wrap_and_get_last_index | replanner_prompt), wrap_messages | planner_prompt,)  | llm | LLMCompilerPlanParser(tools=tools)
    )

In [29]:
llm = ChatOpenAI(model="mistralai/Mixtral-8x7B-Instruct-v0.1", temperature=0.1)
planner = create_planner(llm, tools, prompt)

In [30]:
example_question = "What's the temperature in SF raised to the 3rd power?"

for task in planner.stream([HumanMessage(content=example_question)]):
    print(task["tool"], "The arguments are: ",task["args"])
    print("---")

name='math' description='math(problem: str, context: Optional[List[str]] = None, config: Optional[langchain_core.runnables.config.RunnableConfig] = None) - "math(problem: str, context: Optional[list[str]]) -> float:\n"\n" - Solves the provided math problem.\n"\n\' - `problem` can be either a simple math problem (e.g. "1 + 3") or a word problem (e.g. "how many apples are there if there are 3 apples and 2 apples").\n\'\n" - You cannot calculate multiple expressions in one call. For instance, `math(\'1 + 3, 2 + 4\')` does not work. "\n"If you need to calculate multiple expressions, you need to call them separately like `math(\'1 + 3\')` and then `math(\'2 + 4\')`\n"\n" - Minimize the number of `math` actions as much as possible. For instance, instead of calling "\n\'2. math("what is the 10% of $1") and then call 3. math("$1 + $2"), \'\n\'you MUST call 2. math("what is the 110% of $1") instead, which will reduce the number of math actions.\n\'\n# Context specific rules below\n" - You can o

##### Task Fetchin unit
This component shedules the tasks. It receives a stream of tools fo the following format
{
    tool: BaseTool,
    dependencies: number[]
}

The basic idea is to begin executing tools as soon as their dependencies ae met. This is done through multi-threading. We wil combine the task fetching unit and exectutor below


In [31]:
from typing import Any, Union, Iterable, List, Tuple, Dict
from typing_extensions import TypedDict

from langchain_core.runnables import (
    chain as as_runnable
)

from concurrent.futures import ThreadPoolExecutor, wait
import time


def _get_obsservations(messages: List[BaseMessage]) -> Dict[int, Any]:
    results = {}
    for message in messages[::-1]:
        if isinstance(message, FunctionMessage):
            results[int(message.additional_kwargs["idx"])] = message.content
    return results


class SchedulerInput(TypedDict):
    messages: List[BaseMessage]
    tasks: Iterable[Task]
    
    
def _execute_task(task, observations, config):
    tool_to_use = task["tool"]
    if isinstance(tool_to_use, str):
        return tool_to_use
    args = task["args"]
    
    try:
        if isinstance(args, str):
            resolved_args = _resolve_arg(args, observations)
        elif isinstance(args, dict):
            resolved_args = {
                key: _resolve_arg(val, observations) for key, val in args.items()
            }
        else:
            resolved_arg = args
            
    except Exception as e:
        return (
            f"ERROR(Failed to call {tool_to_use.name} with args {args})"
            f"Args cound not be resolved. Error: {repr(e)}"
        )
    try:
        return tool_to_use.invoke(resolved_arg, config)
    except Exception as e:
        return (
            f"ERROR(Failed to call {tool_to_use.name} with args {args})."
            + f" Args could not be resolved. Error: {repr(e)}"
        )

def _resolve_arg(arg: Union[str, Any], observations: Dict[int, Any]):
    if isinstance(arg, str) and arg.startswith("$"):
        try:
            stripped = arg[1:].replace(".output", "").strip("{}")
            idx = int(stripped)
        except Exception:
            return str(arg)
        return str(observations[idx])
    
    elif isinstance(arg,list):
        return [_resolve_arg(a, observations) for a in arg]
    else:
        return str(arg)
    
    
@as_runnable
def schedule_task(task_inputs, config):
    task: Task = task_inputs["task"]
    observations: Dict[int, Any] = task_inputs["observations"]
    
    try:
        observation = _execute_task(task, observations, config)
    except Exception:
        import traceback
        
        observation = traceback.format_exception()
    observations[task["idx"]] = observation
    
    
def schedule_pending_task(
    task: Task, observations: Dict[int, Any], retry_after: float = 0.2
):
    while True:
        deps = task["dependencies"]
        if deps and (any([dep not in observations for dep in deps])):
            time.sleep(retry_after)
            continue
        
        schedule_task.invoke({"task": task, "observations": observations})
        break
    
@as_runnable
def schedule_tasks(scheduler_input: SchedulerInput) -> List[FunctionMessage]:
    """Group the tasks into a DAG schedule"""
    
    tasks = scheduler_input["tasks"]
    messages = scheduler_input["messages"]
    observations = _get_obsservations(messages)
    task_names = {}
    originals = set(observations)
    
    futures = []
    retry_after = 0.25
    with ThreadPoolExecutor() as executor:
        for task in tasks:
            deps = task["dependencies"]
            task_names[task["idx"]] = (
                task["tool"] if isinstance(task["tool"], str) else task["tool"].name
            )
            if (
                deps and (any([dep not in observations for dep in deps]))
            ):
                futures.append(
                    executor.submit(
                        schedule_pending_task, task, observations, retry_after
                    )
                )
            else:
                schedule_task.invoke(dict(task=task, observations=observations))
                
        wait(futures)
    new_observations = {
        k: (task_names[k], observations[k]) for k in sorted(observations.keys() - originals)
    }
    tool_messages = [
        FunctionMessage(name=name, content=str(obs), additional_kwargs={"idx":k }) for k, (name, obs) in new_observations.items()
        
    ]

    return tool_messages
    


In [32]:
import itertools

@as_runnable
def plan_and_shedule(messages: List[BaseMessage], config):
    tasks = planner.stream(messages, config)
    tasks = itertools.chain([next(tasks)], tasks)
    scheduled_tasks = schedule_tasks.invoke(
        {
            "messages": messages,
            "tasks": tasks,
        },
        config,
    )
    return scheduled_tasks

##### Joiner
So now we have the planning and initial execution done. We need a component to process these outptus and either:
1. Respond with the correct answer
2. Loop with a new plan

In [35]:
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain.chains.openai_functions import create_structured_output_runnable
from langchain_core.messages import AIMessage

class FinalResponse(BaseModel):
    """ The final response/answer. """
    response: str
    
    
class Replan(BaseModel):
    feedback: str = Field(
        description="Analysis of the previous attempts and recommendations on what needs to be fixed"
    )
    
    
class Replan(BaseModel):
    feedback: str = Field(
        description="Analysis of the previous attempts and recommendations on what needs to be fixed"
    )
    
class JoinOutputs(BaseModel):
    """ Decide whether to replan or whether you can return the final response"""
    
    thought: str = Field(
        description="The cahin of thought reasoning for the selected action"
    )
    action: Union[FinalResponse, Replan]
    
joiner_prompt = hub.pull("wfh/llm-compiler-joiner").partial(examples="")

llm = ChatOpenAI(model="mistralai/Mixtral-8x7B-Instruct-v0.1")
runnable = create_structured_output_runnable(JoinOutputs, llm, joiner_prompt)

In [36]:
def _parse_joiner_output(decision: JoinOutputs) -> List[BaseMessage]:
    response = [AIMessage(content=f"Thought: {decision.thought}")]
    if isinstance(decision.action, Replan):
        return response + [
            SystemMessage(content=f"Context from last attempt: {decision.action.feedback}")
        ]
    else:
        return response + [AIMessage(content=decision.action.response)]
    
def select_recent_messages(messages: list) -> dict:
    selected = []
    for msg in messages[::-1]:
        selected.append(msg)
        if isinstance(msg, HumanMessage):
            break
    return {"messages": selected[::-1]}


joiner = select_recent_messages | runnable | _parse_joiner_output

##### Compose using LangGraph

Now we have defined all individual components of the LLM compiler, we can now stitch everything together into a langgraph to orchestrate the LLM Compiler execution. We'll define the agent as a stateful grpah, with the main nodes being:
1. Plan and executre (the DAG from the first step above)
2. Join: determine if we should finish or replan
3. Recontextualize: update the graph state based on the output from the joiner

In [37]:
from langgraph.graph import MessageGraph, END
from typing import Dict

graph_builder = MessageGraph()

graph_builder.add_node("plan_and_shedule", plan_and_shedule)
graph_builder.add_node("join", joiner)

graph_builder.add_edge("plan_and_shedule", "join")


def should_contiue(state: List[BaseMessage]):
    if isinstance(state[-1], AIMessage):
        return END
    return "plan_and_shedule"

graph_builder.add_conditional_edges(
    start_key="join",
    condition=should_contiue
)

graph_builder.set_entry_point("plan_and_shedule")
chain = graph_builder.compile()

In [None]:
for step in chain.stream([HumanMessage(content="What's the GDP of New York?")]):
    print(step)
    print("---")

Multi-hop question
This question requires that the agent perform multiple searches

In [38]:
steps = chain.stream(
    [
        HumanMessage(
            content="What's the oldest parrot alive, and how much longer is that than the average?"
        )
    ],
    {
        "recursion_limit": 100
    }
)

for step in steps:
    print(step)
    print("---")

{'plan_and_shedule': [FunctionMessage(content='ERROR(Failed to call tavily_search_results_json with args {\'query\': \'oldest parrot alive\'}). Args could not be resolved. Error: UnboundLocalError("cannot access local variable \'resolved_arg\' where it is not associated with a value")', additional_kwargs={'idx': 1}, name='tavily_search_results_json'), FunctionMessage(content='ERROR(Failed to call math with args {\'problem\': \'age of the parrot from search results\', \'context\': [0]}). Args could not be resolved. Error: UnboundLocalError("cannot access local variable \'resolved_arg\' where it is not associated with a value")', additional_kwargs={'idx': 2}, name='math'), FunctionMessage(content='ERROR(Failed to call tavily_search_results_json with args {\'query\': \'average lifespan of parrots\'}). Args could not be resolved. Error: UnboundLocalError("cannot access local variable \'resolved_arg\' where it is not associated with a value")', additional_kwargs={'idx': 3}, name='tavily_sea

OutputParserException: Could not parse function call: 'tool_calls'

Multistep math, this quesiton requires that the agent perform multiple math operations

In [None]:
for step in chain.stream():
    [
        HumanMessage(
            content="What's ((3*(4+5)/0.5) + 3245) + 8? What's 32/4.23? What's the sum of the two values?"
        )
    ]
    print(step)
    print("---")

##### Conclusion
Here are some limitations of the implementation above:
- The planner output parsing format is fragile if your function requires more than 1 or 2 arguments. We could make it more robust by streaming tool callng. 
- Variable substition is fragile in the example above. It coudl be made more robust by using fine-tuned model and a more robust syntax, e.g lark syntax
- The state can grow quite long if you require mulitple re-planning runs. To handle, you could add a message compressor once you go above a certain token limit.