# Langchain Agents and LangGraph

LLMs become really interesting when we use them in autonomous agents, giving them tools and allowing them to reason with themselves.

In this tutorial, I will focus on a relatively simple agent: ReAct.

### What is ReAct?
ReAct was first introduce in [this paper](https://arxiv.org/abs/2210.03629), and is a simple autonomous agent for answering questions using pre-built function calls.

Most agents follow the thought, action, observation pattern. The agent comes up with a thought, then can decide on an action to execute, then receives an observation upon the actions' execution.

The ReAct setup is as follows:
- Prompt the agent to come up with an action that it can take
- Take the action on behalf of the agent, and append the output of the action to the original prompt
- Repeat this process until the agent generates an answer.

### Where can I go from here?
There are many extensions to ReAct:
- [LLMCompiler](https://arxiv.org/abs/2312.04511) is a system for generating parallel function calls, which yields speedups over ReAct prompting, with better accuracy on some benchmarks!
- [Voyager](https://arxiv.org/abs/2305.16291) is a LLM lifelong learning agent that is capable of discovering new items in Minecraft!
- [WebVoyager](https://arxiv.org/abs/2401.13919) is an autonomous agent that is able to browse the web based on a user's request! Methods-wise, this is closer to ReAct, compared to Voyager/LLMCompiler. It just uses a vision model and some JS functions on top of the basic ReAct idea

# Creating a ReAct agent using pre-made functions

`create_react_agent` is setup to allow us to do this. It works pretty poorly with LLaMa3.2 because the model has trouble providing outputs in a specific JSON format. For this reason, I don't really go in-depth into this agent.

[create_react_agent docs](https://api.python.langchain.com/en/latest/agents/langchain.agents.react.agent.create_react_agent.html)

In [33]:
from langchain import hub
from langchain.agents import load_tools
from langchain.agents import AgentExecutor, create_react_agent
from langchain_ollama import ChatOllama
from langchain_community.tools import DuckDuckGoSearchRun

In [34]:
llm = ChatOllama(model='llama3.2')

In [35]:
prompt = hub.pull("hwchase17/react")
print(prompt.template)



Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}


In [42]:
tools = load_tools(['llm-math'], llm=llm)
agent = create_react_agent(llm, 
                           tools=tools, 
                           prompt=prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [43]:
# Kinda hard to tell what's going on here, but yeah. The executor should take care of parsing the output, etc.
print(agent.get_graph().draw_ascii())

  +---------------------------------+  
  | Parallel<agent_scratchpad>Input |  
  +---------------------------------+  
             **         **             
           **             **           
          *                 *          
   +--------+          +-------------+ 
   | Lambda |          | Passthrough | 
   +--------+          +-------------+ 
             **         **             
               **     **               
                 *   *                 
 +----------------------------------+  
 | Parallel<agent_scratchpad>Output |  
 +----------------------------------+  
                   *                   
                   *                   
                   *                   
          +----------------+           
          | PromptTemplate |           
          +----------------+           
                   *                   
                   *                   
                   *                   
            +------------+             


In [47]:
# this works, but the LLM is pretty bad at following instructions so this often errors.
# agent_executor.invoke({"input": "what is 365842068 + 3409568092?"})

We should probably figure out how AgentExecutor works, and what format we need the chain to be in to work with the AgentExecutor, but for now, let's move on to LangGraph so we can make custom agents

# Remaking the agent in LangGraph

I will remake the ReAct agent, adding fixes to help it work with an inferior model like LLaMa3.2.

**Some Background**
LangGraph is a library for building stateful, multi-actor applications with LLMs. It enables us to create things such as the ReAct prompting system, which loops back on itself many times.

Let's recreate the ReAct agent, that can search the web and do math, but use our custom tools to make it work more smoothly with a smaller language model like LLaMa3.2. We will have these modifications:
- we will update a scratchpad, containing the agents previous observations
- we will be more lenient with parsing the agent's output

Resources:
- [MessagesPlaceholder](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.chat.MessagesPlaceholder.html)
- Most of this implementation is based on the [WebVoyager implementation by LangChain](https://www.youtube.com/watch?v=ylrew7qb8sQ)

In [98]:
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_ollama import ChatOllama
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.prompts.prompt import PromptTemplate
from langchain.prompts.chat import ChatPromptTemplate, MessagesPlaceholder, HumanMessagePromptTemplate
from langchain_core.messages.system import SystemMessage
from langchain_core.output_parsers import StrOutputParser
from langgraph.graph import END, START, StateGraph

# state
from typing import List, Optional
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage

# misc
import re
import json

## Making State Types

These types define the state that the graph will be using.

The prediction will contain the agent's output, detailing the next action the agent should take. After we act on this prediction, we will put the response back into the 'observation' key

In [142]:
# For searching the web/doing calculations this should be good.
class Prediction(TypedDict):
    action: str
    args: Optional[str]

class AgentState(TypedDict):
    str_output: str # llm output, this is not being used right now tbh
    input: str # user request
    prediction: Prediction # agent's output
    scratchpad: List[BaseMessage]
    observation: str # most recent response from a tool

## Defining Tools

These tools will be used by the LLM. I made some modifications to the tools, to work better with LLaMa3.2:
- the search tool has an LLM Summarization built into it, to shorten the contents of the search. My reasoning is that distilling the search query can help the model better focus on the relevant parts, and the results of this search won't drown out other observations
- The eval tool tries to correct agent input a bit, before performing the evaluation

In [134]:
ddg_search = DuckDuckGoSearchRun()
llm = ChatOllama(model="llama3.2", max_tokens=4096)
search_prompt_str = """
Please summarize these search results in the context of the following question. 
Be AS CONCISE AS POSSIBLE, and only include information that is relevant to answering the question.

QUESTION: {question}

RESULTS: {search_results}
"""
search_prompt = PromptTemplate.from_template(search_prompt_str)
summarization_chain = search_prompt | llm | StrOutputParser()

# modified search with summarization chain
def search(state: AgentState):
    args = state['prediction']['args']
    question = state['input']
    if args is None:
        return "No arguments found for search tool. Please provide a search query!"
    search_results = ddg_search.invoke(args)
    summary = summarization_chain.invoke(dict(search_results=search_results, question=state['input']))
    return summary

def evaluate_math(state: AgentState):
    # We'll only evaluate basic math here
    args = state['prediction']['args']
    if args is None:
        return "No arguments found for math tool."
    args = re.sub(r'[^0-9+-/*()<>]', '', args) # remove non-math related things
    args = args.replace(',', '')
    try:
        result = eval(args)
        return f" = {result}"
    except SyntaxError:
        return f"ERROR: Math expression can only contain numbers and operators: {args}"

In [133]:
evaluate_math({'prediction': {'args': '3000 > 2000'}})

' = True'

In [112]:
# Search with summary
search({'input': 'whats kanyes net worth', 'prediction': {'args': 'kanye net worth 2024'}})

"Kanye West's net worth is approximately $400 million (as of June 2024), ranking him fourth on the list of top richest rappers in the world."

## ReAct components

Parse will attempt to find the model's next {action, args} input.

- The problem with the previous ReAct chain is that it is very strict when parsing the model's output, which is difficult for LLaMa3.2 to follow, so here, we will try to do a bit more parsing.
- When updating the scratchpad, I will also append the prediction string that was called to past observations. I thought this might be useful in case the previous prediction failed with an error, then we can let the model debug it. I have not tested the model enough to see the effects of this, though
- The LLM Chain will be explained below

In [135]:
def parse(text: str):
    text = text.replace('{{', '{')
    text = text.replace('}}', '}')
    try:
        dict_in_text = re.search(r'\{([^{}]*)\}', text).group(1)
        prediction = json.loads('{' + dict_in_text + '}')
        if 'action' in prediction and 'args' in prediction:
            return {'action': prediction['action'].strip(), 'args': prediction['args'].strip()}
        else:
            raise SyntaxError
    except:
        return {'action': 'retry', 'args': 'could not parse LLM output as JSON'}

def update_scratchpad(state: AgentState):
    # scratchpad will just contain one message here, which is the system message with observations
    # this is inserted between the system prompt, and the human query
    old = state.get('scratchpad')
    if old:
        txt = old[0].content
        last_line = txt.rsplit("\n", 1)[-1]
        step = int(re.match(r"\d+", last_line).group()) + 1
    else:
        txt = "Previous action observations:\n"
        step = 1
    txt += f"\n{step}. {state['prediction']} - {state['observation']}"

    return {**state, "scratchpad": [SystemMessage(content=txt)]}

def get_llm_chain(debug=False):
    llm = ChatOllama(model="llama3.2", max_tokens=4096)
    prompt_str = """
    Answer the following questions as best you can. Provide some reasoning, and then choose one of the following actions:

    1. Evaluate a math expression. NOTE: this expression must only contain numbers, operators such as +-*/ and ()
    2. Search the web
    3. Respond with final answer, once previous action observations contain sufficient info to answer the question.

    Correspondingly, Action should be returned as a JSON string, following these formats:
    - {{ "action": "eval", "args": "NUMERICAL EXPRESSION" }}
    - {{ "action": "search", "args": "SEARCH QUERY" }}
    - {{ "action": "answer", "args": "ANSWER" }}
    
    Key Guidelines You MUST follow:

    Execute only one action per iteration.
    Keys and values in the Action JSON MUST be strings
    
    Your reply should strictly follow the format:

    Thought: Your brief thoughts (briefly summarize the info that will help ANSWER)
    Action: JSON formatted action
    Then the User will provide:
    Observation: Result of the action
    """
    prompt = ChatPromptTemplate(messages=[SystemMessage(content=prompt_str), 
                                          MessagesPlaceholder('scratchpad'),
                                          HumanMessagePromptTemplate.from_template("Question: {input}")], input_variables=['input'])
    # Assign str_output for debugging purposes
    if debug:
        agent = (RunnablePassthrough.assign(str_output=prompt | llm | StrOutputParser())
                 | RunnablePassthrough.assign(prediction=lambda state: parse(state['str_output'])))
    else:
        agent = (RunnablePassthrough.assign(prediction=prompt | llm | StrOutputParser() | parse))
    return agent, prompt

### LLM chain

Here, note that we add scratchpad messages in between a system message explaining to the model what it should be doing, and a human message containing the input we want. We take advantage of chat model templates here. We can now add items to the model's scratchpad, and show it's effect on the generations:

In [114]:
chain, prompt = get_llm_chain(debug=True)

# invoking the chain with no previous observations
print("Invoking chain with no previous observations")
result = chain.invoke({"input": "what is kanye west's net worth?", "scratchpad": []})
print(f"\nRAW TEXT:\n{result['str_output']}")
print(f"\nACTION:\n{result['prediction']}")

# invoking the chain with artificial previous observations
print("\n#############################\n")
print("Invoking chain with artificial previous observations")
result = chain.invoke({"input": "what is kanye west's net worth?", 
              "scratchpad": [SystemMessage(content='Previous action observations:\nKanye west has a net worth of 2 million dollars')]})
print(f"\nRAW TEXT:\n{result['str_output']}")
print(f"\nACTION:\n{result['prediction']}")

Invoking chain with no previous observations

RAW TEXT:
Thought: Kanye West is a renowned musician, fashion designer, and entrepreneur. To determine his current net worth, I'll need to search for recent reports on his financial situation.

Action:
{{ "action": "search", "args": "Kanye West net worth 2023" }}

ACTION:
{'action': 'search', 'args': 'Kanye West net worth 2023'}

#############################

Invoking chain with artificial previous observations

RAW TEXT:
Thought: Since we already have information about Kanye West's net worth, it might be sufficient to answer this question directly.

Action: {{ "action": "answer", "args": "2 million dollars" }}

Please provide the observation result.

ACTION:
{'action': 'answer', 'args': '2 million dollars'}


# Making the graph

Here, we simply create nodes and edges to define the graph. Most of this is self-explanatory, since the graph is quite simple.

In [136]:
agent, _ = get_llm_chain(debug=True)

graph_builder = StateGraph(AgentState)

# Make agent node, set to the start state
graph_builder.add_node("agent", agent)
graph_builder.add_edge(START, "agent")

# Add update scratchpad node. When this is done, go back to the agent
graph_builder.add_node("update_scratchpad", update_scratchpad)
graph_builder.add_edge("update_scratchpad", "agent")

# Add tools, update scratchpad after running tool
tools = {'search': search, 'eval': evaluate_math}
for node_name, tool in tools.items():
    # This will run the tool and add the outputted observation to the state, with the key "observation"
    # I think the original contents of the state will be kept, and we modify "observation" here when we put it into LangGraph
    graph_builder.add_node(node_name, RunnableLambda(tool) | (lambda observation: {"observation": observation}))
    graph_builder.add_edge(node_name, 'update_scratchpad')

def select_tool(state: AgentState):
    action = state['prediction']['action']
    # Retry if unknown tool
    if action not in ['search', 'eval', 'retry', 'answer']:
        state['prediction'] = {'action': 'retry', 'args': 'unknown action'}
    if action == 'answer':
        return END
    # so the agent won't actually see the error reason in the retry. We could route to update scratchpad to append this though
    # if action == 'retry':
    #     return 'agent'
    if action == 'retry':
        return 'update_scratchpad'
    return action

graph_builder.add_conditional_edges("agent", select_tool)

graph = graph_builder.compile()

When we call the agent, we stream the events, print relevant info and return results. We can inspect what `events` consist of for better understanding.

In [137]:
def call_agent(question, max_steps=10):
    # Basically yields intermediate results as soon as they are available
    # Just copying code from WebVoyager for now.
    event_stream = graph.stream({'input': question, 'scratchpad': []}, {'recursion_limit': max_steps})

    steps = []
    events = []
    answer = None
    try:
        for event in event_stream:
            events.append(event)
            if "agent" not in event:
                continue
            pred = event["agent"].get("prediction") or {}
            action = pred.get("action")
            args = pred.get("args")
            steps.append(f"{len(steps) + 1}: {pred}")
            print(steps[-1])
            if action == 'answer':
                print(f'ANSWER: {args}')
                break
    finally:
        return args, steps, events

In [117]:
args, steps, events = call_agent("If Kanye West's net worth were to double today, what would it be?")

1: {'action': 'search', 'args': 'Kanye West net worth'}
2: {'action': 'answer', 'args': '800'}
ANSWER: 800


In [123]:
# Seems like events is just the list of the states as we pass through. 
# We can see that the summary LLM for the search results automatically did the math for us.
events[1]

{'search': {'observation': "Kanye West's estimated net worth would double to $800 million if it were to double today."}}

In [140]:
# Seems like the agent gets stuck on computations. We could probably do some prompt engineering to change that, but for now, I will leave it as is.
args, steps, events = call_agent("what is 333000 * 2?")

1: {'action': 'eval', 'args': '333000 * 2'}
2: {'action': 'eval', 'args': '333000 * 2'}
3: {'action': 'eval', 'args': '333000 * 2'}
4: {'action': 'eval', 'args': '333000 * 2'}


In [141]:
# We see that the model has the necessary info to give an answer, but is unable to generate the 'answer' action here.
events[-1]

{'agent': {'str_output': 'Thought: The previous action evaluation confirms that 333,000 multiplied by 2 equals 666,000. This observation will be used to answer the current question.\n\nAction: {{ "action": "eval", "args": "333000 * 2" }}\n\nObservation: 666000',
  'input': 'what is 333000 * 2?',
  'prediction': {'action': 'eval', 'args': '333000 * 2'},
  'scratchpad': [SystemMessage(content="Previous action observations:\n\n1. {'action': 'eval', 'args': '333000 * 2'} -  = 666000\n2. {'action': 'eval', 'args': '333000 * 2'} -  = 666000\n3. {'action': 'eval', 'args': '333000 * 2'} -  = 666000", additional_kwargs={}, response_metadata={})],
  'observation': ' = 666000'}}