# Intro

It's time for me to start learning about agents.


## What are agents?

![[image source: Tweet from Abhishek Thakur](https://x.com/abhi1thakur/status/1873697074405122144)](imgs/say_agentic_one_more_time.jpeg)

Let's start with some definitions of agents from different sources.


::: {.callout-note}
## Agent Definition from LangChain Blog Post - [source](https://blog.langchain.dev/what-is-an-agent/)
An AI agent is a system that uses an LLM to decide the control flow of an application.
:::

::: {.callout-note}
## Agent Definition from AWS - [source](https://aws.amazon.com/what-is/ai-agents/)
An artificial intelligence (AI) agent is a software program that can interact with its environment, collect data, and use the data to perform self-determined tasks to meet predetermined goals. Humans set goals, but an AI agent independently chooses the best actions it needs to perform to achieve those goals.
:::

::: {.callout-note}
## Agent Definition from Chip Huyen's Book "AI Engineering" - [source](https://learning.oreilly.com/library/view/ai-engineering/9781098166298/ch06.html#ch06_agents_1730157386572111)

An **agent** is anything that can perceive its environment and act upon that environment. This means that an **agent** is characterized by the environment it operates in and the set of actions it can perform.
:::



::: {.callout-note}
## Agent Definition from Mongo DB Blog Post - [source](https://www.mongodb.com/resources/basics/artificial-intelligence/ai-agents#what-is-an-ai-agent)

An AI **agent** is a computational entity with an awareness of its environment that’s equipped with faculties that enable perception through input, action through tool use, and cognitive abilities through foundation models backed by long-term and short-term memory.
:::

::: {.callout-note}
## Agent Definition from Anthropic - [source](https://www.anthropic.com/research/building-effective-agents)

**"Agent"** can be defined in several ways. Some customers define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks. Others use the term to describe more prescriptive implementations that follow predefined workflows. At Anthropic, we categorize all these variations as **agentic** systems, but draw an important architectural distinction between **workflows** and **agents**:

**Workflows** are systems where LLMs and tools are orchestrated through predefined code paths.

**Agents**, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.
:::


::: {.callout-note}
## Agent Definition from Hugging Face Blog Post on `smolagents` - [source](https://huggingface.co/blog/smolagents)

Any efficient system using AI will need to provide LLMs some kind of access to the real world: for instance the possibility to call a search tool to get external information, or to act on certain programs in order to solve a task. In other words, LLMs should have agency. Agentic programs are the gateway to the outside world for LLMs.


**Agents** are programs where LLM outputs control the workflow. Note that with this definition, **"agent"** is not a discrete, 0 or 1 definition: instead, "agency" evolves on a continuous spectrum, as you give more or less power to the LLM on your workflow.
:::


## Is it an Agent? Is it Agentic? It's more like a spectrum with a lot of gray area!

![[image source: Tweet from Andrew Ng](https://x.com/AndrewYNg/status/1801295202788983136)](imgs/its_agentic.jpg)

There is a lot of debate and discussion on what exactly is an agent and what is not an agent. I think there is a lot of gray area here and something we 
have to just accept, at least for now. I think Andrew Ng makes some really good points in this [tweet](https://x.com/AndrewYNg/status/1801295202788983136). As Andrew points out, rather than engaging in binary debates about whether something qualifies as a "true agent," we should think about systems as existing on a spectrum of agent-like qualities.  The adjective "agentic" itself becomes particularly useful here, allowing us to describe systems that incorporate agent-like patterns to different degrees without getting caught in restrictive definitions.

This spectrum-based view is reinforced by Anthropic's recent [blog post](https://www.anthropic.com/research/building-effective-agents) on agents. They acknowledge that while they draw an architectural distinction between workflows (systems with predefined code paths) and agents (systems with dynamic control), they categorize both under the broader umbrella of "agentic systems." Similarly, we saw from one of our definitions above that "agent" isn't a discrete, 0 or 1 definition, but rather evolves on a continuous spectrum as you give more or less power to the LLM in your system. This aligns with Andrew Ng's observation that there's a gray zone between what clearly is not an agent (prompting a model once) and what clearly is (an autonomous system that plans, uses tools, and executes multiple steps independently).

![[image source: Blog post from Nathan Lambert on the AI Agent Spectrum](https://www.interconnects.ai/p/the-ai-agent-spectrum)](imgs/llm_loop_is_agent.webp)

Nathan Lambert also writes about the AI agent spectrum in this [blog post](https://www.interconnects.ai/p/the-ai-agent-spectrum). Nathan discusses that the simplest system on this spectrum would be any tool-use language model and that the spectrum of agents increases in complexity from there. I like how Nathan makes the point that the spectrum will continue to evolve and that the definition of an agent will continue to change as the field evolves. Over time, certain technologies will reach milestones where they become definitive examples of AI agents. Therefore at some point, basic tool use with an LLM may not be considered an agent, even though it's the basic starting point on the agentic spectrum.

![[image source: Tweet from Hamel Husain](https://x.com/HamelHusain/status/1867289397898752222)](imgs/llm_workflow_vs_loop.png)

Personally, agents and agentic workflows are still so new to me and I have a lot to learn on this topic. I have deployed LLMs in production as well as built some applications where LLMs use function calling (tools) within a conversational chat interface. So I think some of my previous work has fallen somewhere within this AI agentic spectrum, even if it's at one end of the spectrum. I'm going to  keep an open mind and avoid getting caught up in debates about categorical definitions. I'll try to avoid the hype and marketing fluff but be on the lookout for innovation and practical applications. 



# The Tool Calling Loop: A Building Block for Agentic Systems {#sec-tool_calling_loop}


![[image source: Tweet from Abhishek Thakur](https://x.com/abhi1thakur/status/1875159964785987904)](imgs/just_loops_meme.jpeg)

So where do we even start on this spectrum of AI agents? Practically, I think the first step is to start with an LLM equipped with tools.
I think this is what Anthropic refers to as the ["The augmented LLM"](https://www.anthropic.com/research/building-effective-agents).

![[image source: Blog post from Anthropic on Building effective agents](https://www.anthropic.com/research/building-effective-agents)](imgs/the_augmented_llm.png)

This is the building block, an LLM equipped with tools. I think we need to take it slightly further and make it clear we need a tool calling loop. The entire process is kicked off by sending a user request to the LLM. The LLM then decides on the initial tool calls to be made in the first step. These tool calls could be executed in parallel if they are independent of one another. After calling the initial tools, the LLM can choose whether to repeat follow up tool calls, which are dependent on the results of previous tool calls. Implementing this logic together within a loop is what I refer to as the "tool calling loop". 

I wrote about this tool calling loop a while ago in a previous [blog_post](https://drchrislevy.github.io/posts/anthropic/anthropic.html#my-own-wrapper-with-an-emphasis-on-the-tool-calling-loop). Here is an image I created at the time to illustrate the concept. 

![[image source: previous blog post](https://drchrislevy.github.io/posts/anthropic/anthropic.html#my-own-wrapper-with-an-emphasis-on-the-tool-calling-loop)](imgs/llm_tool_call_loop.png){height=900px}

One could call this tool calling loop "agentic" since the LLM is making decisions on what tool calls to make. Or maybe we just call it an "augmented LLM". It does not really matter. What does matter is that it's simple to implement, it does not require any frameworks, and it can solve for quite a few scenarios. It's plain old LLM function calling.

Here is one such implementation of the tool calling loop. It assumes the typical JSON format for the tool calls and uses the OpenAI chat completion API format.
I'm using the [`litellm`](https://github.com/BerriAI/litellm) library to call the OpenAI API since I can easily switch to another model (such as Anthropic)
and still use the same OpenAI API format. If you have never used `litellm` before that is fine! This is my first time using it. 
I only first heard about it when I  was reading about [smolagents](https://github.com/huggingface/smolagents) and how it utilizes it.
All you need to know is that `from litellm import completion` is the same as calling `chat.completions.create(...)` from the `openai` library.

In the loop below I also have some "print to console" functionality which uses [`rich`](https://rich.readthedocs.io/en/stable/introduction.html) under the hood.
I also borrowed this idea when looking through the source code of the [smolagents library](https://github.com/huggingface/smolagents) from Hugging Face. I will talk more about it later on in this post.


In [1]:
# | echo: false
# | warning: false
import dotenv

dotenv.load_dotenv()


from IPython.display import Markdown, display


def import_python_as_markdown(file_path):
    with open(file_path, "r") as file:
        content = file.read()
    return f"```python\n{content}\n```"


from tool_calling_loop import llm_with_tools, run_step

* 'fields' has been removed


In [2]:
# | echo: false
file_path = "tool_calling_loop.py"
markdown_content = import_python_as_markdown(file_path)
display(Markdown(markdown_content))

```python
import json
from concurrent import futures
from typing import Any, Callable, Dict

from litellm import completion
from utils import (
    console_print_llm_output,
    console_print_step,
    console_print_tool_call_inputs,
    console_print_tool_call_outputs,
    console_print_user_request,
)


def call_tool(tool: Callable, tool_args: Dict) -> Any:
    return tool(**tool_args)


def run_step(messages, tools=None, tools_lookup=None, model="gpt-4o-mini", **kwargs):
    messages = messages.copy()
    response = completion(model=model, messages=messages, tools=tools, **kwargs)
    response_message = response.choices[0].message.model_dump()
    response_message.pop("function_call", None)  # deprecated field in OpenAI API
    tool_calls = response_message.get("tool_calls", [])
    assistant_content = response_message.get("content", "")
    messages.append(response_message)

    if not tool_calls:
        response_message.pop("tool_calls", None)
        return messages

    tools_args_list = [json.loads(t["function"]["arguments"]) for t in tool_calls]
    tools_callables = [tools_lookup[t["function"]["name"]] for t in tool_calls]
    tasks = [(tools_callables[i], tools_args_list[i]) for i in range(len(tool_calls))]
    console_print_tool_call_inputs(assistant_content, tool_calls)
    with futures.ThreadPoolExecutor(max_workers=10) as executor:
        tool_results = list(executor.map(lambda p: call_tool(p[0], p[1]), tasks))
    console_print_tool_call_outputs(tool_calls, tool_results)
    for tool_call, tool_result in zip(tool_calls, tool_results):
        messages.append(
            {
                "tool_call_id": tool_call["id"],
                "role": "tool",
                "content": str(tool_result),
                "name": tool_call["function"]["name"],
            }
        )
    return messages


def llm_with_tools(messages, tools=None, tools_lookup=None, model="gpt-4o-mini", max_steps=10, **kwargs):
    console_print_user_request(messages, model)
    done_calling_tools = False
    for counter in range(max_steps):
        console_print_step(counter)
        messages = run_step(messages, tools, tools_lookup, model=model, **kwargs)
        done_calling_tools = messages[-1]["role"] == "assistant" and messages[-1].get("content") and not messages[-1].get("tool_calls")
        if done_calling_tools:
            break
    console_print_llm_output(messages[-1]["content"])
    return messages

```

First we will run a single step, without any tools, which is a single LLM call.
Note that I return the entire message history in the output.

In [3]:
messages = [{"role": "user", "content": "Hello friend!"}]
run_step(messages)

[{'role': 'user', 'content': 'Hello friend!'},
 {'content': 'Hello! How can I assist you today?', 'role': 'assistant'}]

## Some Tools

Before going through an example task, let's show some initial tools. These tools are a list of functions that we can call.
We also have a lookup dictionary that maps the tool name to the tool function.


In [4]:
from tools import TOOL_LKP, TOOLS

TOOL_LKP

{'web_search': <function tools.web_search(query: str) -> str>,
 'execute_python_code': <function python_sandbox.execute_python_code(code: str, sandbox=None) -> dict>,
 'visit_web_page': <function tools.visit_web_page(url)>}

Let's see how each tool works first.

This first tool executes python code. It's actually running in a [Modal Sandbox](https://modal.com/docs/guide/sandbox) in a secure cloud container/environment. It's an awesome feature of Modal useful for executing arbitrary code. Let's skip the details for now and come back to it later. For now, just think of it as a way to execute python code and get back the results.

In [5]:
# This tool is a python code execution tool.
# The code is executed in a secure cloud container/environment using Modal.
# The results are returned locally as an object.
TOOL_LKP["execute_python_code"](code="print('Hello World!')")

{'stdout': 'Hello World!\n',
 'stderr': '',
 'success': True,
 'result': 'None',
 'error': None}

In [6]:
# We even get the last expression evaluated as a result just like in ipython repl
TOOL_LKP["execute_python_code"](code="import math; x = math.sqrt(4); print(x); y=2; x-y")

{'stdout': '2.0\nOut[1]: 0.0\n',
 'stderr': '',
 'success': True,
 'result': '0.0',
 'error': None}

The next tool uses `duckduckgo-search` to search the web.

In [7]:
TOOL_LKP["web_search"](query="What sporting events are happening today?")

[{'title': 'Live Sports On TV Today - TV Guide',
  'href': 'https://www.tvguide.com/sports/live-today/',
  'body': "Live Sports on TV Today. Here's sports to watch today, Saturday, Jan 11, 2025. ... Dayna Yastremska is the No. 1 seed at the WTA 250 event in Hobart. ... coaches and celebrities are interviewed ..."},
 {'title': "Today's Top Sports Scores and Games (All Sports) | FOX Sports",
  'href': 'https://www.foxsports.com/scores',
  'body': "Visit FOXSports.com for today's top sports scores and games. Explore real-time game scores across MLB, NBA, NFL, Soccer, NHL and more."},
 {'title': 'Sports on TV Today - Sports Media Watch',
  'href': 'https://www.sportsmediawatch.com/sports-on-tv-today-games-time-channel/',
  'body': 'See where to watch sports on TV today with this daily, updated guide of games and events on TV and streaming. This site may earn commission on subscriptions purchased via this page. For a full list of sports TV schedules, see this page. Games on TV Today (Monday

And the next tool visits a web page and converts it to markdown.

In [8]:
print(TOOL_LKP["visit_web_page"](url="https://drchrislevy.github.io/"))

Chris Levy

[Chris Levy](./index.html)

* [About](./index.html)
* [Blog](./blog.html)

 
 

## On this page

* [About Me](#about-me)

# Chris Levy

 
[twitter](https://twitter.com/cleavey1985)
[Github](https://github.com/DrChrisLevy)
[linkedIn](https://www.linkedin.com/in/chris-levy-255210a4/)

**Hello!** I’m Chris Levy. I work in ML/AI and backend Python development.

## About Me

I spent a good amount of time in school where I completed a PhD in applied math back in 2015. After graduating I shifted away from academia and started working in industry. I mostly do backend python development these days, and build ML/AI applications/services. I work across the entire stack from research, to training and evaluating models, to deploying models, and getting in the weeds of the infrastructure and devops pipelines.

Outside of AI/ML stuff, I enjoy spending time with my family and three kids, working out, swimming, cycling, and playing guitar.

![](pic_me.jpeg)


To pass these tools to the LLM, we use the typical JSON format used within the OpenAI API format.

In [9]:
TOOLS

[{'type': 'function',
  'function': {'name': 'execute_python_code',
   'description': 'Run and execute the python code and return the results.',
   'parameters': {'type': 'object',
    'properties': {'code': {'type': 'string',
      'description': 'The python code to execute.'}},
    'required': ['code']}}},
 {'type': 'function',
  'function': {'name': 'web_search',
   'description': 'Search the web for the query and return the results.',
   'parameters': {'type': 'object',
    'properties': {'query': {'type': 'string',
      'description': 'The query to search for.'}},
    'required': ['query']}}},
 {'type': 'function',
  'function': {'name': 'visit_web_page',
   'description': 'Visit the web page and return the results.',
   'parameters': {'type': 'object',
    'properties': {'url': {'type': 'string',
      'description': 'The URL to visit.'}},
    'required': ['url']}}}]

## Example Task 1

Okay, so let's run the tool calling loop now with the tools defined above to illustrate how it works.
Here is a task where we ask some questions about recent NBA events.



In [10]:
messages = [
    {
        "role": "system",
        "content": """You are a helpful assistant. Use the supplied tools to assist the user. 
        Always use python to do math. After getting web search results be sure to visit the web page and convert it to markdown. 
        Todays date is 2025-01-03.""",
    },
    {
        "role": "user",
        "content": """
        Recently on Jan 2 2025, Steph Curry made a series of 3 pointers in one game without missing. 
        How many three pointers did he make in total that game?
        How many points did he score in total that game?
        Calculate the total points scored by both teams and then the percentage of those points that were made by Steph Curry alone.

        Also, how old is Lebron James and how many points did he score in his game on Jan 2 2025?
        Take his total points scored that game and raise it to the power of 5. What is the result?
        """,
    },
]

There is an answer to this question. Here is the correct ground truth answer.

In [11]:
example_one_answer = """
Game stats from January 2, 2025:

Steph Curry:
- Made 8/8 three-pointers (perfect from three)
- Total points: 30
- Game final score: Warriors 139, 76ers 105
- Total combined points in game: 244
- Curry's percentage of total points: 30/244 = 12.3%

LeBron James:
- Age: 40 (turned 40 on December 30, 2024)
- Points scored: 38 (vs Portland Trail Blazers)
- Points scored raised to the power of 5: 38^5 = 79,235,168
"""

Let's also have a simple LLM call to evaluate if a response is correct.



In [12]:
import json


def eval_example_one(input_answer):
    input_msgs = [
        {
            "role": "user",
            "content": f"""
         
Original question:
{messages[-1]["content"]}

Here is the ground truth answer:
{example_one_answer}

Here is the predicted answer from an LLM.
{input_answer}

Given the context of the correct answer and question, did the LLM get everything correct in its predicted answer?
Return True or False. Only return True if the LLM got everything correct
and answered each part of the question correctly. Also give an explanation of why you returned True or False.
Output JSON.

{{
    "correct": True or False,
    "explanation": "explanation of why you returned True or False"
}}
""",
        },
    ]

    return json.loads(run_step(input_msgs, model="gpt-4o", response_format={"type": "json_object"})[-1]["content"])


# Example of incorrect answer
print(eval_example_one("Lebron James is 40 years old and scored 38 points in his game on Jan 2 2025."))

# Example of correct answer
print(
    eval_example_one(
        "Lebron James is 40 years old and scored 38 points in his game on Jan 2 2025. 38 to the power of 5 is 79,235,168.  Steph scored 30, made 8 three pointers, and scored 12.3 percent of the total points."
    )
)

{'correct': False, 'explanation': "The LLM prediction only partially answered the question correctly. It confirmed LeBron James' age and points scored, which match the ground truth. However, the LLM response omitted calculations related to Steph Curry's performance—specifically, the number of three-pointers made, total points scored, and his percentage of the total points. Additionally, the LLM did not include the calculation of LeBron's points raised to the power of 5. Therefore, the LLM response did not address all parts of the original question completely."}
{'correct': True, 'explanation': 'The LLM correctly answered all parts of the question. It accurately stated that LeBron James is 40 years old and scored 38 points. It also correctly calculated 38 to the power of 5 as 79,235,168. For Steph Curry, it mentioned he scored 30 points, made 8 three-pointers, and that these points constituted 12.3 percent of the total points scored in the game. These details match the ground truth answ

### gpt-4o-mini

Okay, lets send this same task to `gpt-4o-mini` and see how it does.

In [13]:
messages_final = llm_with_tools(messages, model="gpt-4o-mini", tools=TOOLS, tools_lookup=TOOL_LKP)

We can look at all the messages in the final output, which includes all the messages handled by the LLM.



In [14]:
# Commenting out since the output is long from the webpages visited.
# But has all the messages chat history and tool calls in the OpenAI API format.

# messages_final

Let's use our LLM judge to evaluate the final output.

In [15]:
eval_example_one(messages_final[-1]["content"])

{'correct': False,
 'explanation': "The LLM did not get everything correct. First, for LeBron James' age, the LLM incorrectly stated he was 41 years old on January 2, 2025. LeBron was born on December 30, 1984, which would make him 40 years old at that time, not 41. Second, for LeBron's points scored in his game, the ground truth states 38 points, whereas the LLM provides 30 points. Lastly, raising the points LeBron scored to the power of 5, the calculation results in 30^5 = 24,300,000, which was incorrectly calculated since the correct points to consider are 38, leading to 38^5 = 79,235,168 as per the ground truth. Due to these discrepancies, the LLM's answer is not entirely correct."}

### claude-3-5-sonnet {#sec-claude-3-5-sonnet-ex1}

Let's send this same task to Anthropic's `claude-3-5-sonnet` model.
That's the beauty of `litellm`! We can easily switch between models and still use the same all familiar OpenAI API format.

In [16]:
messages_final = llm_with_tools(messages, model="claude-3-5-sonnet-20240620", tools=TOOLS, tools_lookup=TOOL_LKP)

In [17]:
eval_example_one(messages_final[-1]["content"])

{'correct': False,
 'explanation': "The LLM did not provide a complete and correct answer for the given questions. It correctly identified LeBron James's age as 40, but could not find specific data on his points scored on January 2, 2025, nor could it calculate the result of raising that score to the power of 5. The correct answer includes Steph Curry's performance details, game statistics, and specific calculations which the LLM did not mention. Therefore, the LLM did not answer each part of the question correctly and completely."}

### deepseek/deepseek-chat

We can also try the same task with `"deepseek/deepseek-chat"`.

In [18]:
messages_final = llm_with_tools(messages, model="deepseek/deepseek-chat", tools=TOOLS, tools_lookup=TOOL_LKP)

In [19]:
eval_example_one(messages_final[-1]["content"])

{'correct': False,
 'explanation': "The LLM's predicted answer contains two inaccuracies. First, it miscalculates the total points scored by both teams as 228 instead of the correct 244. Second, it subsequently miscalculates Steph Curry's percentage of total points contributed as 13.16% instead of the correct 12.3%. All other parts of the answer, including Steph Curry's three-pointers, total points, LeBron James's age, and his points raised to the 5th power, are correctly handled. However, these inaccuracies in the calculation prevent the entire answer from being correct."}

# ReAct

One of the main prompting techniques for building agents comes from the paper --> [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/pdf/2210.03629).
It is also the approach `smolagents` uses in their library as talked about in their conceptual guide [here](https://github.com/huggingface/smolagents/blob/main/docs/source/en/conceptual_guides/react.md).
I'm sure a lot of other frameworks use this approach, or modified versions of it, as well. You should check out the `smolagents` library, documentation, and code for more [details](https://huggingface.co/blog/smolagents). 

The ReAct prompting framework (short for Reasoning and Acting) is a technique designed to enhance the capabilities of large language model (LLM) agents by enabling them to reason and act iteratively when solving complex tasks. ReAct combines chain-of-thought reasoning with decision making actions, allowing the model to think step by step while simultaneously interacting with the environment to gather necessary information.

The key elements of ReAct are:

**Reasoning**: The model generates intermediate steps to explain its thought process while solving a problem or addressing a task.

**Acting**: The model performs actions based on its reasoning i.e. calling tools. 

**Observation**: The outputs of actions (tool calls) provide feedback or data to guide the next reasoning step.

**Iterative Process**: ReAct operates in a loop, where the outputs of reasoning and acting are used to refine the approach, gather additional information, or confirm conclusions until the task is resolved.

It's some what similar to what we saw above in the *Tool calling Loop* @sec-tool_calling_loop. Actually, when you compare the output from our first example task in the tool calling loop, you can see that `"anthropic/claude-3-5-sonnet"` @sec-claude-3-5-sonnet-ex1 is quite verbose in explaining its reasoning while making tool calls. It's already using some sort of chain of thought reasoning. However the OpenAI `gpt-4o-mini` model does not output much in the way of reasoning.

Let's see if we can implement a simple version of ReAct prompting. The goal here is not to be robust as a framework, but rather to illustrate some of the concepts for educational purposes.

In [22]:
from react import REACT_SYSTEM_PROMPT, react_loop
messages = [
    {"role": "system", "content": REACT_SYSTEM_PROMPT},
    {
        "role": "user",
        "content": """
    Recently on Jan 2 2025, Steph Curry made a series of 3 pointers in one game without missing. 
    How many three pointers did he make in total that game?
    How many points did he score in total that game?
    Calculate the total points scored by both teams and then the percentage of those points that were made by Steph Curry alone.

    Also, how old is Lebron James and how many points did he score in his game on Jan 2 2025?
    Take his total points scored that game and raise it to the power of 5. What is the result?
    """,
    },
]
messages_final = react_loop(messages)

0


1


2


3


4


5


6


7


In [23]:
eval_example_one(messages_final[-1]["content"])

{'correct': True,
 'explanation': "The LLM correctly answered all parts of the question based on the ground truth provided. It accurately reported that Steph Curry made 8 three-pointers and scored 30 points, and that the Warriors scored 139 points against the 76ers' 105 points, making the total points 244 with Curry scoring 12.30% of them. Additionally, it correctly identified LeBron James as being 40 years old, scored 38 points, and calculated 38 raised to the power of 5 as 79,235,168. Thus, all details match the ground truth, and the LLM's response is correct."}

# TODO Thoughts

- the idea of agent writing code instead of just JSON. As discussed in the [Hugging Face Blog Post on `smolagents`](https://huggingface.co/blog/smolagents) as well as the paper [Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030)

- one cool thing about multi agents is that each agent is modular and can be good at one thing. Also these agents can be different models/prompts/providers etc.
- from CrewAI - role playing, focus, tools, cooperation, guardrails, memory
    - role playing - give the agent a role/persona with system prompt, careful consideration of role and backstory for the agent
    - focus - Too much tools, too much context, etc. Make agent focus on one thing. Don't let agent do too much. 
    - tools - too many tools or overload of tools can lead to issues. Choose the tools carefully for each agent and not too many.
    - cooperation - multi agent cooperation. Just like human can intteract with ChatGPT in iterative manner to get better results/outputs, the LLM agents can interact with each other anc accept feedback, delegate tasks.
    - guardrails - agents getting stuck doing loops, derailing, etc. Many frameworks have guardrails built in.
    - memory - ability for the agent to remember what it has done, what it has learned, etc. Short term memory is mmemory during execution of a task. share knowledge, activities. and learnings with agents. sharing intermedia information even before task completion. Long term memory - after the crew is finished. Stored in DB. LOng term memory can be used in future tasks. Leads to self improving agents.
    - process i.e. process.hierarchical
- CrewAI advice:
    - think like a manager 
    - Cache tool responses across all agents 
    - Tools should be versatile and fault tolerant 
    - I feel like this is simply good SWE practice.

- Do a section on Modal SandBox
    - but also lets utilize the ability to use same sandbox for back and forth. Right now we are starting a new sandbox for each tool call.

- What are the other main prompting techniques for building agents aside from ReAct?


# Resources

[OpenAI Note Book - Prerequisite to Swarm - Orchestrating Agents: Routines and Handoffs](https://github.com/openai/openai-cookbook/blob/main/examples/Orchestrating_agents.ipynb)

[Anthropic Blog - Building effective agents](https://docs.anthropic.com/en/docs/build/agents/building-effective-agentshttps://www.anthropic.com/research/building-effective-agents)

[AI Engineering Book - By Chip Huyen](https://learning.oreilly.com/library/view/ai-engineering/9781098166298/)

[Hugging Face Blog Post - Introducing smolagents, a simple library to build agents](https://huggingface.co/blog/smolagents)

[litelllm - GIthub](https://github.com/BerriAI/litellm)

[Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030)

[Hugging Face Collection of Papers on Agents](https://huggingface.co/collections/m-ric/agents-65ba776fbd9e29f771c07d4e)

[Deep Learning AI Course -  Multi AI Agent Systems with crewAI](https://www.deeplearning.ai/short-courses/multi-ai-agent-systems-with-crewai/)

[Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet](https://www.anthropic.com/research/swe-bench-sonnet)

[Building Effective Agents Cookbook](https://github.com/anthropics/anthropic-cookbook/tree/main/patterns/agents)

[deeplearning.ai - The Batch - Issue 281 - Recap of 2024 - with some agentic thoughts](https://www.deeplearning.ai/the-batch/issue-281/)

[LangChain Blog Post - What is an Agent?](https://blog.langchain.dev/what-is-an-agent/)

[Lindy](https://www.lindy.ai/)


[AWS: What is an Agent?](https://aws.amazon.com/what-is/ai-agents/)

[Chapter 6 Agents from AI Engineering Book by Chip Huyen](https://learning.oreilly.com/library/view/ai-engineering/9781098166298/ch06.html#ch06_agents_1730157386572111)

[tweet from Andrew Ng on AI Agent Spectrum](https://x.com/AndrewYNg/status/1801295202788983136)

[Nathan Lambert Blog Post on the AI Agent Spectrum](https://www.interconnects.ai/p/the-ai-agent-spectrum)

[Lang Chain Academy - AI Agents with LangGraph](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/56295530-getting-set-up-video-guide)

[Deep Learning AI Course - LangGraph AI Agents](https://www.deeplearning.ai/short-courses/ai-agents-in-langgraph/)

[A simple Python implementation of the ReAct pattern for LLMs](https://til.simonwillison.net/llms/python-react-pattern)

[AI Agents That Matter](https://agents.cs.princeton.edu/)

[Demystifying AI Agents: A Guide for Beginners](https://www.mongodb.com/resources/basics/artificial-intelligence/ai-agents)

[REACT: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS](https://arxiv.org/pdf/2210.03629)

[Chip Huyen Blog post on AI Agents](https://x.com/chipro/status/1876681640505901266)

[321 real-world gen AI use cases from the world's leading organizations](https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders)

[LLM Agents MOOC YouTube Playlist by Berkeley RDI Center on Decentralization & AI](https://www.youtube.com/playlist?list=PLS01nW3RtgopsNLeM936V4TNSsvvVglLc)

[Which AI Agent framework should i use? (CrewAI, Langgraph, Majestic-one and pure code)](https://medium.com/@aydinKerem/which-ai-agent-framework-i-should-use-crewai-langgraph-majestic-one-and-pure-code-e16a6e4d9252)

[crewAI](https://github.com/crewAIInc/crewAI)

[Langgraph](https://github.com/langchain-ai/langgraph)

[Smolagents](https://github.com/huggingface/smolagents)

[autogen](https://github.com/microsoft/autogen)

[Amazon Bedrock Agents](https://aws.amazon.com/bedrock/agents/)
