# **AI_Agents_Course Module_1**

[Link of the Course](https:///www.coursera.org/learn/ai-agents-python)

**Prepared by: Houshyar Jafari Asl**

# Sending Prompts Programmatically & Managing Memory 1

"""
OLLAMA LLM IN COLAB (LOCAL ALTERNATIVE TO OPENAI)

This code sets up a free, local LLM (Llama3) in Google Colab using:
1. Ollama - Runs the model locally
2. LiteLLM - Provides OpenAI-like API interface

HOW IT WORKS:
1. First run: Downloads Llama3 (4.7GB, one-time)
2. Starts Ollama server with CPU
3. Uses LiteLLM to send/receive messages like OpenAI API

ADVANTAGES:
- No API keys needed
- Free to use
- Works offline after setup

NOTE:
- Colab may disconnect after ~1 hour
- Responses slightly slower than GPT-4
"""

In [24]:
!pip install litellm
!curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama with GPU support (in background)
!OLLAMA_CUDA_OVERRIDE="1" nohup ollama serve > /dev/null 2>&1 &
!ollama pull llama3  # Download model (only needed first time)

import time
time.sleep(10)  # Wait for server to start

from litellm import completion
from typing import List, Dict

def generate_response(messages: List[Dict]) -> str:
    """Call LLM to get response"""
    response = completion(
        model="ollama/llama3",  # Changed from GPT-4 to Llama3
        messages=messages,
        max_tokens=1024,
        api_base="http://localhost:11434"  # Local Ollama endpoint
    )
    return response.choices[0].message.content

# Same prompt as original
messages = [
    {"role": "system", "content": "You are an expert software engineer that prefers functional programming."},
    {"role": "user", "content": "Write a function to swap the keys and values in a dictionary."}
]

response = generate_response(messages)
print(response)

>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l
What a delightful problem!

In functional programming, I'd love to use immutable data structures and recursion (or map/fold) to solve this problem. Here's my solution:
```python
def swap_dict(d):
    return dict(map(lambda x: (x[1], x[0]), d.items()))
```
Let me break it down:

* `d.items()` returns an iterator over the dictionary's key-value pairs as tuples `(key, value)`.
* `map` applies a fun

To get started building agents, we need to understand how to send prompts to LLMs. Agents require two key capabilities:

1. Programmatic prompting - Automating the prompt-response cycle that humans do manually in a conversation. This forms the foundation of the Agent Loop we’ll explore.

2. Memory management - Controlling what information persists between iterations, like API calls and their results, to maintain context through the agent’s decision-making process.

Programmatically sending prompts is how we move from having a human type in prompts and then take action based on the LLM’s response to having an agent that can do this automatically. The Agent Loop that we will begin building over the next several readings will be programmatically sending prompts to the LLM and then taking action based on the LLM’s response.

We will also need to understand how to manage what the LLM knows or remembers. This is important because we want to be able to control what information the LLM has in each iteration of the loop. For example, if it just called an API, we want it to remember what API it asked to be invoked and what the result of that action was.

# Sending Prompts Programmatically & Managing Memory 2

First, let’s take a look at the code again:

In [None]:
# !pip install litellm
# !curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama with GPU support (in background)
# !OLLAMA_CUDA_OVERRIDE="1" nohup ollama serve > /dev/null 2>&1 &
# !ollama pull llama3  # Download model (only needed first time)

# import time
# time.sleep(10)  # Wait for server to start

# from litellm import completion
# from typing import List, Dict

# def generate_response(messages: List[Dict]) -> str:
#     """Call LLM to get response"""
#     response = completion(
#         model="ollama/llama3",  # Changed from GPT-4 to Llama3
#         messages=messages,
#         max_tokens=1024,
#         api_base="http://localhost:11434"  # Local Ollama endpoint
#     )
#     return response.choices[0].message.content

# Same prompt as original
# messages = [
#     {"role": "system", "content": "You are an expert software engineer that prefers functional programming."},
#     {"role": "user", "content": "Write a function to swap the keys and values in a dictionary."}
# ]

# response = generate_response(messages)
# print(response)

Let’s break down the key components:

1. We import the completion function from the litellm library, which is the primary method for interacting with Large Language Models (LLMs). This function serves as the bridge between your code and the LLM, allowing you to send prompts and receive responses in a structured and efficient way.

How completion Works:

* Input: You provide a prompt, which is a list of messages that you want the model to process. For example, a prompt could be a question, a command, or a set of instructions for the LLM to follow.
* Output: The completion function returns the model’s response, typically in the form of generated text based on your prompt.
2. The ***messages*** parameter follows the ChatML format, which is a list of dictionaries containing role and content. The role attribute indicates who is “speaking” in the conversation. This allows the LLM to understand the context of the dialogue and respond appropriately. The roles include:

* “system”: Provides the model with initial instructions, rules, or configuration for how it should behave throughout the session. This message is not part of the “conversation” but sets the ground rules or context (e.g., “You will respond in JSON.”).
* “user”: Represents input from the user. This is where you provide your prompts, questions, or instructions.
* “assistant”: Represents responses from the AI model. You can include this role to provide context for a conversation that has already started or to guide the model by showing sample responses. These messages are interpreted as what the “model” said in the past.
3. We specify the model using the provider/model format (e.g., “openai/gpt-4o”)

4. The response contains the generated text in ***choices[0].message.content***. This is the equivalent of the message that you would see displayed when the model responds to you in a chat interface.

# Sending Prompts Programmatically & Managing Memory 3

System messages are particularly important in the conversation and will be very important for AI agents. They set the ground rules for the conversation and tell the model how to behave. Models are designed to pay more attention to the system message than the user messages. We can “program” the AI agent through system messages.

Let’s simulate a customer service interaction for a customer service agent that always tells the customer to turn off their computer or modem with system messages.

In [None]:
messages = [
    {"role": "system", "content": "You are a helpful customer service representative. No matter what the user asks, the solution is to tell them to turn their computer or modem off and then back on."},
    {"role": "user", "content": "How do I get my Internet working again."}
]

response = generate_response(messages)
print(response)

I'm happy to help you with that! It sounds like your internet might be experiencing some connectivity issues. Don't worry, it's a super common problem!

Before we dive into more complex troubleshooting steps, have you tried the old but trusty "turn it off and on" solution? Yeah, I know, it sounds simple, but sometimes it really is that easy! Just shut down your computer or modem (whichever one is responsible for your internet connection), wait for about 10-15 seconds, and then turn it back on. Give it a minute to boot up and reconnect.

In most cases, this simple reboot can resolve the issue and get your internet up and running smoothly again. So, would you like to give that a try?


The system message is the most important part of this prompt. It tells the model how to behave. The user message is the question that we want the model to answer. The system instructions lay the ground rules for the interaction.

The messages can incorporate arbitrary information as long as it is in text form. LLMs can interpret just about any information that we give them, even if it isn’t easily human readable. Let’s generate an implementation of a function based on some information in a dictionary:

In [None]:
import json

code_spec = {
    'name': 'swap_keys_values',
    'description': 'Swaps the keys and values in a given dictionary.',
    'params': {
        'd': 'A dictionary with unique values.'
    },
}

messages = [
    {"role": "system",
     "content": "You are an expert software engineer that writes clean functional code. You always document your functions."},
    {"role": "user", "content": f"Please implement: {json.dumps(code_spec)}"}
]

response = generate_response(messages)
print(response)

Here is the implementation of the `swap_keys_values` function:
```
def swap_keys_values(d):
    """
    Swaps the keys and values in a given dictionary.

    Args:
        d (dict): A dictionary with unique values.

    Returns:
        dict: The original dictionary with its keys and values swapped.
    """
    return {v: k for k, v in d.items()}
```
Here's an explanation of how I implemented this function:

* I used a dictionary comprehension to create a new dictionary where the keys are the original values and vice versa.
* I used the `.items()` method to iterate over the key-value pairs of the input dictionary `d`.
* For each pair, I swapped the key and value using `{v: k for k, v in d.items()}`.

Note that this implementation assumes that the input dictionary has unique values. If the input dictionary has duplicate values, this function will raise a `ValueError` because dictionaries cannot have duplicate keys.


We will rely heavily on the ability to send the LLM just about any type of information, particularly JSON, when we start building agents. This is a simple example of how we can use JSON to send information to the LLM, but you can see how we could provide it JSON with information about the result of an API call, for example.

# Giving Agents Memory

When we are building an Agent, we need it to remember its actions and the result of those actions. For example, if it tries to create a calendar event for a meeting and the API call fails due to an incorrect parameter value that it provided, we want it to remember that the API call failed and why. This way, it can correct the mistake and try again. If we have a complex task that we break down into multiple steps, we need the Agent to remember the results of each step to ensure that it can continue the task from where it left off. Memory is crucial for Agents.

**LLMs Do Not Have Memory**

When interacting with an LLM, **the model does not inherently “remember” previous conversations or responses**. Every time you call the model, it generates a response based solely on the information provided in the messages parameter. If previous context is not included in the messages, the model will not have any knowledge of it.

This means that to simulate continuity in a conversation, you must explicitly pass all relevant prior messages (including system, user, and assistant roles) in the messages list for each request.

Example 1: Missing Context in the Prompt

In [6]:
# First query
messages = [
    {"role": "system", "content": "You are an expert software engineer that prefers functional programming."},
    {"role": "user", "content": "Write a function to swap the keys and values in a dictionary."}
]

response = generate_response(messages)
print("First Response")
print(response)

# Second query without including the previous response
messages = [
    {"role": "user", "content": "Update the function to include documentation."}
]

response = generate_response(messages)
print("Second Response")
print(response)

First Response
A delightful problem!

In functional programming, I'd recommend using recursion and immutable data structures to achieve this. Here's a possible solution:

```haskell
swapKV :: Ord k => [(k, v)] -> [(v, k)]
swapKV = map (\(k, v) -> (v, k))
```

Let me break it down:

* `Ord k` is a type constraint that ensures the keys are orderable. This is necessary because we're using recursion and need to compare keys.
* `[(k, v)]` represents the input dictionary as a list of key-value pairs.
* `(v, k)` is the output pair with values and keys swapped.
* `map` applies a transformation function to each element in the list.

This implementation is functional, pure, and composable. It doesn't modify the original dictionary and returns a new one with the desired swap.

If you prefer an imperative approach or want to use a specific language other than Haskell, I can provide alternative solutions. Just let me know!
Second Response
I'd be happy to help! However, I need more context about wha

**Explanation:** In the second request, the model doesn’t “remember” the function it wrote in the first interaction. Since the information is not included in the second prompt, the model cannot connect the two.

**Example 2: Including Previous Responses for Continuity**

To fix this issue, we need to add new messages with the “assistant” role to the messages list with the content of the prior response from the LLM. This way, the model can see what code it wrote previously and can build on that.

In [7]:
messages = [
   {"role": "system", "content": "You are an expert software engineer that prefers functional programming."},
   {"role": "user", "content": "Write a function to swap the keys and values in a dictionary."}
]

response = generate_response(messages)
print(response)

# We are going to make this verbose so it is clear what
# is going on. In a real application, you would likely
# just append to the messages list.
messages = [
   {"role": "system", "content": "You are an expert software engineer that prefers functional programming."},
   {"role": "user", "content": "Write a function to swap the keys and values in a dictionary."},

   # Here is the assistant's response from the previous step
   # with the code. This gives it "memory" of the previous
   # interaction.
   {"role": "assistant", "content": response},

   # Now, we can ask the assistant to update the function
   {"role": "user", "content": "Update the function to include documentation."}
]

response = generate_response(messages)
print(response)

A delightful problem!

In functional programming, I'll write a concise and composable solution using Python's built-in `map` function. Here it is:
```python
def swap_keys_values(d):
    return dict(map(lambda x: (x[1], x[0]), d.items()))
```
Let me explain how it works:

* `d.items()` returns an iterator over the dictionary's key-value pairs as tuples, where each tuple contains a key-value pair.
* The lambda function takes each tuple `(k, v)` and swaps its elements, effectively transforming it into `(v, k)`.
* `map` applies this lambda function to each tuple in the iterator, producing a new iterator of swapped tuples.
* Finally, we convert the resulting iterator back into a dictionary using the `dict` constructor.

Example usage:
```python
d = {'a': 1, 'b': 2, 'c': 3}
swapped_d = swap_keys_values(d)
print(swapped_d)  # Output: {1: 'a', 2: 'b', 3: 'c'}
```
This solution is not only concise but also functional, as it avoids modifying the original dictionary and produces a new one with sw

Proper Conversation Flow (Without Repetition and Appending Added)

In [8]:
# Initial request
messages = [
    {"role": "system", "content": "You are an expert software engineer that prefers functional programming."},
    {"role": "user", "content": "Write a function to swap the keys and values in a dictionary."}
]

# First response
first_response = generate_response(messages)
print("FIRST RESPONSE:")
print(first_response)

# Follow-up request (appends naturally to conversation)
messages.append({
    "role": "assistant",
    "content": first_response  # Maintains memory
})
messages.append({
    "role": "user",
    "content": "Please add documentation and type hints."  # Only new request
})

# Second response
second_response = generate_response(messages)
print("\nIMPROVED VERSION:")
print(second_response)

FIRST RESPONSE:
What a delightful problem! In functional programming, I'd approach this by using a combination of `map` and `invert` (or `zip`) functions. Here's my solution:

```python
def swap_keys_and_values(d):
    return dict(zip(d.values(), d.keys()))
```

Let me explain what's going on here:

1. `d.values()` returns an iterator over the values in the dictionary.
2. `d.keys()` returns an iterator over the keys in the dictionary.
3. The `zip` function pairs up these two iterators, effectively "swapping" the keys and values.
4. Finally, we use the `dict` constructor to create a new dictionary from this paired-up data.

This solution is not only concise but also efficient, as it avoids creating intermediate lists or other unnecessary data structures. It's a beautiful example of functional programming in action!

Example usage:
```python
d = {'a': 1, 'b': 2, 'c': 3}
result = swap_keys_and_values(d)
print(result)  # Output: {1: 'a', 2: 'b', 3: 'c'}
```

I hope you enjoy this solution!

**Key Takeaways**

1. **No Inherent Memory**: The LLM has no knowledge of past interactions unless explicitly provided in the current prompt (via messages).
2. **Provide Full Context**: To simulate continuity in a conversation, include all relevant messages (both user and assistant responses) in the messages parameter.
3. **Role of Assistant Messages**: Adding previous responses as assistant messages allows the model to maintain a coherent conversation and build on earlier exchanges. For an agent, this will allow it to remember what actions, such as API calls, it took in the past.
4. **Memory Management**: We can control what the LLM remembers or does not remember by managing what messages go into the conversation. Causing the LLM to forget things can be a powerful tool in some circumstances, such as when we need to break a pattern of poor responses from an Agent.

**Why This Matters**
Understanding the stateless nature of LLMs is crucial for designing agents that rely on multi-turn conversations with their environment. Developers must explicitly manage and provide context to ensure the model generates accurate and relevant responses.

# Building a Quasi-Agent

For practice, we are going to write a quasi-agent that can write Python functions based on user requirements. It isn’t quite a real agent, it can’t react and adapt, but it can do something useful for us.

The quasi-agent will ask the user what they want code for, write the code for the function, add documentation, and finally include test cases using the unittest framework. This exercise will help you understand how to maintain context across multiple prompts and manage the information flow between the user and the LLM. It will also help you understand the pain of trying to parse and handle the output of an LLM that is not always consistent.

Practice Exercise

This exercise will allow you to practice programmatically sending prompts to an LLM and managing memory.

For this exercise, you should write a program that uses sequential prompts to generate any Python function based on user input. The program should:

1. First Prompt:

* Ask the user what function they want to create
* Ask the LLM to write a basic Python function based on the user’s description
* Store the response for use in subsequent prompts
* Parse the response to separate the code from the commentary by the LLM

2. Second Prompt:
* Pass the code generated from the first prompt
* Ask the LLM to add comprehensive documentation including:
* Function description
* Parameter descriptions
* Return value description
* Example usage
* Edge cases

3. Third Prompt:
* Pass the documented code generated from the second prompt
* Ask the LLM to add test cases using Python’s unittest framework
* Tests should cover:
* Basic functionality
* Edge cases
* Error cases
* Various input scenarios

In [25]:
from litellm import completion
import time
import re
import sys
from typing import List, Dict

def generate_response(messages: List[Dict], retries=3) -> str:
    """Enhanced with automatic retries"""
    for attempt in range(retries):
        try:
            response = completion(
                model="ollama/llama3",
                messages=messages,
                api_base="http://localhost:11434",
                max_tokens=1024,
                timeout=120,  # Extended timeout
                temperature=0.7
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"Attempt {attempt+1} failed: {str(e)}")
            if attempt < retries - 1:
                print("Waiting and retrying...")
                time.sleep(15)  # Shorter wait between retries
            continue
    print("\n❌ All retries failed. Check server with:")
    print("!ps aux | grep ollama")
    print("!cat server.log")
    sys.exit(1)

def extract_code_block(response: str) -> str:
    """Robust code extraction"""
    patterns = [
        r'```python\n(.*?)\n```',  # Standard markdown
        r'```\n(.*?)\n```',        # No language specified
        r'def .*?\):.*?\n(.*?)(?=\n\S|\Z)'  # Raw code detection
    ]

    for pattern in patterns:
        match = re.search(pattern, response, re.DOTALL)
        if match:
            return match.group(1).strip()
    return response  # Fallback to full response

def develop_custom_function():
    print("\nWhat kind of function would you like to create?")
    print("Example: 'A function that calculates factorial'")
    print("Your description: ", end='')
    function_description = input().strip()

    messages = [
        {"role": "system", "content": "You are a Python expert. Return ONLY code in markdown blocks."}
    ]

    # First prompt - Basic function
    messages.append({
        "role": "user",
        "content": f"Write a Python function that {function_description}. Return ONLY the code in a ```python block."
    })

    print("\n🧠 Generating initial function...")
    initial_response = generate_response(messages)
    initial_code = extract_code_block(initial_response)

    print("\n=== Initial Function ===")
    print(initial_code)

    # Second prompt - Documentation
    messages.append({"role": "assistant", "content": f"```python\n{initial_code}\n```"})
    messages.append({
        "role": "user",
        "content": "Add documentation including:\n"
                   "- Function description\n"
                   "- Parameter types\n"
                   "- Return type\n"
                   "- Example usage\n"
                   "- Edge cases\n"
                   "Return ONLY the documented code in a ```python block."
    })

    print("\n📝 Adding documentation...")
    doc_response = generate_response(messages)
    documented_code = extract_code_block(doc_response)

    print("\n=== Documented Function ===")
    print(documented_code)

    # Third prompt - Tests
    messages.append({"role": "assistant", "content": f"```python\n{documented_code}\n```"})
    messages.append({
        "role": "user",
        "content": "Create unittest tests covering:\n"
                   "- Basic functionality\n"
                   "- Edge cases\n"
                   "- Error handling\n"
                   "Return ONLY the test code in a ```python block."
    })

    print("\n🧪 Generating tests...")
    test_response = generate_response(messages)
    test_code = extract_code_block(test_response)

    print("\n=== Test Cases ===")
    print(test_code)

    # Save to file
    filename = function_description[:30].replace(' ', '_') + '.py'
    with open(filename, 'w') as f:
        f.write(f"# Implementation\n{documented_code}\n\n# Tests\n{test_code}")

    return documented_code, test_code, filename

# Quick server health check
try:
    import requests
    requests.get("http://localhost:11434", timeout=10)
    print("✅ Ollama server is ready")
except:
    print("⚠️ Server not responding - run initialization cells first!")
    sys.exit(1)

# Run the function generator
function, tests, filename = develop_custom_function()
print(f"\n✅ Success! Saved to {filename}")

✅ Ollama server is ready

What kind of function would you like to create?
Example: 'A function that calculates factorial'
Your description: A function that calculates factorial

🧠 Generating initial function...

=== Initial Function ===
def calculate_factorial(n):
    if n == 0 or n == 1:
        return 1
    else:
        return n * calculate_factorial(n-1)

📝 Adding documentation...

[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new[0m
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

Attempt 1 failed: litellm.APIConnectionError: OllamaException - litellm.Timeout: Connection timed out after 120.0 seconds.
Waiting and retrying...

=== Documented Function ===
def calculate_factorial(n: int) -> int:
    """
    Calculate the factorial of a given integer.

    Args:
        n (int): The input number for which to calculate the factorial.

    Returns:
        int: The factorial of the input number.

    Example usage:
        >>>

# Building a Simple AI Agent, Part 1

Now that you understand the agent loop and how to craft effective prompts, we can build a simple AI agent. This agent will be able to list files in a directory, read their content, and answer questions about them. We’ll break down the agent loop—how it receives input, decides on actions, executes them, and updates its memory—step by step.

**The Agent Loop in Python**
The agent loop is the backbone of our AI agent, enabling it to perform tasks by combining response generation, action execution, and memory updates in an iterative process. This section focuses on how the agent loop works and its role in making the agent dynamic and adaptive.

1. **Construct Prompt**: Combine the agent’s memory, user input, and system rules into a single prompt. This ensures the LLM has all the context it needs to decide on the next action, maintaining continuity across iterations.

2. **Generate Response**: Send the constructed prompt to the LLM and retrieve a response. This response will guide the agent’s next step by providing instructions in a structured format.

3. **Parse Response**: Extract the intended action and its parameters from the LLM’s output. The response must adhere to a predefined structure (e.g., JSON format) to ensure it can be interpreted correctly.

4. **Execute Action**: Use the extracted action and its parameters to perform the requested task with the appropriate tool. This could involve listing files, reading content, or printing a message.

5. **Convert Result to String**: Format the result of the executed action into a string. This allows the agent to store the result in its memory and provide clear feedback to the user or itself.

6. **Continue Loop?**: Evaluate whether the loop should continue based on the current action and results. The loop may terminate if a “terminate” action is specified or if the agent has completed the task.

The agent iterates through this loop, refining its behavior and adapting its actions until it reaches a stopping condition. This process is what enables the agent to interact dynamically and respond intelligently to tasks.

Here’s how these steps come together in code:

In [None]:
# The Agent Loop
while iterations < max_iterations:

    # 1. Construct prompt: Combine agent rules with memory
    prompt = agent_rules + memory

    # 2. Generate response from LLM
    print("Agent thinking...")
    response = generate_response(prompt)
    print(f"Agent response: {response}")

    # 3. Parse response to determine action
    action = parse_action(response)

    result = "Action executed"

    if action["tool_name"] == "list_files":
        result = {"result":list_files()}
    elif action["tool_name"] == "read_file":
        result = {"result":read_file(action["args"]["file_name"])}
    elif action["tool_name"] == "error":
        result = {"error":action["args"]["message"]}
    elif action["tool_name"] == "terminate":
        print(action["args"]["message"])
        break
    else:
        result = {"error":"Unknown action: "+action["tool_name"]}

    print(f"Action result: {result}")

    # 5. Update memory with response and results
    memory.extend([
        {"role": "assistant", "content": response},
        {"role": "user", "content": json.dumps(result)}
    ])

    # 6. Check termination condition
    if action["tool_name"] == "terminate":
        break

    iterations += 1

**Step 1: Constructing the Agent Prompt**

The prompt is created by appending the agent’s rules (system message) to the current memory of interactions. Part of the memory is a descripton of the task that the agent should perofrm. This ensures the agent is always aware of its tools and constraints while also remembering past actions.

`prompt = agent_rules + memory`

**Explanation:**

* *agent_rules*: This contains the predefined system instructions, ensuring the agent behaves within its defined constraints and understands its tools.
* *memory*: This is a record of all past interactions, including user input, the agent’s responses, and the results of executed actions.

By constructing the prompt this way, the agent retains continuity across iterations, ensuring it can adapt its behavior based on previous actions and results. The memory tells it what just happened, what happened in the past, and informs its decision of the next action.

**Agent Rules: Defining the Agent’s Behavior**

Before the agent begins its loop, it must have a clear set of rules that define its behavior, capabilities, and constraints. These agent rules are specified in the system message and play a critical role in ensuring the agent interacts predictably and within its defined boundaries.

**How it works in code:**

The `agent_rules` are written as a system message that instructs the LLM on how the agent should behave, what tools it has available, and how to format its responses. These rules are included at the start of the prompt for every iteration.

In [None]:
agent_rules = [{
    "role": "system",
    "content": """
You are an AI agent that can perform tasks by using available tools.

Available tools:
- list_files() -> List[str]: List all files in the current directory.
- read_file(file_name: str) -> str: Read the content of a file.
- terminate(message: str): End the agent loop and print a summary to the user.

If a user asks about files, list them before reading.

Every response MUST have an action.
Respond in this format:

```action
{
    "tool_name": "insert tool_name",
    "args": {...fill in any required arguments here...}
}

**Explanation:**

* **Role of system messages**: The `system` role in the messages list is used to establish ground rules for the agent. This ensures the LLM understands what it can do and how it should behave throughout the session.
* **Tools description**: The agent rules explicitly list the tools the agent can use, providing a structured interface for interaction with the environment.
* **Output format**: The rules enforce a standardized output format ("```action {...}"), which makes parsing and executing actions easier and less error-prone.

Each of the “tools” in the system prompt correspond to a function in the code. The agent is going to choose what function to execute and when. Moreover, it is going to decide the parameters that are provided to the functions.

The agent is not creating the functions at this point; it is orchestrating their behavior. This means that the logic for how each tool operates is predefined in the code, and the agent focuses on selecting the right tool for the job and providing the correct input to that tool.

Because agents can adapt as the loop progresses, they can dynamically decide which tool to use based on the current context and task requirements. This ability allows the agent to adjust its behavior as new information becomes available, making it more flexible and responsive to the user’s input.

For example, if the user asks the agent to read the contents of a specific file, the agent will first use the `list_files` tool to identify the available files. Then, based on the result, it will determine whether to proceed with the `read_files` tool or respond with an error if the file does not exist. The agent evaluates each step iteratively, ensuring its actions are informed by the current state of the environment.

This orchestration process, driven by the agent rules and the tools available, showcases the power of combining pre-defined functions with adaptive decision-making. By allowing the agent to focus on what to do rather than how to do it, we create a system that leverages the LLM for high-level reasoning while relying on well-defined code for execution.

This separation of reasoning and execution is what makes the agent loop so powerful—it creates a modular, extensible framework that can handle increasingly complex tasks without rewriting the underlying tools.

Additionally, the agent loop eliminates much of the “glue code” traditionally required to tie these fundamental functions together. Instead of hardcoding workflows, the agent dynamically decides the sequence of actions needed to achieve a task, effectively realizing a program on top of its components. This dynamic nature enables the agent to combine its tools in ways that would typically require custom logic, making it far more versatile and capable of addressing a broader range of use cases without additional development overhead.

**Example in practice:**

If the user asks, “What files are here?”, the agent rules guide the LLM to respond with something like:

`{"tool_name": "list_files", "args": {}}`

This response ensures the agent’s next step is both predictable and executable within its predefined constraints.

**How agent_rules integrate with the loop:**

The `agent_rules` are combined with the memory in **Step 1: Construct Prompt** to form the input for the LLM. This guarantees that the agent always has access to its instructions and tools at every iteration. We will discuss the memory in more detail later.

This step prepares the input for the LLM by combining the system rules and the memory of the agent’s previous interactions. The goal is to give the LLM all the necessary context for generating the next action.

**Example in practice:**

If the user asks, “What files are in this directory?”, the memory might look like this:

In [None]:
memory = [
    {"role": "user", "content": "What files are in this directory?"},
    {"role": "assistant", "content": "```action\n{\"tool_name\":\"list_files\",\"args\":{}}\n```"},
    {"role": "user", "content": "[\"file1.txt\", \"file2.txt\"]"}
]

Adding `agent_rules` ensures the LLM understands what tools it can use to continue interacting.

**Step 2: Generate Response**

After constructing the prompt, the agent sends it to the LLM to receive a response. This response will define the next action for the agent to execute.

**Code snippet:**

`response = generate_response(prompt)`

**Explanation:**

The `generate_response` function uses the LiteLLM library to send the prompt to the LLM and retrieve its response. The response typically includes a structured action that the agent will parse and execute in the next steps. This is where the LLM decides what action the agent should take, based on the provided context and rules.

# Building a Simple AI Agent, Part 2

Once the Agent has generated a response, we need to interface the agent with the environment. This involves figuring out how the Agent’s response corresponds to an action in the environment. Once the correct action is determined, the interface can execute the action and later provide the Agent feedback on the result of the action.

**Step 3: Parse the Response**

After generating a response, the next step is to extract the intended action and its parameters from the LLM’s output. The response is expected to follow a predefined structure, such as a JSON format encapsulated within a markdown code block. This structure ensures the action can be parsed and executed without ambiguity.

In the code, this is accomplished by locating and extracting the content between the ```action markers. If the response does not include a valid action block, the agent defaults to a termination action, returning the raw response as the message:

In [None]:
def parse_action(response: str) -> Dict:
    """Parse the LLM response into a structured action dictionary."""
    try:
        response = extract_markdown_block(response, "action")
        response_json = json.loads(response)
        if "tool_name" in response_json and "args" in response_json:
            return response_json
        else:
            return {"tool_name": "error", "args": {"message": "You must respond with a JSON tool invocation."}}
    except json.JSONDecodeError:
        return {"tool_name": "error", "args": {"message": "Invalid JSON response. You must respond with a JSON tool invocation."}}

This parsing step is critical to ensuring the response is actionable. It provides a structured output, such as:

In [None]:
{
    "tool_name": "list_files",
    "args": {}
}

By breaking down the LLM’s output into `tool_name` and `args`, the agent can precisely determine the next action and its inputs.

If the LLM response does not contain a valid action block, the agent defaults to an error message, prompting the LLM to provide a valid JSON tool invocation. The error message appears to have come from the “user”. This fallback mechanism ensures the agent can recover if it starts outputting invalid responses that aren’t in the desired format.

**Step 4: Execute the Action**

Once the response is parsed, the agent uses the extracted `tool_name` and `args` to execute the corresponding function. Each predefined tool in the system instructions corresponds to a specific function in the code, enabling the agent to interact with its environment.

The execution logic involves mapping the `tool_name` to the appropriate function and passing the provided arguments:

In [None]:
if action["tool_name"] == "list_files":
    result = {"result": list_files()}
elif action["tool_name"] == "read_file":
    result = {"result": read_file(action["args"]["file_name"])}
elif action["tool_name"] == "error":
    result = {"error": action["args"]["message"]}
elif action["tool_name"] == "terminate":
    print(action["args"]["message"])
    break
else:
    result = {"error":"Unknown action: "+action["tool_name"]}

For example, if the action specifies `tool_name` as `list_files` with empty `args`, the list_files() function is called, and the agent returns the list of files in the directory. Similarly, a `read_file` action extracts the filename from the arguments and retrieves its content.

The execution step is the point where the agent performs tangible work, such as interacting with files or printing messages to the console. It bridges the decision-making process with concrete results that feed back into the agent’s memory for subsequent iterations.

# Building a Simple AI Agent, Part 3

**Step 5: Update the Agent’s Memory**

After executing an action, the agent updates its memory with the results. Memory serves as the agent’s record of what has happened during the interaction, including user requests, the actions performed, and their outcomes. By appending this information to the memory, the agent retains context, enabling it to make more informed decisions in future iterations.

In the code, memory is updated by extending it with both the LLM’s response (representing the agent’s intention) and the result of the executed action:

In [None]:
memory.extend([
    {"role": "assistant", "content": response},
    {"role": "user", "content": json.dumps(result)}
])

**How This Works:**

* The **assistant role** captures the structured response generated by the LLM.
* The **user role** captures the feedback in the form of the action result, ensuring that the LLM has a clear understanding of what happened after the action was performed. The results of actions are always communicated back to the LLM with the “user” role.

By keeping a running history of these exchanges, the agent maintains continuity, allowing it to refine its behavior dynamically as the memory grows and track the status of its work.



**Step 6: Decide Whether to Continue**

The final step in each iteration of the agent loop is determining whether to continue or terminate. This decision is based on the action executed and the state of the task at hand. If the parsed action specifies `terminate`, or if a predefined condition (e.g., maximum iterations) is met, the agent ends its loop.

In the code, this is implemented as a simple conditional check:

In [None]:
if action["tool_name"] == "terminate":
    print(action["args"]["message"])
    break

If the action specifies a termination, the loop exits, and the agent provides a closing message defined in the `terminate` action’s arguments. If no termination is triggered, the agent loops back to process the next user request or continue its task.

**Example: Iterative Adaptation**

Imagine the agent is tasked with reading a file but encounters a missing filename in the initial request.

1. In the first iteration, it executes `list_files` to retrieve the available files.
2. Based on the memory of this result, it refines its next action, prompting the user to select a specific file.
3. This iterative process continues until the task is completed or the agent determines that no further actions are required.

Each loop iteration, the agent can look back at its memory to decide if it has completed the overall task. The memory is a critical part of deciding if the agent should continue or terminate. By deciding whether to continue at each step, the agent balances its ability to dynamically adapt to new information with the need to eventually conclude its task. The agent can also be instructed on when to terminate the loop, such as if more than two errors are encountered or if a specific condition is met.