<a href="https://colab.research.google.com/github/Kishan-Kumar-Zalavadia/AgenticAI_Experiments/blob/main/2_First_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Building a Simple AI Agent

## *The agent can list files in a given directory, read their content, and answer questions about them.*


> If you wish to see the agent code, it's at the end.



This agent will be able to list files in a directory, read their content, and answer questions about them. We’ll break down the agent loop—how it receives input, decides on actions, executes them, and updates its memory—step by step.


# The Agent Loop in Python

The agent loop is the backbone of our AI agent, enabling it to perform tasks by combining response generation, action execution, and memory updates in an iterative process. This section focuses on how the agent loop works and its role in making the agent dynamic and adaptive.

1. **Construct Prompt**: Combine the agent’s memory, user input, and system rules into a single prompt. This ensures the LLM has all the context it needs to decide on the next action, maintaining continuity across iterations.

3. **Generate Response**: Send the constructed prompt to the LLM and retrieve a response. This response will guide the agent’s next step by providing instructions in a structured format.

4. **Parse Response**: Extract the intended action and its parameters from the LLM’s output. The response must adhere to a predefined structure (e.g., JSON format) to ensure it can be interpreted correctly.

5. **Execute Action**: Use the extracted action and its parameters to perform the requested task with the appropriate tool. This could involve listing files, reading content, or printing a message.

6. **Convert Result to String**: Format the result of the executed action into a string. This allows the agent to store the result in its memory and provide clear feedback to the user or itself.

7. **Continue Loop?**: Evaluate whether the loop should continue based on the current action and results. The loop may terminate if a “terminate” action is specified or if the agent has completed the task.

The agent iterates through this loop, refining its behavior and adapting its actions until it reaches a stopping condition. This process is what enables the agent to interact dynamically and respond intelligently to tasks.

Here’s how these steps come together in code:

```
# The Agent Loop
while iterations < max_iterations:

    # 1. Construct prompt: Combine agent rules with memory
    prompt = agent_rules + memory

    # 2. Generate response from LLM
    print("Agent thinking...")
    response = generate_response(prompt)
    print(f"Agent response: {response}")

    # 3. Parse response to determine action
    action = parse_action(response)

    result = "Action executed"

    if action["tool_name"] == "list_files":
        result = {"result":list_files()}
    elif action["tool_name"] == "read_file":
        result = {"result":read_file(action["args"]["file_name"])}
    elif action["tool_name"] == "error":
        result = {"error":action["args"]["message"]}
    elif action["tool_name"] == "terminate":
        print(action["args"]["message"])
        break
    else:
        result = {"error":"Unknown action: "+action["tool_name"]}

    print(f"Action result: {result}")

    # 5. Update memory with response and results
    memory.extend([
        {"role": "assistant", "content": response},
        {"role": "user", "content": json.dumps(result)}
    ])

    # 6. Check termination condition
    if action["tool_name"] == "terminate":
        break

    iterations += 1
  ```

# **Step 1: Constructing the Agent Prompt**

The prompt is created by appending the agent’s rules (system message) to the current memory of interactions. Part of the memory is a descripton of the task that the agent should perofrm. This ensures the agent is always aware of its tools and constraints while also remembering past actions.

```
prompt = agent_rules + memory
```

**Explanation:**

- **agent_rules**: This contains the predefined system instructions, ensuring the agent behaves within its defined constraints and understands its tools.
- **memory**: This is a record of all past interactions, including user input, the agent’s responses, and the results of executed actions.


By constructing the prompt this way, the agent retains continuity across iterations, ensuring it can adapt its behavior based on previous actions and results. The memory tells it what just happened, what happened in the past, and informs its decision of the next action.

## Agent Rules: Defining the Agent’s Behavior

Before the agent begins its loop, it must have a clear set of rules that define its behavior, capabilities, and constraints. These agent rules are specified in the system message and play a critical role in ensuring the agent interacts predictably and within its defined boundaries.

**How it works in code:**

The agent_rules are written as a system message that instructs the LLM on how the agent should behave, what tools it has available, and how to format its responses. These rules are included at the start of the prompt for every iteration.

## Agent Rules: Defining the Agent’s Behavior

Before the agent begins its loop, it must have a clear set of rules that define its behavior, capabilities, and constraints. These agent rules are specified in the system message and play a critical role in ensuring the agent interacts predictably and within its defined boundaries.

```
agent_rules = [{
    "role": "system",
    "content": """
You are an AI agent that can perform tasks by using available tools.

Available tools:
- list_files() -> List[str]: List all files in the current directory.
- read_file(file_name: str) -> str: Read the content of a file.
- terminate(message: str): End the agent loop and print a summary to the user.

If a user asks about files, list them before reading.

Every response MUST have an action.
Respond in this format:

```action
{
    "tool_name": "insert tool_name",
    "args": {...fill in any required arguments here...}
}
```

**Explanation:**

- **Role of system messages**: The system role in the messages list is used to establish ground rules for the agent. This ensures the LLM understands what it can do and how it should behave throughout the session.

- **Tools description**: The agent rules explicitly list the tools the agent can use, providing a structured interface for interaction with the environment.

- **Output format**: The rules enforce a standardized output format ("```action {...}"), which makes parsing and executing actions easier and less error-prone.


Each of the “tools” in the system prompt correspond to a function in the code. The agent is going to choose what function to execute and when. Moreover, it is going to decide the parameters that are provided to the functions.

The agent is not creating the functions at this point; it is orchestrating their behavior. This means that the logic for how each tool operates is predefined in the code, and the agent focuses on selecting the right tool for the job and providing the correct input to that tool.

Because agents can adapt as the loop progresses, they can dynamically decide which tool to use based on the current context and task requirements. This ability allows the agent to adjust its behavior as new information becomes available, making it more flexible and responsive to the user’s input.

For example, if the user asks the agent to read the contents of a specific file, the agent will first use the list_files tool to identify the available files. Then, based on the result, it will determine whether to proceed with the read_file tool or respond with an error if the file does not exist. The agent evaluates each step iteratively, ensuring its actions are informed by the current state of the environment.

This orchestration process, driven by the agent rules and the tools available, showcases the power of combining pre-defined functions with adaptive decision-making. By allowing the agent to focus on what to do rather than how to do it, we create a system that leverages the LLM for high-level reasoning while relying on well-defined code for execution.

This separation of reasoning and execution is what makes the agent loop so powerful—it creates a modular, extensible framework that can handle increasingly complex tasks without rewriting the underlying tools.

Additionally, the agent loop eliminates much of the “glue code” traditionally required to tie these fundamental functions together. Instead of hardcoding workflows, the agent dynamically decides the sequence of actions needed to achieve a task, effectively realizing a program on top of its components. This dynamic nature enables the agent to combine its tools in ways that would typically require custom logic, making it far more versatile and capable of addressing a broader range of use cases without additional development overhead.

**Example in practice:**

If the user asks, “What files are here?”, the agent rules guide the LLM to respond with something like:

```
{"tool_name": "list_files", "args": {}}
```

This response ensures the agent’s next step is both predictable and executable within its predefined constraints.

### How agent_rules integrate with the loop:

The agent_rules are combined with the memory in Step 1: Construct Prompt to form the input for the LLM. This guarantees that the agent always has access to its instructions and tools at every iteration. We will discuss the memory in more detail later.

This step prepares the input for the LLM by combining the system rules and the memory of the agent’s previous interactions. The goal is to give the LLM all the necessary context for generating the next action.

**Example in practice:**

If the user asks, “What files are in this directory?”, the memory might look like this:

```
memory = [
    {"role": "user", "content": "What files are in this directory?"},
    {"role": "assistant", "content": "```action\n{\"tool_name\":\"list_files\",\"args\":{}}\n```"},
    {"role": "user", "content": "[\"file1.txt\", \"file2.txt\"]"}
]
```

Adding agent_rules ensures the LLM understands what tools it can use to continue interacting.



# **Step 2: Generate Response**

After constructing the prompt, the agent sends it to the LLM to receive a response. This response will define the next action for the agent to execute.

**Code snippet:**

```
response = generate_response(prompt)
```

**Explanation**:

The generate_response function uses the LiteLLM library to send the prompt to the LLM and retrieve its response. The response typically includes a structured action that the agent will parse and execute in the next steps. This is where the LLM decides what action the agent should take, based on the provided context and rules.

Once the Agent has generated a response, we need to interface the agent with the environment. This involves figuring out how the Agent’s response corresponds to an action in the environment. Once the correct action is determined, the interface can execute the action and later provide the Agent feedback on the result of the action.

# **Step 3: Parse the Response**

After generating a response, the next step is to extract the intended action and its parameters from the LLM’s output. The response is expected to follow a predefined structure, such as a JSON format encapsulated within a markdown code block. This structure ensures the action can be parsed and executed without ambiguity.

In the code, this is accomplished by locating and extracting the content between the ```action markers. If the response does not include a valid action block, the agent defaults to a termination action, returning the raw response as the message:

```
def parse_action(response: str) -> Dict:
    """Parse the LLM response into a structured action dictionary."""
    try:
        response = extract_markdown_block(response, "action")
        response_json = json.loads(response)
        if "tool_name" in response_json and "args" in response_json:
            return response_json
        else:
            return {"tool_name": "error", "args": {"message": "You must respond with a JSON tool invocation."}}
    except json.JSONDecodeError:
        return {"tool_name": "error", "args": {"message": "Invalid JSON response. You must respond with a JSON tool invocation."}}
```
This parsing step is critical to ensuring the response is actionable. It provides a structured output, such as:

```
{
    "tool_name": "list_files",
    "args": {}
}
```

By breaking down the LLM’s output into tool_name and args, the agent can precisely determine the next action and its inputs.

If the LLM response does not contain a valid action block, the agent defaults to an error message, prompting the LLM to provide a valid JSON tool invocation. The error message appears to have come from the “user”. This fallback mechanism ensures the agent can recover if it starts outputting invalid responses that aren’t in the desired format.

# **Step 4: Execute the Action**

Once the response is parsed, the agent uses the extracted tool_name and args to execute the corresponding function. Each predefined tool in the system instructions corresponds to a specific function in the code, enabling the agent to interact with its environment.

The execution logic involves mapping the tool_name to the appropriate function and passing the provided arguments:

```
if action["tool_name"] == "list_files":
    result = {"result": list_files()}
elif action["tool_name"] == "read_file":
    result = {"result": read_file(action["args"]["file_name"])}
elif action["tool_name"] == "error":
    result = {"error": action["args"]["message"]}
elif action["tool_name"] == "terminate":
    print(action["args"]["message"])
    break
else:
    result = {"error":"Unknown action: "+action["tool_name"]}
```

For example, if the action specifies tool_name as list_files with empty args, the list_files() function is called, and the agent returns the list of files in the directory. Similarly, a read_file action extracts the filename from the arguments and retrieves its content.

The execution step is the point where the agent performs tangible work, such as interacting with files or printing messages to the console. It bridges the decision-making process with concrete results that feed back into the agent’s memory for subsequent iterations.

# **Step 5: Update the Agent’s Memory**

After executing an action, the agent updates its memory with the results. Memory serves as the agent’s record of what has happened during the interaction, including user requests, the actions performed, and their outcomes. By appending this information to the memory, the agent retains context, enabling it to make more informed decisions in future iterations.

In the code, memory is updated by extending it with both the LLM’s response (representing the agent’s intention) and the result of the executed action:

```
memory.extend([
    {"role": "assistant", "content": response},
    {"role": "user", "content": json.dumps(result)}
])
```

**How This Works:**

- The assistant role captures the structured response generated by the LLM.
- The user role captures the feedback in the form of the action result, ensuring that the LLM has a clear understanding of what happened after the action was performed. The results of actions are always communicated back to the LLM with the “user” role.

By keeping a running history of these exchanges, the agent maintains continuity, allowing it to refine its behavior dynamically as the memory grows and track the status of its work.

# **Step 6: Decide Whether to Continue**

The final step in each iteration of the agent loop is determining whether to continue or terminate. This decision is based on the action executed and the state of the task at hand. If the parsed action specifies terminate, or if a predefined condition (e.g., maximum iterations) is met, the agent ends its loop.

In the code, this is implemented as a simple conditional check:

```
if action["tool_name"] == "terminate":
    print(action["args"]["message"])
    break
```

If the action specifies a termination, the loop exits, and the agent provides a closing message defined in the terminate action’s arguments. If no termination is triggered, the agent loops back to process the next user request or continue its task.

## Example: Iterative Adaptation

Imagine the agent is tasked with reading a file but encounters a missing filename in the initial request.

1. In the first iteration, it executes list_files to retrieve the available files.
2. Based on the memory of this result, it refines its next action, prompting the user to select a specific file.
3. This iterative process continues until the task is completed or the agent determines that no further actions are required.

Each loop iteration, the agent can look back at its memory to decide if it has completed the overall task. The memory is a critical part of deciding if the agent should continue or terminate. By deciding whether to continue at each step, the agent balances its ability to dynamically adapt to new information with the need to eventually conclude its task. The agent can also be instructed on when to terminate the loop, such as if more than two errors are encountered or if a specific condition is met.

In [None]:
!!pip install litellm
import os
from google.colab import userdata

os.environ["GROQ_API_KEY"] = userdata.get("GROQ_API_KEY")

In [18]:
import os
import json
from litellm import completion
from google.colab import userdata

# -----------------------------
# Setup API Key
# -----------------------------
os.environ["GROQ_API_KEY"] = userdata.get("GROQ_API_KEY")


# -----------------------------
# LLM Response Generator
# -----------------------------
def generate_response(messages):
    """Generate a response from the LLM using LiteLLM."""
    response = completion(
        model="groq/llama-3.3-70b-versatile",
        messages=messages,
        max_tokens=1024
    )
    return response.choices[0].message.content


# -----------------------------
# Helper Functions (Tools)
# -----------------------------
def list_files(folder_name):
    """List files inside the given folder."""
    try:
        if not os.path.isdir(folder_name):
            return [f"Error: Folder '{folder_name}' not found."]
        return os.listdir(folder_name)
    except Exception as e:
        return [f"Error listing files: {str(e)}"]


def read_file(folder_name, file_name):
    """Read content of a file inside the given folder."""
    try:
        file_path = os.path.join(folder_name, file_name)
        if not os.path.exists(file_path):
            return f"Error: File '{file_name}' not found in folder '{folder_name}'."
        with open(file_path, 'r') as f:
            return f.read()
    except Exception as e:
        return f"Error reading file {file_name}: {str(e)}"


# -----------------------------
# Utility Functions
# -----------------------------
def extract_markdown_block(text, block_name):
    """Extract content between code fences like ```action ... ```."""
    start_token = f"```{block_name}"
    end_token = "```"
    if start_token in text:
        start = text.index(start_token) + len(start_token)
        end = text.index(end_token, start)
        return text[start:end].strip()
    return text


def parse_action(response):
    """Parse the LLM response into a structured action dictionary."""
    try:
        response = extract_markdown_block(response, "action")
        response_json = json.loads(response)
        if "tool_name" in response_json and "args" in response_json:
            return response_json
        else:
            return {"tool_name": "error", "args": {"message": "Response missing 'tool_name' or 'args'."}}
    except json.JSONDecodeError:
        return {"tool_name": "error", "args": {"message": "Invalid JSON response format."}}
    except Exception as e:
        return {"tool_name": "error", "args": {"message": f"Parsing error: {str(e)}"}}


# -----------------------------
# Define Agent Rules
# -----------------------------
agent_rules = [{
    "role": "system",
    "content": """
You are an AI agent that can perform tasks by using available tools.

Available tools:
- list_files(folder_name: str) -> List[str]: List all files inside the given folder.
- read_file(folder_name: str, file_name: str) -> str: Read the content of a file inside the given folder.
- terminate(message: str): End the agent loop and print a summary to the user.

If a user asks about files, list them before reading.

Every response MUST have an action.
Respond in this format:

```action
{
    "tool_name": "insert_tool_name_here",
    "args": {...fill in required arguments...}
}
"""
}]


# -----------------------------
# Main Agent Loop
# -----------------------------
def run_agent(folder_name, user_input, max_iterations=5):
    memory = [
    {
        "role": "system",
        "content": f"You are a file interaction agent. You can only access files inside the folder '{folder_name}'. "
                   f"Whenever you output an action, always include 'folder_name': '{folder_name}' instead of '.'."
    },
    {"role": "user", "content": user_input}
]
    iterations = 0

    while iterations < max_iterations:
        prompt = agent_rules + memory
        print("\nAgent thinking...")
        response = generate_response(prompt)
        print(f"\n🧠 Agent Response:\n{response}")
        action = parse_action(response)
        print(f"\n🧩 Parsed Action: {action}")

        if action["tool_name"] == "list_files":
            result = {"result": list_files(folder_name)}
        elif action["tool_name"] == "read_file":
            result = {"result": read_file(folder_name, action["args"].get("file_name", ""))}
        elif action["tool_name"] == "error":
            result = {"error": action["args"]["message"]}
        elif action["tool_name"] == "terminate":
            print("\n✅ Agent Terminated:", action["args"]["message"])
            break
        else:
            result = {"error": f"Unknown action: {action['tool_name']}"}

        print(f"\n📄 Action Result:\n{result}")

        memory.extend([
            {"role": "assistant", "content": response},
            {"role": "user", "content": json.dumps(result)}
        ])

        if action["tool_name"] == "terminate":
            break

        iterations += 1

    print("\n💾 Agent Finished Loop.\n")




### Example Input - 1 : Read the file notes.txt

In [19]:

# -----------------------------
# Main Entry Point
# -----------------------------
if __name__ == "__main__":
    print("=== Simple AI Agent (Folder-Specific Version) ===")
    folder_name = input("Enter folder name in the current directory: ").strip()
    user_query = input("Ask the agent something (e.g., 'What files are here?' or 'Read the file notes.txt'): ")
    run_agent(folder_name, user_query)

=== Simple AI Agent (Folder-Specific Version) ===
Enter folder name in the current directory: Sample_Data_2
Ask the agent something (e.g., 'What files are here?' or 'Read the file notes.txt'): Read the file notes.txt

Agent thinking...

🧠 Agent Response:
```action
{
    "tool_name": "list_files",
    "args": {
        "folder_name": "Sample_Data_2"
    }
}
```

🧩 Parsed Action: {'tool_name': 'list_files', 'args': {'folder_name': 'Sample_Data_2'}}

📄 Action Result:
{'result': ['todo.md', 'readme.txt', 'notes.txt', 'data.json', '.ipynb_checkpoints']}

Agent thinking...

🧠 Agent Response:
```action
{
    "tool_name": "read_file",
    "args": {
        "folder_name": "Sample_Data_2",
        "file_name": "notes.txt"
    }
}
```

🧩 Parsed Action: {'tool_name': 'read_file', 'args': {'folder_name': 'Sample_Data_2', 'file_name': 'notes.txt'}}

📄 Action Result:
{'result': 'Meeting notes for project X:\n- Review progress on backend integration\n- Test API responses\n- Update documentation\n'}

A

### Example Input - 2 : Read the file todo.md and also list the tasks that are not completed.

In [20]:

# -----------------------------
# Main Entry Point
# -----------------------------
if __name__ == "__main__":
    print("=== Simple AI Agent (Folder-Specific Version) ===")
    folder_name = input("Enter folder name in the current directory: ").strip()
    user_query = input("Ask the agent something (e.g., 'What files are here?' or 'Read the file notes.txt'): ")
    run_agent(folder_name, user_query)

=== Simple AI Agent (Folder-Specific Version) ===
Enter folder name in the current directory: Sample_Data_2
Ask the agent something (e.g., 'What files are here?' or 'Read the file notes.txt'): Read the file todo.md and also list the tasks that are not completed.

Agent thinking...

🧠 Agent Response:
To read the file and list tasks that are not completed, first, it's necessary to list all files to confirm the existence of "todo.md", and then read its content.

```action
{
    "tool_name": "list_files",
    "args": {
        "folder_name": "Sample_Data_2"
    }
}
```

🧩 Parsed Action: {'tool_name': 'list_files', 'args': {'folder_name': 'Sample_Data_2'}}

📄 Action Result:
{'result': ['todo.md', 'readme.txt', 'notes.txt', 'data.json', '.ipynb_checkpoints']}

Agent thinking...

🧠 Agent Response:
The file "todo.md" exists. Now, we can read its content.

```action
{
    "tool_name": "read_file",
    "args": {
        "folder_name": "Sample_Data_2",
        "file_name": "todo.md"
    }
}
```

