# Demo 3: The Agent Toolmaker

## Concept: Self-Modification / Hot-swapping

 We'll build an agent that can **create its own tools at runtime** - acquiring capabilities it didn't have 5 minutes ago.

```
Request ‚Üí Tool Missing ‚Üí LLM Writes Tool ‚Üí Hot-swap ‚Üí Request Fulfilled
```

### What We'll Build

An agent that:
1. Starts with basic math tools (add, subtract, multiply)
2. Receives a request it can't handle (sentiment analysis)
3. Detects the missing capability
4. Writes a new tool using an LLM
5. Hot-reloads the tool library
6. Completes the original request

### The Key Concept: Open-Endedness

The system gains complexity with use. Each new tool persists, making the agent more capable over time.

---

### Related Papers

- **Voyager**: An Open-Ended Embodied Agent with Large Language Models  
  [arXiv:2305.16291](https://arxiv.org/abs/2305.16291)

- **MetaGPT**: Meta Programming for A Multi-Agent Collaborative Framework  
  [arXiv:2308.00352](https://arxiv.org/abs/2308.00352)

## Setup

You have three options for the LLM provider:

### Option 1: OpenAI (Recommended)
1. Get an API key from [platform.openai.com](https://platform.openai.com)
2. Add secret `OPENAI_API_KEY` in Colab Secrets (key icon in sidebar)

### Option 2: Google Gemini (FREE)
1. Get a free API key from [Google AI Studio](https://aistudio.google.com/apikey)
2. Add secret `GEMINI_API_KEY` in Colab Secrets
3. In the next cell, comment out the OpenAI section and uncomment the Gemini section

### Option 3: Groq (FREE - Very Fast)
1. Get a free API key from [console.groq.com](https://console.groq.com)
2. Add secret `GROQ_API_KEY` in Colab Secrets
3. In the next cell, comment out the OpenAI section and uncomment the Groq section
4. Uses Llama 3.1 model

We'll also install TextBlob for sentiment analysis (which the agent will learn to use!).

In [None]:
# ============================================================
# OPTION 0: Ollama locally installed
# ============================================================
#!pip install ollama -q
import ollama as client
MODEL='qwen2.5:0.5b'      # ollama pull qwen2.5:0.5b
MODEL='granite4:350m'     # ollama pull granite4:350m
MODEL='granite4:1b'       # ollama pull granite4:1b
MODEL='gemma3n:e2b'       # ollama pull gemma3n:e2b

response = client.chat(model=MODEL, 
                       messages=[{'role': 'user', 'content': 'Hello, how are you?'}],
                       options={ 'temperature': 1 },
                      )

print(response['message']['content'])
print()
print("Setup complete! Using Ollama")

In [None]:
# Install TextBlob dependencies (needed for all options)
# !pip install textblob -q
# !python -m textblob.download_corpora lite

import nltk
nltk.download('punkt', quiet=True)
nltk.download('punkt_tab', quiet=True)
nltk.download('averaged_perceptron_tagger', quiet=True)

# OPTION 0: OpenAI (Recommended - requires API key with credits)
# ============================================================
#!pip install ollama -q
import ollama as client


# ============================================================
# OPTION 1: OpenAI (Recommended - requires API key with credits)
# ============================================================
# !pip install openai -q

# from google.colab import userdata
# from openai import OpenAI
# import importlib
# import re
# import os

# client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))

# print("Setup complete! Using OpenAI")

# ============================================================
# OPTION 2: Google Gemini (FREE - uncomment below, comment above)
# ============================================================
# !pip install google-generativeai -q
#
# from google.colab import userdata
# import google.generativeai as genai
# import importlib
# import re
# import os
#
# genai.configure(api_key=userdata.get('GEMINI_API_KEY'))
#
# # Wrapper class to make Gemini API compatible with OpenAI-style calls
# class GeminiClient:
#     def __init__(self):
#         self._model = genai.GenerativeModel('gemini-1.5-flash')
#
#     class _Completions:
#         def __init__(self, model):
#             self._model = model
#
#         def create(self, model=None, messages=None, temperature=0.7, **kwargs):
#             prompt_parts = []
#             for msg in messages:
#                 role = msg.get('role', 'user')
#                 content = msg.get('content', '')
#                 if role == 'system':
#                     prompt_parts.append(f"Instructions: {content}")
#                 else:
#                     prompt_parts.append(content)
#
#             prompt = "\n\n".join(prompt_parts)
#
#             response = self._model.generate_content(
#                 prompt,
#                 generation_config=genai.GenerationConfig(temperature=temperature)
#             )
#
#             class Message:
#                 def __init__(self, text):
#                     self.content = text
#
#             class Choice:
#                 def __init__(self, text):
#                     self.message = Message(text)
#
#             class Response:
#                 def __init__(self, text):
#                     self.choices = [Choice(text)]
#
#             return Response(response.text)
#
#     @property
#     def chat(self):
#         class Chat:
#             def __init__(chat_self):
#                 chat_self.completions = GeminiClient._Completions(self._model)
#         return Chat()
#
# client = GeminiClient()
#
# print("Setup complete! Using Google Gemini (FREE)")

# ============================================================
# OPTION 3: Groq (FREE - very fast, uncomment below, comment above)
# ============================================================
# !pip install openai -q
#
# from google.colab import userdata
# from openai import OpenAI
# import importlib
# import re
# import os
#
# client = OpenAI(
#     api_key=userdata.get('GROQ_API_KEY'),
#     base_url="https://api.groq.com/openai/v1"
# )
#
# # IMPORTANT: When using Groq, change the model in API calls from
# # "gpt-4o-mini" to "openai/gpt-oss-20b"
#
# print("Setup complete! Using Groq (FREE)")

Finished.


[nltk_data] Downloading package brown to
[nltk_data]     C:\Users\Usuario\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\brown.zip.
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\Usuario\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Usuario\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     C:\Users\Usuario\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger_eng.zip.


## Creating the Initial Tool Library

We'll create a `tools.py` file that contains the agent's initial capabilities. This file will be **modified at runtime** when the agent learns new skills.

Key components:
- Individual tool functions
- A `TOOLS` registry dictionary that maps names to functions

In [1]:
# Create the initial tools module
tools_code = '''"""Agent's tool library - can be extended at runtime!"""

def add(a: float, b: float) -> float:
    """Add two numbers together."""
    return a + b

def subtract(a: float, b: float) -> float:
    """Subtract b from a."""
    return a - b

def multiply(a: float, b: float) -> float:
    """Multiply two numbers together."""
    return a * b

# Registry of available tools
TOOLS = {
    "add": add,
    "subtract": subtract,
    "multiply": multiply,
}
'''

# Write the tools file
with open("tools.py", "w") as f:
    f.write(tools_code)

print("‚úÖ Created tools.py with basic math tools")

# Import the tools module
import tools
print(f"üì¶ Available tools: {list(tools.TOOLS.keys())}")

‚úÖ Created tools.py with basic math tools
üì¶ Available tools: ['add', 'subtract', 'multiply']


## The Toolmaker Agent

This is our self-improving agent. It has three key capabilities:

1. **`execute_tool`**: Run a tool by name (raises `ToolNotFoundError` if missing)
2. **`create_new_tool`**: Ask the LLM to write a new tool function
3. **`add_tool_to_library`**: Modify `tools.py` and hot-reload it

The magic is in `importlib.reload()` - it reloads a Python module without restarting the interpreter!

In [None]:
class ToolNotFoundError(Exception):
    """Raised when the agent tries to use a tool that doesn't exist."""
    pass


class ToolmakerAgent:
    """
    An agent that can create its own tools at runtime.

    When a tool is missing, it uses an LLM to write the tool,
    adds it to tools.py, and hot-reloads the module.
    """

    def __init__(self):
        self.reload_tools()

    def reload_tools(self):
        """Hot-reload the tools module to pick up new tools."""
        import tools
        importlib.reload(tools)
        self.tools = tools.TOOLS
        print(f"üîÑ Tools reloaded: {list(self.tools.keys())}")

    def execute_tool(self, tool_name: str, **kwargs):
        """
        Execute a tool by name.

        Args:
            tool_name: Name of the tool to execute
            **kwargs: Arguments to pass to the tool

        Returns:
            The tool's return value

        Raises:
            ToolNotFoundError: If the tool doesn't exist
        """
        if tool_name not in self.tools:
            raise ToolNotFoundError(f"Tool '{tool_name}' not found. Available: {list(self.tools.keys())}")
        return self.tools[tool_name](**kwargs)

    def create_new_tool(self, tool_name: str, description: str) -> str:
        """
        Ask the LLM to write a new tool function.

        Args:
            tool_name: Name for the new function
            description: What the function should do

        Returns:
            The generated Python function code as a string
        """
        prompt = f"""Create a Python function called `{tool_name}` that does the following:
                    {description}

                    Requirements:
                    - Use simple, standard libraries (textblob for NLP is available via `from textblob import TextBlob`)
                    - Include a docstring explaining what the function does
                    - Include type hints for parameters and return value
                    - Return a clear, structured result (dict for complex data)
                    - Handle edge cases gracefully

                    Return ONLY the function code, no imports (they should be inside the function if needed), no explanation.
                    """

        response = client.chat(
            model=MODEL, 
            messages=[{"role": "user", "content": prompt}],
            options={ 'temperature': 0 }, # Deterministic for reliable code
        )
        func_code = response.message.content

        # response = client.chat.completions.create(
        #     model="gpt-4o-mini",
        #     messages=[{"role": "user", "content": prompt}],
        #     temperature=0  # Deterministic for reliable code
        # )
        #func_code = response.choices[0].message.content

        # Clean up code block markers if present
        if "```python" in func_code:
            func_code = func_code.split("```python")[1].split("```")[0]
        elif "```" in func_code:
            func_code = func_code.split("```")[1].split("```")[0]

        return func_code.strip()

    def _extract_imports(self, code: str) -> tuple[list[str], str]:
        """
        Extract import statements from code and return them separately.

        Args:
            code: Python code that may contain import statements

        Returns:
            Tuple of (list of import lines, code without imports)
        """
        lines = code.split('\n')
        imports = []
        other_lines = []

        for line in lines:
            stripped = line.strip()
            if stripped.startswith('import ') or stripped.startswith('from '):
                # Use stripped version to avoid indentation issues
                imports.append(stripped)
            else:
                other_lines.append(line)

        # Remove leading empty lines from other_lines
        while other_lines and not other_lines[0].strip():
            other_lines.pop(0)

        return imports, '\n'.join(other_lines)

    def add_tool_to_library(self, tool_name: str, func_code: str):
        """
        Add a new tool to tools.py and hot-reload.

        This modifies the tools.py file to:
        1. Extract any imports and add them to the top of the file
        2. Add the new function definition
        3. Register it in the TOOLS dictionary

        Args:
            tool_name: Name of the new tool
            func_code: The Python function code (may include imports)
        """
        # Extract imports from the generated code
        new_imports, func_code_clean = self._extract_imports(func_code)

        # Read current tools.py
        with open("tools.py", "r") as f:
            current_code = f.read()

        # If there are new imports, add them after the docstring
        if new_imports:
            # Find the end of the docstring
            docstring_end = current_code.find('"""', 3) + 3  # Find closing """

            # Get existing imports to avoid duplicates
            existing_imports = set()
            for line in current_code.split('\n'):
                stripped = line.strip()
                if stripped.startswith('import ') or stripped.startswith('from '):
                    existing_imports.add(stripped)

            # Filter out duplicate imports
            unique_new_imports = [imp for imp in new_imports if imp not in existing_imports]

            if unique_new_imports:
                import_block = '\n'.join(unique_new_imports)
                current_code = (
                    current_code[:docstring_end] +
                    '\n' + import_block +
                    current_code[docstring_end:]
                )
                print(f"üì• Added imports: {unique_new_imports}")

        # Add the new function before the TOOLS registry
        tools_dict_line = "# Registry of available tools"

        new_code = current_code.replace(
            tools_dict_line,
            f"{func_code_clean}\n\n{tools_dict_line}"
        )

        # Update TOOLS dict to include the new function
        new_code = new_code.replace(
            "TOOLS = {",
            f'TOOLS = {{\n    "{tool_name}": {tool_name},'
        )

        # Write the modified file
        with open("tools.py", "w") as f:
            f.write(new_code)

        print(f"‚úÖ Added '{tool_name}' to tools.py")

        # Hot-reload to make the new tool available
        self.reload_tools()

## Testing the Agent's Existing Tools

Let's verify the agent works with its built-in math tools.

In [11]:
import importlib
import ollama
# Create the agent
agent = ToolmakerAgent()

print("\nüßÆ Testing existing tools:")
print(f"  add(5, 3) = {agent.execute_tool('add', a=5, b=3)}")
print(f"  subtract(10, 4) = {agent.execute_tool('subtract', a=10, b=4)}")
print(f"  multiply(7, 8) = {agent.execute_tool('multiply', a=7, b=8)}")

üîÑ Tools reloaded: ['add', 'subtract', 'multiply']

üßÆ Testing existing tools:
  add(5, 3) = 8
  subtract(10, 4) = 6
  multiply(7, 8) = 56


## The Self-Improvement Demo

Now let's ask the agent to do something it **can't do yet**: sentiment analysis.

Watch what happens:
1. The agent tries to use `analyze_sentiment`
2. It fails with `ToolNotFoundError`
3. The agent writes a new tool using the LLM
4. It adds the tool to its library
5. It successfully completes the request!

In [None]:
print("\n" + "=" * 60)
print("ü§î Let's ask for sentiment analysis...")
print("=" * 60 + "\n")

try:
    # This will fail - the tool doesn't exist yet!
    result = agent.execute_tool("analyze_sentiment", text="I love this workshop!")
    print(f"Result: {result}")

except ToolNotFoundError as e:
    print(f"‚ùå {e}")
    print("\nüîß Agent is creating a new tool...\n")

    # The agent creates the tool itself!
    func_code = agent.create_new_tool(
        tool_name="analyze_sentiment",
        description="""Analyze the sentiment of a text string.
                    Use the TextBlob library for analysis.
                    Return a dictionary with:
                    - 'polarity': float from -1 (negative) to 1 (positive)
                    - 'subjectivity': float from 0 (objective) to 1 (subjective)
                    - 'label': string 'positive', 'negative', or 'neutral' based on polarity"""
    )

    print("üìù Generated function:")
    print("-" * 40)
    print(func_code)
    print("-" * 40)

    # Add to the tool library
    print("\nüíæ Adding to tools.py...")
    agent.add_tool_to_library("analyze_sentiment", func_code)


ü§î Let's ask for sentiment analysis...

‚ùå Tool 'analyze_sentiment' not found. Available: ['add', 'subtract', 'multiply']

üîß Agent is creating a new tool...

üìù Generated function:
----------------------------------------
3
import textblob

def analyze_sentiment(text: str) -> dict:
    """
    Analyze the sentiment of a given text string using TextBlob library.

    Parameters:
    - text (str): The text to analyze for sentiment.

    Returns:
    - dict: A dictionary containing the polarity, subjectivity,
            and a label indicating if it's positive, negative or neutral.
    """

    # Use TextBlob to get the sentiment analysis
    blob = textblob.TextBlob(text)
    
    # Convert polarity from 0-1 to -1-1 for consistency in our function
    blob.polarity = -blob.sentiment.polarity

    # Return the result as a dictionary, where 'polarity', 'subjectivity', and 'label' are fields
    return {
        "polarity": blob.polarity,
        "subjectivity": blob.sentiment.subj

## Using the Newly Created Tool

The agent now has sentiment analysis capabilities! Let's test it on various texts.

In [13]:
print("\nüéâ Testing the newly created tool:\n")

test_texts = [
    "The weather in Barcelona is absolutely perfect today!",
    "I can't believe how amazing La Sagrada Familia looks.",
    "This PyDay workshop is incredibly inspiring!",
    "Python is a programming language.",
    "I'm so frustrated with these bugs!"
]

for text in test_texts:
    result = agent.execute_tool("analyze_sentiment", text=text)
    print(f'üìù "{text}"')
    print(f"   ‚Üí {result}")
    print()


üéâ Testing the newly created tool:

üìù "The weather in Barcelona is absolutely perfect today!"
   ‚Üí {'polarity': -1.0, 'subjectivity': 1.0, 'label': 'positive'}

üìù "I can't believe how amazing La Sagrada Familia looks."
   ‚Üí {'polarity': -0.6000000000000001, 'subjectivity': 0.9, 'label': 'positive'}

üìù "This PyDay workshop is incredibly inspiring!"
   ‚Üí {'polarity': -0.625, 'subjectivity': 1.0, 'label': 'positive'}

üìù "Python is a programming language."
   ‚Üí {'polarity': -0.0, 'subjectivity': 0.0, 'label': 'neutral'}

üìù "I'm so frustrated with these bugs!"
   ‚Üí {'polarity': 0.875, 'subjectivity': 0.2, 'label': 'negative'}



## Inspecting the Modified Tools

Let's look at how `tools.py` was modified. The agent literally rewrote its own code!

In [14]:
print("üìÑ Current tools.py content:")
print("=" * 60)

with open("tools.py", "r") as f:
    print(f.read())

üìÑ Current tools.py content:
"""Agent's tool library - can be extended at runtime!"""
import textblob

def add(a: float, b: float) -> float:
    """Add two numbers together."""
    return a + b

def subtract(a: float, b: float) -> float:
    """Subtract b from a."""
    return a - b

def multiply(a: float, b: float) -> float:
    """Multiply two numbers together."""
    return a * b

3

def analyze_sentiment(text: str) -> dict:
    """
    Analyze the sentiment of a given text string using TextBlob library.

    Parameters:
    - text (str): The text to analyze for sentiment.

    Returns:
    - dict: A dictionary containing the polarity, subjectivity,
            and a label indicating if it's positive, negative or neutral.
    """

    # Use TextBlob to get the sentiment analysis
    blob = textblob.TextBlob(text)
    
    # Convert polarity from 0-1 to -1-1 for consistency in our function
    blob.polarity = -blob.sentiment.polarity

    # Return the result as a dictionary, where 

## Bonus: Creating Another Tool

Let's demonstrate that the agent can keep learning! Let's ask it for a word counting tool.

In [15]:
print("\n" + "=" * 60)
print("üÜï Let's teach the agent to count words...")
print("=" * 60 + "\n")

try:
    result = agent.execute_tool("count_words", text="Hello world")
except ToolNotFoundError as e:
    print(f"‚ùå {e}")
    print("\nüîß Creating word count tool...\n")

    func_code = agent.create_new_tool(
        tool_name="count_words",
        description="""Count words in a text string.
Return a dictionary with:
- 'total_words': total number of words
- 'unique_words': number of unique words (case-insensitive)
- 'char_count': total characters (excluding spaces)"""
    )

    print("üìù Generated function:")
    print(func_code)
    print()

    agent.add_tool_to_library("count_words", func_code)

# Now use it
print("\nüß™ Testing word count tool:\n")
test_text = "The quick brown fox jumps over the lazy dog. The dog was very lazy."
result = agent.execute_tool("count_words", text=test_text)
print(f'Text: "{test_text}"')
print(f"Result: {result}")

print(f"\nüì¶ Final tool inventory: {list(agent.tools.keys())}")


üÜï Let's teach the agent to count words...

‚ùå Tool 'count_words' not found. Available: ['analyze_sentiment', 'add', 'subtract', 'multiply']

üîß Creating word count tool...

üìù Generated function:
from textblob import TextBlob

def count_words(text: str) -> dict:
    """
    Count words in a given text.
    
    Parameters:
    - text (str): The text string to analyze.
    
    Returns:
    - dict: A dictionary containing the following keys:
        total_words: Total number of unique words
        unique_words: Number of unique words
        char_count: Total characters excluding spaces
    """
    # Initialize a TextBlob object for the text
    blob = TextBlob(text)
    
    # Convert all words to lowercase and filter out punctuation
    word_counts = {word.lower(): count for word, count in blob.words}
    
    return {
        'total_words': len(word_counts),
        'unique_words': len(word_counts),
        'char_count': len(''.join(blob.words))
    }

# Function to check t

ValueError: too many values to unpack (expected 2)

## Key Takeaways

### What We Learned

1. **Open-Ended Systems**: The agent grows more capable with each interaction. It started with 3 tools and now has 5 (or more!).

2. **Hot-Swapping with `importlib.reload()`**: Python lets us reload modules at runtime without restarting. This enables true self-modification.

3. **Tool Discovery Pattern**: When a capability is missing, the agent:
   - Detects the gap (ToolNotFoundError)
   - Synthesizes a solution (LLM writes code)
   - Integrates it (add to tools.py)
   - Persists it (the tool exists for future use)

### Safety Considerations

‚ö†Ô∏è **This is powerful but dangerous!**

In production, you must:
- **Sandbox generated code**: Run in isolated environments (containers, VMs)
- **Validate before execution**: Check for dangerous operations (file deletion, network access, etc.)
- **Human review**: Critical tools should be reviewed before deployment
- **Audit trail**: Log all self-modifications for debugging and security
- **Rollback capability**: Keep backups of previous tool versions

### Real-World Applications

- **Voyager (Minecraft AI)**: Creates new skills as it explores the game world
- **Auto-GPT style agents**: Expand capabilities based on task requirements
- **Code assistants**: Learn new patterns from codebases they interact with
- **Domain-specific agents**: Acquire specialized tools for niche tasks

### The Big Picture

We've seen three levels of code self-improvement:

1. **Demo 1 (Self-Healer)**: Fix broken code using error feedback
2. **Demo 2 (Evolution)**: Optimize code through selection and mutation
3. **Demo 3 (Toolmaker)**: Create entirely new capabilities on demand

Together, these patterns point toward **truly adaptive software** - systems that improve, evolve, and grow with use.

---

**Thank you for attending this workshop!** üéâ

Questions? Let's discuss!