
# Functionsmith
<a target='_blank' href='https://colab.research.google.com/github/google/earthengine-community/blob/master/experimental/functionsmith/functionsmith.ipynb'>   <img src='https://colab.research.google.com/assets/colab-badge.svg' alt='Open In Colab'/> </a>

Functionsmith is a general-purpose problem-solving agent using *dynamic
function calling*.

**USING THIS AGENT IS UNSAFE**. It directly runs LLM-produced code, and thus
should only be used for demonstration purposes. However, Colab serves as
a moderately effective sandbox - the damage would be limited to whatever
this notebook has access to.

See [a sample session output](https://github.com/google/earthengine-community/blob/master/experimental/functionsmith/sample_session.txt).

## Configuration

To run with the default task investigating a CSV file with airport data,
[obtain a Gemini API key](https://ai.google.dev/gemini-api/docs/api-key)
and save it into a [Colab secret](https://colab.sandbox.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) named "GOOGLE_API_KEY".

Run the notebook, then scroll to the end. You will see an empty text area
and the task definition under it. Hit Enter in the task defintion input box
to start processing.

By default, the Gemini API is used. You can switch to Claude, ChatGPT,
or DeepSeek APIs by uncommenting the corresponding LLM class
in the last cell. You will also need to save ANTHROPIC_API_KEY,
OPENAI_API_KEY, or DEEPSEEK_API_KEY secrets.

## Approach

This agent uses dynamic function calling, which means that instead of
relying on a fixed set of tools predefined in the agent
[in normal LLM function calling](https://ai.google.dev/gemini-api/docs/function-calling),
we let the agent itself write with all the functions it needs.

The functionsmith system prompt asks the agent to first write any low-level
function it needs, as well as tests for them. The agent loop will try
to run these functions and ask the LLM to make corrections if necessary.
Once all the functions are ready, the agent will write and run the code
to solve the actual user task.

The agent does not use function calling features of LLM clients. Instead,
it simply tries to parse all the 'python' or 'tool_use' sections
present in the raw LLM output. It keeps all function definitions as well
as their source code in memory. Each call to the LLM is preceded
by the function definitions to let the LLM know what functions are available
locally.

The functions are not saved permanently, though this feature can be added.

## Alternatives

To run a (VERY UNSAFE) command-line version of this agent, run 
`pip install functionsmith`, then run `functionsmith_cli`.

## Attribution

Functionsmith was written by Simon Ilyushchenko (simonf@google.com).
I am grateful to Renee Johnston and other Googlers for implementation advice,
as well as to Earth Engine expert advisors Jeffrey Cardille, Erin Trochim,
Morgan Crowley, and Samapriya Roy, who helped me choose the right training
tasks.

In [None]:
!pip install functionsmith

In [None]:
!wget https://raw.githubusercontent.com/davidmegginson/ourairports-data/refs/heads/main/airports.csv

# Imports

In [None]:
import asyncio
import copy
import enum
import inspect
import os
import sys
import logging
import time

import google.colab
import ipywidgets as widgets
from google.colab import userdata
from IPython.display import display, clear_output, HTML, Javascript

from functionsmith import code_parser
from functionsmith import executor
from functionsmith import llm


# System prompt

In [None]:
system_instruction="""
To solve the task given below, write first low-level python functions with
tests for each of them in a ```python block. Include all the necessary imports.
The tests should be as simple as possible and not rely on anything external.
All asserts in tests should have an error message to make sure their failure is
easy to detect. Do not check for __main__ - just write the top-level code
directly in the output.

In later responses, never omit parts of the code referring to earlier output -
if you need to do this, define a function and then call it later.

I will save the functions locally, and you can write higher-level code that
will invoke them later. I will pass you the output from the code or any error
messages.

Call the task_done() function when you consider the task done.
Ask the user questions if you need additional input.

If I ask you to compute factorial of 10 and then prompt the user if they want
more factorials computed, your responses should be like this (return one
response at a time): Example chat session (each response should be returned in
a separate answer):

    Question 1:
    Please compute the factorial of 10

    Response 1:
    Let's define the requested function and test it.
    ```python
    import math
    def factorial(x):
      return math.factorial(x)
    def test_factorial():
      assert factorial(3) == 6
      assert factorial(4) == 24
      print('success')

    test_factorial()
    ```

    Question 2:
    The code output was "success"

    Response 2:

    Now let's call the previously defined function to solve the user task.
    ```python
      print(factorial(10))
    ```

    Question 3:
    The code output was "3628800"

    Response 3:

    The computed answer looks reasonable. Please enter a number if you want
    another factorial to be computed, or instruct me to exit.

    Question 4:
    You can exit here

    Response 4:

    ```python
      task_done('We can exit')
    ```
"""

# Task

In [None]:
# The header of the CSV file
schema = """
"id","ident","type","name","latitude_deg","longitude_deg","elevation_ft","continent","iso_country","iso_region","municipality","scheduled_service","gps_code","iata_code","local_code","home_link","wikipedia_link","keywords"
"""

task = f"""Please explore a local file airports.csv. First, make some 
hypotheses about the data, and then write code to test them to learn something 
interesting about the data. By 'interesting', I mean something you wouldn't 
have guessed from first principles - eg, finding that the largest countries 
have the most airports is not interesting. Explain why what you discovered 
seems interesting. When done, ask the user if they want to find out something 
else about this file. 

The file has the following schema: {schema}"""

# If you are having problems with the above task, use this simple task defintion
# for debugging the agent.
if False:
  task = """
  Compute the factorial of 20. When done, return a chat message asking the user
  if they want to compute another factorial and compute it if they give you
  a new value.
  """


# Helper classes for IO and code execution

In [None]:
# The agent needs three helper classes:
# * ColabIOManager knows how to interact with Colab
# * Supervisor keeps track of agent state and helps it terminate
# * CustomLoggingHandler captures logs from code parsing and execution

STARS = '*' * 20 + '\n'

class IOState(enum.StrEnum):
  THINKING = 'THINKING'
  RUNNING_CODE = 'RUNNING_CODE'
  WAITING_FOR_USER_INPUT = 'WAITING_FOR_USER_INPUT'
  DONE = 'DONE'

class IOManager:
  """Base class for I/O strategies."""

  def task_done(self, done_message: str='') -> None:
    self.display(f"Agent said: {done_message}\n\nTask Done!")

  def display(self, text: str) -> None:
      raise NotImplementedError

  def set_state(self, state: IOState):
    raise NotImplementedError


class Supervisor:
  """Class responsible for controlling the agent."""

  # A public property indicating whether the agent is running
  running: bool
  _io_manager: IOManager

  def __init__(self, io_manager):
    self.running = False
    self._io_manager = io_manager

  def syscalls(self):
    return [self.task_done]

  def task_done(self, done_message: str='') -> None:
    """Signals the agent that the task is done to terminate execution."""
    self.running = False
    self._io_manager.task_done(done_message)


class ColabIOManager(IOManager):
  """I/O Manager for Google Colab execution.

  This class exists to connect logging output from code parsing and execution
  to the Colab UI.
  """
  def __init__(self):
    self._user_input_handler = None

    # Create output area with a unique ID
    self._output_id = f"output_{int(time.time())}"
    self._output_area = widgets.Output(
        layout=widgets.Layout(
            width='95%',
            height='400px',
            border='1px solid black',
            overflow='auto'
        )
    )
    # Add a unique CSS class to output area
    self._output_area.add_class(self._output_id)

    self._command_input = widgets.Text(
        placeholder='Type your message and press Enter...',
        description='❓',
        value = task,
        layout=widgets.Layout(width='95%')
    )

    # Container for the UI elements
    ui_container = widgets.VBox([
        self._output_area,
        self._command_input,
    ], layout=widgets.Layout(width='100%'))

    # Add CSS styling
    display(HTML("""
    <style>
    .widget-text input[type="text"] {
        width: 100% !important;
        padding: 8px;
        margin: 8px 0;
        box-sizing: border-box;
    }
    .jupyter-widgets-output-area {
        overflow-y: auto !important;
    }
    </style>
    """))

    self._command_input.on_submit(self._on_command)
    display(ui_container)

  def _set_emoji(self, emoji):
    self._command_input.description = emoji

  def set_state(self, state: IOState):
    match state:
      case IOState.THINKING:
        self._set_emoji('🤔')
      case IOState.RUNNING_CODE:
        self._set_emoji('🌎')
      case IOState.WAITING_FOR_USER_INPUT:
        self._set_emoji('❓')
      case IOState.DONE:
        self._set_emoji('✅ ')
      case _:
        self._set_emoji('🦙')

  def _on_command(self, widget):
    """Accepts user input and passes it to the agent."""
    message = widget.value
    if message.strip():
      self.display(f"> {message}")
      widget.value = ''

      if self._user_input_handler:
        self._user_input_handler(message)

  def set_user_input_handler(self, handler):
    """Sets a handler function to be called when the user submits input."""
    self._user_input_handler = handler

  def display(self, text: str) -> None:
    with self._output_area:
      display(HTML(f"<p style='white-space: pre-wrap;'>{text}</p>"))
    self._scroll_to_bottom()

  def _scroll_to_bottom(self):
    js_code = f"""
        requestAnimationFrame(() => {{
            const element = document.querySelector('.{self._output_id}');
            if (element) {{
                element.scrollTop = element.scrollHeight;
            }}
        }});
    """
    display(Javascript(js_code))



class CustomLoggingHandler(logging.Handler):
  """Csustom logging handler that sends agent internal logs to the Colab UI."""
  _io_manager: IOManager

  def __init__(self, io_manager):
    super().__init__(logging.INFO)
    self._io_manager = io_manager

  def emit(self, record):
    msg = self.format(record)
    self._io_manager.display(msg)

# Colab agent

In [None]:
# Stop the agent after this many turns to prevent runaway loops.
MAX_TURNS = 100

class Agent:
  """Main class for running the functionsmith agent."""
  _llm: llm.LLM
  _num_turns: int

  # The twp dictionaries below will contain functions that the LLM can call.
  # The _syscalls dict has system functions - they are defined
  # by the agent beforehand. Their output is not intercepted.
  # The _functions dict will have functions dynamically created by the LLM.
  _syscalls: dict[str, code_parser.Function]
  _functions: dict[str, code_parser.Function]

  _io_manager: IOManager
  _supervisor: Supervisor
  _code_parser: code_parser.Parser
  _code_executor: executor.Executor

  def __init__(self, io_manager: IOManager, llm_interface: llm.LLM):
    self._io_manager = io_manager
    io_manager.set_user_input_handler(self.handle_user_input)
    self._supervisor = Supervisor(io_manager)

    self._llm = llm_interface
    self._syscalls = {}
    self._functions = {}

    logger = self._create_logger()
    self._code_parser = code_parser.Parser(logger)
    self._code_executor = executor.Executor(logger)

    self._extract_syscalls()

  def _create_logger(self):
    logger = logging.getLogger('functionsmith')
    logger.handlers = []
    logger.addHandler(CustomLoggingHandler(self._io_manager))
    logger.propagate = False
    return logger

  def _extract_syscalls(self):
    """Extracts system calls from the IO manager."""
    for method in self._supervisor.syscalls():
      supervisor_syscalls = self._code_parser.extract_functions(
          inspect.getsource(method))
      self._syscalls.update(supervisor_syscalls.functions)

  def _function_signatures(self) -> str:
    all_functions = copy.deepcopy(self._functions)
    all_functions.update(self._syscalls)
    # We tell the LLM about the signatures and docstrings of all the functions
    # available so far, either predefined in the agent as syscalls or defined
    # dynamically during the earlier turns.
    return (
        'The following functions are available:\n' +
        '\n'.join([x.signature() for x in all_functions.values()]))

  def _get_llm_response(self, question: str) -> code_parser.ParsedResponse:
    self._io_manager.set_state(IOState.THINKING)
    question_with_tools = question + self._function_signatures()
    response = self._llm.chat(question_with_tools)
    self._io_manager.display(f"Agent: {response}")
    return self._code_parser.extract_functions(response)

  def _handle_no_code_response(self):
      """Handles the case where the LLM response has no code."""
      self._io_manager.set_state(IOState.WAITING_FOR_USER_INPUT)
      self._supervisor.running = False

  def _execute_code(self, code: str) -> str:
    # We add the code for all the functions defined so far.
    # Only non-syscall source code is used, as we intercept syscalls
    # in execution.
    code_with_tools = (
        '\n'.join([x.code for x in self._functions.values()]) +
        '\n' + code
    )

    sandbox_env = {
      'task_done': self._supervisor.task_done,
    }
    self._io_manager.set_state(IOState.RUNNING_CODE)
    return self._code_executor.run_code(code_with_tools, sandbox_env)

  def handle_user_input(self, user_input):
    question = user_input
    self._supervisor.running = True

    # To respond to user input, we run an infinite loop until one of these
    # things happens:
    # 1. The agent returns a response without any code, which probably means
    #    it's asking the user something.
    # 2. The agent is no longer running, which probably means it thinks
    #    the task is done.
    self._num_turns = 0
    while self._supervisor.running:
      self._io_manager.display(STARS)
      self._num_turns += 1
      if self._num_turns > MAX_TURNS:
        self._io_manager.display(f'REACHED {MAX_TURNS} TURNS, TERMINATING')
        return

      parsed_response = self._get_llm_response(question)

      if not parsed_response.code and not parsed_response.functions:
        if parsed_response.error:
          # We couldn't parse the LLM-produced code, so we send the parsing
          # error to the LLM.
          question = parsed_response.error
          continue

        # The answer has no functions or top-level code.
        # This means the task is not done - the LLM needs user input.
        # We return control to the user.
        self._handle_no_code_response()
        return

      # If we are here, the response has top-level code, functions, or both.
      self._functions.update(parsed_response.functions)

      if not parsed_response.code:
        # The agent only defined functions, but gave no top-level code.
        question = 'go on'
        continue

      # The code execution output is saved into 'question', as it will be
      # sent to the LLM as the next user turn.
      question = self._execute_code(parsed_response.code)
    # End of while loop

    self._io_manager.display(STARS)
    self._io_manager.set_state(IOState.DONE)


# Start the agent

In [None]:
llm_interface = llm.Gemini(system_instruction, api_key=userdata.get('GOOGLE_API_KEY'))
#llm_interface= llm.Claude(system_instruction, api_key=userdata.get('ANTHROPIC_API_KEY'))
#llm_interface = llm.ChatGPT(system_instruction, api_key=userdata.get('OPENAI_API_KEY'))
#llm_interface = llm.DeepSeek(system_instruction, api_key=userdata.get('DEEPSEEK_API_KEY'))

io_manager = ColabIOManager()
agent = Agent(io_manager, llm_interface)

print("""
Legend:
❓ = Waiting for user input
🤔 = Thinking
🌎 = Running code
✅ = The task is done
""")
