# Code Execution: Giving LLMs the Power to Run Code

**Duration**: ~12 minutes

## Learning Objectives

By the end of this notebook, you will:

1. Understand **why** LLMs need code execution (tool limitations)
2. Build a simple **code execution pipeline** using `<execute_python>` tags
3. See code execution handle math, analysis, and visualization tasks
4. Learn safety considerations for running LLM-generated code

---

In [None]:
%pip install --quiet google-genai

In [None]:
GOOGLE_API_KEY = input("Enter your Google API key: ")

## 1.1 The Problem: A Limited Calculator Agent

Imagine you build an agent with only 4 math tools: **add**, **subtract**, **multiply**, and **divide**.

What happens when a user asks for something outside those 4 operations — like a square root?

Let's see.

In [None]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

# Define 4 basic calculator tools
calculator_tools = [
    types.Tool(function_declarations=[
        types.FunctionDeclaration(
            name="add",
            description="Add two numbers together",
            parameters=types.Schema(
                type="OBJECT",
                properties={
                    "a": types.Schema(type="NUMBER", description="First number"),
                    "b": types.Schema(type="NUMBER", description="Second number"),
                },
                required=["a", "b"],
            ),
        ),
        types.FunctionDeclaration(
            name="subtract",
            description="Subtract second number from first",
            parameters=types.Schema(
                type="OBJECT",
                properties={
                    "a": types.Schema(type="NUMBER", description="First number"),
                    "b": types.Schema(type="NUMBER", description="Second number"),
                },
                required=["a", "b"],
            ),
        ),
        types.FunctionDeclaration(
            name="multiply",
            description="Multiply two numbers",
            parameters=types.Schema(
                type="OBJECT",
                properties={
                    "a": types.Schema(type="NUMBER", description="First number"),
                    "b": types.Schema(type="NUMBER", description="Second number"),
                },
                required=["a", "b"],
            ),
        ),
        types.FunctionDeclaration(
            name="divide",
            description="Divide first number by second",
            parameters=types.Schema(
                type="OBJECT",
                properties={
                    "a": types.Schema(type="NUMBER", description="First number"),
                    "b": types.Schema(type="NUMBER", description="Second number"),
                },
                required=["a", "b"],
            ),
        )
    ])
]

# Force the model to use one of these tools (mode=ANY)
tool_config = types.ToolConfig(
    function_calling_config=types.FunctionCallingConfig(mode="ANY")
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What is the square root of 2?",
    config=types.GenerateContentConfig(
        tools=calculator_tools,
        tool_config=tool_config,
    ),
)

# Show what the model tried to do
for part in response.candidates[0].content.parts:
    if part.function_call:
        print(f"Tool called: {part.function_call.name}")
        print(f"Arguments:   {dict(part.function_call.args)}")
        print()
        print("The model was FORCED to pick from add/subtract/multiply/divide.")
        print("It has no sqrt tool, so it had to improvise (incorrectly).")

## 1.2 The Solution: `<execute_python>` Tags

Instead of giving the LLM a fixed set of tools, we can give it **one universal tool: code execution**.

The idea:
1. Tell the LLM to write Python code inside `<execute_python>` tags
2. Parse the code from its response
3. Execute it and return the output

This way, the LLM can solve **any** computable problem — not just the ones we pre-built tools for.

In [None]:
import re
import io
import sys
import math
import statistics

CODE_SYSTEM_PROMPT = """You are a Python code assistant. When asked to compute or analyze something, 
write Python code inside <execute_python> tags. The code should use print() to output results.

Example:
User: What is 2 + 2?
Assistant: <execute_python>
print(2 + 2)
</execute_python>

Always write complete, runnable Python code. You have access to the math and statistics modules."""


def execute_code_from_response(response_text):
    """Extract and execute Python code from <execute_python> tags."""
    # Find all code blocks between tags
    code_blocks = re.findall(
        r"<execute_python>(.*?)</execute_python>", response_text, re.DOTALL
    )

    if not code_blocks:
        return "No code found in response."

    results = []
    for code in code_blocks:
        code = code.strip()
        # Capture stdout
        old_stdout = sys.stdout
        sys.stdout = captured = io.StringIO()

        try:
            # Execute with limited globals for safety
            allowed_globals = {
                "__builtins__": __builtins__,
                "math": math,
                "statistics": statistics,
            }
            exec(code, allowed_globals)
            output = captured.getvalue()
            results.append(output.strip())
        except Exception as e:
            results.append(f"Error: {e}")
        finally:
            sys.stdout = old_stdout

    return "\n".join(results)


def ask_with_code_execution(question):
    """Send a question to the LLM and execute any code in its response."""
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=question,
        config=types.GenerateContentConfig(
            system_instruction=CODE_SYSTEM_PROMPT,
        ),
    )

    llm_text = response.text
    print("LLM Response:")
    print(llm_text)
    print("\n" + "=" * 50)

    result = execute_code_from_response(llm_text)
    print(f"Execution Output: {result}")
    return result


print("Code execution pipeline ready!")

In [None]:
# Demo 1: Factorial of 20
print("DEMO 1: What is the factorial of 20?")
print("=" * 50)
ask_with_code_execution("What is the factorial of 20?");

In [None]:
# Demo 2: Square root of 2 — the problem from earlier, now solved!
print("DEMO 2: What is the square root of 2?")
print("=" * 50)
ask_with_code_execution("What is the square root of 2?");

In [None]:
# Demo 3: Generate sin(x) values
print("DEMO 3: Plot sin(x) from 0 to 2*pi")
print("=" * 50)
ask_with_code_execution(
    "Print a table of sin(x) values from 0 to 2*pi in 8 equal steps. "
    "Format each line as: x_value -> sin(x_value), rounded to 4 decimals."
);

## 1.3 Exercise: Data Analysis via Code Execution

**Your turn!** Modify the system prompt so the LLM performs basic data analysis.

Try asking it to analyze a list of numbers — compute the mean, median, standard deviation, min, and max.

In [None]:
# Exercise: ask the code execution pipeline to analyze this data
# Hint: just call ask_with_code_execution() with a clear question!

data = [23, 45, 12, 67, 34, 89, 56, 11, 78, 43]

# TODO: call ask_with_code_execution with a question about `data`
# Example: ask_with_code_execution(f"Analyze this list: {data}. Compute mean, median, stdev, min, max.")

## 2.0 Advanced: Code Execution with Package Installation

In the basic pipeline above, we restricted the LLM to only `math` and `statistics`. But real-world tasks often need **external packages** — pandas for data analysis, matplotlib for visualization, numpy for numerical computing, requests for web APIs, etc.

This is exactly what tools like **ChatGPT Code Interpreter** do: they let the LLM install and use any package.

**What we'll build next:**
- An upgraded executor that detects `import` statements in LLM-generated code
- Auto-installs missing packages via `pip install`
- Runs code with full `__builtins__` (no restricted globals)
- Handles matplotlib plots by saving to file and displaying inline

**Safety note:** This is powerful but dangerous in production. Always use containers/sandboxes for untrusted code.

In [None]:
import re
import io
import os
import sys
import subprocess

ADVANCED_SYSTEM_PROMPT = """You are a Python code assistant with access to ANY Python package.
When asked to compute, analyze, or visualize something, write Python code inside <execute_python> tags.
Use print() to output results. For plots, use matplotlib and call plt.savefig('_output.png') then plt.close().
You can freely use: pandas, numpy, matplotlib, requests, or any other package.

Example:
User: Analyze this data
Assistant: <execute_python>
import pandas as pd
df = pd.DataFrame({'values': [1, 2, 3]})
print(df.describe())
</execute_python>

Always write complete, runnable Python code with print() for all outputs."""


def auto_install_packages(code):
    """Detect import statements and auto-install missing packages."""
    # Find all top-level package names from import statements
    imports = re.findall(r'^\s*(?:import|from)\s+(\w+)', code, re.MULTILINE)
    # Map common import names to pip package names
    pip_name_map = {
        "cv2": "opencv-python",
        "sklearn": "scikit-learn",
        "PIL": "Pillow",
        "bs4": "beautifulsoup4",
    }
    for pkg in set(imports):
        try:
            __import__(pkg)
        except ImportError:
            pip_name = pip_name_map.get(pkg, pkg)
            print(f"Installing {pip_name}...")
            subprocess.check_call(
                [sys.executable, "-m", "pip", "install", "-q", pip_name],
                stdout=subprocess.DEVNULL,
                stderr=subprocess.DEVNULL,
            )
            print(f"Installed {pip_name} successfully.")


def advanced_execute_code(code):
    """Execute code with full builtins and auto-install support."""
    # Auto-install any missing packages
    auto_install_packages(code)

    # If code uses matplotlib, inject Agg backend
    if "matplotlib" in code or "plt" in code:
        code = "import matplotlib\nmatplotlib.use('Agg')\n" + code

    # Capture stdout
    old_stdout = sys.stdout
    sys.stdout = captured = io.StringIO()

    try:
        exec_globals = {"__builtins__": __builtins__}
        exec(code, exec_globals)
        output = captured.getvalue()
        return output.strip()
    except Exception as e:
        return f"Error: {e}"
    finally:
        sys.stdout = old_stdout


def ask_advanced(question):
    """Send a question to the LLM and execute code with full package support."""
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=question,
        config=types.GenerateContentConfig(
            system_instruction=ADVANCED_SYSTEM_PROMPT,
        ),
    )

    llm_text = response.text
    print("LLM Response:")
    print(llm_text)
    print("\n" + "=" * 50)

    # Extract code blocks
    code_blocks = re.findall(
        r"<execute_python>(.*?)</execute_python>", llm_text, re.DOTALL
    )

    if not code_blocks:
        print("No code found in response.")
        return

    for code in code_blocks:
        code = code.strip()
        result = advanced_execute_code(code)
        print(f"Execution Output:\n{result}")

        # If a plot was saved, display it
        if os.path.exists("_output.png"):
            from IPython.display import display, Image
            display(Image(filename="_output.png"))
            os.remove("_output.png")

    return result


print("Advanced code execution pipeline ready!")

### 2.1 Scenario: Data Analysis with Pandas

Let's ask the LLM to create and analyze a dataset using **pandas** — the most popular Python data analysis library.

The advanced executor will automatically detect `import pandas` and install it if needed.

In [None]:
# Scenario 2.1: Pandas data analysis
print("SCENARIO 2.1: Data Analysis with Pandas")
print("=" * 50)
ask_advanced(
    "Create a pandas DataFrame with columns Name, Age, Salary, Department for 8 employees. "
    "Then compute: average salary by department, the oldest employee, and sort by salary descending. "
    "Print all results."
);

### 2.2 Scenario: Visualization with Matplotlib

Now let's ask the LLM to generate a **plot**. The advanced executor handles this by:
1. Injecting `matplotlib.use('Agg')` for non-interactive rendering
2. The LLM saves the plot to `_output.png`
3. We display it inline with `IPython.display.Image`

In [None]:
# Scenario 2.2: Matplotlib visualization
print("SCENARIO 2.2: Visualization with Matplotlib")
print("=" * 50)
ask_advanced(
    "Using numpy and matplotlib, generate 100 random data points from a normal distribution "
    "(mean=0, std=1). Plot a histogram with 20 bins. Add a title 'Normal Distribution Histogram' "
    "and axis labels. Save the plot with plt.savefig('_output.png') then call plt.close(). "
    "Also print summary statistics: mean, std, min, max of the generated data."
);

### 2.3 Scenario: Web Data Fetching with Requests

The LLM can also write code that fetches **live data from the web**. Here we'll hit a public JSON API and format the results.

In [None]:
# Scenario 2.3: Web data fetching with requests
print("SCENARIO 2.3: Web Data Fetching with Requests")
print("=" * 50)
ask_advanced(
    "Using the requests library, fetch data from https://jsonplaceholder.typicode.com/users "
    "and print a formatted table showing each user's name, email, and city. "
    "Use string formatting to align the columns nicely."
);

### 2.4 Scenario: Text Processing with regex + collections

Not every task needs external packages. Here we use **stdlib modules** (`re`, `collections`) through the advanced pipeline to do text analysis.

In [None]:
# Scenario 2.4: Text processing with stdlib
print("SCENARIO 2.4: Text Processing with regex + collections")
print("=" * 50)

paragraph = (
    "Artificial intelligence is transforming the way we live and work. "
    "Machine learning algorithms can analyze vast amounts of data. "
    "Natural language processing enables computers to understand human language. "
    "Deep learning has achieved remarkable results in image recognition. "
    "The future of AI holds tremendous potential for solving complex problems."
)

ask_advanced(
    f"Given this paragraph: '{paragraph}'\n\n"
    "Use Python to: count word frequencies, find the top 5 most common words, "
    "count the number of sentences, and compute the average word length. "
    "Use the collections module (Counter). Print all results clearly."
);

## Summary

### What We Learned

1. **Fixed tools are limiting** — a calculator agent with only 4 operations can't handle square roots, factorials, or data analysis
2. **Code execution is a universal tool** — instead of building N tools, give the LLM the ability to write and run code
3. **The basic pattern is simple**: system prompt + `<execute_python>` tags + `exec()` with stdout capture
4. **Advanced execution with auto-install** — detect imports, pip install missing packages, run with full builtins
5. **Real-world scenarios** — pandas for data analysis, matplotlib for visualization, requests for web APIs, stdlib for text processing

### Key Takeaways

| Feature | Basic Pipeline | Advanced Pipeline |
|---|---|---|
| Allowed packages | `math`, `statistics` only | Any Python package |
| Package installation | Not supported | Auto-detect and install |
| Globals | Restricted | Full `__builtins__` |
| Visualization | Not supported | matplotlib with auto-save/display |
| Use case | Simple math, stats | Data analysis, web, visualization |

### Safety Considerations

- **Never run LLM-generated code in production without sandboxing** — use Docker containers, E2B, or similar
- **Review generated code** before execution when possible
- **Restrict network access** if the code doesn't need it
- **Set timeouts** to prevent infinite loops
- **Use read-only file systems** to prevent data corruption

### Next Steps

- Explore **Google's built-in code execution** tool in Gemini API
- Look into sandboxed execution environments (Docker, E2B, etc.)
- Combine code execution with other tools for a more capable agent
- Add **error recovery** — if code fails, send the error back to the LLM for a retry