# Shell Tool with OpenAI Responses API

Author: [Priyanshu Deshmukh](https://github.com/priyansh4320)

This notebook demonstrates how to use the shell tool with OpenAI's Responses API. The shell tool allows models to execute shell commands through your integration, enabling them to interact with your local computer through a controlled command-line interface.

**Warning: Running arbitrary shell commands can be dangerous. Always sandbox execution or add strict allow-/deny-lists before forwarding commands to the system shell in production.**

## Install AG2 and dependencies

To be able to run this notebook, you will need to install AG2 with the `openai` extra.
````{=mdx}
:::info Requirements
Install `ag2` with 'openai' extra:

pip install ag2[openai]
```

For more information, please refer to the [installation guide](https://docs.ag2.ai/latest/docs/user-guide/basic-concepts/installing-ag2).
:::
````

## Setup

First, let's configure the LLM with the Responses API and enable the shell tool.

In [None]:
import os

from autogen import ConversableAgent, LLMConfig

# Configure the LLM with Responses API and shell tool
llm_config = LLMConfig(
    config_list={
        "api_type": "responses",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
    },
)

# Create the assistant agent
assistant = ConversableAgent(
    name="Assistant",
    system_message="""You are a helpful assistant with access to shell commands.
    You can use the shell tool to execute commands and interact with the filesystem.
    The local shell environment is on Mac/Linux.
    Keep your responses concise and include command output when helpful.
    """,
    llm_config=llm_config,
    human_input_mode="NEVER",
)

# Create a user proxy agent
user_proxy = ConversableAgent(
    name="UserProxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
)

## Example 1: Automating Filesystem Diagnostics

The shell tool is perfect for automating filesystem or process diagnostics. In this example, we'll find the largest PDF file in a directory and show running processes.

In [None]:
# Example 1: Find the largest PDF and show processes
result = assistant.initiate_chat(
    recipient=user_proxy,
    message="""
    Please help me with the following tasks:
    1. Find the largest PDF file in the current directory (or create a test directory with PDFs)
    2. Show me information about running Python processes
    """,
    max_turns=5,
)

## Example 2: Extending Model Capabilities with UNIX Utilities

The shell tool extends the model's capabilities by allowing it to use built-in UNIX utilities, Python runtime, and other CLIs in your environment. This enables the model to perform tasks that require system-level operations.

In [None]:
# Example 2: Use UNIX utilities and Python CLI
result = assistant.initiate_chat(
    recipient=user_proxy,
    message="""
    Please help me:
    1. Check the current Python version using the python CLI
    2. Get system information like disk usage and memory
    3. Create a simple text file and then use grep to search within it
    """,
    max_turns=6,
)

print("\n" + "=" * 80)
print("Chat Summary:")
print("=" * 80)
print(result.summary)

## Example 3: Multi-Step Build and Test Flows

The shell tool excels at running multi-step build and test flows, chaining commands together to complete complex workflows. In this example, we'll set up a Python project, install dependencies, and run tests.

In [None]:
# Example 3: Multi-step build and test flow
result = assistant.initiate_chat(
    recipient=user_proxy,
    message="""
    Please help me set up a simple Python project:
    1. Create a directory called 'test_project'
    2. Create a simple Python module with a function to test
    3. Create a test file using pytest format
    4. Install pytest if needed
    5. Run the tests and show me the results
    """,
    max_turns=8,
)

print("\n" + "=" * 80)
print("Chat Summary:")
print("=" * 80)
print(result.summary)

## Notes

- The shell tool executes commands immediately when `shell_call` items are generated by the model
- Each `shell_call` can contain multiple commands that are executed concurrently
- Commands support timeouts and output length limits
- Always be cautious when executing shell commands, especially in production environments
- Consider implementing sandboxing or command allow-lists for security